### Hashing, Chaining, Dictionaries

#### Dictionaries
- **A**bstract **D**ata **T**ype
- Maintain set of items, each with a *key*
- Operations:
  1. ```insert(item)``` - Overwrite existing keys
  2. ```delete(item)``` - Delete key if exists, else raise error
  3. ```search(key)``` - Return item with given key, else raise error

#### Prehashing
- Maps keys to non-negative integers
- In theory, keys are *finite* and *discrete* (string of bits)
- ```prehash(x) == prehash(y)``` should only hold true iff **x==y**

#### Hashing
- Reduce universe of all keys(integers) down to reasonable size **m** for table. Let this function be h()
- **m** ~ O(n) where n is the number of keys
- if h(k<sub>1</sub>) == h(k<sub>2</sub>) and k<sub>1</sub> != k<sub>2</sub>, this is called a **collision**

#### Chaining
- Based on linked lists
- Colliding elements are stored as a linked list

#### Simple uniform hashing
**Assumption** : Each key is equally likely to be hashed to any slot of the table irrespective of where the other keys hash to 

Expected length of a chain for *n* keys and *m* slots in the table = n/m ($\alpha$ : load factor)



In [27]:
import random

In [8]:
class Node():
    def __init__(self, key=None, value=None):
        self.key=key
        self.value=value
        self.next=None

In [72]:
def universal_hashing_function(k):
    '''
    Implements the universal hashing function
    h(k)=[(ak+b) % p] % m
    k is the key to be hashed
    p is a large prime > U (size of the universe of keys)
    a,b are random non-negative integers smaller than p
    m is the size of the table
    '''
    k=hash(k)
    return ((a*k + b) % p) % m


In [94]:
def insert(key, value):
    key_hash=universal_hashing_function(key)
    key_collision=0
    if not dictionary[key_hash]:
        dictionary[key_hash]=Node(key, value)
    else:
        temp_node=dictionary[key_hash]
        while(temp_node):
            if temp_node.key==key:
                temp_node.value=value
                key_collision=1
                break
            else:
                temp_node=temp_node.next
        if not key_collision:
            temp_node.next=Node(key, value)   
    return key_collision

In [74]:
def getitem(key):
    key_hash=universal_hashing_function(key)
    if not dictionary[key_hash]:
        raise KeyError("Key does not exist")
    else:
        temp_node=dictionary[key_hash]
        while(temp_node):
            if temp_node.key==key:
                return temp_node.value
            else:
                temp_node=temp_node.next
        raise KeyError("Key does not exist")

In [75]:
def delete(key):
    key_hash=universal_hashing_function(key)
    if not dictionary[key_hash]:
        raise KeyError("Key does not exist")
    else:
        temp_node=dictionary[key_hash]
        prev=None
        while(temp_node):
            if temp_node.key==key:
                if not prev:
                    dictionary[key_hash]=temp_node.next
                else:
                    prev.next=temp_node.next
                return
            else:
                prev=temp_node
                temp_node=temp_node.next
        raise KeyError("Key does not exist")

In [52]:
def show():
    for idx in range(len(dictionary)):
        entry=dictionary[idx]
        if entry:
            temp=entry
            while(temp):
                print("{{ {} : {} }}".format(temp.key, temp.value), end=", ")
                temp=temp.next

#### Large Prime Number Generation
The following cell implements a large prime number generator. It is taken from the code at [this link](https://medium.com/@prudywsh/how-to-generate-big-prime-numbers-miller-rabin-49e6e6af32fb)

In [24]:
from random import randrange, getrandbits
def is_prime(n, k=128):
    """ Test if a number is prime
        Args:
            n -- int -- the number to test
            k -- int -- the number of tests to do
        return True if n is prime
    """
    # Test if n is not even.
    # But care, 2 is prime !
    if n == 2 or n == 3:
        return True
    if n <= 1 or n % 2 == 0:
        return False
    # find r and s
    s = 0
    r = n - 1
    while r & 1 == 0:
        s += 1
        r //= 2
    # do k tests
    for _ in range(k):
        a = randrange(2, n - 1)
        x = pow(a, r, n)
        if x != 1 and x != n - 1:
            j = 1
            while j < s and x != n - 1:
                x = pow(x, 2, n)
                if x == 1:
                    return False
                j += 1
            if x != n - 1:
                return False
    return True
def generate_prime_candidate(length):
    """ Generate an odd integer randomly
        Args:
            length -- int -- the length of the number to generate, in bits
        return a integer
    """
    # generate random bits
    p = getrandbits(length)
    # apply a mask to set MSB and LSB to 1
    p |= (1 << length - 1) | 1
    return p
def generate_prime_number(length=1024):
    """ Generate a prime
        Args:
            length -- int -- length of the prime to generate, in          bits
        return a prime
    """
    p = 4
    # keep generating while the primality test fail
    while not is_prime(p, 64):
        p = generate_prime_candidate(length)
    return p

In [97]:
m=1000
p=generate_prime_number(length=128)
a,b=random.randint(0,p-1), random.randint(0,p-1)
dictionary=[None for _ in range(m)]

In [98]:
c=0
for _ in range(50):
    c+=insert(key=random.randint(0,100), value=random.random())
print("{} collisions".format(c))

11 collisions


In [100]:
show()

{ 93 : 0.11736287184252514 }, { 87 : 0.407841307989703 }, { 41 : 0.10999523340638151 }, { 95 : 0.7786571451985072 }, { 45 : 0.3411739410674578 }, { 92 : 0.9800387466963579 }, { 31 : 0.3643077573522411 }, { 35 : 0.13952374830847702 }, { 28 : 0.15788370512702476 }, { 82 : 0.0033165954375194984 }, { 75 : 0.1463276267028083 }, { 39 : 0.14321754910912732 }, { 86 : 0.38583691569482725 }, { 79 : 0.6814294414692729 }, { 36 : 0.5835915470309202 }, { 22 : 0.5070119855708576 }, { 26 : 0.016311742179795252 }, { 23 : 0.13778941957572943 }, { 77 : 0.12247972886867553 }, { 63 : 0.5377870209516074 }, { 27 : 0.0806751858863981 }, { 13 : 0.7622952254024683 }, { 24 : 0.5089521097763241 }, { 71 : 0.5271012696135149 }, { 64 : 0.854599561168042 }, { 21 : 0.8971324295723885 }, { 68 : 0.619753056361715 }, { 7 : 0.9245424961414358 }, { 18 : 0.05595102735937674 }, { 58 : 0.5508895653656204 }, { 15 : 0.2182296867595075 }, { 62 : 0.775066016802474 }, { 12 : 0.0917282562996844 }, { 59 : 0.3236011085896672 }, { 52 

In [101]:
delete(32)

KeyError: 'Key does not exist'

In [102]:
getitem(12)

0.0917282562996844