### Hashing, Chaining, Dictionaries

#### Dictionaries
- **A**bstract **D**ata **T**ype
- Maintain set of items, each with a *key*
- Operations:
  1. ```insert(item)``` - Overwrite existing keys
  2. ```delete(item)``` - Delete key if exists, else raise error
  3. ```search(key)``` - Return item with given key, else raise error

#### Prehashing
- Maps keys to non-negative integers
- In theory, keys are *finite* and *discrete* (string of bits)
- ```prehash(x) == prehash(y)``` should only hold true iff **x==y**

#### Hashing
- Reduce universe of all keys(integers) down to reasonable size **m** for table. Let this function be h()
- **m** ~ O(n) where n is the number of keys
- if h(k<sub>1</sub>) == h(k<sub>2</sub>) and k<sub>1</sub> != k<sub>2</sub>, this is called a **collision**

#### Chaining
- Based on linked lists
- Colliding elements are stored as a linked list

#### Simple uniform hashing
**Assumption** : Each key is equally likely to be hashed to any slot of the table irrespective of where the other keys hash to 

Expected length of a chain for *n* keys and *m* slots in the table = n/m ($\alpha$ : load factor)



In [None]:
import random

In [None]:
class Node():
    def __init__(self, key=None, value=None):
        self.key=key
        self.value=value
        self.next=None

#### Large Prime Number Generation
The Dictionary class uses a large prime number generator. It is taken from the code at [this link](https://medium.com/@prudywsh/how-to-generate-big-prime-numbers-miller-rabin-49e6e6af32fb)

In [None]:
class Dictionary():
    
    def __init__(self, m=1000):
        from random import randrange, getrandbits, randint
        self.m=m
        self.data=[None for _ in range(m)]
        self.p=generate_prime_number(length=256)
        self.a, self.b= randint(0,p-1), randint(0,p-1)
        self._length_=0
        self._alpha_=0
    
    def is_prime(self, n, k=128):
        """ Test if a number is prime
            Args:
                n -- int -- the number to test
                k -- int -- the number of tests to do
            return True if n is prime
        """
        if n == 2 or n == 3:
            return True
        if n <= 1 or n % 2 == 0:
            return False
        # find r and s
        s = 0
        r = n - 1
        while r & 1 == 0:
            s += 1
            r //= 2
        # Run k tests
        for _ in range(k):
            a = randrange(2, n - 1)
            x = pow(a, r, n)
            if x != 1 and x != n - 1:
                j = 1
                while j < s and x != n - 1:
                    x = pow(x, 2, n)
                    if x == 1:
                        return False
                    j += 1
                if x != n - 1:
                    return False
        return True
    
    def generate_prime_candidate(self, length):
        """ Generate an odd integer randomly
            Args:
                length -- int -- the length of the number to generate, in bits
            return a integer
        """
        # generate random bits
        p = getrandbits(length)
        # apply a mask to set MSB and LSB to 1
        p |= (1 << length - 1) | 1
        return p
    
    def generate_prime_number(self, length=1024):
        """ Generate a prime
            Args:
                length -- int -- length of the prime to generate, in          bits
            return a prime
        """
        p = 4
        # keep generating while the primality test fail
        while not self.is_prime(p, 64):
            p = self.generate_prime_candidate(length)
        return p
    
    def universal_hashing_function(self, k):
        '''
        Implements the universal hashing function
        h(k)=[(ak+b) % p] % m
        k is the key to be hashed
        p is a large prime > U (size of the universe of keys)
        a,b are random non-negative integers smaller than p
        m is the size of the table
        '''
        k=hash(k)
        return ((self.a*k + self.b) % self.p) % self.m
    
    def insert(self, key, value):
        key_hash=self.universal_hashing_function(key)
        key_collision=0
        chain=0
        if not self.data[key_hash]:
            self.data[key_hash]=Node(key, value)
        else:
            temp_node=self.data[key_hash]
            prev=None
            while(temp_node):
                if temp_node.key==key:
                    temp_node.value=value
                    key_collision=1
                    break
                else:
                    prev=temp_node
                    temp_node=temp_node.next
            if not key_collision:
                prev.next=Node(key, value)
                chain=1
        self._length_+=1
        self._alpha_=self._length_/self.m
        return chain

    def getitem(self, key):
        key_hash=self.universal_hashing_function(key)
        if not self.data[key_hash]:
            raise KeyError("Key does not exist")
        else:
            temp_node=self.data[key_hash]
            while(temp_node):
                if temp_node.key==key:
                    return temp_node.value
                else:
                    temp_node=temp_node.next
            raise KeyError("Key does not exist")

    def delete(self, key):
        key_hash=self.universal_hashing_function(key)
        if not self.data[key_hash]:
            raise KeyError("Key does not exist")
        else:
            temp_node=self.data[key_hash]
            prev=None
            while(temp_node):
                if temp_node.key==key:
                    if not prev:
                        self.data[key_hash]=temp_node.next
                    else:
                        prev.next=temp_node.next
                    self._length_-=1
                    self._alpha_==self._length_/self.m
                    return
                else:
                    prev=temp_node
                    temp_node=temp_node.next
            raise KeyError("Key does not exist")

    def show(self):
        for idx in range(len(self.data)):
            entry=self.data[idx]
            if entry:
                temp=entry
                while(temp):
                    print("{{ {} : {} }}".format(temp.key, temp.value), end=", ")
                    temp=temp.next

### Testing code

In [None]:
dictionary=Dictionary()
collisions=0

In [None]:
item_count=random.randint(0,125)
print("{} items will be tried".format(item_count))
for _ in range(item_count):
    c+=dictionary.insert(key=random.randint(0,1000), value=random.random())
print("{} collisions".format(c))

In [None]:
print("Load factor: {}, Length: {}".format(dictionary._alpha_, dictionary._length_))

In [None]:
dictionary.show()

In [None]:
dictionary.delete(665)