## Hash Functions

Main Idea: construct a data structure that makes insert, search, delete an element fast, i.e. $\mathcal{O}(1)$. We want to store element $x$, each with an associated key $key\in U$, where $U$ is the set of the universe of all keys. Normally we make keys to be natural numbers. Let $\mathbb{Z}_n \equiv \{0,1,\dots, n-1\}$. The simplest hash table we can design is a direct address hash table.
- let $d:U \rightarrow T$ such that for an element $(x, x.key)$, $key \mapsto key$ and $x \mapsto T[x.key]$, i.e. each element and is corresponding key is place at position key in the direct address table.
- Direct_address_insert(T, x): return T[x.key]
- Direct_address_Insert(T, x): T[x.key] = x
- Direct_address_Delete(T, x): t[x.key] = NIL
- all operations are $\mathcal{O}(1)$
- The direct address works well if the universe of distinct key is relative small, since the number of slots in the table are $m = |U|$

Normally the number of keys are large and we cannot reserve a slot for each individual key. Instead we use a *hash function* in the following way:
- $h:U \rightarrow T$, such that $key \mapsto h(key)\in \mathbb{Z}_m$ and $x \mapsto T[h(x.key)]$
- the number of slots in the *hash table* $T$ is $m$, and the number of distinct keys we store in it is $n$.
- we define the *load factor*, $\alpha=\frac{n}{m}$, it could be $\alpha > 1$
- we can have *collissions* in the hash table: $$\exists k_1,k_2\in U, \,\, \text{with} \,\, k_1 \neq k_2, \,\, \text{and} \,\, h(k_1)=h(k_2)$$
- depending on the design of the hash function we could have few or many collisions.
  - The poorest hash functions maps all distinct keys into the same slot in the hash table
  - In *uniform hashing* the probability that each key is placed in each slot is the same, i.e. $$\Pr(h(key)=i, \,\, \forall i\in \mathbb{Z}_m) = \frac{1}{m}$$
  - let $n_i, \,\, i\in \mathbb{Z}_m$ denote the number of elements on the same slot on the hash table. Then $\sum\limits_{i=0}^{m-1} n_i = n$
  - on average after inserting $n$ elements on the hash table the expected number of elements on each slot is $$\mathbb{E}[n_i] = \alpha = \frac{n}{m}$$
- We can resolve collissions by *chainning*:
  - create a double linked list on each slot on the hash table when a collission happens and insert new elements at the beggining of the list.
  - Chaining_Hash_insert(T, x): insert x at the beggining of the linked-list $T[h(x.key)]$
  - Chaining_Hash_search(T, x): search for x.key in the linke-list $T[h(x.key)]$
  - Chaining_Hash_delete(T, x): delete the element with $x.key$ in the linked-list $T[h(x.key)]$
  - searching is $\mathcal{O}(1+\alpha)$ on average, since we have to traverse the linked list.
  - inserting is $\mathcal{O}(1), since we insert the new element at the head of the corresponding linked list
  - deleting the first or last element at a slot is $\mathcal{O}(1)$ if we use double linked-lists, and $\mathcal{O}(1+\alpha)$ if we deleting a specific element on the linked-list
- choosing a hash function:
  - division method: $$h: \mathbb{N} \rightarrow \mathbb{Z}_m \,\,\, \text{by} \,\,\, k \mapsto h(k)=k \text{mod}\,m $$
  - multiplication method: $$h: \mathbb{N} \rightarrow \mathbb{Z}_m \,\,\, \text{by} \,\,\, k \mapsto h(k)= \lfloor m\left(k\cdot A \text{mod}\,1\right)\rfloor \,\,\,A\in(0,1)$$

In [1]:
class HashTableChaining:


    def __init__(self, m: int):
        # define the number of slots in the hash table
        self.size = m
        # use a list of lists to represent the double linked list on each slot on the hash table
        self.hash_table = [[] for _ in range(m)]


    def key_word(self, word: int) -> int:
        '''Define a key-function to calculate uniquely the key for a sting of characters
        '''

        kword = sum([ord(c) for c in word])
        return kword


    def hash_function(self, kword: int) -> int:
        # define the key of a sting as the sum of the unicode number of each character on the string
        # define a simple hash function as key mod m
        return kword % self.size

    def add_element(self, word: str):
        '''add an element in its corresponding position on the hash table in the form
        of a tuple (element, h(key), key)
        '''
        kword = self.key_word(word)
        hash_value_word = self.hash_function(kword)
        slot = self.hash_table[hash_value_word]
        # check if the element is already on the hash table
        if (word, hash_value_word, kword) not in slot:
            slot.append((word, hash_value_word, kword))

    def search_element(self, word: str):
        kword = self.key_word(word)
        hash_value_word = self.hash_function(kword)
        return (word, hash_value_word, kword) in self.hash_table[hash_value_word]
    
    def delete_element(self, word: str):
        kword = self.key_word(word)
        hash_value_word = self.hash_function(kword)
        slot = self.hash_table[hash_value_word]
        # check if the element is on the hash table
        if (word, hash_value_word, kword) in slot:
            slot.remove((word, hash_value_word, kword))

    def print_hash_table(self):
        for i in range(self.size):
            print(f'slot number {i} contains the elements (element, key, h(key)): {self.hash_table[i]}')
            

In [2]:
htable =HashTableChaining(10)
words = ['table', 'chair', 'monitor', 'desk', 'computer', 'mouse', 'keyboard']

for w in words:
    htable.add_element(w)
    
htable.print_hash_table()

print(f'is "table" on the hash table: {htable.search_element('table')}')
print(f'is "blabla" on the hash table: {htable.search_element('blabla')}')
print(f'deleting "keyboard" from the hash table')
htable.delete_element('keyboard')
htable.print_hash_table()


slot number 0 contains the elements (element, key, h(key)): [('table', 0, 520)]
slot number 1 contains the elements (element, key, h(key)): []
slot number 2 contains the elements (element, key, h(key)): []
slot number 3 contains the elements (element, key, h(key)): [('desk', 3, 423), ('mouse', 3, 553)]
slot number 4 contains the elements (element, key, h(key)): []
slot number 5 contains the elements (element, key, h(key)): []
slot number 6 contains the elements (element, key, h(key)): [('monitor', 6, 776)]
slot number 7 contains the elements (element, key, h(key)): []
slot number 8 contains the elements (element, key, h(key)): []
slot number 9 contains the elements (element, key, h(key)): [('chair', 9, 519), ('computer', 9, 879), ('keyboard', 9, 849)]
is "table" on the hash table: True
is "blabla" on the hash table: False
deleting "keyboard" from the hash table
slot number 0 contains the elements (element, key, h(key)): [('table', 0, 520)]
slot number 1 contains the elements (element, 

### Universal Hash Family

- Having a fix hash fuction is subject to malicious adversalial attacks that maps each element to the same slot, making every operation $\mathcal{O}(n)$.  
To avoid such situations we can choose a hash function at random from a *Universal Hash Family* of hash functions $$\mathcal{H} = \{h_1, h_2, \cdots, h_k \}$$  
Such a family is called *universal* if $$\forall \, k_1,k_2\in U \,\, \Pr_{h\in \mathcal{H}}\left(h(k_1) =h(k_2) \right)\leq \frac{1}{m}$$
- In univerasl hashing the expected number of element on each slot  is:
$$\mathbb{E}(n_{h(k)}) = \begin{cases}
1 +\alpha \,\, &, \,\, \text{if}\,\, k\in T[h(k)]\\
\alpha, \,\, &, \,\, \text{if} \,\, k\notin T[h(k)]
\end{cases}
$$
where $n$ total elements are inserted into $m$ total slot in hash table and collissions are resolved by chainning.
- example of a universal hash family
$$\mathcal{H}_{pm} = \{h_{ab}: \,\, a \in \mathbb{Z}_p^*, \,\, b\in \mathbb{Z}_p\} \quad , \quad h_{ab}: \mathbb{N} \rightarrow \mathbb{Z}_m \quad , \quad h_{ab} = \left((a\cdot k+b) \,\text{mod} \,p \right)\text{mod} \, m \quad ,\text{with} \,\, p \,\, \text{a prime, such that } \forall k\in U \,\, , \,\, k<p \,\, \text{and also} \,\, m<p$$
In the above hash family there are $p\cdot(p-1)$ different hash functions to choose from.

### Open Address Hashing

Unlike to chainning, when a collission happens, e.i. if for $k_1,k_2 \in U$ with $k_1\neq k_2$, $h(k_1)=h(k_2)$, place $k_2$ to another empty slot in the hash table. 

### Perfect Hashing

- When the set $U$ of set is static and do not change over time, we would like to make every operation in the hash table to be $\mathcal{O}(1)$ in the worst case.
- to achieve this the number of slots in the table has to be significally greater than the number of keys.
- Theorem: when hashing from a universal hash family $\mathcal{H}$, n keys into a table with $m=n^2$ slots,  
then the probability of any collission happening is less than $1/2$.
- to construct a perfect hashing we follow the steps:
  - choose $h_{ab}\in \mathcal{H}_{pm}$, 
  - hash $n$ keys into a table $T$ of $m<n$ slots 
  - if $n_j\geq 2$ collissions happens at slot $m_j$ use pick another hash function $\tilde{h}_{\tilde{a}\tilde{b}}\in \mathcal{H}_{pm_j}$  
    and hash the $n_j$ elements of the $m_j$ slot into another hash table $T_j$ with $n_j^2$ slots.  
    If now a collission happens on $T_j$, pick another $\tilde{h}_{\tilde{a}\tilde{b}}$ until there are no collissions on $T_j$
- all operations are $\mathcal{O}(1)$ in the worst case.
- we can prove that the total extra space is $$\mathbb{E}\left(\sum\limits_{j=0}^{m-1} n_j^2 \right) < 2n \,\, \Rightarrow \,\, \mathcal{O}(n)$$