# Hash Tables

- A data structure that implements an associative array mapping the keys to values via a hash function.

- The associative array stores a set (key,value) of pairs and allows insertion, deletion and lookup. The keys of this array must be unique.

- Ideally, the hash function is desired to parse each key into a unique bucket(index in the memory). But most of the hash table designs employ an imperfect hash function, which might cause hash collisions (2 or more keys are assigned to the same bucket) and such collisions are typically resolved with various resolution mechanisms.


## Load factor

- Is a critical statistic of the hash table.

- An array $A$, with length $m$ is partially filled with $n$ elements, where $m \geq n$  
    - A value $x$ gets stored at an index location $ A[h(x)]$ , where $h$ is the hashfunction and $h(x)<m$

$$ load factor(\alpha) =\frac{n}{m} $$ 

- Performance of the hash table deteriorates with increase in the load factor $\alpha$.
    - The hash table is resized\rehashed, if the load factor $\alpha$ approaches 1.
    - Acceptable figures of load factor $\alpha$ ranges between 0.6 to 0.75


## Use cases
- Hash tables are used to map one thing to other.
- Used when frequent lookups are required.
- DNS resolution, to translate web address to IP addresses.
- To prevent duplicate entries.
- As cache.

In [1]:
# In python, a dictionary itself is a hashmap.
phone_book = {}
phone_book["Maximilian"] = 45223544
phone_book["Mustermann"] = 1234445334

In [7]:
phone_book["Mustermann"]

1234445334

## Time complexity
-   |OP|Average case | Worst case|
    |:-:|:-:|:-:|
    Search|$O(1)$|$O(n)$
    Insert|$O(1)$|$O(n)$
    Deletion|$O(1)$|$O(n)$


- Assuming a computer can do 10 operations per second, a comparision of the best case scenarios to look-up (search) is as follows

    -   |# of items | Simple search<br/> $O(n)$|Binary search <br/> $O(log(n))$ | Hash tables <br/>$O(1) $|
        |:-:|:-:|:-:|:-:|
        |100 | 10 sec| 1 sec|Instant
        |1000| 1.6 min|1 sec | Instant
        |10000| 16.6 min|2 sec| Instant


- A simple resolution to avoid collission is to grow a linked-list at the index where collision is happening. If the list grows too long, it slows down the hash table by a lot. [ As look up for linked list has time complexity of $O(n)$]