## 7.3 – Dictionaries
### Hash Tables
The *associative array* is the abstract data type that you will recognise from its Python implementation: the dictionary. This is also called a *hash table* or *hash map*. Lots of names for the same thing: a structure which maps arbitrary *keys* to *values*.

In many units, this is where you would learn about the basic idea behind hash functions, how we can combine those with an array, how we run into collisions, and so on. But you've seen all that in the hash set!

Suppose we're given a key-value pair. The hash set already gives us the structure for how to turn an arbitrary key into an index for an array. If you have a fixed number of items and can design your hash to be perfect and unique, you do not need to store the key at all, you can just store the value in this location in the array. 

But the hash table is usually designed to support later additions, meaning collisions are possible, meaning we need to store the key also. So the way we'll do this is to store *two* items in each location in the underlying array, the key and the value. We can do this using a tuple, or we could write a class which contained fields `.key` and `.value`. I like the aesthetic of the latter, and if you are doing this in another language you might not have tuples, only `struct`, so let's do it that way.

In [1]:
class HashTableCell:
    def __init__(self, key, value):
        self.key = key
        self.value = value


class HashTable:
    def __init__(self, arr_size=100):
        self.arr = [None] * arr_size
        self.n = arr_size
        
    def add(self, key, value):
        key_index = hash(key) % self.n
        
        while self.arr[key_index] is not None and self.arr[key_index].key != key:
            key_index = (key_index + 1) % self.n
        
        self.arr[key_index] = HashTableCell(key, value)
        
    def get(self, key):
        key_index = hash(key) % self.n
        
        while self.arr[key_index] is not None and self.arr[key_index].key != key:
            key_index = (key_index + 1) % self.n
            
        if self.arr[key_index] is None:
            raise KeyError(f"No such key {key}") 
        
        return self.arr[key_index].value
    
    
my_table = HashTable()
my_table.add("ray", 5000)
my_table.add("ali", 3000)
my_table.add("sam", 2000)

print(my_table.get("sam"))

my_table.add("sam", 9000)

print(my_table.get("sam"))

2000
9000


Easy once you know the basics. Notice this implementation of the hash table naturally replaces existing values for known keys also, which is the standard behaviour.

#### Chaining
I also want to introduce a different collision resolution mechanic called *chaining*. 

The code above uses *linear probing* – we put the value into the next available space. 

With chaining, we just store a list of values at each space in the array! 

So we have a kind of 2D data structure, but each element could be a different length. Ideally we'll use a *linked list*, because we'll mostly be doing membership testing (search) which is going to be $O(n)$ whatever we do, so we want to avoid the cost of having to occasionally resize the array list.

***Exercise:*** Adapt the code above to use chaining. You can use a built in Python list, a `deque`, or reuse the LinkedList class from the previous notebook.

### Hash Table Complexity
The hash table has the same complexity considerations as the hash set. Provided it's big enough we get $O(1)$ average access efficiency. But if the load factor gets too high the performance degrades. As previously mentioned, most implementations will resize once they get too large, a feature we haven't included above.

What about *chaining* vs *probing*? You could write a dissertation on hash table conflict resolution, but in short, chaining has more consistent performance as load factor gets high, but it is slightly slower than probing if the load factor is kept within the normal range.

<img src="./resources/chain_probe.png" width=450 />

You can read a lot more about this subject online, such as [good old Wikipedia](https://en.wikipedia.org/wiki/Hash_table#Collision_resolution).

## What Next?
Once you're done with hash tables, head back to Engage to move onto the next section – a data structure that isn't already built in to Python!