# Hash Tables / Dictionaries

Hash tables run in constant time --> O(1) which is the fastest possible runtime.
Alternative to this would be running a sorting algorithm over a list of names which would be between O(n) and O(log n) speed.

Hash tables are particularly good for: 

    • Filtering out duplicates
    • Modelling relationships between items as key:value pairs
    • Storing data to easily retrieve it later (cache)

In [1]:
voted  = {}
def check_voter(name):
    '''
    checks to see if name is already present in dictionary
    if it is, person has already voted
    if not, create entry in dictionary and set to True
    running again for a second time will now result in the name not being allowed to vote
    '''
    if voted.get(name):
        print("kick them out!")
    else:
        voted[name] = True
        print("let them vote")

In [2]:
check_voter('Tom')

let them vote


In [4]:
check_voter('Tom')

kick them out!


In [5]:
cache = {}

def get_page(url):
    '''
    Cache can be used to speed up load times of url's which have static content
    Here, if the page has been requested before, we'll check the cache dictionary and return data associated with this url
    Otherwise, we'll fetch the data with some function "get_data_from_server" and add it to the cache for next time
    '''
    if cache.get(url):
        return cache[url]
    else:
        data = get_data_from_server(url)
        cache[url] = data
        return data

### Load Factor
As hash tables use an array for storage, we need to avoid instances where array gets completely full.
When this happens, we have a collision and will start overwriting data as there are no spare slots in memory to allocate new data to.
To get around this, we could start a linked list at each slot in memory but this will cause item retrieval time to reduce from O(1), constant time to O(n), linear time.

We can see how full our hash table array is by calculting the 'load factor' as follows:

    • No. of items in hash table / total number of slots

Suppose we had the following array with 1 out of 3 slots filled:

    [ , 15, ]
    
Here, our load factor is 0.333...

When arrays approach a certain size, we need to increase the size of the array to ensure new items will continue to fit in without collisions. This is called 'resizing' and involves re-inserting all existing values into a new, larger array.

A good rule of thumb is to resize when the load factor reached 0.7

As for a hash-function that implements this and distributes items evenly rather than clumping them together, resulting in collisions, it's not something we'll really have to worry about as all languages have built in functions for this already. However, something like SHA could be used.