# Hash Tables - Part 1

## Hash Table Basics

Hash functions are different than hash tables and **are often used for data lookup**

Attributes: 
* A set of keys (K) and the size of the set is N
* A subset of (*k*) of K and its size is I
* A hash table of size m - where m is prime and m << N
* A hash function h(k) that takes the modulus of k which is a subset of *k* by m
* Define n as a number of keys in the table and the load factor a = n/m
* To put k into our table, simple assign table[h(k)] = k

## Dealing with Collision (Open Chaining)

All the keys that hash h(k) is a list of keys. So instead of two elements colliding when they hash to the same value the solution is to simple implement a bucket at each index (python) list of keys

This idea is called **open chaining**. 

In [1]:
# insert hash chaining

# adds string to hash table, if string is not a duplicate
def insert(str):
    if (str is None):
        return
    str_hash = hash(str, HASHPRIME)
    if (hash_table[str_hash] is None):
        hash_table[str_hash] = [str]
    else:
        for i in hash_table[str_hash]:
            if (i == str):
                return # duplicate
            hash_table[str_hash] = [str] + hash_table[str_hash]
    
    

In [2]:
# find hash chaining

# return whether str is in table
def find(str):
        if (str is None):
            return False
        bucket = hash_table[hash(str, HASHPRIME)]
        if (bucket is not None):
            for i in bucket:
                if (i == str):
                    return True
        return False

In [3]:
# delete hash chaining

# delete
# removes string from hash table, if string is present
def delete(str):
    if (str is None):
        return
    str_hash = hash(str, HASHPRIME)
    bucket = hash_table[str_hash]
    if (bucket is None):
        return
    b_len = len(hash_table[str_hash])
    
    for i in range(0, b_len):
        if (bucket[i] == str):
            hash_table[str_hash] = bucket[0:i] + bucket[i+1:b_len]
        if (len(hash_table[str_hash]) == 0):
            hash_table[str_hash] = None # clean up empty list case
            return # found and deleted so return from within loop
    return # get here only if str was not in table

## What are the big oh costs of hash tables?

The expected search time of a hash table is O(1)
* This happens provides that m is close in size to n

## How can we make this more optimized

We can sort the list in each bucket thi smakes it so that if you mis-estimate n by a factor of 2 performance is still great.

You can also use AVL trees to make this more sorted, but this requires more complex code

In [4]:
# insert sorted code for hash table

# insert
# adds string to hash table, if string is not a duplicate
def insert(str):
    if (str is None):
        return
    str_hash = hash(str, HASHPRIME)
    bucket = hash_table[str_hash]
    if (bucket is None):
        hash_table[str_hash] = [str]
    else:
        b_len = len(bucket)
        
    for i in range(0, b_len):
        if (bucket[i] == str):
            return # duplicate
        if (bucket[i] > str):
            hash_table[str_hash] = bucket[0:i] + [str] + bucket[i:b_len]
            return
    # only get here if all values in list are smaller than str
    hash_table[str_hash] = bucket + [str]

In [5]:
# find sorted code here hash table

# return whether str is in table
def find(str):
    if (str is None):
        return False
    bucket = hash_table[hash(str, HASHPRIME)]
    if (bucket is not None):
        bucketlen = len(bucket)
        i = 0
        while ((i < bucketlen) and (bucket[i] < str)):
            i = i+1
        if ((i < bucketlen) and (bucket[i] == str)):
            return True
    return False

In [8]:
# delete sorted code here hash table

# removes string from hash table, if string is present
def delete(str):
    if (str is None):
        return
    str_hash = hash(str, HASHPRIME)
    if (hash_table[str_hash] is None):
        return
    
    bucket_len = len(hash_table[str_hash])
    for i in range(0, bucket_len):
        if (hash_table[str_hash][i] == str):
            hash_table[str_hash] = hash_table[str_hash][0:i] + \
                hash_table[str_hash][i+1:bucket_len]
            if (len(hash_table[str_hash]) == 0):
                hash_table[str_hash] = None # clean up empty list case
                return # found and deleted so return from within loop
    return # get here only if str was not in table