### Collisions / Separate Chaining

Insteed of placing a single value in the cell, it places a `reference` to an array.  
It is important that our table is design to have `few` collisions.   
Otherwise it will be `no better` than an array, where in worst-case scenario the lookup takes O(n) steps.  

In [5]:
class HashTable:
    
    def __init__(self):
        self.capacity = 16
        self.table = [None] * self.capacity

    def _hash(self, x: str) -> int:  # List key is int
        h = 0
        base = 257
        for ch in x:
            h = (h * base + ord(ch)) % self.capacity
        return h

    def __setitem__(self, key, val):
        idx = self._hash(key)

        if self.table[idx] is None: # Look Here
            self.table[idx] = []

        # Search for existing key
        for i, (k, _) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx][i] = (key, val)  # UPDATE

        # Key not found - INSERT
        self.table[idx].append((key, val)) 
        

    def __getitem__(self, key):
        idx = self._hash(key)
        
        if self.table[idx] is not None:

            # Searching
            for k, v in self.table[idx]:  # Average O(1)
                if k == key:
                    return v

        return None

thesaurus = HashTable()

# Insert
thesaurus['bad'] = 'evil'
thesaurus['cab'] = 'taxi'
thesaurus['dab'] = 'pat'

# Search
print(thesaurus['bad'])  # evil
print(thesaurus['cab'])  # taxi
print(thesaurus['dab'])  # pat

# Update
thesaurus['cab'] = 'taxiii'
print(thesaurus['cab'])  # taxiii


evil
taxi
pat
taxiii


### Array Search / Speed Up

Hash tables can make your code faster, even if your data `doesn't` come in pairs.  
By `converting` an array into a hash table, we can go from O(n) searches to O(1) searches.  
Using a hash table is like using an `index` in a book.  

In [22]:
array = [61, 30, 91, 11, 54, 38, 72]

def search_array(arr, x): # O(n)
    for i in range(len(arr)):
        if arr[i] == x:
            return i
    return None

print("Search array | O(n):")
print(search_array(array, 11))
print(search_array(array, 22))

def convert_array(arr):
    d = {}
    for n in arr: 
        d[n] = True
    return d

def number_exists(arr, key): # O(1) - Look Here
    arr = convert_array(arr)
    if key in arr:
        return arr[key]
    return None

print("Search converted | O(1):")
print(number_exists(array, 11))
print(number_exists(array, 22))

Search array | O(n):
3
None
Search converted | O(1):
True
None


### Practical Example / Array Subset

Let's use hash tables to boost the speed of a very `practical` algorithm.  
We want to find out if an array is a `subset` of another array.  

The `O(n,m)` approach will be to use nested loops.   
We iterate through every element of the `smaller` array, and for each we use a second loop on the larger array.  
If we find an element isn't contained in larger array, we return `false`.  

A new approach is to store the `larger` array into a hash table.  
This will be our `index` that will allowed as to make O(1) lookups.  

We iterate through each item of the larger array in order to `build` the hash table.  
We iterate through each item of the smaller array taking just `one step` for the lookup in hash table.  

If `N` is the number of both arrays combined, our algorithm is O(N), since we touch each item once.  
This is a `huge` improvement over the intitial O(n,m) algorithm.

In [23]:
def issubset(arr1, arr2):

    # Initialize smaller and larger arrays
    S = []
    L = []

    # Initialize hash table (dictionary)
    H = {}

    # Determine which array is larger
    if len(arr1) > len(arr2):
        S = arr2
        L = arr1
    else:
        S = arr1
        L = arr2

    # Build the hash table
    for item in L:
        H[item] = True

    # The brilliant part, a second non-nested loop through smaller array
    for item in S:
        if item not in H:
            return False

    return True

A = ['a', 'b', 'c', 'd', 'e', 'f']
B = ['b', 'd', 'f']
C = ['b', 'd', 'f', 'h']

print("B is a subset of A ==", issubset(A, B))
print("C is a subset of A ==", issubset(A, C))

B is a subset of A == True
C is a subset of A == False


### References   
> [A Common-Sense Guide to Data Structures and Algorithms](https://www.amazon.com/gp/product/B08KYMK4NR/), Jay Wengrow