### Hash Tables / Fast Reading

Most programming languages have a data structure called hash table, for `fast` reading.  
Other names include hashes, maps, dictionaries or `associative` arrays.  
In Python they are named `dictionaries`.  

In an unordered array, a search would take `O(N) steps` adn O(log N) for an ordered array.  
Using a data structure like hash table, we can make the search `O(1)`!

In [15]:
menu = {
    'a': 1,
    'b': 2,
    'c': 3,
}

# Searching a dictionary takes only O(1) steps
item = menu['b']

# Output result
print(item)

2


### Hashing / Converting to Numbers

The process of taking characters and converting them `to numbers` is called hashing.  
For example BAD converts to 214, then the `sum` hashing function adds them together.  
The hashing `function` can use addition, multiplication or something else.

In [16]:
mapping = { 
    'A': 1,
    'B': 2,
    'C': 3,
    'D': 4,
    'E': 5,
}

def hashing(x: str) -> str:
    hash = ''
    for char in x:
        hash += str(mapping[char])
    return hash

def hashing_sum(x: str) -> int:
    hash = 0
    for char in x:
        hash += mapping[char]
    return hash

def hashing_product(x: str) -> int:
    hash = 1
    for char in x:
        hash *= mapping[char]
    return hash

print("Hashing | BAD =", hashing('BAD'))
print("Hasghing to Sum | BAD =", hashing_sum('BAD'))
print("Hasghing to Product | BAD =", hashing_product('BAD'))

Hashing | BAD = 214
Hasghing to Sum | BAD = 7
Hasghing to Product | BAD = 8


### Storing Data

This is how hash table `stores` the data.  
Let's imagine a simple thesaurus application, when a user search for an word, the app returns just one `synonym`.  

To search for a value, we use the `key` to find that value.  
To find the value associated with `bad` the computer executes two simple steps:  
    1. The computer hashes the `key` bad = 2*1*4 = 8  
    2. Then return the `value` at key 8 in the hash table  

In [17]:
class MyHashTable:

    # Initialize internal table (an array with 16 empty cells)
    def __init__(self):
        self.table = [None] * 16
        self.mapping = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

    # Hash multiplication function
    def hash_entry(self, x):
        hash = 1
        for char in x:
            hash *= self.mapping[char]
        return hash

    def add_entry(self, key, value):
        hash = self.hash_entry(key)
        self.table[hash] = value    # O(1) - Look Here

    def get_entry(self, key):
        hash = self.hash_entry(key)
        return self.table[hash]    # O(1) - Look Here

thesaurus = MyHashTable()

thesaurus.add_entry('bad', 'evil')
thesaurus.add_entry('cab', 'taxi')

value1 = thesaurus.get_entry('bad')
value2 = thesaurus.get_entry('cab')

print(value1)
print(value2)

evil
taxi


### Python Syntax / Dictionaries

Python's literal `syntax` does the same.  
When we have a key and want to find the its value, the `key` itself tell us that value.  
The syntax `{}, []` are core data stuctures for implementing classes Dictionaries and Lists.

In [18]:
thesaurus = {}

thesaurus['bad'] = 'evil'
thesaurus['cab'] = 'taxi'

print(thesaurus['bad']) # O(1)
print(thesaurus['cab']) # O(1)

evil
taxi


### Wrong Implementation

This implementation of this hash table is wrong, because the search will take `O(n)` linear time.

In [19]:
class MyWrongHashTable:

    def __init__(self):
        self.keys = []
        self.values = []

    def add(self, key, value):
        self.keys.append(key)
        self.values.append(value) # O(1)

    def get(self, key):
        for k in range(len(self.keys)): # O(n)
            if key == self.keys[k]:
                return self.values[k]

thesaurus = MyWrongHashTable()
thesaurus.add('bad', 'evil')
thesaurus.add('cab', 'taxi')

value1 = thesaurus.get('bad')
value2 = thesaurus.get('cab')

print(value1)
print(value2)

evil
taxi


### Colisions

Trying to add data to a `cell` that is already filled results in a collision.  

In [20]:
HT = MyHashTable()

HT.add_entry('bad', 'evil')  # 8
HT.add_entry('cab', 'taxi')  # 6
HT.add_entry('dab', 'pat')   # 8

print(HT.get_entry('bad'))  # pat (wrong) !
print(HT.get_entry('cab'))
print(HT.get_entry('dab'))

pat
taxi
pat


### Collisions / Separate Chaining

Insteed of placing a single value in the cell, it places a `reference` to an array.  
It is important that our table is design to have `few` collisions.   
Otherwise it will be `no better` than an array, where in worst-case scenario the lookup takes O(n) steps.  

In [21]:
class MyHashTable_v2:

    # Initialize internal table (an array with 16 empty cells)
    def __init__(self):
        self.table = [None] * 16
        self.mapping = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}

    # Hash multiplication function
    def hash_entry(self, x):
        hash = 1
        for char in x:
            hash *= self.mapping[char]
        return hash

    def add_entry(self, key, value):
        index = self.hash_entry(key)

        if self.table[index] is None: # Look Here
            self.table[index] = []

        self.table[index].append((key, value))

    def get_entry(self, key):
        index = self.hash_entry(key)

        if self.table[index] is not None: # Look Here
            for k, v in self.table[index]:
                if k == key:
                    return v

        return None

HT = MyHashTable_v2()

HT.add_entry('bad', 'evil')  # 8
HT.add_entry('cab', 'taxi')  # 6
HT.add_entry('dab', 'pat')   # 8

print(HT.get_entry('bad'))  # evil (right) !
print(HT.get_entry('cab'))
print(HT.get_entry('dab'))

evil
taxi
pat


### Array Search / Speed Up

Hash tables can make your code faster, even if your data `doesn't` come in pairs.  
By `converting` an array into a hash table, we can go from O(n) searches to O(1) searches.  
Using a hash table is like using an `index` in a book.  

In [22]:
array = [61, 30, 91, 11, 54, 38, 72]

def search_array(arr, x): # O(n)
    for i in range(len(arr)):
        if arr[i] == x:
            return i
    return None

print("Search array | O(n):")
print(search_array(array, 11))
print(search_array(array, 22))

def convert_array(arr):
    d = {}
    for n in arr: 
        d[n] = True
    return d

def number_exists(arr, key): # O(1) - Look Here
    arr = convert_array(arr)
    if key in arr:
        return arr[key]
    return None

print("Search converted | O(1):")
print(number_exists(array, 11))
print(number_exists(array, 22))

Search array | O(n):
3
None
Search converted | O(1):
True
None


### Practical Example / Array Subset

Let's use hash tables to boost the speed of a very `practical` algorithm.  
We want to find out if an array is a `subset` of another array.  

The `O(n,m)` approach will be to use nested loops.   
We iterate through every element of the `smaller` array, and for each we use a second loop on the larger array.  
If we find an element isn't contained in larger array, we return `false`.  

A new approach is to store the `larger` array into a hash table.  
This will be our `index` that will allowed as to make O(1) lookups.  

We iterate through each item of the larger array in order to `build` the hash table.  
We iterate through each item of the smaller array taking just `one step` for the lookup in hash table.  

If `N` is the number of both arrays combined, our algorithm is O(N), since we touch each item once.  
This is a `huge` improvement over the intitial O(n,m) algorithm.

In [23]:
def issubset(arr1, arr2):

    # Initialize smaller and larger arrays
    S = []
    L = []

    # Initialize hash table (dictionary)
    H = {}

    # Determine which array is larger
    if len(arr1) > len(arr2):
        S = arr2
        L = arr1
    else:
        S = arr1
        L = arr2

    # Build the hash table
    for item in L:
        H[item] = True

    # The brilliant part, a second non-nested loop through smaller array
    for item in S:
        if item not in H:
            return False

    return True

A = ['a', 'b', 'c', 'd', 'e', 'f']
B = ['b', 'd', 'f']
C = ['b', 'd', 'f', 'h']

print("B is a subset of A ==", issubset(A, B))
print("C is a subset of A ==", issubset(A, C))

B is a subset of A == True
C is a subset of A == False
