# Chapter 7

## Enter the Hast Table

In Python, the hash table is called a "dictionary". Here's an example.

In [3]:
menu = {"french fries": 0.75, 
        "hamburger": 2.50, 
        "hot dog": 1.50, 
        "soda": 0.60}

In [4]:
menu

{'french fries': 0.75, 'hamburger': 2.5, 'hot dog': 1.5, 'soda': 0.6}

In [5]:
menu["french fries"]

0.75

Looking up a value in a hash table has an efficiency of $O(N)$.

## Building a Thesaurus for Fun and Profit, but Mainly Profit

In [6]:
thesaurus = {}

In [7]:
thesaurus["bad"] = "evil"

In [8]:
thesaurus

{'bad': 'evil'}

In [9]:
thesaurus["cab"] = "taxi"

In [10]:
thesaurus

{'bad': 'evil', 'cab': 'taxi'}

In [11]:
thesaurus["ace"] = "star"

In [12]:
thesaurus

{'bad': 'evil', 'cab': 'taxi', 'ace': 'star'}

In [13]:
thesaurus["bad"]

'evil'

## Dealing with Collisions

In [14]:
thesaurus["dab"] = "pat"

In [15]:
thesaurus

{'bad': 'evil', 'cab': 'taxi', 'ace': 'star', 'dab': 'pat'}

## Practical Examples

Instead of using arrays for creating sets (i.e., array-based sets), we can use a hash table.

In [16]:
set = {}

In [17]:
# a piece of data in the set is the key,
# and the corresponding value can be anything... usually
# something that doesn't take much memory
set["apple"] = 1
set["banana"] = 1
set["cucumber"] = 1

In [18]:
set

{'apple': 1, 'banana': 1, 'cucumber': 1}

Instead of a $O(N)$ linear search to check whether a piece of data exists, we can just add the data to the hash table set. Even if it already exists, we just overwrite the existing data with the new (duplicate) data.

In [19]:
set["banana"] = 1

In [20]:
set

{'apple': 1, 'banana': 1, 'cucumber': 1}

Hash tables are perfect for any situation where we want to keep track of which values exists within a dataset.

Using a hash table, we can write a function that has $O(N)$ efficiency (instead of $O(N^2)$) to check whether an array has duplicate values.

In [47]:
def has_duplicate_value(array):
    existing_values = {}
    for i in range(len(array)):
        if existing_values.get(array[i]) is None:
            existing_values[array[i]] = 1
        else:
            # print(existing_values)
            return(True)
    # print(existing_values)
    return(False)

In [48]:
short_list = [1, 3, 5, 7, 9, 11]

In [49]:
has_duplicate_value(short_list)

False

In [50]:
short_list = [1, 3, 5, 7, 9, 3]

In [51]:
has_duplicate_value(short_list)

True

Let’s say we are building an electronic voting machine, in which voters can choose from a list of candidates or write in another candidate.

In [67]:
votes = []

def add_vote(candidate):
    votes.append(candidate)

In [68]:
add_vote("Thomas Jefferson")
add_vote("John Adams")
add_vote("John Adams")
add_vote("Thomas Jefferson")
add_vote("John Adams")
add_vote("John Adams")
add_vote("Thomas Jefferson")
add_vote("Thomas Jefferson")
add_vote("Thomas Jefferson")
add_vote("Bill McGillicuddy")

In [69]:
votes

['Thomas Jefferson',
 'John Adams',
 'John Adams',
 'Thomas Jefferson',
 'John Adams',
 'John Adams',
 'Thomas Jefferson',
 'Thomas Jefferson',
 'Thomas Jefferson',
 'Bill McGillicuddy']

Insertions are $O(N)$ efficient, but we'd end up with a really long array. To count the votes at the end of the day would take $O(N)$. That's unnecessary. Instead, use a hash table to store the data in the first place.

In [70]:
votes = {}

def add_vote(candidate):
    if votes.get(candidate) is not None:
        votes[candidate] = votes[candidate] + 1
    else:
        votes[candidate] = 1

def count_votes():
    return(votes)

In [71]:
add_vote("Thomas Jefferson")
add_vote("John Adams")
add_vote("John Adams")
add_vote("Thomas Jefferson")
add_vote("John Adams")
add_vote("John Adams")
add_vote("Thomas Jefferson")
add_vote("Thomas Jefferson")
add_vote("Thomas Jefferson")
add_vote("Bill McGillicuddy")

In [72]:
count_votes()

{'Thomas Jefferson': 5, 'John Adams': 4, 'Bill McGillicuddy': 1}

This way each insertion is $O(1)$ and then counting the votes at the end of the day is also $O(1)$. Boom.