# [CptS 215 Data Analytics Systems and Algorithms](https://github.com/gsprint23/cpts215)
[Washington State University](https://wsu.edu)

[Gina Sprint](http://eecs.wsu.edu/~gsprint/)
# Hash Table Implementation

Learner objectives for this lesson:
* Implement a hash table with collision handling
* Introduce the map abstract data type


## Acknowledgments
Content used in this lesson is based upon information in the following sources:
* [Miller and Ranum](http://interactivepython.org/runestone/static/pythonds/index.html)

## Hash Table Implementation
We can implement a hash table with a list.

In [1]:
class HashTable:
    '''
    
    '''
    def __init__(self, size=11):
        '''
        
        '''
        self.size = size
        self.slots = [None] * self.size
        
    def put(self, item):
        '''
        Place an item in the hash table.
        Return slot number if successful, -1 otherwise (no available slots, table is full)
        '''
        hashvalue = self.hashfunction(item)
        slot_placed = -1
        if self.slots[hashvalue] == None or self.slots[hashvalue] == item: # empty slot or slot contains item already
            self.slots[hashvalue] = item
            slot_placed = hashvalue
        else:
            nextslot = self.rehash(hashvalue)
            while self.slots[nextslot] != None and self.slots[nextslot] != item: 
                nextslot = self.rehash(nextslot)
                if nextslot == hashvalue: # we have done a full circle through the hash table
                    # no available slots
                    return slot_placed

            self.slots[nextslot] = item
            slot_placed = nextslot
        return slot_placed
        
    def get(self, item):
        '''
        returns slot position if item in hashtable, -1 otherwise
        '''
        startslot = self.hashfunction(item)
        
        stop = False
        found = False
        position = startslot
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == item:
                found = True
            else:
                position=self.rehash(position)
                if position == startslot:
                    stop = True
        if found:
            return position
        return -1
    
    def remove(self, item):
        '''
        Removes item.
        Returns slot position if item in hashtable, -1 otherwise
        '''
        startslot = self.hashfunction(item)
        
        stop = False
        found = False
        position = startslot
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == item:
                found = True
                self.slots[position] = None
            else:
                position=self.rehash(position)
                if position == startslot:
                    stop = True
        if found:
            return position
        return -1

    def hashfunction(self, item):
        '''
        Remainder method
        '''
        return item % self.size

    def rehash(self, oldhash):
        '''
        Plus 1 rehash for linear probing
        '''
        return (oldhash + 1) % self.size
    
ht = HashTable()
print(ht.put(61))
print(ht.put(7))
print(ht.put(12))
print(ht.put(44))
print(ht.put(92))
print(ht.put(55))
print(ht.put(9))
print(ht.put(4))
print(ht.put(21))
print(ht.slots)
print(ht.put(23))
print(ht.put(39))
print(ht.slots)
# hash table is full, no room to put again
print(ht.put(90))
print(ht.slots)
print(ht.remove(55))
print(ht.slots)

6
7
1
0
4
2
9
5
10
[44, 12, 55, None, 92, 4, 61, 7, None, 9, 21]
3
8
[44, 12, 55, 23, 92, 4, 61, 7, 39, 9, 21]
-1
[44, 12, 55, 23, 92, 4, 61, 7, 39, 9, 21]
2
[44, 12, None, 23, 92, 4, 61, 7, 39, 9, 21]


## Map Implementation
Now that we know how to implement a hash table, we can implement a dictionary! The abstract data type of a dictionary is also called a map. A map is an unordered collection of key value pairs. The keys are unique so that there is a one-to-one relationships between a key and a value. To look up a value given a key, we will use a hash table (because it is efficient) and a parallel array to store the values. When we look up a key, we will hash it to identify its slot. Then, we can look up the value in a parallel array stored at the same slot location.

### An Aside: Hashing Strings
The example implementation below assumes a string key and any value. A simple approach to hash a string key is to compute an integer representing the string is by summing the unicode values for each character in the string. This simple approach works, except it disregards the order of the characters in the string and doesn't compute a unique hash for a string. This means that anagrams will be hashed to the same slot. For example:

In [2]:
c = ord("c")
a = ord("a")
t = ord("t")
print("f(\"cat\") = (%d + %d + %d) %% 11 = %d" %(c, a, t, (c + a + t) % 11))
print("f(\"tac\") = (%d + %d + %d) %% 11 = %d" %(t, a, c, (t + a + c) % 11))

f("cat") = (99 + 97 + 116) % 11 = 4
f("tac") = (116 + 97 + 99) % 11 = 4


To address this issue we could instead call the special method [`__hash__()`](https://docs.python.org/3/reference/datamodel.html#object.__hash__):

In [10]:
cat_hash = "cat".__hash__()
tac_hash = "tac".__hash__()

print("f(\"cat\") = \"cat\".__hash__() %% 11 = %d %% 11 = %d" %(cat_hash, cat_hash % 11))
print("f(\"tac\") = \"tac\".__hash__() %% 11 = %d %% 11 = %d" %(tac_hash, tac_hash % 11))

f("cat") = "cat".__hash__() % 11 = 5585333550734385640 % 11 = 5
f("tac") = "tac".__hash__() % 11 = 1899368083108669241 % 11 = 2


Note: By default, the `__hash__()` values of `str`, `bytes` and `datetime` objects are "salted" with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

### Example

In [11]:
class Map(HashTable):
    '''
    
    '''
    def __init__(self, size=11):
        '''
        
        '''
        super().__init__(size)
        self.values = [None] * self.size # holds values
        
    def __str__(self):
        '''
        
        '''
        s = ""
        for slot, key in enumerate(self.slots):
            value = self.values[slot]
            s += str(key) + ":" + str(value) + ", "
        return s
    
    def __len__(self):
        '''
        Return the number of key-value pairs stored in the map.
        '''
        count = 0
        for item in self.slots:
            if item is not None:
                count += 1
        return count
    
    def __getitem__(self, key):
        '''
        
        '''
        return self.get(key)

    def __setitem__(self, key, data):
        '''
        
        '''
        self.put(key,data)
        
    def __delitem__(self, key):
        '''
        
        '''
        self.remove(key)
        
    def __contains__(self, key):
        '''
        
        '''
        return self.get(key) != -1

            
    def put(self, key, value):
        '''
        Add a new key-value pair to the map. If the key is already in the map then replace the old value with the new value.
        '''
        slot = super().put(key)
        if slot != -1:
            self.values[slot] = value
        return -1
        
    def get(self, key):
        '''
        
        '''
        slot = super().get(key)
        if slot != -1:
            return self.values[slot]
        return -1
    
    def remove(self, key):
        '''
        Removes key:value pair.
        Returns slot location if item in hashtable, -1 otherwise
        '''
        slot = super().remove(key)
        if slot != -1:
            self.values[slot] = None
        return slot

    def hashfunction(self, item):
        '''
        Remainder method
        '''
        key = 0
        for x in item:
            key += ord(x)
        return key % self.size
    
m = Map()
m["cat"] = len("cat")
m["dog"] = len("dog")
m["lion"] = len("lion")
m["tiger"] = len("tiger")
m["bird"] = len("bird")
m["cow"] = len("cow")
m["goat"] = len("goat")
m["pig"] = len("pig")
m["chicken"] = len("chicken")
print(m)
m["llama"] = len("llama")
m["rooster"] = len("rooster")
print(m)
# hash table is full, no room to put again
m["fish"] = len("fish")
print(m)
del m["lion"]
print(m)
print(len(m))
print("cow" in m)
print("fish" in m)

tiger:5, cow:3, pig:3, chicken:7, cat:3, lion:4, dog:3, None:None, None:None, goat:4, bird:4, 
tiger:5, cow:3, pig:3, chicken:7, cat:3, lion:4, dog:3, llama:5, rooster:7, goat:4, bird:4, 
tiger:5, cow:3, pig:3, chicken:7, cat:3, lion:4, dog:3, llama:5, rooster:7, goat:4, bird:4, 
tiger:5, cow:3, pig:3, chicken:7, cat:3, None:None, dog:3, llama:5, rooster:7, goat:4, bird:4, 
10
True
False


## Practice Problems
Note: the following problems are adapted from Koffman and Wolfgang.

### 1
Write a method to display all key-value pairs in a `Map`, on pair per line. Define an iterator for the `Map` class to do this.

### 2
Using the `Map` class, implement a cell phone contact list. That is, maintain the list of contacts for a cell phone owner. For each contact, a person's name, there should be a list of phone numbers that can be changed. The interface for the `Map` should include the following:
* `add_or_change_entry(name, numbers)`: Changes the numbers associated with the given `name` (string) or adds a new entry with this `name` and a list of `numbers` (list). Returns the old list of numbers or `None` if this is a new entry.
* `lookup_entry(name)`: Searches the contact list for the given `name` and returns its list of numbers or `None` if the `name` is not found.
* `remove_entry(name)`: Removes the entry with the specified `name` from the contact list and returns its list of numbers or `None` if the `name` is not in the contact list.
* `display()`: Displays the contact list in order by `name`.