# ENGR30004

## Week 7 - Intermediate Data Structures

Today, we will be looking at Hashtables and Sets

### Learning Objectives

- Implementing a Hashtable suitable for a real-life problem (Problem 1)
    - Figure out how to use a python list to implement the functionality of a hashtable.
    - Understand hash-functions, and create a basic hashfunction for a simple real-life scenario.
    - Understand hash-collissions, and how to handle them effectively.
    - Be able to compare and contrast different implementations for hashfunctions and collission handling.
    
- Using a Set (Problem 2)
    - Identify problems where Sets are useful.
    - Familiarise with basic Set functions.

### Problem 1: Hashtables

We will learn how to use hashtables through a real-life example.

Imagine you are collecting the data for a university. Not to complicate the example, you are collecting student `ID numbers`, and corresponding `names`.

The ID numbers are of the format `XXXXXX C`, where an `X` stands for a digit from `[0-9]` (inclusive), and `C` stands for an uppercase letter `[A-Z]` (inclusive).

Eg: `529401 F`

Assume that you have a large collection of data, and that you need to look up the names of students with a given ID number. (Given in `student_records.txt` file)

#### Hash function

Create a hash function suitable for the given problem. (If you can separate the hash function into a hashcode, and a compression function).

Following are different hash functions (hf1, hf2, hf3, hf4). In the workshops we compared these different functions 

In [None]:
def hash_code(key):
  hashcode = 0
  for i in range(0, 6):
    hashcode += int(key[i])

  return hashcode

In [None]:
def hash_compression(hc):
  return hc % 100

In [None]:
def hf1(key):
    hashcode = hash_code(key) #hash code
    return hash_compression(hashcode) # hash compression

In [None]:
def hf2(key):
    hashcode = ord(key[-1]) - ord('A')
    return hashcode

In [None]:
# https://docs.python.org/3/library/functions.html#hash
def hf3(key):
    return hash(key) % 100

In [None]:
# https://docs.python.org/3/library/hashlib.html - another library you can use to get a hash code
from hashlib import md5

def hf4(key):
  hash_code_hex = md5(str.encode(key)).hexdigest()
  hash_code = int(hash_code_hex, base=16)
  return hash_compression(hash_code)

In [None]:
hf1("529401 F")

21

In [None]:
hf2("529401 F")

5

In [None]:
hf3("839554 M")

69

In [None]:
hf4("839554 M")

82

#### Hashtable

Implement a crude version of a hashtable using python lists. Use the skeleton given below.

How will you handle the hash colissions? (Implement one stratergy you learnt in the class)

In [None]:
# Without collission handling
class Hashtable:
    
    def __init__(self):
        self.hash_table = [None] * 100
    
    def insert(self, key, value):
        index = hf3(key) #choose a hash function from above
        self.hash_table[index] = (key, value)
    
    def search(self, key):
        index = hf3(key) #choose a hash function from above
        if self.hash_table[index] is not None:
          return self.hash_table[index][1]
        else:
          raise Exception("Entry not Found")
    
    def count(self):
        return sum(entry is not None for entry in self.hash_table)
    
    def remove(self, key):
      index = hf3(key) #choose a hash function from above
      if self.hash_table[index] is not None:
        return self.hash_table.pop(index)
      else:
        raise Exception("Entry not Found to delete")

In [None]:
# With collission handling - This example is shown for Chaining
class Hashtable:
    
    def __init__(self):
        self.hash_table = [None] * 100
    
    def insert(self, key, value):
        index = hf3(key)
        hash_table_index_val = self.hash_table[index]

        if (hash_table_index_val is None):
          self.hash_table[index] = (key, value)
        elif (type(hash_table_index_val) is list):
          self.hash_table[index].append((key, value))
        else:
          self.hash_table[index] = [self.hash_table[index]]
          self.hash_table[index].append((key, value))
    
    def search(self, key):
        index = hf3(key)
        hash_table_index_val = self.hash_table[index]

        if (type(hash_table_index_val) is list):
          for k, val in self.hash_table[index]:
            if k == key:
              return val
        elif (type(hash_table_index_val) is tuple):
          if key == self.hash_table[index][0]:
            return self.hash_table[index][1]

        raise Exception("Entry not Found")
    
    def count(self):
        return sum(entry is not None for entry in self.hash_table)
    
    def remove(self, key):
        index = hf3(key)
        hash_table_index_val = self.hash_table[index]

        if (type(hash_table_index_val) is list):
          for k, val in self.hash_table[index]:
            if k == key:
              self.hash_table[index].remove((k, val))
              return
        elif (type(hash_table_index_val) is tuple):
          if key == self.hash_table[index][0]:
            self.hash_table[index] = None

        raise Exception("Entry not Found to delete") 

Test your solution by testing it against `student_records.txt` file.

In [None]:
ht = Hashtable()

In [None]:
f = open("student_records.txt", 'r')

for i in range(40):
    line = f.readline()
    line = line.split(' ')
    key = line[0] + " " + line[1]
    val = line[2][:-1]
    print(key + " " + val)
    
    ht.insert(key, val)
    
f.close()

839554 M Michael
857000 X Christopher
340118 L Jessica
199760 L Matthew
387025 Y Ashley
719977 H Jennifer
308335 V Joshua
983106 F Amanda
618587 E Daniel
693671 O David
910524 R James
737903 K Robert
146971 Q John
415442 H Joseph
469007 D Andrew
180166 D Ryan
819207 V Brandon
279847 Y Jason
594450 O Justin
747579 X Sarah
576519 H William
244644 Z Jonathan
397033 S Stephanie
374357 G Brian
618581 R Nicole
728758 K Nicholas
839883 Y Anthony
218661 T Heather
430472 M Eric
777329 K Elizabeth
222913 D Adam
302859 O Megan
191094 F Melissa
972734 O Kevin
593442 Q Steven
621176 W Thomas
900308 H Timothy
491176 I Christina
740845 K Kyle
569451 T Rachel


In [None]:
ht.search("191094 F")

'Melissa'

In [None]:
ht.search("983106 F")

'Amanda'

In [None]:
ht.search("308335 V")

'Joshua'

In [None]:
ht.search("308335 A")

Exception: ignored

In [None]:
ht.count()

33

#### Hashtables Using Python Standard Library

Implement the same solution using a python dictionary.

In [None]:
# python dictionaries - https://docs.python.org/3/tutorial/datastructures.html#dictionaries
f = open("student_records.txt", 'r')

ht = dict()

for i in range(40):
    line = f.readline()
    line = line.split(' ')
    key = line[0] + " " + line[1]
    val = line[2][:-1]
    print(key + " " + val)
    
    ht[key] = val
    
f.close()

839554 M Michael
857000 X Christopher
340118 L Jessica
199760 L Matthew
387025 Y Ashley
719977 H Jennifer
308335 V Joshua
983106 F Amanda
618587 E Daniel
693671 O David
910524 R James
737903 K Robert
146971 Q John
415442 H Joseph
469007 D Andrew
180166 D Ryan
819207 V Brandon
279847 Y Jason
594450 O Justin
747579 X Sarah
576519 H William
244644 Z Jonathan
397033 S Stephanie
374357 G Brian
618581 R Nicole
728758 K Nicholas
839883 Y Anthony
218661 T Heather
430472 M Eric
777329 K Elizabeth
222913 D Adam
302859 O Megan
191094 F Melissa
972734 O Kevin
593442 Q Steven
621176 W Thomas
900308 H Timothy
491176 I Christina
740845 K Kyle
569451 T Rachel


In [None]:
ht["191094 F"]

'Melissa'

In [None]:
ht["983106 F"]

'Amanda'

In [None]:
ht["308335 V"]

'Joshua'

In [None]:
ht["308335 A"]

KeyError: ignored

### Problem 2: Sets

Let us explore sets while solving a programming question. We will be using sets from python standard library.

https://docs.python.org/3/tutorial/datastructures.html#sets

The following data are the winning Powerball numbers for some draws. (Don't gamble, btw!)

**August**

**Draw 1:**
28
25
10
27
14
11
32
6

**Draw 2:**
26
7
9
10
2
30
8
3

**Draw 3:**
17
10
4
26
19
23
21
7

**Draw 4:**
16
25
12
7
11
10
21
18



**July**

**Draw 1:**
23
26
7
4
33
35
22
17

**Draw 2:**
28
25
15
13
20
29
16
11

**Draw 3:**
25
32
17
16
3
11
26
2

**Draw 4:**
7
8
17
9
19
32
15
6

**Draw 5:**
13
19
18
17
11
16
20
15




Answer the following questions, with the use of Sets

1. Which balls were drawn in August?

In [None]:
August_1 = set([28, 25, 10, 27, 14, 11, 32, 6])
August_2 = {26, 7, 9, 10, 2, 30, 8, 3}
August_3 = {17, 10, 4, 26, 19, 23, 21, 7}
August_4 = {16, 25, 12, 7, 11, 10, 21, 18}

August = August_1 | August_2 | August_3 | August_4
August

{2,
 3,
 4,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 14,
 16,
 17,
 18,
 19,
 21,
 23,
 25,
 26,
 27,
 28,
 30,
 32}

2. Which balls were drawn in July?

In [None]:
july1 = {23, 26, 7, 4, 33, 35, 22, 17}
july2 = {28, 25, 15, 13, 20, 29, 16, 11}
july3 = {25, 32, 17, 16, 3, 11, 26, 2}
july4 = {7, 8, 17, 9, 19, 32, 15, 6}
july5 = {13, 19, 18, 17, 11, 16, 20, 15}

july = july1 | july2 | july3 | july4 | july5
july

{2,
 3,
 4,
 6,
 7,
 8,
 9,
 11,
 13,
 15,
 16,
 17,
 18,
 19,
 20,
 22,
 23,
 25,
 26,
 28,
 29,
 32,
 33,
 35}

3. How many unique balls were drawn in each month?

In [None]:
len(August), len(july)

(23, 24)

4. Which balls were drawn in both August and July?

In [None]:
August & july

{2, 3, 4, 6, 7, 8, 9, 11, 16, 17, 18, 19, 23, 25, 26, 28, 32}

5. Which balls were drawn in August, but not in July? and which balls were drawn in July, but not in August?

In [None]:
August - july

{10, 12, 14, 21, 27, 30}

In [None]:
july - August

{13, 15, 20, 22, 29, 33, 35}

If you have completed the exercies, go back and implement a different hashfunction or a collission handling mechanism.