# Hash Table Implementation

## Lesson Overview

To best understand hash tables, it's important to understand their component parts:

*   Hash function
*   Hash collisions
*   Hash collision resolution strategies

Once you've got a good sense of those, you can begin to put them together into a full implementation of a hash table. That said, this is a step up in complexity, so the potential for bugs becomes greater, as well. For one, most maps/dictionaries (and your hash tables, consequently) don't just store the value in a bucket; they store a combination of three pieces of data:
 
*   the `key`, or the unique identifier for the data being stored
*   the `hash_code`, or the result of the hash function run on the key
*   the `value`, or the data corresponding to the key being stored

In [None]:
#persistent
class TableEntry:

  def __init__(self, hash_code, key, value):
    self.hash_code = hash_code
    self.key = key
    self.value = value

### Writing a hash table class

To build a working `HashTable` class, we'll need a hash function, a way to resolve collisions, and ultimately, a way to put data into the table that makes use of those functions. Here's one possible implementation below.

In [None]:
# This is the default initial table size for the hash table.
INITIAL_TABLE_SIZE = 5

class HashTable:

  def __init__(self):
    self.current_table_size = INITIAL_TABLE_SIZE
    # Make INITIAL_TABLE_SIZE buckets for TableEntry instances.
    self.buckets = [None] * INITIAL_TABLE_SIZE
    # This is a quick shorthand to allow us to check if the bucket currently is
    # being used by our hash table.
    # It stores booleans: False when the bucket is empty, and True otherwise.
    # It is more efficient to store this array of booleans than to look up if
    # elements of the buckets array are None.
    self.bucket_in_use = [False] * INITIAL_TABLE_SIZE
    self.num_buckets_in_use = 0
    self.keys = set()
    self.values = set()

  def hash_function(self, key):
    # Python provides a hash() function, which returns an integer for the given
    # argument. Our hash_function takes that and uses the modulo operator to get
    # a bucket that we can use for our hash table.
    hash_code = hash(key)
    return hash_code % self.current_table_size

  def resolve_collisions(self, key):
    # If we hit the same bucket twice during this linear resolution, we should
    # notify the user that a collision couldn't be resolved.
    bucket_num = self.hash_function(key)
    initial_bucket_num = bucket_num
    while self.bucket_in_use[bucket_num]:
      bucket_num = (bucket_num + 1) % self.current_table_size
      if bucket_num == initial_bucket_num:
        raise ValueError(
          'A hash table collision has occurred that cannot be resolved.')
    return bucket_num

  def put(self, key, value):
    bucket_num = self.hash_function(key)
    if self.bucket_in_use[bucket_num]:
      bucket_num = self.resolve_collisions(key)
    self.buckets[bucket_num] = TableEntry(bucket_num, key, value)
    self.bucket_in_use[bucket_num] = True
    self.keys.add(key)
    self.values.add(value)

That said, what happens when you ultimately fill the table? Since a collision can't be resolved, you'll run into a `ValueError`. Is there a way to fix that problem before it throws an error?

### Resizing a hash table

You now need to consider things like the number of your buckets relative to the number of entries you're going to have. For instance, calling the `put()` method adds a new entry to your map. You need to have at least one bucket available to put that new entry into. If you have more entries than buckets, you'll never be able to resolve a collision and `put()` will fail. As a result, you need to be able to **resize** your hash table.

We haven't talked about the need for a `resize()` method, yet, but it's critical to allow your hash table to take on more elements. A map usually starts small and then, once it hits its `RESIZE_PERCENTAGE`, it creates a larger storage section for it. Python's dictionary, for instance (implemented as a hash table), resizes its backing hash table whenever it's 2/3 full (as we've done, here). Let's look at an implementation of this method.

```python
# This indicates how much the backing table will expand when we resize, e.g. if
# RESIZE_FACTOR == 2, the table will double in size when resize() is called.
RESIZE_FACTOR = 2

# What % of buckets must be full before we resize?
BUCKET_RESIZE_PERCENTAGE = 0.6666666666

def resize(self):
  new_buckets = [None] * (len(self.buckets) * RESIZE_FACTOR)
  new_bucket_in_use = [False] * (len(self.bucket_in_use) * RESIZE_FACTOR)
  self.current_table_size *= RESIZE_FACTOR
  for item in self.buckets:
    if item is not None:
      new_bucket_num = self.hash_function(item.key)
      if new_bucket_in_use[new_bucket_num]:
        new_bucket_num = self.resolve_collisions(item.key)
      item.hash_code = new_bucket_num
      new_buckets[new_bucket_num] = item
      new_bucket_in_use[new_bucket_num] = True
  self.buckets = new_buckets
  self.bucket_in_use = new_bucket_in_use
```

The `resize()` method doubles the number of available buckets, but it then has to do the lengthier task of moving all of the old data into the new buckets. Since that consumes a lot of time and memory, we try to avoid doing it for as long as we can. With this, though, you should have all you need to implement your own hash table.

### Building a full hash table

Now that we've completed the `resize()` implementation, we can integrate that into our existing hash table implementation.

In [None]:
#persistent
# This is the default initial table size for the hash table.
INITIAL_TABLE_SIZE = 5

# This indicates how much the backing table will expand when we resize, e.g. if
# RESIZE_FACTOR == 2, the table will double in size when resize() is called.
RESIZE_FACTOR = 2

# What % of buckets must be full before we resize?
BUCKET_RESIZE_PERCENTAGE = 0.6666666666

class HashTable:

  def __init__(self):
    self.current_table_size = INITIAL_TABLE_SIZE
    # Make INITIAL_TABLE_SIZE buckets for TableEntry instances.
    self.buckets = [None] * INITIAL_TABLE_SIZE
    # This is a quick shorthand to allow us to check if the bucket currently is
    # being used by our hash table.
    # It stores booleans: False when the bucket is empty, and True otherwise.
    # It is more efficient to store this array of booleans than to look up if
    # elements of the buckets array are None.
    self.bucket_in_use = [False] * INITIAL_TABLE_SIZE
    self.num_buckets_in_use = 0
    self.keys = set()
    self.values = set()

  def hash_function(self, key):
    # Python provides a hash() function, which returns an integer for the given
    # argument. Our hash_function takes that and uses the modulo operator to get
    # a bucket that we can use for our hash table.
    hash_code = hash(key)
    return hash_code % self.current_table_size

  def resolve_collisions(self, key):
    # If we hit the same bucket twice during this linear resolution, we should
    # notify the user that a collision couldn't be resolved.
    bucket_num = self.hash_function(key)
    initial_bucket_num = bucket_num
    while self.bucket_in_use[bucket_num]:
      bucket_num = (bucket_num + 1) % self.current_table_size
      if bucket_num == initial_bucket_num:
        raise ValueError(
          'A hash table collision has occurred that cannot be resolved.')
    return bucket_num

  def put(self, key, value):
    if self.num_buckets_in_use >= (
        len(self.buckets) * BUCKET_RESIZE_PERCENTAGE):
      # Note the use of self.resize, here, if the number of buckets in use ever
      # exceeds the limit we've set internally.
      self.resize()
    bucket_num = self.hash_function(key)
    if self.bucket_in_use[bucket_num]:
      bucket_num = self.resolve_collisions(key)
    self.buckets[bucket_num] = TableEntry(bucket_num, key, value)
    self.bucket_in_use[bucket_num] = True
    self.keys.add(key)
    self.values.add(value)

  def resize(self):
    new_buckets = [None] * (len(self.buckets) * RESIZE_FACTOR)
    new_bucket_in_use = [False] * (len(self.bucket_in_use) * RESIZE_FACTOR)
    self.current_table_size *= RESIZE_FACTOR
    for item in self.buckets:
      if item is not None:
        new_bucket_num = self.hash_function(item.key)
        if new_bucket_in_use[new_bucket_num]:
          new_bucket_num = self.resolve_collisions(item.key)
        item.hash_code = new_bucket_num
        new_buckets[new_bucket_num] = item
        new_bucket_in_use[new_bucket_num] = True
    self.buckets = new_buckets
    self.bucket_in_use = new_bucket_in_use

In the following exercises, we will use [class inheritance](https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming)) to create a new class `YourHashTable` that inherits all the methods defined above for `HashTable`. This means we can create new methods for `YourHashTable` without having to copy-paste the entire `HashTable` definition.

## Question 1

Which of these statements about a hash table are true? There may be more than one correct response. 


**a)** When resizing a hash table, you may also need to rehash the elements in the table.

**b)** A hash table needs some form of collision resolution to avoid crashes.

**c)** A perfect hash function will have 0 collisions.

**d)** Hash tables rely on hash functions, hash collision resolution, and dynamic resizing in order to dynamically store elements.

### Solution

The correct answers are **a)**, **b)**, and **d)**. 

**c)** The hash function depends on the number of buckets, so if you have more entries being stored in the table than buckets, then a collision is guaranteed via the pigeonhole principle.

## Question 2

Let's start by implementing `contains()`, which will return True if a given key exists in the table. If the key doesn't exist, we will return False.

Hash tables usually store their keys and values in some accessible structure, like a `set`. Assume our hash table has a set called `keys` keeping track of the keys that the table has been storing.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    # TODO(you): Implement
    print('This method has not been implemented.')

### Hint

Keep in mind that a set `my_set` allows you to see if it contains `key` by using `if key in my_set`.

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
test_table = YourHashTable()
test_table.put('apple', 'orange')

print(test_table.contains('apple'))
# Should print: True

print(test_table.contains('peach'))
# Should print: False

### Solution

In Python we can use `in` to check if a key is in a set. For non-Python languages, you can loop through the set's elements and compare them to `key`, returning `True` if you find a match.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

## Question 3

Next, let's work on building out a method called `get_bucket_number()`. Think of this as a better version of `hash_function`, as it allows you to get the bucket number of a key after you've handled collision resolution. This will help us with a later implementation.

Start by finding the initial bucket number, and then handle collisions if needed. You may assume collisions are resolved linearly, if you want, but your implementation can be completed without that assumption.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # TODO(you): Implement
    print('This method has not been implemented.')

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
test_table = YourHashTable()
test_table.put('apple', 'orange')

print(test_table.get_bucket_number('apple'))
# Should print: 0

print(test_table.get_bucket_number('peach')) 
# Should raise: ValueError

### Solution

`get_bucket_number()` should first verify the key exists, and then once you know that, you can either try to resolve it most efficiently (which would depend on how your table implementation resolves collisions) or just do a quick linear scan of the buckets. Since we don't know what method our table uses to resolve collisions, currently, let's do a linear scan.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

## Question 4

Now that we know whether or not a key exists and what bucket it's mapped to, we can implement `update()`, which allows users to pass in a key and value and, if that key exists, it overwrites that key's previous value with a new one. Assume that your keys will be of type `string` and your values will be of type `string`. You should use the same internal structures and methods given in the Lesson Overview, and you'll need your answers to the previous two questions, as well. Implement `update()`.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
      raise ValueError('The key does not exist in the table.')
    # TODO(you): Implement
    print('This method has not been implemented.')

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
test_table = YourHashTable()

test_table.put('apple', 'orange')
# Check that the value for 'apple' is 'orange'.
print(test_table.buckets[test_table.get_bucket_number('apple')].value)
# Should print: 'orange'

test_table.update('apple', 'pie')
# Check that the new value for 'apple' is 'pie'.
print(test_table.buckets[test_table.get_bucket_number('apple')].value)
# Should print: 'pie'

### Solution

For `update()`, we can leverage the `contains` and `get_bucket_number` methods to greatly simplify this method.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

## Question 5

Now that we've seen an implementation for `put()`, let's implement `get()`. You may assume the hash table stores string keys and string values. If the key is not in the table, raise an error. Keep in mind that hash tables store their keys and values in sets called `keys()` and `values()`, as you've seen in the implementation of `HashTable`. You may also use any methods you've written or seen in this lesson.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  def get(self, key):
    # TODO(you): Implement
    print('This method has not been implemented.')

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
new_table = YourHashTable()
new_table.put('a', 'c')
new_table.put('b', 'c')
new_table.put('d', 'e')

print(new_table.get('a'))
# Should print: c
print(new_table.get('d'))
# Should print: e

### Solution

Given that we already have implemented `get_bucket_number()`, we can use it to simplify our `get()` code.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

## Question 6

A useful method implemented by a number of maps and dictionaries is `from_keys()`, which allows users to pass in a list of keys and a value that should be set for all of them.

```python
new_table = YourHashTable.from_keys(['a', 'b', 'c', 'd', 'e'], 'f')
```

This is equivalent to adding `('a', 'f')`, `('b', 'f')`, `('c', 'f')`, `('d', 'f')`, and `('e', 'f')`,  to your hash table.

Implement `from_keys()`.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  @classmethod
  def from_keys(cls, key_list, default_value):
    # This creates a new table, like calling the __init__ method.
    new_table = cls()
    # TODO(you): Implement
    print('This method has not been implemented.')
    return new_tables

You'll see that this has `@classmethod` precedes the `from_keys` method. This is a Python-only keyword, called a method *decorator*. You can essentially ignore it; it just indicates that `from_keys` is called in the following way:

```python
new_table = YourHashTable.from_keys(['a', 'b', 'c', 'd', 'e'], 'f')
```

Instead of:

```python
new_table = YourHashTable()
new_table.from_keys(['a', 'b', 'c', 'd', 'e'], 'f')
```

For more information on `@classmethod`, take a look at the [official Python docs](https://docs.python.org/3/library/functions.html#classmethod).

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
new_table = YourHashTable.from_keys(['a', 'b', 'c', 'd', 'e'], 'f')
print(new_table.get('a'))
# Should print: f

print(new_table.get('c'))
# Should print: f

print(new_table.get('o'))
# Should raise: ValueError

### Solution

The `from_keys` method is implemented as a series of `put` calls.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  @classmethod
  def from_keys(cls, key_list, default_value):
    new_table = cls()
    for key in key_list:
      new_table.put(key, default_value)
    return new_table

## Question 7

Since we've filled out our hash table, let's empty it. Implement `clear()`, which goes through a hash table and wipes out all of the entries. You may continue to use the hash table definition and any methods seen earlier in this lesson. 

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  @classmethod
  def from_keys(cls, key_list, default_value):
    new_table = cls()
    for key in key_list:
      new_table.put(key, default_value)
    return new_table

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  def clear(self):
    # TODO(you): Implement
    print('This method has not been implemented.')

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
new_table = YourHashTable.from_keys(['a', 'b'], 'c')
new_table.put('d', 'e')
new_table.clear()

print(len(new_table.keys))
# Should print: 0

### Solution

For `clear()`, one strategy is just to go through all of the buckets and empty them out. While this is a reasonable implementation, it's somewhat inefficient. Remember how we have the `bucket_in_use` array? We can speed this function up by just clearing out *that* array, so that when new elements are added, the `put()` method will overwrite the memory for us. Just don't forget to empty out `self.keys()` and `self.values()` as well. 

This is all done in the `__init__()` method of `HashTable`, so, calling that will clear out the hashtable for us.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  @classmethod
  def from_keys(cls, key_list, default_value):
    new_table = cls()
    for key in key_list:
      new_table.put(key, default_value)
    return new_table

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  def clear(self):
    self.__init__()

## Question 8

Next, let's implement `items()`, which returns a list of tuples with key-value pairs in the hash table. If you're not familiar with a tuple, this is how to make one:

In [None]:
key = 'apple'
value = 'orange'
my_tuple = (key, value)
print(my_tuple)

Finish the implementation for `items()`. Don't forget that a `TableEntry` item holds both the key and the value.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  @classmethod
  def from_keys(cls, key_list, default_value):
    new_table = cls()
    for key in key_list:
      new_table.put(key, default_value)
    return new_table

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  def items(self):
    # TODO(you): Implement
    print('This method has not been implemented.')

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
test_table = YourHashTable()
test_table.put('apple', 'orange')
test_table.put('peach', 'pear')

items = test_table.items()
items.sort()
print(items)
# Should print: [('apple', 'orange'), ('peach', 'pear')]

### Solution

One thing that maps do not guarantee is an ordering of the items if you iterate through them because the items are stored in a bucket ordering rather than alphabetically or some other sorting criteria. Given how hash functions and collision resolutions work, you may return the items in any order you like, but it's likely easiest to just iterate through the buckets.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  @classmethod
  def from_keys(cls, key_list, default_value):
    new_table = cls()
    for key in key_list:
      new_table.put(key, default_value)
    return new_table

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  def items(self):
    result = []
    for i in range(len(self.buckets)):
      # Note that we check if bucket_in_use[i] is True instead of checking if
      # buckets[i] is not None. This is more efficient, and consistent with our
      # design.
      if self.bucket_in_use[i]:
        result.append((self.buckets[i].key, self.buckets[i].value))
    return result

If you prefer, you can also call `keys()` and iterate through the table, calling `get()` on each key. It's less efficient, but it requires less code, given the other methods in the class.

In [None]:
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  @classmethod
  def from_keys(cls, key_list, default_value):
    new_table = cls()
    for key in key_list:
      new_table.put(key, default_value)
    return new_table

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  def items(self):
    result = []
    for key in self.keys:
      result.append((key, self.get(key)))
    return result

## Question 9

Your colleague is working to try and finish up the hash table implementation that you started; this time, they're trying to implement `pop()`, which removes an element with the specified key and returns its value (a `string`, in this case). Here is their implementation, based on your hash table code.

Their code doesn't seem to be working. What's wrong with their code, and how would you fix it?

In [None]:
#persistent
class YourHashTable(HashTable):

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def update(self, key, value):
    if not self.contains(key):
        raise ValueError('The key does not exist in the table.')
    self.buckets[self.get_bucket_number(key)].value = value

  @classmethod
  def from_keys(cls, key_list, default_value):
    new_table = cls()
    for key in key_list:
      new_table.put(key, default_value)
    return new_table

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  def contains(self, key):
    return key in self.keys

  def get_bucket_number(self, key):
    if not self.contains(key): 
      raise ValueError('The key does not exist in the table.')
    # We've already verified that the key we're looking for exists in the table,
    # so we know we will hit it in one of these iterations.
    for bucket_num in range(self.current_table_size):
      # Once we find the key, return the bucket number. We need to check that
      # the bucket is not None before checking the key.
      if (self.buckets[bucket_num] is not None and
          self.buckets[bucket_num].key == key):
        return bucket_num

  def get(self, key):
    if not self.contains(key):
      raise ValueError('Key does not exist in hash table.')
    return self.buckets[self.get_bucket_number(key)].value

  def items(self):
    result = []
    for key in self.keys:
      result.append((key, self.get(key)))
    return result

In [None]:
class TheirHashTable(YourHashTable):

  def pop(self, key):
    # TODO(you): Find the problem in this code and fix it.
    if not self.contains(key):
      raise ValueError('The key does not exist in the table.')

    for i in range(len(self.buckets)):
      if i == self.hash_function(key):
        return self.buckets[i].value

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
new_table = TheirHashTable.from_keys(['a', 'b'], 'c')
new_table.put('d', 'e')

print(new_table.pop('b'))
# Should print: 'c'

print(new_table.pop('b'))
# Should raise: ValueError

### Solution

There are a number of issues with this method. First off, comparing `i` to `hash_function(key)` is not efficient, as you're recalculating the hash every time you make that comparison. Additionally, given that we aren't sure how this map resolves collisions (or if collisions occurred), we can't guarantee that `hash_function(key)` is the bucket that our value is currently in (it might have been moved). Instead, let's use the `get_bucket_number()` that we implemented earlier in this lesson.


In [None]:
class TheirHashTable(YourHashTable):

  def pop(self, key):
    if not self.contains(key):
      raise ValueError('The key does not exist in the table.')
    return self.buckets[self.get_bucket_number(key)].value

That looks pretty good! But you may have noticed something else is wrong. We're never actually popping; this is identical to the `get()` method. Instead, we can leverage the fact that the hash table uses `bucket_in_use` to determine if the bucket is being used and set that to `False` in order to tell the table that that bucket is no longer occupied. This will allow new data to be stored in that bucket later on.

In [None]:
class TheirHashTable(YourHashTable):

  def pop(self, key):
    if not self.contains(key):
      raise ValueError('The key does not exist in the table.')
    bucket_number = self.get_bucket_number(key)
    self.bucket_in_use[bucket_number] = False
    self.keys.remove(key)
    return self.buckets[bucket_number].value

## Question 10

Your coworker is going a bit above and beyond and writing a new method called `get_keys_for_value()`. It takes a value and returns an array of keys that map to that value.

It's got a few bugs. Can you identify and fix them all?

In [None]:
class TheirHashTable(YourHashTable):

  def pop(self, key):
    if not self.contains(key):
      raise ValueError('The key does not exist in the table.')
    bucket_number = self.get_bucket_number(key)
    self.bucket_in_use[bucket_number] = False
    self.keys.remove(key)
    return self.buckets[bucket_number].value
 
  def get_keys_for_value(self, value):
    # TODO(you): Find the bugs in this method and fix them.
    bucket_num = self.hash_function(value)
    for i in range(len(self.buckets)):
      bucket_num = (bucket_num + i) % len(self.buckets)
      # Skip buckets that aren't filled.
      if not self.buckets[bucket_num]:
        continue
      if self.buckets[bucket_num].value == value:
        return self.buckets[bucket_num].value

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
new_table = TheirHashTable.from_keys(['a', 'b'], 'c')
new_table.put('d', 'e')

keys = new_table.get_keys_for_value('c')
keys.sort()
print(keys)
# Should print: ['a', 'b']

### Solution

This is close to the right idea, but they've done a few things that are incorrect. For starters, there's no need to complicate the code by hashing `value`; we can just start at `bucket_num = 0`.

In [None]:
class TheirHashTable(YourHashTable):

  def pop(self, key):
    if not self.contains(key):
      raise ValueError('The key does not exist in the table.')
    bucket_number = self.get_bucket_number(key)
    self.bucket_in_use[bucket_number] = False
    self.keys.remove(key)
    return self.buckets[bucket_number].value

  def get_keys_for_value(self, value):
    for i in range(len(self.buckets)):
      # Skip buckets that aren't filled.
      if not self.buckets[i]:
        continue
      if self.buckets[i].value == value:
        return self.buckets[i].value

This is closer, but still not there. Recall that we should be returning an array of keys for the value provided, not just the value, which is what your coworker did by mistake. Let's fix that by appending the found key to a `result` array.

In [None]:
class TheirHashTable(YourHashTable):

  def pop(self, key):
    if not self.contains(key):
      raise ValueError('The key does not exist in the table.')
    bucket_number = self.get_bucket_number(key)
    self.bucket_in_use[bucket_number] = False
    self.keys.remove(key)
    return self.buckets[bucket_number].value

  def get_keys_for_value(self, value):
    result = []
    for i in range(len(self.buckets)):
      # Skip buckets that aren't filled.
      if not self.buckets[i]:
        continue
      # Once the buckets are found, check their values to find a match.
      if self.buckets[i].value == value:
        result.append(self.buckets[i].key)

    return result

This way, even if the value doesn't exist, we will just return an empty array, which is valid.