# Hash Tables - Part 2

## Open Addressing

Last time we looked at Open Chaining where individual lists or chains grew slowly and uniformly given good hashing

Open Addressing on the other hand is a lousy way of buidling a hash table but points to some useful tools. To understand it though we have to look at a bad version of open addressing.

### Solving Open Addressing Collisions - Linear Probing

If table[h(k)] is occupied, then find the next open spot in the bucket and then put the key there

This is not a good way to do this because when using find and delete it recursively has to go to the next spot and recaluclate the hash to see if it was supposed to be in the deleted spot.

Because of this now insertion, find, and deletion are now O(size of table) instead of O(1)

So how do we fix this?

**Interative hash function:**
    The hash function takes a key k and an iteration variable i. h(k,0) is the inital bucket, h(k,i) is the next bucket to try.

**Quadratic Probing:**
    This is where you compute the h^i(k) with two constants but this also suffers from clustering which is specifically secondary clustering

**Double Hashing:**
    This is avoids clustering but is still a nightmare for deletion it is where you basically use two hash functions to find another hash position if the first fails

In general you should use open chaining but if you use open addressing use double hashing

### So if this isnt a great method why do I need to know this? - (Buzzy Bloom's Puzzle)

Buzzy wanted a tool to hyphenate words, because about 10% of the words in english does not follow the hyphenation rules. Every once in a while he will get a false positive or negative when looking up a word in a dictionary to see if it follows the hyphenation rules.

False Positives are okay but, False negatives are bad.

So his goal is to develop an algorithm with a low chance of a false positive and a zero-percent chance of a false negative (special word that is not caught as being special)

This problem is similiar to the Set Membership Problem which is great for:
    * Tracking People or Packets through the internet
    * if data has been requested before
    * validating state spaces for program verfication
    * fast blockchain lookups
    * avoiding reccomending webpages you have already read

## How does this work?

The hash table is a bit table

You need to hash each key with multiple hash functions (5-10 functions)
    * double hashing works fine

No keys are stored in the table. It just sets the bits corresponding to the hash value

Collisions are okay. This also means though you cannot delete a key as it may be sharing a bit with other keys

The cost of setting a bit is O(k) which is pretty close to O(1)

You can also pack the table pretty full
    * Example: suppose you filled a table to 80% capacity. If you compute 10 independent hashes for a key that is not in the table the chances you will have 10 bits all be one is (0.8)^10 which is an 11% chance of a false positive
    * Also the chance of getting a false negative is 

$$
(1-e<sup>(-kn/m)</sup>)^k
$$