# Hash Table

## Definition

Hash table (hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found. During lookup, the key is hashed and the resulting hash indicates where the corresponding value is stored. Ideally, the hash function will assign each key to a unique bucket, but most hash table designs employ an imperfect hash function, which might cause hash collisions where the hash function generates the same index for more than one key. Such collisions are typically accommodated in some way.

![hashtable](https://upload.wikimedia.org/wikipedia/commons/thumb/7/7d/Hash_table_3_1_1_0_1_0_0_SP.svg/1920px-Hash_table_3_1_1_0_1_0_0_SP.svg.png)


## Hash fucntions

### Division mehtod

$h(k) = k \mod m$

Where $k = key$ and $m = number\ of\ slots\ in\ the\ table$

This is practical when `m` is prime but not too close to power of 2 or 10 (then just depending on low bits/digits).
But it is inconvenient to find a prime number, and division is slow.

In [32]:
# number of slots in the table
m = 10


def division_method(value):
    if isinstance(value, str):
        k = sum([ord(c) for c in value])
    elif isinstance(value, int) or isinstance(value, float):
        k = value
    elif isinstance(value, object):
        k = id(object)
    else:
        raise TypeError("unhashable type: {type}".format(type(value)))
    
    return k % m


print(division_method(2))
print(division_method(1234))
print(division_method("test"))

2
4
8


### Multiplication Method

$h(k) = [(a*k) \mod 2^{w}] >> (w - r)$

Where $a$ is random, $k$ is $w$ bits, and $m = 2^{r}$

This is practical when $a$ is odd and $2^{w−1} < a < 2^{w}$ and $a$ not too close to $2^{w−1}$ or $2^{w}$
Multiplication and bit extraction are faster than division.

In [27]:
import random


# number of slots in the table
r = 4
m = 2 ** r


def multiplication_method(value):
    if isinstance(value, str):
        k = sum([ord(c) for c in value])
    elif isinstance(value, int) or isinstance(value, float):
        k = value
    elif isinstance(value, object):
        k = id(object)
    else:
        raise TypeError("unhashable type: {type}".format(type(value)))
        
    w = len(bin(k)) - 2
    a = random.randint((2 ** (w - 1)), (2 ** w))
    return ((a*k) % 2 ** w) >> (w - r)

print(multiplication_method(123))
print(multiplication_method("test"))
print(multiplication_method(10))


4
14
10


### Universal Hashing

$h(k) = [(a*k + b) \mod p] \mod m$

Where $a$, and $b$ are random $\in \{0, 1, 2, ...p - 1\}$ and $p$ is large prime $p>|\upsilon|$

In [46]:
import random


# I don't know if 30 digits prime number is bigger than universe U of all keys, but for this example I thinks it's sufficient.
p = 128306691506178072038422964657
# number of slots in the table
m = 10


def universal_hashing(value):
    if isinstance(value, str):
        k = sum([ord(c) for c in value])
    elif isinstance(value, int) or isinstance(value, float):
        k = value
    elif isinstance(value, object):
        k = id(object)
    else:
        raise TypeError("unhashable type: {type}".format(type(value)))
    
    a = random.randint(0, p - 1)
    b = random.randint(0, p - 1)
    return ((a*k + b) % p) % m


print(universal_hashing(123))
print(universal_hashing("test"))
print(universal_hashing(10))

3
1
7


## Hash collisions

For instance in Python 2.7

```shell
$ python
Python 3.9.5 (default, May 14 2021, 00:00:00) 
[GCC 10.3.1 20210422 (Red Hat 10.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> hash('\0B')
-2106417055349253107
>>> hash('\0\0C')
-8252262576489450206
>>>
$ python2.7
Python 2.7.18 (default, May 19 2021, 00:00:00) 
[GCC 10.3.1 20210422 (Red Hat 10.3.1-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> hash('\0B')
64
>>> hash('\0\0C')
64
```

Python object [`__hash__`](https://docs.python.org/3/reference/datamodel.html#object.__hash__) method is called by built-in function [`hash()`](https://docs.python.org/3/library/functions.html#hash)


Another collision example using division method

In [45]:
m = 10


def division_method(value):
    if isinstance(value, str):
        k = sum([ord(c) for c in value])
    elif isinstance(value, int) or isinstance(value, float):
        k = value
    elif isinstance(value, object):
        k = id(object)
    else:
        raise TypeError("unhashable type: {type}".format(type(value)))
    
    return k % m


print(division_method(2))
print(division_method("asdss"))
print(division_method("asd"))
print(division_method(12))


2
2
2
2


## How do we deal with collisions?

### Chaining

![chaining](https://upload.wikimedia.org/wikipedia/commons/thumb/5/5a/Hash_table_5_0_1_1_1_1_0_LL.svg/2560px-Hash_table_5_0_1_1_1_1_0_LL.svg.png)

TBC

### Open addressing

TBC


## Resources

- [Hashing with Chaining (video)](https://youtu.be/0M_kIqhwbFo)
- [Table Doubling, Karp-Rabin (video)](https://youtu.be/BRO7mVIFt08)
- [Open Addressing, Cryptographic Hashing (video)](https://youtu.be/rvdJDijO2Ro)
- [Lecture 8: Hashing I](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/MIT6_006F11_lec08.pdf)
- [Introduction to Algorithms MIT course 6.006](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/index.htm)
- [Hash Table (wiki)](https://en.wikipedia.org/wiki/Hash_table)