# Dictionaries and sets
A dictionary is a symbol table: a collection of objects in which each object is associated to a key. A dictionary can be expanded and reduced. A set in Python is a mutable unordered collection of unique keys. Several operation can be performed on sets, for instance union and intersection. With dictionaries there is no need to search for an element in the collection, as when using a list, we have just to use its key. 

## Hash function
Both dictionaries and sets are based on hash tables. The key is used to compute an index using an hash function. A hash function returns an integer given a key, e.g. a string, that is guaranteed to have the least probability of a collision with another key. The low collision probability of an hash function is a consequence of the size of the integer space used to implement it compared to the size of the space of the keys. A hash function should handle collisions and return a different integer in case of a collision when calculating the hash value of a key.

In [1]:
hash('John Doe')

-3355136106873060009

In [2]:
hash_value = hash('John Dowe')
hash_value

2938172440763455204

The binary form of the hash value is

In [3]:
bin(hash_value)

'0b10100011000110011111000100010010001100100011000101011011100100'

We can mask (i.e. filter) the bits, starting from the last, for example we want only the last three bits of the hash value, since this is 010 the result is 2

In [4]:
mask = 0b111

In [5]:
hash_value & mask

4

In [6]:
for c in 'John Doe':
    print(ord(c))

74
111
104
110
32
68
111
101


We implement a trivial hash function that returns the alphabetical index of the first character of a string, for example the name of a city, masked to use only the last three bits.

In [7]:
class City(str):
    def __hash__(self):
        return ord(self[0]) & 0b111

In [8]:
data = {
    City("Rome"): 'Italy',
    City("San Francisco"): 'USA',
    City("New York"): 'USA',
    City("Barcelona"): 'Spain',
}

Both Barcelona and Rome share the same hash index. We need to use more than three bits to handle the 4 cities in the dictionary

In [20]:
City('Barcelona').__hash__()

2

In [22]:
City('Rome').__hash__()

2

Python can handle collisions of the values returned by the hash function, so even if the keys 'Rome' and 'Barcelona' collide the dictionary works fine all the same.

In [23]:
data[City('Rome')]

'Italy'

In [24]:
data[City('Barcelona')]

'Spain'

## Dictionaries

In [1]:
phonebook = {
    "John Doe": "555-555-5555",
    "Albert Einstein" : "212-555-5555",
}

In [2]:
print(f"John Doe's phone number is {phonebook['John Doe']}")

John Doe's phone number is 555-555-5555


## Sets

In [3]:
s1 = {1, 2, 3, 'a', 'b'}
s2 = {'Rome', 'Paris', 1, 4}

### Union of two sets

In [4]:
s3 = s1 | s2
s3

{1, 2, 3, 4, 'Paris', 'Rome', 'a', 'b'}

In [5]:
hash('luigi')

4459383324764636963

## References
* [What happens when you mess with hashing in python](https://www.asmeurer.com/blog/posts/what-happens-when-you-mess-with-hashing-in-python/)