## 8.5 Summary

### 8.5.1 Maps and dictionaries

A **map** is an unordered collection of **key–value pairs** with unique keys.
The same value may be associated with different keys.
Python's `dict` data type implements the map ADT.

Operation | English | Python
:-|:-|:-
new | let *m* be an empty map | `d = dict()`
size | │*m*│ | `len(d)`
membership | *key* in *m* | `key in d`
associate | let *m*(*key*) be *value* | `d[key] = value`
lookup | *m*(*key*) | `d[key]`
remove | remove *m*(*key*) | `d.pop(key)`
equality | m1 = m2 | `d1 == d2`
inequality | m1 ≠ m2 | `d1 != d2`

Accessing a value raises an error if the dictionary doesn't contain the given key.

**Dictionaries** are implemented with hash tables, described below.
Dictionaries take more memory than sequences with the same key–value pairs.
We assume all dictionary operations take constant time, except (in)equality,
which takes linear time in the size of the dictionary in the worst case.

Dictionaries are iterable:
`for key in a_dict` iterates over all keys in `a_dict` and
`for (key, value) in a_dict.items()` iterates over all key–value pairs.
While iterating over a dictionary, no key–value pair can be added or removed.
Iterating over a dictionary's keys or items takes linear time
in the size of the dictionary.

### 8.5.2 Lookup and hash tables

A map with natural numbers as keys can be implemented with a **lookup table**,
an array in which the indices are the keys and the items are the values.
If the keys are characters, then Python's function `ord` can be used to
return their Unicode code, which is a natural number.
Lookup tables are often used to store pre-computed values.

A **hash table with separate chaining** is a lookup table of sequences of key–value
pairs. Each sequence is called a **slot**.

The **load factor** is the number of pairs (size of the map) divided by the number
of slots (size of the table), i.e. the mean number of pairs per slot.
The lower the load factor, the more memory is used, but the higher the chance
that each slot has at most one pair.
If two different keys are associated to the same slot, a **collision** occurs.
With separate chaining, the collision resolution algorithm simply adds both
keys to the same slot.

The hash table is implemented with a dynamic array, to increase the
number of slots as the dictionary size increases and keep a low load factor.
When the table grows or shrinks the slots of all pairs have to be recomputed.

To search, add, replace or delete a value by key, we compute for the given key
the slot it must be in, and then do a linear search of the key in that slot.
With a low load factor, a hash function that reduces collisions,
and short keys, map operations take constant time,
which we assume is the usual situation. In the worst case
(all key-value pairs in the same slot), operations are linear in the size of the map.

In Python, lists and dictionaries aren't **hashable**, i.e. can't be used as keys,
to avoid inadvertently changing a key after it was inserted in the dictionary.
A tuple is hashable only if all its items are.
Python has a built-in hash function, named `hash`.

### 8.5.3 Sets

**Sets** are unordered collections without duplicate items.
Python's `set` class is implemented like a dictionary and thus
requires items to be hashable. Python's sets are iterable but not hashable.

Operation | Maths | English | Python | Complexity (best/worst)
:-|:-|:-|:-|:-
new | let *s* be {}  |  let *s* be the empty set | `s = set()` | Θ(1)
size | │*s*│ |  |`len(s)` | Θ(1)
membership | $i \in s$  | *i* in *s* | `item in s` or `item not in s` | Θ(1)
add  |  |   add *i* to *s*  | `s.add(item)` | Θ(1)
remove   |   | remove *i* from *s*   | `s.discard(item)` | Θ(1)
union | $s1 \cup s2$  | union of *s1* and *s2* | `s1.union(s2)` | Θ(│`s1`│ + │`s2`│)
intersection | $s1 \cap s2$  |  intersection of *s1* and *s2* | `s1 & s2` or `s1.intersection(s2)` | Θ(min(│`s1`│, │`s2`│)
difference | *s1* − *s2* | | `s1 - s2` or `s1.difference(s2)` | Θ(│`s1`│)
(proper) subset  | *s1* $\subset$ *s2* and *s1* $\subseteq$ *s2* | | `s1 < s2` and `s1 <= s2` | Θ(1) / Θ(│`s1`│)
(proper) superset |  *s1* $\supset$ *s2* and *s1* $\supseteq$ *s2* | | `s1 > s2` and `s1 >= s2` | Θ(1) /  Θ(│`s2`│)
(in)equality | *s1* = *s2* and *s1* ≠ *s2* | | `s1 == s2` and `s1 != s2` | Θ(1) / Θ(│`s1`│)

Set union can also be written as `s1 | s2`.

Two sets are **disjoint** if their intersection is empty.

⟵ [Previous section](08_4_set.ipynb) | [Up](08-introduction.ipynb) | [Next section](../09_Practice-1/09-introduction.ipynb) ⟶