# Dictionaries

A *dictionary* is a **mutable** one-to-one **mapping** from **keys** to **values** and comes **without any order** of the keys.

An association between a key and its value is also called **pair** or **item**. Both, keys and values can be of different types; however, the keys must be an **immutable** object (for technical reasons) and **unique**.

To create a dictionary, we can use the `{..: .., ...}` syntax and list all the pairs.

For example, let's map the integers $0$ through $2$ from English language to their numeric object type equivalents. Note the style: Each pair is on its own indented line (4 spaces), a space is only on the right side of the colon, and even the last pair in the dictionary ends with a comma.

In [1]:
numbers = {
    "zero": 0,
    "one": 1,
    "two": 2,
}

As before, dictionaries are objects on their own.

In [2]:
id(numbers)

140458517723104

In [3]:
type(numbers)

dict

The [dict()](https://docs.python.org/3/library/functions.html#func-dict) built-in function gives us an alternative way to create a dictionary from a list of tuples.

In [4]:
soccer_players = [
    ("Neuer", "Goalkeeper"),
    ("Müller", "Striker"),
    ("Boateng", "Defender"),
]

In [5]:
dict(soccer_players)

{'Neuer': 'Goalkeeper', 'Müller': 'Striker', 'Boateng': 'Defender'}

### Nesting

Often, dictionaries occur nested and combined with lists to provide more complex "objects".

For example, below is a dictionary that describes two distinct lists of people, programmers and mathematicians, where each person may have an arbitrary number of email addresses.

In [6]:
people = {
    "programmers": [
        {
            "name": "Alexander",
            "emails": ["alexander@whu.edu"],
        },
        {
            "name": "Guido",
            "emails": ["guido@python.org"],
        },
    ],
    "mathematicians": [
        {
            "name": "Gilbert",
            "emails": ["gilberg@mit.edu"],
        },
    ],
}

Outputting such a dictionary might not be practical. However, the Standard Library provides a [pprint](https://docs.python.org/3/library/pprint.html) module that comes with a [pprint()](https://docs.python.org/3/library/pprint.html#pprint.pprint) function that "pretty prints" dictionaries in a more readable way.

In [7]:
people

{'programmers': [{'name': 'Alexander', 'emails': ['alexander@whu.edu']},
  {'name': 'Guido', 'emails': ['guido@python.org']}],
 'mathematicians': [{'name': 'Gilbert', 'emails': ['gilberg@mit.edu']}]}

In [8]:
from pprint import pprint

In [9]:
pprint(people)

{'mathematicians': [{'emails': ['gilberg@mit.edu'], 'name': 'Gilbert'}],
 'programmers': [{'emails': ['alexander@whu.edu'], 'name': 'Alexander'},
                 {'emails': ['guido@python.org'], 'name': 'Guido'}]}


### Key Hashability

Using a mutable object as a key will result in a `TypeError`. The reason for this is that dictionaries are implemented as highly optimized **[hash tables](https://en.wikipedia.org/wiki/Hash_table)** so that indexing into them (i.e., looking up keys) is a fast operation. To achieve this, each key is translated internally into an "integer address" based on the key object's value. If a key object could change its value, Python would not find it any more in the hash table as the integer address would change as well. In this context, an immutable object is also called **hashable** and the function that calculates the integer address is called a **hash function**.

Let's use a list object as a key.

In [10]:
more_numbers = {
    ["zero", "one"]: [0, 1],
}

TypeError: unhashable type: 'list'

If we really need keys composed of several (immutable) objects, we can use tuples.

In [11]:
more_numbers = {
    ("zero", "one"): [0, 1],
}

## Mapping $\neq$ Sequence, but Mapping $=$ Collection

Pythonistas distinguish between two kinds of **container types**: Whereas a `list` or `tuple` are **(ordered) sequences**, there is no guaranteed deterministic order for the keys in a `dict` and such kinds are summarized as **(unordered) collections**.

While we can still use the [len()](https://docs.python.org/3/library/functions.html#len) function to obtain the number of pairs in the `dict`, we should **not iterate** over it **if the order matters**.

In [12]:
len(numbers)

3

Iteration works but is bad style as the order is not necessarily the same as the one when `numbers` was defined. Note that we are only interating over the *keys*.

In [13]:
for number in numbers:
    print(number)

zero
one
two


An alternative and clearer way to iterate over the keys is to use the [keys()](https://docs.python.org/3/library/stdtypes.html#dict.keys) method on the `numbers` object. Again, the order is not guaranteed to be the same as when `numbers` was defined.

In [14]:
for number in numbers.keys():
    print(number)

zero
one
two


To iterate over the values, we use the [values()](https://docs.python.org/3/library/stdtypes.html#dict.values) method.

In [15]:
for value in numbers.values():
    print(value)

0
1
2


To iterate over key-value pairs, we use the [items()](https://docs.python.org/3/library/stdtypes.html#dict.items) method. This returns a sequence of tuples, where the first element is the key and the second element is the value. We can use tuple assignment in a `for` loop to unpack them.

In [16]:
for number, value in numbers.items():
    print(f"{number} -> {value}")

zero -> 0
one -> 1
two -> 2


The built-in function [sorted()](https://docs.python.org/3/library/functions.html#sorted) can be used to iterate over a dictionary in deterministic order. However, this is again not necessarily the same as the order when the `dict` was defined.

In [17]:
for number in sorted(numbers.keys()):
    print(number)

one
two
zero


## Membership Testing

The boolean `in` operator checks if a given (immutable) object is a key in the dictionary. Because of the [hash table](https://en.wikipedia.org/wiki/Hash_table) implementation, this is a fast operation. As with lists, Python uses the `==` operator behind the scenes. Note that there is no fast way to look up values. To to that, we would have to iterate over all items and check each value on its own, which is the same linear search as is built into `list` type objects.

In [18]:
"one" in numbers

True

In [19]:
"ten" in numbers

False

## "Indexing" / Look-up

The indexing operator `[...]` also works with dictionaries and returns the value object for a given key. In this context, we speak of **looking up** a value.

In [20]:
numbers["two"]

2

If a key is not in the dictionaries, Python raises a `KeyError`.

In [21]:
numbers["three"]

KeyError: 'three'

This can be mitigated by using the [get()](https://docs.python.org/3/library/stdtypes.html#dict.values) method with a *default* value.

In [22]:
numbers.get("three", "n/a")

'n/a'

As mentioned in the previous section, looking up a value, a so-called **reverse lookup**, must be manually implemented via looping over all items (= "linear search").

In [23]:
search_term = 2

In [24]:
for value in numbers.values():
    if value == search_term:
        print("found")

found


A more Pythonic way of doing this uses the `in` operator; however, behind the scenes this is still a `for` loop.

In [25]:
search_term in numbers.values()

True

While dictionaries implement indexing with the `[...]` operator, the more general concept of slicing is *not* available.

## Mutability

As with lists we can change parts of the `dict` object.

For example, let's map the English words to their German counterparts. This also changes the types of the values from `int` to `str`.

In [26]:
numbers["zero"] = "null"
numbers["one"] = "eins"
numbers["two"] = "zwei"

In [27]:
numbers

{'zero': 'null', 'one': 'eins', 'two': 'zwei'}

Let's add two more numbers.

In [28]:
numbers["three"] = "drei"
numbers["four"] = "vier"

In [29]:
numbers

{'zero': 'null', 'one': 'eins', 'two': 'zwei', 'three': 'drei', 'four': 'vier'}

Note that none of these operations change the memory location / identity of the `numbers` object.

In [30]:
id(numbers)

140458517723104

The `del` statement removes individual items.

In [31]:
del numbers["zero"]

In [32]:
numbers

{'one': 'eins', 'two': 'zwei', 'three': 'drei', 'four': 'vier'}

## More Dictionary Methods

Dictionaries are used internally by Python to implement many of its functionalities. Further, using dictionaries and lists often suffices to implement prototypes of many algorithms.

It is worthwile to know about the built-in [methods](https://docs.python.org/3/library/stdtypes.html#dict) dictionaries come with.

[setdefault()](https://docs.python.org/3/library/stdtypes.html#dict.setdefault) makes it possible to set a *default* value that is used when a key lookup fails. This works a bit different than the above [get()](https://docs.python.org/3/library/stdtypes.html#dict.values) method in that it also *inserts* the default value into the `dict` before it returns it.

In [33]:
numbers.setdefault("zero", "null")

'null'

In [34]:
numbers

{'one': 'eins', 'two': 'zwei', 'three': 'drei', 'four': 'vier', 'zero': 'null'}

[update()](https://docs.python.org/3/library/stdtypes.html#dict.update) takes the items of another dictionary and inserts them. If keys collide, they are overwritten.

In [35]:
four_to_six_in_spanish = {
    "four": "cuatro",
    "five": "cinco",
    "six": "seis",    
}

In [36]:
numbers.update(four_to_six_in_spanish)

In [37]:
numbers

{'one': 'eins',
 'two': 'zwei',
 'three': 'drei',
 'four': 'cuatro',
 'zero': 'null',
 'five': 'cinco',
 'six': 'seis'}

[pop()](https://docs.python.org/3/library/stdtypes.html#dict.pop) returns the value of the given key and removes that key from the dictionary. It takes an optional *default* argument. [popitem()](https://docs.python.org/3/library/stdtypes.html#dict.popitem) has a similar behavior.

In [38]:
numbers.pop("zero")

'null'

In [39]:
numbers.pop("zero", "n/a")

'n/a'

[copy()](https://docs.python.org/3/library/stdtypes.html#dict.copy) creates and returns a *shallow* copy of the dictionary.

In [40]:
numbers_copy = numbers.copy()

[clear()](https://docs.python.org/3/library/stdtypes.html#dict.copy) discards all items but keeps the dictionary object alive in the memory.

In [41]:
numbers.clear()

We see that `numbers_copy` is indeed a real copy of `numbers`. However, the same caveats apply as with shallow and deep copies of lists. In particular when uses as list arguments, dictionaries can lead to "weird" behavior in an application. See the section "Lists as Function Arguments" in the notebook on lists.

In [42]:
numbers

{}

In [43]:
numbers_copy

{'one': 'eins',
 'two': 'zwei',
 'three': 'drei',
 'four': 'cuatro',
 'five': 'cinco',
 'six': 'seis'}

## Fibonacci revisited: Dictionaries for Memoization

The recursive implementation of the [Fibonacci Numbers](https://en.wikipedia.org/wiki/Fibonacci_number) took very long to complete for large numbers as the number of function calls grew exponentially.

The graph below visualizes what the problem is and also suggests a solution: Intstead of re-calculating the return value of the `fibonacci` function for the same argument over and over again, it makes sense to **cache** the result and re-use it. This concept is also called **[memoization](https://en.wikipedia.org/wiki/Memoization)**.

<img src="static/fibonacci_call_graph.png" width="50%" align="left">

Below is yet another implementation of `fibonacci` that uses a **globally** defined dictionary `memo` to store intermediate results and look them up.

In [44]:
memo = {
    0: 0,
    1: 1,
}

def fibonacci(i):
    """Calculate the ith Fibonacci number.

    Args:
        i (int): index of the Fibonacci number to calculate.

    Returns:
        int
    """
    # Look up the cached value ...
    if i in memo:
        return memo[i]
    # ... or calculate and store it.
    recurse = fibonacci(i - 1) + fibonacci(i - 2)
    memo[i] = recurse
    return recurse

The 13th Fibonacci number is still $144$.

In [45]:
fibonacci(12)

144

Now, `fibonacci` is fast even for large numbers.

In [46]:
%%timeit -n 1
fibonacci(100)

The slowest run took 307.70 times longer than the fastest. This could mean that an intermediate result is being cached.
18.4 µs ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [47]:
%%timeit -n 1
fibonacci(1000)

The slowest run took 3151.51 times longer than the fastest. This could mean that an intermediate result is being cached.
171 µs ± 417 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


However, the implementation with the `for` loop is still more efficient regarding the memory usage. 

In [48]:
%%timeit -n 1
fibonacci(10000)

RecursionError: maximum recursion depth exceeded

## Special Dictionary Types

The `collections` module in the Standard Library provides special types of dictionaries that behave a bit different. See the [documentation](https://docs.python.org/3/library/collections.html) for a full overview of the module.

### Ordered Dictionaries

[OrderedDict](https://docs.python.org/3/library/collections.html#collections.OrderedDict) can be used to create a dictionary just as all the ones above with the added feature that the dictionary remembers the order in which the keys were added.

### Dictionaries with Default Values

[defaultdict](https://docs.python.org/3/library/collections.html#collections.defaultdict) allows us to define a factory function that creates a default value if we try to index into a dictionary and the looked up key does not yet exist.

Let's say we have a list of tuples that indicate when a national team player scored a goal in a soccer game and we want to group the goals by player and/or country.

In [49]:
goals = [
    ("Germany", "Müller", 11), ("Germany", "Klose", 23),
    ("Germany", "Kroos", 24), ("Germany", "Kroos", 26),
    ("Germany", "Khedira", 29), ("Germany", "Schürrle", 69),
    ("Germany", "Schürrle", 79), ("Brazil", "Oscar", 90),
]

Using a plain dictionary, one has to tediously check if a player has already scored a goal before. If this is not a case, we need to initiate a new list object first.

In [50]:
goals_by_player = {}

for _, player, minute in goals:
    if player not in goals_by_player:
        goals_by_player[player] = [minute]
    else:
        goals_by_player[player].append(minute)

goals_by_player

{'Müller': [11],
 'Klose': [23],
 'Kroos': [24, 26],
 'Khedira': [29],
 'Schürrle': [69, 79],
 'Oscar': [90]}

Using a `defaultdict` the same can be written more concisely. Note that the [list()](https://docs.python.org/3/library/functions.html#func-list) built-in function is an alternative way to create an empty list.

In [51]:
from collections import defaultdict

In [52]:
goals_by_player = defaultdict(list)

for _, player, minute in goals:
    goals_by_player[player].append(minute)

goals_by_player

defaultdict(list,
            {'Müller': [11],
             'Klose': [23],
             'Kroos': [24, 26],
             'Khedira': [29],
             'Schürrle': [69, 79],
             'Oscar': [90]})

If we want this code to produce a plain dictionary object, we can use the built-in function [dict()](https://docs.python.org/3/library/functions.html#func-dict) for conversion.

In [53]:
goals_by_player = dict(goals_by_player)

goals_by_player

{'Müller': [11],
 'Klose': [23],
 'Kroos': [24, 26],
 'Khedira': [29],
 'Schürrle': [69, 79],
 'Oscar': [90]}

We could even become creative and use a nested factory function to group on the country and the player level.

In [54]:
goals_by_country_and_player = defaultdict(lambda: defaultdict(list))

for country, player, minute in goals:
    goals_by_country_and_player[country][player].append(minute)

goals_by_country_and_player

defaultdict(<function __main__.<lambda>()>,
            {'Germany': defaultdict(list,
                         {'Müller': [11],
                          'Klose': [23],
                          'Kroos': [24, 26],
                          'Khedira': [29],
                          'Schürrle': [69, 79]}),
             'Brazil': defaultdict(list, {'Oscar': [90]})})

### Counters

A common task is to count the number of occurrences of elements in a list-like object.

[Counter](https://docs.python.org/3/library/collections.html#collections.Counter) provides an easy to use interface that can be called with any iterable object and returns a dictionary-like object that tells the number of times each key occurred.

In [55]:
[x[1] for x in goals]  # this is just to show the argument for two cells below

['Müller',
 'Klose',
 'Kroos',
 'Kroos',
 'Khedira',
 'Schürrle',
 'Schürrle',
 'Oscar']

In [56]:
from collections import Counter

In [57]:
scorers = Counter(x[1] for x in goals)

In [58]:
scorers

Counter({'Müller': 1,
         'Klose': 1,
         'Kroos': 2,
         'Khedira': 1,
         'Schürrle': 2,
         'Oscar': 1})

In [59]:
scorers["Müller"]

1

It returns $0$ if a key is not found.

In [60]:
scorers["Neuer"]

0

`Counter` objects have a [most_common()](https://docs.python.org/3/library/collections.html#collections.Counter.most_common) method that returns a list of the elements order by the number of occurrences in descending order.

In [61]:
scorers.most_common(2)

[('Kroos', 2), ('Schürrle', 2)]