# 01_03: Dictionaries and sets

In [1]:
import math
import collections
import dataclasses
import datetime

import numpy as np
import pandas as pd
import matplotlib.pyplot as pp  

The other super-important data structure in Python is the _dictionary_ (or `dict`). While lists and tuples give us a way to retrieve values by their numerical index, dictionaries associate _values_ to unique _keys_.

`dicts` are written with curly braces, separating _items_ with commas; each item is given as _key_, colon, `value`. For instance, here are the capitals of some of my favorite countries.

In [2]:
capitals = {'United States': 'Washington, DC', 'France': 'Paris', 'Italy': 'Rome'}

In [3]:
capitals

{'United States': 'Washington, DC', 'France': 'Paris', 'Italy': 'Rome'}

The length of a dictionary is obtained with `len`, and the empty dictionary is denoted by empty braces.

In [13]:
len(capitals), len({})

(4, 0)

Just as we do with lists, values are accessed with a bracket notation, but instead of a number, we're going to use a key.

In [4]:
capitals['Italy']

'Rome'

The same notation can be used to add items to a dictionary.

In [5]:
capitals['Spain'] = 'Madrid'

In [6]:
capitals

{'United States': 'Washington, DC',
 'France': 'Paris',
 'Italy': 'Rome',
 'Spain': 'Madrid'}

Accessing a non-existent item results in a key error. To avoid that, we can check beforehand whether an item exists in a dictionary using the `in` operator.

In [7]:
capitals['Germany']

KeyError: 'Germany'

In [8]:
'Germany' in capitals, 'Italy' in capitals

(False, True)

To combine two dictionaries, we can unpack them within the same braces using the double-star unpacking operator, which works in a similar way to the star operator for lists. 

In [26]:
morecapitals = {'Germany': 'Berlin', 'United Kingdom': 'London'}

In [25]:
{**capitals, **morecapitals}

{'United States': 'DC',
 'France': 'Paris',
 'Italy': 'Rome',
 'Spain': 'Madrid',
 'Germany': 'Berlin',
 'United Kingdom': 'London'}

If keys are repeated, the last one is honored. We can also _update_ a dict in place using another dict:

In [30]:
capitals.update(morecapitals)

In [31]:
capitals

{'United States': 'Washington, DC',
 'France': 'Paris',
 'Italy': 'Rome',
 'Spain': 'Madrid',
 'Germany': 'Berlin',
 'United Kingdom': 'London'}

Similarly to lists, we can delete items by key.

In [64]:
del capitals['United Kingdom']

In [65]:
capitals

{'United States': 'Washington, DC',
 'France': 'Paris',
 'Italy': 'Rome',
 'Spain': 'Madrid',
 'Germany': 'Berlin'}

In fact, keys do not need to be strings: any Python object that is _hashable_ may be used as a name. _Hashable_ means that Python can convert it to a number. That's true for strings, numbers, and tuples, but not for lists.

In [32]:
birthdays = {(7,15): 'Michele', (3,14): 'Albert'}

In [33]:
birthdays[(7,15)]

'Michele'

We can see the internal representation of the keys with `hash`:

In [36]:
hash('Italy'), hash((7,15))

(2991295390957851405, -2471369287409462312)

Looping over a dictionary is very similar to looping over a list. However, there are three different kinds of loops you may want to write. [slide] The most straightforward syntax loops over the keys:

In [70]:
for country in capitals:
    print(country)

United States
France
Italy
Spain
Germany


You can also write this more explicitly:

In [71]:
for country in capitals.keys():
    print(country)

United States
France
Italy
Spain
Germany


In [72]:
capitals.keys()

dict_keys(['United States', 'France', 'Italy', 'Spain', 'Germany'])

In [73]:
list(capitals.keys())

['United States', 'France', 'Italy', 'Spain', 'Germany']

The other two dict loops are over the values:

In [74]:
for capital in capitals.values():
    print(capital)

Washington, DC
Paris
Rome
Madrid
Berlin


Or over keys and values together, using tuple unpacking:

In [75]:
for country, capital in capitals.items():
    print(country, capital)

United States Washington, DC
France Paris
Italy Rome
Spain Madrid
Germany Berlin


Note that `keys()`, `values()`, and `items()` are not lists, as they would have been in Python 2, but special iterable objects.

In [38]:
capitals.keys()

dict_keys(['United States', 'France', 'Italy', 'Spain', 'Germany', 'United Kingdom'])

However you can make them into lists.

In [39]:
list(capitals.keys())

['United States', 'France', 'Italy', 'Spain', 'Germany', 'United Kingdom']

Beginning in Python 3.6 for the CPython interpreter, and Python 3.7 for the very language definition, the order of insertion is preserved for dicts. This means that when we loop over keys or items, we get them in the order in which we originally inserted them.

This was not the case in previous versions of Python, and in fact the standard library defined a special object (`OrderedDict` in the `collections` module) to preserve the order. That is not necessary now.

There is however another specialized dictionary data structure that is very useful. That is `defaultdict`, which you set up to return a default value (instead of an error) when an item has not been set.

In [55]:
capitals_default = collections.defaultdict(lambda: "I don't know!")

In [54]:
capitals_default.update(capitals)

In [53]:
capitals_default['Canada']

"I don't know!"

Dicts are very important in Python, since they underlie many aspects of the language itself. For instance, the methods and attributes of classes are stored internally in dicts.

The interface by which we access dict values using keys is also adopted in the Python data-analysis library `pandas`, which we will examine later in this course. So it's good to become familiar with it.

Last, I want to mention sets. You can think of them as _bags_ of items. These can be of any immutable types, and they are not duplicated.

In [57]:
continents = {'America', 'Europe', 'Asia', 'Oceania', 'Africa', 'Africa'}

In [58]:
continents

{'Africa', 'America', 'Asia', 'Europe', 'Oceania'}

As you see, `'Africa'` appears only once.

You can check if an item exists in a set, or add an item, or remove it, or loop over a set. But there's no indexing.

In [89]:
'Africa' in continents

True

In [95]:
continents.add('Antarctica')

In [97]:
continents.remove('Antarctica')

In [98]:
for c in continents:
    print(c)

Oceania
America
Africa
Europe
Asia
