# 01_04: Comprehensions

In [1]:
import math
import collections
import dataclasses
import datetime

import numpy as np
import pandas as pd
import matplotlib.pyplot as pp

When we work with data in Python, there are many cases when we want to iterate over a `list` or `dict`, perform an operation on every element, and then collect all the results in a new `list` or `dict`.

We could certainly do that with a `for` loop. For instance, we can compute the first ten squares, starting with an empty list and adding elements to the list in the body of the loop. Note that to get the integers from 1 to 10 we need to write a `range` with boundaries 1 and 11.

In [2]:
squares = []
for i in range(1, 11):
    squares.append(i**2)

In [3]:
squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

We can do better---we can be more _Pythonic_! Python offers a great feature, _comprehensions_, that lets us write shorter, more easily readable code.

In essence, the comprehension will be a compressed version of the for loop. Let's go through the steps:

* We want a list, so we have brackets.
* In front, we have the body of the loop---the square.
* In the back, we write the looping construct: for + variable + iterable (here a range). 

The result is the same, but we managed to write it in a very readable and very efficient way.

In [5]:
squares = [i**2 for i in range(1, 11)]

In [6]:
squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

We can also _filter_ the list of elements that we are creating, adding an `if` clause. For instance, we might want to collect only the squares that are divisible by 4.

In [16]:
squares_by_four = [i**2 for i in range(1, 11) if i**2 % 4 == 0]

In [17]:
squares_by_four

[4, 16, 36, 64, 100]

Again, quick and readable. In Python 3, comprehensions largely replace the `map` and `filter` built-in functions, which are important in so-called _functional_ languages, but did not really belong in Python.

The syntax for _dictionary comprehensions_ is also rather intuitive. For instance, let's create a dictionary that will get us the square of an integer from the integer itself.

* It's a dictionary, so we need braces.
* In front, we have the key-colon-value pair.
* In the back, the looping construct.
* We can add also an `if` clause if we want. I don't need one here.

In [18]:
squares_dict = {i: i**2 for i in range(1, 11)}

In [19]:
squares_dict

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}

dict comprehensions are sometimes used to _transpose_ an existing dict. Let's go back to our capitals, which we wrote as a dictionary indexed by country:

In [20]:
capitals_by_country = {'United States': 'Washington, DC', 'France': 'Paris', 'Italy': 'Rome'}

Say that instead we want the countries indexed by capital.

In [22]:
countries_by_capital = {capital: country for country, capital in capitals_by_country.items()}

In [23]:
countries_by_capital

{'Washington, DC': 'United States', 'Paris': 'France', 'Rome': 'Italy'}

Sometimes you see a "naked" comprehension, with the body and loop without the brackets. That's actually a _generator expression_ and it's useful to generate a sequence and immediately _consume_ the elements one by one without ever storing them in a list or dict.

For instance, to take the sum of the first then squares, we may write the interior part of our earlier list comprehension, and feed it directly to the Python built-in `sum`.

In [30]:
sum(i**2 for i in range(1, 11))

385

Doing this saves memory _and_ time, which is important if you deal with large amounts of data.

As it turns out, the built-in `range`, which we used earlier to demonstrate loops, does something very similar. It never builds a list, but it keeps handing new values to the loop.

Comprehensions are incredibly useful when we manipulate lists, dicts, and sets of data. Here's a summary of their syntax. [slide]

If you don't currently use them, I'm sure that if you try them out you will become addicted quickly, and you'll start doing all sorts of acrobatics, such as nested comprehensions to make a list of lists (which you can think of as a matrix):

In [45]:
matrix = [[i*j for i in range(1,4)] for j in range(1,4)]

In [47]:
matrix

[[1, 2, 3], [2, 4, 6], [3, 6, 9]]

Or a single comprehension with nested loops to flatten the matrix into a list:

In [50]:
[element for row in matrix for element in row]

[1, 2, 3, 2, 4, 6, 3, 6, 9]