# Comprehensions and Generators



## Table of Contents
- [1. Introduction](#1.-Introduction)
- [2. Collections of Data](#2.-Collections-of-Data)
- [3. Iterables and Iterators](#3.-Iterables-and-Iterators)
- [4. Comprehensions](#4.-Comprehensions)
- [5. Selective Inclusion in a Comprehension](#5.-Selective-Inclusion-in-a-Comprehension)
- [6. Nested Comprehensions](#6.-Nested-Comprehensions)
- [7. Set and Dictionary Comprehensions](#7.-Set-and-Dictionary-Comprehensions)
- [8. Comprehensions Evaluation](#8.-Comprehensions-Evaluation)
- [9. Generators](#9.-Generators)
- [10. Factories](#10.-Factories)
- [11. Summary](#summary)

## 1. Introduction

What is a **comprehension**?

<img src="assets/Webster_comprehension.png" alt="Meaning of Comprehension" width="500"/>

<div style="text-align:center">
    <span style="font-size:0.9em; font-weight: bold;"><b>Meaning of Comprehension.</b></span>
</div>

**Comprehensions** and **Generators** in Python are both powerful tools for managing and transforming data efficiently. 

**Comprehensions** in Python are concise and powerful constructs that allow you to create, for example, **lists**, **dictionaries**, and **sets** by performing operations on elements from an iterable while optionally applying filters, all in a single line of code. 
They enhance code readability and efficiency compared to traditional `for` loops, making them a fundamental tool for data manipulation and transformation tasks. 
Comprehensions are widely used in Python for their ability to simplify common programming tasks involving iteration and transformation of data.

**Generators**, on the other hand, offer memory-efficient iterable sequences, generating values as needed rather than storing them in memory, making them ideal for handling large datasets. 

In general, comprehensions and generators provide you with essential techniques for concise, readable, and efficient data manipulation and iteration.

## 2. Collections of Data

Comprehensions and generators are ways of transforming
collections of data.
So, let us first look at such data collections.

A collection of data can be **stored** in a **tuple**, **list**, **set**, or **dictionary**. The difference between using one or the other will depend on the properties you need to represent your data. In the following table, we briefly present the properties of each of these collections.

| Collection | Mutable | Ordered | Allows duplicates | Indexed | Representation |
|------------|:-------:|:-------:|:----------------:|:-------:|:--------------:|
| `tuple` | ✖︎ | ✔︎ | ✔︎ | ✔︎ | `(...)` |
| `list`  | ✔︎ | ✔︎ | ✔︎ | ✔︎ | `[...]` |
| `set`   | ✔︎ | ✖︎ | ✖︎ | ✖︎ | `{...}` |
| `dict`  | ✔︎ | ✖︎ | ✖︎ | ✔︎ * | `{key: val, ...}` |

<sup>*</sup> You access key-value pairs via a key.

Additionally, we have already seen a number of built-in operations on lists, such as
* **indexing**, to access an item at a given index:
  `s[i]`, where `0 <= i < len(s)`
* **slicing**, to extract a subsequence of items:
  `s[a:b]` extracts the list of `s[i]` with `a <= i < b`,
  or `s[a:b:c]` where `c` is the step size
* **concatenation**: `s + t`
* **length**: `len(s)`
* **aggregation**: `sum(s)`, `min(s)`, `max(s)`
* **sorting**: `sorted(s)`

In [1]:
from typing import Dict, List, Set, Tuple

## 3. Iterables and Iterators

What collections of data have in common is that they are _iterable_.
An **iterable** is any collection that you can step through one by one.
To traverse an *iterable* you require an **iterator**.
An *iterator* is an object that traverses your iterable and returns one element at a time. 
To transform a collection into an iterator, we use the `iter` method.
Afterwards, each element in the iterator can be accessed by repeatedly calling the `next` method.

<img src="assets/iterable-iterator.png" alt="Iterables and Iterators" width="500"/>

<div style="text-align:center">
    <span style="font-size:0.9em; font-weight: bold;">Iterables and iterators in Python.</span>
</div>

In [2]:
# Create a list of integers, which is an iterable
iterable: List[int] = [1, 2, 3, 4, 5]
print(type(iterable))
iterable

<class 'list'>


[1, 2, 3, 4, 5]

In [3]:
from typing import Iterator 

# Create an iterator out from the "iterable" list
iterator: Iterator = iter(iterable)
type(iterator)

list_iterator

In [4]:
# Traverse the iterable via the iterator
num_one = next(iterator)
print(f'First iteration: {num_one}')

num_two = next(iterator)
print(f'Second iteration: {num_two}')

num_three = next(iterator)
print(f'Third iteration: {num_three}')

num_four = next(iterator)
print(f'Fourth iteration: {num_four}')

num_five = next(iterator)
print(f'Fifth iteration: {num_five}')

First iteration: 1
Second iteration: 2
Third iteration: 3
Fourth iteration: 4
Fifth iteration: 5


If the collection has been traversed and you call the `next()` function one more time, you will get a `StopIteration` error.

In [5]:
next(iterator)

StopIteration: 

Let us now use the `while` loop to iterate over our iterable in a smarter way.

In [6]:
iterable: List[int] = [1, 2, 3, 4, 5]
iterator: Iterator = iter(iterable)
i: int = 0

while i < len(iterable):
    val: int = next(iterator)
    print(val)
    i += 1

1
2
3
4
5


When using a `for` loop, you do not need to call these methods. The previous procedure is done under the hood by Python!

Furthermore, one *iterable* can be iterated over or traversed multiple times. Each new traversal involves a new *iterator*.
In fact, multiple iterators can be active concurrently on the same collection, as it is the case of nested loops.

In [7]:
letters: Tuple = ('a', 'b', 'c')

for i in letters:
    for j in letters:
        print('{}{}'.format(i, j), end=" ")
    print()

aa ab ac 
ba bb bc 
ca cb cc 


<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Iterate over the list of words and add the words that start by 'a' to a new list.
</div>

In [8]:
words = ['age', 'bee', 'ask', 'cut', 'clean', 'zoo', 'add']
# Remove this line and add your code here

## 4. Comprehensions

A **list comprehension** is an _expression_ that constructs a list
based on some *iterable*.
The items taken from the iterable can be _transformed_ in an expression
before being collected in the list.

Suppose you want to create a list of 10 numbers where each integer value is the squared value of the index, so 1, 4, 9, 16, etc.

The following program fragment
with a `for` statement and an auxiliary variable (`aux`).

In [9]:
aux: List[int] = []

for n in range(10):
    aux.append(n * n)

aux

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In the next example, the iterable is `range(10)`, and we collect the squares of the numbers in that range:

In [10]:
[n * n for n in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The list comprehension above defines the same list, but the comprehension is more compact, and
does not need an explicit list variable.

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Remember our first exercise in this notebook? Use a comprehension to iterate over the list of words and add the words that start by 'a' to a new list.
</div>

In [11]:
words: List['str'] = ['age', 'bee', 'ask', 'cut', 'clean', 'zoo', 'add']
# Remove this line and add your code here

## 5. Selective Inclusion in a Comprehension

You can also _selectively_ include items in the constructed list,
by using an `if` clause.

The behaviour is similar to *filter* we saw previously.

For instance, the numbers less than 10
that have a remainder of more than 2 when divided by 7:

In [12]:
from typing import List

def remainders() -> List[int]:
    """
    Filters all numbers not having a remainder of more than 2 when divided by 7.
    :returns: list of numbers having a remainder of more than 2 when divided by 7.
    """
    result_nums: List[int] = []
    for n in range(10):
        if n % 7 > 2:
            result_nums.append(n)
    return result_nums


print(remainders())

[3, 4, 5, 6]


Now with a comprehension:

In [13]:
[n for n in range(10) if n % 7 > 2]

[3, 4, 5, 6]

And here is the list of _squares_ of those numbers
that have a remainder of more than 2 when divided by 7.

In [14]:
[n * n for n in range(10) if n % 7 > 2]

[9, 16, 25, 36]

Note that the condition is applied to the items
_before_ they are transformed.

This is like putting an `if` statement inside the `for` loop.

In [15]:
aux: List[int] = []
    
for n in range(10):
    if n % 7 > 2:
        aux.append(n * n)

aux

[9, 16, 25, 36]

In summary, a **list comprehension**
```python
[E(v) for v in iterable if C(v)]
```
  
* takes items from an _**iterable**_:
  ```python
  for v in iterable
  ```
* _**selects**_ items based on a condition:
  ```python
  if C(v)
  ```
* _**transforms**_ the selected items using an expression:
  ```python
  E(v)
  ```
* and _**collects**_ the expression results in a list:
  ```python
  [...]
  ```

Note the order: first select, then transform
(even though you write the transformation first, and the selection last).

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Use a comprehension to create a list where all words in the given list are transformed into capital letters.
</div>

In [16]:
words: List[str] = ['age', 'bee', 'ask', 'cut', 'clean', 'zoo', 'add']
# Remove this line and add your code here

## 6. Nested Comprehensions

If you need to select _after_ transforming the items,
then you can use a _nested_ comprehension
(but do read the **warning** after the following example).

In [17]:
[sq for sq in [n * n for n in range(10)] if sq % 7 > 2]

[4, 25, 81]

Observe that the result is different with `[n * n for n in range(10) if n % 7 > 2]`, why?

<div class="alert alert-info">
    <b>Evaluation of nested comprehensions</b><br>
    In a <b>nested comprehension</b>,
    the <i>inner</i> comprehension is <b>completely evaluated and stored</b>
    before it is being used in the <i>outer</i> comprehension.
</div>

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    We have a list of lists. The internal lists have numbers as elements. We would like to flatten the outer list; i.e. instead of have lists of lists of numbers we just want to have a list of numbers. Use a nested comprehension to achieve this goal.
</div>

In [18]:
lst = [[1, 2, 3], [4, 5, 6, 7], [8, 9, 10], [11], [12, 13, 14, 15]]
# Remove this line and add your code here

### Alternative Approaches for Selection _After_ Transformation

Perfom the transformation (in this case: squaring)
also in the `if` clause:

In [19]:
[n * n for n in range(10) if n * n % 7 > 2]

[4, 25, 81]

A disadvantage of this approach is that every transformation is done *twice*,
which can be costly if the transformation is expensive.

## 7. Set and Dictionary Comprehensions

* [Comprehensions also work for _sets_ and _dictionaries_](https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets-and-dictionaries).
* Comprehensions can involve multiple `for` and `if` clauses
  (but always start with an expression and a `for` clause).
  
Here are some examples.
The _set_ of non-prime numbers up to 100:

In [20]:
composites: Set[int] = {i * j for i in range(2, 10 + 1) for j in range(2, 100 // i + 1)}
composites

{4,
 6,
 8,
 9,
 10,
 12,
 14,
 15,
 16,
 18,
 20,
 21,
 22,
 24,
 25,
 26,
 27,
 28,
 30,
 32,
 33,
 34,
 35,
 36,
 38,
 39,
 40,
 42,
 44,
 45,
 46,
 48,
 49,
 50,
 51,
 52,
 54,
 55,
 56,
 57,
 58,
 60,
 62,
 63,
 64,
 65,
 66,
 68,
 69,
 70,
 72,
 74,
 75,
 76,
 77,
 78,
 80,
 81,
 82,
 84,
 85,
 86,
 87,
 88,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 98,
 99,
 100}

Let us decompose this rather complex comprehension by looking what each step does, both as `list` and `set`.

In [21]:
[i for i in range(2, 10 + 1)]

[2, 3, 4, 5, 6, 7, 8, 9, 10]

In [22]:
{i for i in range(2, 10 + 1)}

{2, 3, 4, 5, 6, 7, 8, 9, 10}

Let us add the second comprehension and first see the result when a list is being returned.

In [23]:
for i in range(2, 10 + 1):
    print(100 // i)

50
33
25
20
16
14
12
11
10


In [24]:
for i in range(2, 10 + 1):
    print(100 // i + 1)

51
34
26
21
17
15
13
12
11


In [25]:
[j for i in range(2, 10 + 1) for j in range(2, 100 // i + 1)]

[2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10]

In [26]:
{j for i in range(2, 10 + 1) for j in range(2, 100 // i + 1)}

{2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50}

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    We would like to use a set comprehension to create a set with all words of length 4 that are part of a given list. Remember that sets do not have duplicates and that is why we eant to use them as data type.
</div>

In [27]:
lst: List[str] = ['funny', 'that', 'little', 'yoke', 'sunny', 'side', 'up', 'in', 'the', 'span',
       'of', 'the', 'lake']
# Remove this line and add your code here

A _dictionary_ that associates the numbers 13 through 32
to their squares:

In [28]:
squares: Dict[int, int] = {n: n * n for n in range(13, 32 + 1)}
squares

{13: 169,
 14: 196,
 15: 225,
 16: 256,
 17: 289,
 18: 324,
 19: 361,
 20: 400,
 21: 441,
 22: 484,
 23: 529,
 24: 576,
 25: 625,
 26: 676,
 27: 729,
 28: 784,
 29: 841,
 30: 900,
 31: 961,
 32: 1024}

A _list_ of powers
where the base is a prime less than 10 and
exponents run from 2 through 5
(it uses `composites` defined earlier);
note the clause order `for if for`:

In [29]:
[base ** exp for base in range(2, 10 + 1) if base not in composites for exp in range(2, 5 + 1)]

[4, 8, 16, 32, 9, 27, 81, 243, 25, 125, 625, 3125, 49, 343, 2401, 16807]

In order to understand it may be again helpful to look at the individual results of each comprehension.

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Use a comprehension to create a dictionary that stores the words within the list <i>lst</i> as keys and their length as values.
</div>

In [30]:
lst: List[str] = ['funny', 'that', 'little', 'yoke', 'sunny', 'side', 'up', 'in', 'the', 'span',
       'of', 'the', 'lake']
# Remove this line and add your code here

## 8. Comprehensions Evaluation

Comprehensions are completely evaluated before further use.

What if you do not need the whole list constructed by the comprehension?
Suppose, for instance, you only want the first element.
Will the whole list still be computed?

To illustrate what we mean,
we first show a version with explicit
**`for`**, **`if`**, and **`break`** statements,
that avoids computing all values in the list.
It stops when the first item has been computed.

In [31]:
aux: List[int] = []

for n in range(10):
    if n % 7 > 2:
        aux.append(n * n)
        break

print(aux)
aux[0]

[9]


9

Let's try this with our comprehension,
by extracting the first item (at index 0).

In [32]:
[n * n for n in range(20) if n % 7 > 2][0]

9

Actually, we now cannot see whether the whole list got computed.
So, let us introduce a function `f` with a **side effect**
to make this visible.
A *side effect* is the modification of any sort of state such as changing a mutable variable, using IO (Input/Output), or throwing an exception.
As a function, `trail` does nothing to its argument:
it returns _n_ unchanged.
But it also prints a dot, and this is a (visible) side effect.

In [33]:
from typing import Any

def trail(n: Any) -> Any:
    """ 
    Print a dot and return n.
    :param n: any value
    :returns: value given as parameter without any change
    """
    print('.', end='')
    return n

Let us try again.

In [34]:
[trail(n * n) for n in range(20) if n % 7 > 2][0]

...........

9

Apparently, the whole list got computed first.

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Use a comprehension to generate a list of the numbers from 1 to 10 squared, but only conmpute the squares until you find a square that is greater than 50. Use a side effect function, such as the function to print a dot for each computation. Finally, print the generated list of squares. <br>
<b>Hint:</b>   You may use the <code>trail</code> function
</div>

In [35]:
# Remove this line and add your code here

## 9. Generators

We can fix this by using a generator.
* A [**generator expression**](https://docs.python.org/3/reference/expressions.html#generator-expressions) is like a comprehension:
it _selectively_ takes items from an _iterable_ and
_transforms_ them.
* But a generator does not construct a list to store all items.
* A generator is **lazy**,
in the sense that a generator will not be computed completely in advance.
(In fact, a generator can be endless/infinite.)
* Instead,
a generator is only evaluated to the extent that its values are needed.
The evaluation of a generator is **demand driven**.

A generator is not a list, but it is itself again an _iterable_.
In fact, a generator is an _iterator_.
(A list is also an iterable, but a list is completely stored in memory.)

Let us define a function `first`
that will only extract the first element of an iterable.
(We need this function `first`,
because we cannot extract the first item from an iterable by indexing it at 0,
like we did with the list comprehension.)

In [36]:
from typing import Iterable

def first(iterable: Iterable) -> Any:
    """ 
    Returns first item from iterable.
    :param iterable: the set which has to be iterated
    :returns: the first element of the iterable.
    """
    for item in iterable:
        return item  # and ignore everything else

If we apply this to the list comprehension, we (again) see that the whole list still gets computed.

In [37]:
first([trail(n * n) for n in range(20) if n % 7 > 2])

...........

9

Now, let us apply it to the generator version of the comprehension.
Note the use of **round parentheses** instead of square brackets.
By the way, since the function call also involves round parentheses,
we don't have to repeat another pair.

In [38]:
first(trail(n * n) for n in range(20) if n % 7 > 2)

.

9

We see that now only one item got computed (the first one).

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Create the function <i>first_multiple_of_2</i> which gets an iterable as argument and returns the first number that is multiple of two. Build a generator that iterates over number from 1 to 10, and computes the result of multiplying a given number by 3. 
</div>

In [39]:
# Remove this line and add your code here

### A Generator Can be Used Only Once

Because a generator is a (special kind of) iterator,
it can be used only once.

Let us store our generator in a variable:

In [40]:
my_gen = (trail(n * n) for n in range(10) if n % 7 > 2)

You can then use this variable as an iterable.

In [41]:
first(my_gen)

.

9

In [42]:
for i in my_gen:
    print(i, end=" ")

.16 .25 .36 

Observe that the generator continued where it left off after its first (partial) use.
Once a generator is exhausted (has reached the end), it becomes useless.

In [43]:
for i in my_gen:
    print(i, end=" ")

(There is no output, because the generator was already exhausted.)

## 10. Factories

Since generators (like iterators) are not reusable,
it is more common to define a **function that returns a fresh generator** on each call.
Such a function is also known as a **factory**.
If that function is parameterized,
then you can produce _customized_ generators.

Here is an example of a parameterized factory:

In [44]:
from typing import Generator

def square_factory(m: int) -> Generator[int, None, None]:
    """ 
    Returns a generator for the squares of numbers in the range [0, m).
    :param m: an integer value
    :returns: an iterable as generator
    """
    return (n * n for n in range(m) if n % 7 > 2)

The call `square_factory(10)` returns (a fresh copy of) the generator
that we considered above.

Let's try the same things again, using this factory.

In [45]:
first(square_factory(10))

9

In [46]:
for i in square_factory(10):
    print(i, end=" ")
    
print()

for i in square_factory(10):
    print(i, end=" ")

9 16 25 36 
9 16 25 36 

That looks better.
The **`for`** loop starts all over again.

<div class="alert alert-success">
    <b>Do It Yourself!</b><br>
    Create a factory function to return the generator you created before: the one related to computing the multiplication of a number by 3.
</div>

In [47]:
# Remove this line and add your code here

## 11. Summary <a class="anchor" id="summary"></a>


This chapter provides a comprehensive overview of Python **comprehensions** and **generators**. 

**Comprehensions** are powerful constructs for creating lists, dictionaries, and sets by performing operations on iterable elements while optionally applying filters, enhancing code readability and efficiency. 

On the other hand, **Generators** are a memory-efficient way to create iterable sequences of data, allowing you to generate values as needed, rather than storing them in memory. 
This makes generators particularly useful for, for example, handling large datasets.

This chapter also covers iterables, iterators, collections (like lists, sets, and dictionaries), selective inclusion in comprehensions, nested comprehensions, set and dictionary comprehensions, comprehension evaluation, and the one-time use limitation of generators. 
Factory functions are explored as a means to create customized generators.

By delving into these concepts, the chapter provides a comprehensive understanding of how Python comprehensions and generators are fundamental tools for concise, efficient, and readable data manipulation and iteration.

---
This Jupyter Notebook is based on Chapter 19 (Sections 19.2-19.5) of the book Think Python. 
The extra material presented is based on Jupyter notebooks developed by Tom Verhoeff (TU/e).

---

# (End of Notebook)

&copy; 2023 - **TU/e** - Eindhoven University of Technology