# List comprehension, generators, iteration

## BProf Python course

### June 25-29, 2018

#### Judit Ács

# List comprehension

- transform any iterable into a list in one line
- syntactic sugar
- example: create a list of the first N odd numbers starting from 1

In [1]:
l = []
for i in range(10):
    l.append(2*i+1)
l

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

one-liner equivalent

In [2]:
l = [2*i+1 for i in range(10)]
l

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

## The general form of list comprehension is

~~~
[<expression> for <element> in <sequence>]
~~~

conditional expressions can be added to filter the sequence:

~~~
[<expression> for <element> in <sequence> if <condition>]
~~~

In [3]:
even = [n*n for n in range(20) if n % 2 == 0]
even

[0, 4, 16, 36, 64, 100, 144, 196, 256, 324]

which is equivalent to

In [4]:
even = []
for n in range(20):
    if n % 2 == 0:
        even.append(n*n)
even

[0, 4, 16, 36, 64, 100, 144, 196, 256, 324]

- since this expression implements a filtering mechanism, there is no `else` clause

- an if-else clause can be used as the first expression though:

In [5]:
l = [1, 0, -2, 3, -1, -5, 0]

signum_l = [(int(n / abs(n)) if n != 0 else 0) for n in l]
signum_l

[1, 0, -1, 1, -1, -1, 0]

In [6]:
n = -3.2
int(n / abs(n)) if n != 0 else 0

-1

More than one sequence may be traversed. Is this depth-first or breadth-first traversal?

In [7]:
l1 = [1, 2, 3]
l2 = [4, 5, 6]

[(i, j) for i in l1 for j in l2]

[(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]

In [8]:
[(i, j) for j in l2 for i in l1]

[(1, 4), (2, 4), (3, 4), (1, 5), (2, 5), (3, 5), (1, 6), (2, 6), (3, 6)]

List comprehensions may be nested by replacing the first expression with another list comprehension:

In [9]:
matrix = [
    [1, 2, 3],
    [5, 6, 7]
]

[[e*e for e in row] for row in matrix]

[[1, 4, 9], [25, 36, 49]]

## What is the type of a (list) comprehension?

In [10]:
i = (i for i in range(10))
type(i)

generator

# Generator expressions

Generator expressions are a generalization of list comprehension. They were introduced in PEP 289 in 2002.

Check out the memory consumption of these cells.

In [11]:
N = 8
s = sum([i*2 for i in range(int(10**N))])
print(s)

9999999900000000


In [12]:
s = sum(i*2 for i in range(int(10**N)))
print(s)

9999999900000000


Generators do not generate a list in memory

In [13]:
even_numbers = (2*n for n in range(10))
even_numbers

<generator object <genexpr> at 0x7fd8e80795c8>

therefore they can only be traversed once

In [14]:
for num in even_numbers:
    print(num)

0
2
4
6
8
10
12
14
16
18


the generator is empty after the first run

In [15]:
for num in even_numbers:
    print(num)

calling `next()` raises a `StopIteration` exception

In [16]:
even_numbers = (2*n for n in range(10))

while True:
    try:
        print(next(even_numbers))
    except StopIteration:
        break

0
2
4
6
8
10
12
14
16
18


In [17]:
# next(even_numbers)  # raises StopIteration

In [18]:
l = [1, 2, 3]
liter = iter(l)
print(next(liter))
print(next(liter))
print(next(liter))
# print(next(liter))  # raises StopIteration

1
2
3


these are actually the defining properties of the **iteration protocol**

# Iteration protocol

A class satisfies the iteration protocol if:

1. it has a `__iter__` function that returns and iterator, which
1. has a `__next__` function (this function is called `next` in Python 2),
2. raises a `StopIteration` after a certain number of iterations

For loops use the iteration protocol.

In [19]:
class MyIterator:
    def __init__(self):
        self.iter_no = 5
        
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.iter_no <= 0:
            raise StopIteration()
        self.iter_no -= 1
        print("Returning {}".format(self.iter_no))
        return self.iter_no
    
myiter = MyIterator()

for i in myiter:
    print(i)

Returning 4
4
Returning 3
3
Returning 2
2
Returning 1
1
Returning 0
0


In [20]:
class MyList:
    def __init__(self, elements):
        self.l = list(elements)
        
    def __iter__(self):
        return iter(self.l)
    
for e in MyList("abce"):
    print(e)

a
b
c
e


In [21]:
gen1 = (n * n for n in range(10))
gen2 = gen1

for e in gen1:
    pass

for e in gen2:
    print(e)

# Set and dict comprehension

Sets and dictionaries can be instantiated via generator expressions too.

A generator expression between curly brackets instantiates a set:

In [22]:
fruit_list = ["apple", "plum", "apple", "pear"]

fruits = {fruit.title() for fruit in fruit_list}

type(fruits), len(fruits), fruits

(set, 3, {'Apple', 'Pear', 'Plum'})

if the expression in the generator is a key-value pair separated by a colon, it instantiates a dictionary:

In [23]:
word_list = ["apple", "plum", "pear", "apple", "apple"]
word_length = {word: len(word) for word in word_list}
type(word_length), len(word_length), word_length

(dict, 3, {'apple': 5, 'pear': 4, 'plum': 4})

In [24]:
word_list = ["apple", "plum", "pear", "avocado"]
first_letters = {word[0]: word for word in word_list}
first_letters

{'a': 'avocado', 'p': 'pear'}

# `yield` keyword

- if a function uses `yield` instead of return, it becomes a **generator function**
- `yield` temporarily gives back the execution to the caller
- the generator function continues

In [25]:
def hungarian_vowels():
    alphabet = ("a", "á", "e", "é", "i", "í", "o", "ó",
                "ö", "ő", "u", "ú", "ü", "ű")
    for vowel in alphabet:
        yield vowel

this function returns a generator object

In [26]:
type(hungarian_vowels())

generator

In [27]:
for vowel in hungarian_vowels():
    print(vowel)

a
á
e
é
i
í
o
ó
ö
ő
u
ú
ü
ű


In [28]:
gen = hungarian_vowels()

print("first iteration: {}".format(", ".join(gen)))
print("second iteration: {}".format(", ".join(gen)))

first iteration: a, á, e, é, i, í, o, ó, ö, ő, u, ú, ü, ű
second iteration: 


In [29]:
print("first iteration: {}".format(", ".join(hungarian_vowels())))
print("second iteration: {}".format(", ".join(hungarian_vowels())))

first iteration: a, á, e, é, i, í, o, ó, ö, ő, u, ú, ü, ű
second iteration: a, á, e, é, i, í, o, ó, ö, ő, u, ú, ü, ű


The `next` function returns the next element of the generator.
A `StopIteration` is raised when no more elements are left:

In [30]:
gen = hungarian_vowels()

while True:
    try:
        print("The next element is {}".format(next(gen)))
    except StopIteration:
        print("No more elements left :(")
        break

The next element is a
The next element is á
The next element is e
The next element is é
The next element is i
The next element is í
The next element is o
The next element is ó
The next element is ö
The next element is ő
The next element is u
The next element is ú
The next element is ü
The next element is ű
No more elements left :(


the generator function returns a new generator object every time it's called

In [31]:
gen1 = hungarian_vowels()
gen2 = hungarian_vowels()

print(gen1 is gen2)
print("gen1 first time:", list(gen1))
print("gen1 second time:", list(gen1))
print("gen2 first time:", list(gen2))

False
gen1 first time: ['a', 'á', 'e', 'é', 'i', 'í', 'o', 'ó', 'ö', 'ő', 'u', 'ú', 'ü', 'ű']
gen1 second time: []
gen2 first time: ['a', 'á', 'e', 'é', 'i', 'í', 'o', 'ó', 'ö', 'ő', 'u', 'ú', 'ü', 'ű']


iterators can only be traversed forward, but we can easily wrap an iterator to have memory:

In [32]:
def iter_with_memory(orig_iter):
    prev = None
    for current in orig_iter:
        yield current, prev
        prev = current

In [33]:
for i in iter_with_memory(hungarian_vowels()):
    print(i)

('a', None)
('á', 'a')
('e', 'á')
('é', 'e')
('i', 'é')
('í', 'i')
('o', 'í')
('ó', 'o')
('ö', 'ó')
('ő', 'ö')
('u', 'ő')
('ú', 'u')
('ü', 'ú')
('ű', 'ü')


## Q. Add a `memory_size` parameter to the previous function which specifies how many of the previous elements are stored.

You can yield them in a list or better, wrap it in a class.

# Exercises

Generator expressions can be particularly useful for formatted output. We will demonstrate this through a few examples.

In [34]:
numbers = [1, -2, 3, 1]

# print(", ".join(numbers))  # raises TypeError
print(", ".join(str(number) for number in numbers))

1, -2, 3, 1


In [35]:
shopping_list = ["apple", "plum", "pear"]

~~~
The shopping list is:
item 1: apple
item 2: plum
item 3: pear
~~~

In [36]:
shopping_list = ["apple", "plum", "pear"]

print("The shopping list is:\n{}".format(
    "\n".join("item {}: {}".format(i+1, item) for i, item in enumerate(shopping_list))))

The shopping list is:
item 1: apple
item 2: plum
item 3: pear


## Q. Print the following shopping list with quantities.

For example:

~~~
item 1: apple, quantity: 2
item 2: pear, quantity: 1
~~~

In [37]:
shopping_list = {
    "apple": 2,
    "pear": 1,
    "plum": 5,
}

print("\n".join("item {}: {}, quantity: {}".format(idx+1, item, quantity)
                for idx, (item, quantity) in enumerate(shopping_list.items())))

item 1: apple, quantity: 2
item 2: pear, quantity: 1
item 3: plum, quantity: 5


## Q. Print the same format in alphabetical order.

- Decreasing order by quantity

In [38]:
shopping_list = {
    "apple": 2,
    "pear": 1,
    "plum": 5,
}
print("\n".join("item {}: {}, quantity: {}".format(idx+1, item, quantity)
                for idx, (item, quantity) in enumerate(sorted(shopping_list.items()))))

item 1: apple, quantity: 2
item 2: pear, quantity: 1
item 3: plum, quantity: 5


In [39]:
print("\n".join(
    "item {0}: {1}, quantity: {2}".format(idx+1, item, quantity) for idx, (item, quantity) in
    enumerate(sorted(shopping_list.items(), key=lambda x: -x[1]))))

item 1: plum, quantity: 5
item 2: apple, quantity: 2
item 3: pear, quantity: 1


## Q. Print the list of students. 

In [40]:
students = [
    ["Joe", "John", "Mary"],
    ["Tina", "Tony", "Jeff", "Béla"],
    ["Pete", "Dave"],
]

## Q. Print one class-per-line and print the size of the class too

Example:
~~~
class 1, size: 3, students: Joe, John, Mary
class 2, size: 2, students: Pete, Dave
~~~

## Q. Sort the classes by size in increasing order

Example:
~~~
class 1, size: 2, students: Pete, Dave
class 2, size: 3, students: Joe, John, Mary
~~~

In [41]:
hun_mapping = {c: i for i, c in enumerate("aábcdéúű")}

sorted(list("aááéáúűbcd"), key=lambda c: hun_mapping[c])

['a', 'á', 'á', 'á', 'b', 'c', 'd', 'é', 'ú', 'ű']