# Chapter 8. Iterables

## 8.1 Comprehensions

(1) Types of comprehensions

* List comprehensions
* Set comprehensions
* Dictionary comprehensions

(2) Style

* Declarative
* Functional

(3) Benefits

* Readable
* Expressive
* Effective

### 8.1.1 List comprehensions

General syntax:

```python
[expr(item) for item in iterable]
```

In [1]:
words = "Why sometimes I have believed as many as six impossible things before breakfast".split()
words

['Why',
 'sometimes',
 'I',
 'have',
 'believed',
 'as',
 'many',
 'as',
 'six',
 'impossible',
 'things',
 'before',
 'breakfast']

In [2]:
[len(word) for word in words]

[3, 9, 1, 4, 8, 2, 4, 2, 3, 10, 6, 6, 9]

In [3]:
lengths = []
for word in words:
    lengths.append(len(word))
    
lengths

[3, 9, 1, 4, 8, 2, 4, 2, 3, 10, 6, 6, 9]

In [4]:
from math import factorial
f = [len(str(factorial(x))) for x in range(20)]
f

[1, 1, 1, 1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18]

In [5]:
type(f)

list

### 8.1.2 Set comprehensions

General syntax:

```python
{expr(item) for item in iterable}
```

In [7]:
{len(str(factorial(x))) for x in range(20)}

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18}

### 8.1.3 Dictionary comprehensions

General syntax:

```python
{key_expr: value_expr for item in iterable}
```

In [8]:
from pprint import pprint as pp
country2capital = {"United Kingdom": "London",
                   "Brazil": "Brazilia",
                   "Morocco": "Rabat",
                   "Sweden": "Stockholm"}
capital2country = {capital: country for country, capital in country2capital.items()}
pp(capital2country)

{'Brazilia': 'Brazil',
 'London': 'United Kingdom',
 'Rabat': 'Morocco',
 'Stockholm': 'Sweden'}


In [10]:
# Duplicates: later keys overwrite earlier keys
words = ["hi", "hello", "foxtrot", "hotel"]
{x[0]: x for x in words}

{'f': 'foxtrot', 'h': 'hotel'}

In [11]:
# Don't cram too much complexity into comprehensions!
import os
import glob
file_sizes = {os.path.realpath(p): os.stat(p).st_size for p in glob.glob('*.py')}
pp(file_sizes)

{'/home/renwei/repos/github/learning-ml/python/pluralsight-python-fundamental/exceptional.py': 380,
 '/home/renwei/repos/github/learning-ml/python/pluralsight-python-fundamental/roots.py': 883}


### 8.1.4 Filtering predicates

Optional filtering clause:

```python
[expr(item) for item in iterable if predicate(item)]
```

"Simple is better  
than complex"

"Code is written once  
But read over and over  
Fewer is clearer"

In [6]:
from math import sqrt

def is_prime(x):
    if x < 2:
        return False
    for i in range(2, int(sqrt(x) + 1)):
        if x % i == 0:
            return False
    return True

[x for x in range(101) if is_prime(x)]

[2,
 3,
 5,
 7,
 11,
 13,
 17,
 19,
 23,
 29,
 31,
 37,
 41,
 43,
 47,
 53,
 59,
 61,
 67,
 71,
 73,
 79,
 83,
 89,
 97]

In [13]:
prime_square_divisors = {x * x: (1, x, x * x) for x in range(101) if is_prime(x)}
pp(prime_square_divisors)

{4: (1, 2, 4),
 9: (1, 3, 9),
 25: (1, 5, 25),
 49: (1, 7, 49),
 121: (1, 11, 121),
 169: (1, 13, 169),
 289: (1, 17, 289),
 361: (1, 19, 361),
 529: (1, 23, 529),
 841: (1, 29, 841),
 961: (1, 31, 961),
 1369: (1, 37, 1369),
 1681: (1, 41, 1681),
 1849: (1, 43, 1849),
 2209: (1, 47, 2209),
 2809: (1, 53, 2809),
 3481: (1, 59, 3481),
 3721: (1, 61, 3721),
 4489: (1, 67, 4489),
 5041: (1, 71, 5041),
 5329: (1, 73, 5329),
 6241: (1, 79, 6241),
 6889: (1, 83, 6889),
 7921: (1, 89, 7921),
 9409: (1, 97, 9409)}


## 8.2 Iteration protocols

### 8.2.1 Iterable protocol

Iterable objects can be passed to the built-in `iter()` function to get an iterator.

```python
iterator = iter(iterable)
```

### 8.2.2 Iterator protocol

Iterator objects can be passed to the built-in `next()` function to fetch the next item.

```python
item = next(iterator)
```

In [14]:
iterable = ['Spring', 'Summer', 'Autumn', 'Winter']
iterator = iter(iterable)

next(iterator)

'Spring'

In [15]:
next(iterator)

'Summer'

In [16]:
next(iterator)

'Autumn'

In [17]:
next(iterator)

'Winter'

In [18]:
next(iterator)

StopIteration: 

In [19]:
def first(iterable):
    iterator = iter(iterable)
    try: 
        return next(iterator)
    except StopIteration:
        raise ValueError("iterable is empty")
        
first(['1st', '2nd', '3rd'])

'1st'

In [20]:
first({'1st', '2nd', '3rd'})

'1st'

In [21]:
first(set())

ValueError: iterable is empty

## 8.3 Generators

### 8.3.1 Introduction

```python
def gen123():
    yield 1
    yield 2
    yield 3
    # optional return
    return
```

(1) Specify iterable sequences

All generators are iterators.

(2) Are lazily evaluated

The next value in the sequence is computed on demand.

(3) Can model infinite sequences

such as data streams with no definite end

(4) Are composable into pipelines

for natural stream processing

In [22]:
def gen123():
    yield 1
    yield 2
    yield 3
    
g = gen123()
g

<generator object gen123 at 0x7f837e798308>

In [23]:
next(g)

1

In [24]:
next(g)

2

In [25]:
next(g)

3

In [26]:
next(g)

StopIteration: 

In [27]:
for v in gen123():
    print(v)

1
2
3


In [28]:
h = gen123()
i = gen123()
h

<generator object gen123 at 0x7f838c0f2830>

In [29]:
i

<generator object gen123 at 0x7f838c0f2258>

In [30]:
h is i

False

In [31]:
next(h)

1

In [32]:
next(h)

2

In [33]:
next(i)

1

In [34]:
def gen246():
    print("About to yield 2")
    yield 2
    print("About to yield 4")
    yield 4
    print("About to yield 6")
    yield 6
    print("About to return")
    
g = gen246()
next(g)

About to yield 2


2

In [35]:
next(g)

About to yield 4


4

In [36]:
next(g)

About to yield 6


6

In [37]:
next(g)

About to return


StopIteration: 

### 8.3.2 Stateful generators

(1) Generators resume execution.

(2) Can maintain state in local variables.

(3) Complex control flow

(4) Lazy execution

To debug inside jupyter, use ipdb. Add `set_trace()` where you want to set a breakpoint.

For executing line by line use n and for step into a function use s and to exit from debugging prompt use c.

In [3]:
"""Module for demonstrating generator execution."""

def take(count, iterable):
    """Take items from the front of an iterable.
    Args:
        count: The maximum number of items to retrieve.
        iterable: The source series.
        
    Yields:
        At most `count` items from `iterable`.
    """
    counter = 0
    set_trace()
    for item in iterable:
        if counter == count:
            return
        counter += 1
        yield item
        
def run_take():
    set_trace()
    items = [2, 4, 6, 8, 10]
    for item in take(3, items):
        print(item)
        
if __name__ == "__main__":
    from IPython.core.debugger import set_trace
    run_take()

> [0;32m<ipython-input-3-f844a8052207>[0m(22)[0;36mrun_take[0;34m()[0m
[0;32m     20 [0;31m[0;32mdef[0m [0mrun_take[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0m
[0m[0;32m     21 [0;31m    [0mset_trace[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0m
[0m[0;32m---> 22 [0;31m    [0mitems[0m [0;34m=[0m [0;34m[[0m[0;36m2[0m[0;34m,[0m [0;36m4[0m[0;34m,[0m [0;36m6[0m[0;34m,[0m [0;36m8[0m[0;34m,[0m [0;36m10[0m[0;34m][0m[0;34m[0m[0m
[0m[0;32m     23 [0;31m    [0;32mfor[0m [0mitem[0m [0;32min[0m [0mtake[0m[0;34m([0m[0;36m3[0m[0;34m,[0m [0mitems[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0m
[0m[0;32m     24 [0;31m        [0mprint[0m[0;34m([0m[0mitem[0m[0;34m)[0m[0;34m[0m[0m
[0m
ipdb> n
> [0;32m<ipython-input-3-f844a8052207>[0m(23)[0;36mrun_take[0;34m()[0m
[0;32m     21 [0;31m    [0mset_trace[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0m
[0m[0;32m     22 [0;31m    [0mitems[0m [0;34m=[0m [0;34m[[0m[0;36m2

In [4]:
def distinct(iterable):
    """Return unique items by eliminating the duplicates
    
    Args:
        iterable: The source series.
        
    Yields:
        unique elements in order from `iterable`.
    """
    seen = set()
    for item in iterable:
        if item in seen:
            continue
        yield item
        seen.add(item)
        
def run_distinct():
    items = [5, 7, 7, 6, 5, 5]
    for item in distinct(items):
        print(item)
        
if __name__ == "__main__":
    run_distinct()

5
7
6


In [7]:
def run_pipeline():
    items = [3, 6, 6, 2, 1, 1]
    for item in take(3, distinct(items)):
        print(item)
        
if __name__ == "__main__":
    run_pipeline()

> [0;32m<ipython-input-3-f844a8052207>[0m(14)[0;36mtake[0;34m()[0m
[0;32m     12 [0;31m    [0mcounter[0m [0;34m=[0m [0;36m0[0m[0;34m[0m[0m
[0m[0;32m     13 [0;31m    [0mset_trace[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0m
[0m[0;32m---> 14 [0;31m    [0;32mfor[0m [0mitem[0m [0;32min[0m [0miterable[0m[0;34m:[0m[0;34m[0m[0m
[0m[0;32m     15 [0;31m        [0;32mif[0m [0mcounter[0m [0;34m==[0m [0mcount[0m[0;34m:[0m[0;34m[0m[0m
[0m[0;32m     16 [0;31m            [0;32mreturn[0m[0;34m[0m[0m
[0m
ipdb> c
3
6
2


### 8.3.3 Laziness and the infinite

(1) Just in time computation

(2) Infinite (or large) sequences

* Sensor readings
* Mathematical series
* Massive files

In [31]:
def lucas():
    yield 2
    a = 2
    b = 1
    while True:
        yield b
        a, b = b, a + b
        
for x in lucas():
    print(x)

2
1
3
4
7
11
18
29
47
76
123
199
322
521
843
1364
2207
3571
5778
9349
15127
24476
39603
64079
103682
167761
271443
439204
710647
1149851
1860498
3010349
4870847
7881196
12752043
20633239
33385282
54018521
87403803
141422324
228826127
370248451
599074578
969323029
1568397607
2537720636
4106118243
6643838879
10749957122
17393796001
28143753123
45537549124
73681302247
119218851371
192900153618
312119004989
505019158607
817138163596
1322157322203
2139295485799
3461452808002
5600748293801
9062201101803
14662949395604
23725150497407
38388099893011
62113250390418
100501350283429
162614600673847
263115950957276
425730551631123
688846502588399
1114577054219522
1803423556807921
2918000611027443
4721424167835364
7639424778862807
12360848946698171
20000273725560978
32361122672259149
52361396397820127
84722519070079276
137083915467899403
221806434537978679
358890350005878082
580696784543856761
939587134549734843
1520283919093591604
2459871053643326447
3980154972736918051
6440026026380244498
1042018

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



KeyboardInterrupt: 

### 8.3.4 Generator comprehension

(1) Similar syntax to list comprehension.

```python
(expr(item) for item in iterable)
```

(2) Create a generator object.

(3) Concise

(4) Laze evaluation

In [1]:
million_squares = (x * x for x in range(1, 1000001))
million_squares

<generator object <genexpr> at 0x7f9a981b9938>

In [2]:
list(million_squares)

[1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361,
 400,
 441,
 484,
 529,
 576,
 625,
 676,
 729,
 784,
 841,
 900,
 961,
 1024,
 1089,
 1156,
 1225,
 1296,
 1369,
 1444,
 1521,
 1600,
 1681,
 1764,
 1849,
 1936,
 2025,
 2116,
 2209,
 2304,
 2401,
 2500,
 2601,
 2704,
 2809,
 2916,
 3025,
 3136,
 3249,
 3364,
 3481,
 3600,
 3721,
 3844,
 3969,
 4096,
 4225,
 4356,
 4489,
 4624,
 4761,
 4900,
 5041,
 5184,
 5329,
 5476,
 5625,
 5776,
 5929,
 6084,
 6241,
 6400,
 6561,
 6724,
 6889,
 7056,
 7225,
 7396,
 7569,
 7744,
 7921,
 8100,
 8281,
 8464,
 8649,
 8836,
 9025,
 9216,
 9409,
 9604,
 9801,
 10000,
 10201,
 10404,
 10609,
 10816,
 11025,
 11236,
 11449,
 11664,
 11881,
 12100,
 12321,
 12544,
 12769,
 12996,
 13225,
 13456,
 13689,
 13924,
 14161,
 14400,
 14641,
 14884,
 15129,
 15376,
 15625,
 15876,
 16129,
 16384,
 16641,
 16900,
 17161,
 17424,
 17689,
 17956,
 18225,
 18496,
 18769,
 19044,
 19321,
 19600,
 19881,
 20164,
 20449

In [3]:
list(million_squares)

[]

In [4]:
# Compared with suming the elements of the list, suming the generators consumes much less memory.
sum(x * x for x in range(1, 1000001))

333333833333500000

In [8]:
sum(x * x for x in range(1, 1000001) if is_prime(x))

24693298341834533

## 8.4 Batteries included for iteration

### 8.4.1 itertools

* `chain()`
* `islice()`
* `count()`
* many more!

In [9]:
from itertools import islice, count

islice(all_primes, 1000)

NameError: name 'all_primes' is not defined

In [11]:
thousand_primes = islice((x for x in count() if is_prime(x)), 1000)
thousand_primes

<itertools.islice at 0x7f9a7a975d68>

In [12]:
list(thousand_primes)

[2,
 3,
 5,
 7,
 11,
 13,
 17,
 19,
 23,
 29,
 31,
 37,
 41,
 43,
 47,
 53,
 59,
 61,
 67,
 71,
 73,
 79,
 83,
 89,
 97,
 101,
 103,
 107,
 109,
 113,
 127,
 131,
 137,
 139,
 149,
 151,
 157,
 163,
 167,
 173,
 179,
 181,
 191,
 193,
 197,
 199,
 211,
 223,
 227,
 229,
 233,
 239,
 241,
 251,
 257,
 263,
 269,
 271,
 277,
 281,
 283,
 293,
 307,
 311,
 313,
 317,
 331,
 337,
 347,
 349,
 353,
 359,
 367,
 373,
 379,
 383,
 389,
 397,
 401,
 409,
 419,
 421,
 431,
 433,
 439,
 443,
 449,
 457,
 461,
 463,
 467,
 479,
 487,
 491,
 499,
 503,
 509,
 521,
 523,
 541,
 547,
 557,
 563,
 569,
 571,
 577,
 587,
 593,
 599,
 601,
 607,
 613,
 617,
 619,
 631,
 641,
 643,
 647,
 653,
 659,
 661,
 673,
 677,
 683,
 691,
 701,
 709,
 719,
 727,
 733,
 739,
 743,
 751,
 757,
 761,
 769,
 773,
 787,
 797,
 809,
 811,
 821,
 823,
 827,
 829,
 839,
 853,
 857,
 859,
 863,
 877,
 881,
 883,
 887,
 907,
 911,
 919,
 929,
 937,
 941,
 947,
 953,
 967,
 971,
 977,
 983,
 991,
 997,
 1009,
 1013,
 1019,


In [13]:
sum(thousand_primes)

0

In [14]:
# Have to re-create the generator object since it has been used once.
sum(islice((x for x in count() if is_prime(x)), 1000))

3682913

### 8.4.2 Built-in tools

(1) `any()`, `all()`

In [15]:
any([False, False, True])

True

In [16]:
all([False, False, True])

False

In [17]:
any(is_prime(x) for x in range(1328, 1361))

False

In [18]:
all(name == name.title() for name in ['London', 'New York', 'Sydney'])

True

(2) `zip()`

In [23]:
sunday = [12, 14, 15, 15, 17, 21, 22, 22, 23, 22, 20, 18]
monday = [13, 14, 14, 14, 16, 20, 21, 22, 22, 21, 19, 17]

# Generate a tuple via zip.
for item in zip(sunday, monday):
    print(item)

(12, 13)
(14, 14)
(15, 14)
(15, 14)
(17, 16)
(21, 20)
(22, 21)
(22, 22)
(23, 22)
(22, 21)
(20, 19)
(18, 17)


In [26]:
# Tuple unpack
for sun, mon in zip(sunday, monday):
    print("average =", (sun + mon) / 2)

average = 12.5
average = 14.0
average = 14.5
average = 14.5
average = 16.5
average = 20.5
average = 21.5
average = 22.0
average = 22.5
average = 21.5
average = 19.5
average = 17.5


In [27]:
tuesday = [2, 2, 3, 7, 9, 10, 11, 12, 10, 9, 8, 8]
for temps in zip(sunday, monday, tuesday):
    print("min = {:4.1f}, max = {:4.1f}, average = {:4.1f}"\
          .format(min(temps), max(temps), sum(temps) / len(temps)))

min =  2.0, max = 13.0, average =  9.0
min =  2.0, max = 14.0, average = 10.0
min =  3.0, max = 15.0, average = 10.7
min =  7.0, max = 15.0, average = 12.0
min =  9.0, max = 17.0, average = 14.0
min = 10.0, max = 21.0, average = 17.0
min = 11.0, max = 22.0, average = 18.0
min = 12.0, max = 22.0, average = 18.7
min = 10.0, max = 23.0, average = 18.3
min =  9.0, max = 22.0, average = 17.3
min =  8.0, max = 20.0, average = 15.7
min =  8.0, max = 18.0, average = 14.3


In [29]:
# Lazy concatenation
from itertools import chain
temperatures = chain(sunday, monday, tuesday)

all(t > 0 for t in temperatures)

True

In [32]:
for x in (p for p in lucas() if is_prime(p)):
    print(x)

2
3
7
11
29
47
199
521
2207
3571
9349
3010349
54018521
370248451
6643838879
119218851371
5600748293801


KeyboardInterrupt: 

(3) `sum()`, `min()`, `max()`, `enumerate()`