[Iterators, Generators, module `itertools`]

<h1 style="background:#DDEEEE;padding: 15px;">Iterators</h1>

https://dbader.org/blog/python-iterators

## `iter()`

#### Syntax 1: Use the `iter(<iterator>)` function

In [19]:
iterator = iter([1, 2, 3])
while True:
    try:
        val = next(iterator)
    except StopIteration:
        break
    print(val)

1
2
3


In [20]:
iterator = iter([i for i in range(3)])     # with a list
while True:
    try:
        val = next(iterator)
    except StopIteration:
        break
    print(val)

0
1
2


In [21]:
iterator = iter((i for i in range(3)))     # with a set
while True:
    try:
        val = next(iterator)
    except StopIteration:
        break
    print(val)

0
1
2


#### Syntax 2: Use parentheses`()`

In [22]:
# SAME syntax as list comprehensions but with parentheses -> iterator
iterator = (i for i in range(5))
while True:
    try:
        val = next(iterator)
    except StopIteration:
        break
    print(val)

0
1
2
3
4


#### Read big data files

In [23]:
def count_capitals_in_file(filename):
    ''' Counts the number of capital letters in a file.'''
    n_capitals = 0
    iterator = iter(open(filename))
    
    while True:
        try:
            line = next(iterator)
        except StopIteration:
            break
            
        for char in line:
            if char.isupper():
                n_capitals += 1
    
    return n_capitals

count_capitals_in_file('test_file.txt')

39

#### Profiling runtime example 1

In [2]:
def grouper(inputs, n):
    '''
    Split inputs into groups of length n.
    The length of inputs is divisible by n.
    
    >>> grouper([1, 2, 3, 4, 5, 6], 2)
    [(1, 2), (3, 4), (5, 6)]
    
    >>> grouper([1, 2, 3, 4, 5, 6], 3)
    [(1, 2, 3), (4, 5, 6)]
    '''
    groups = []
    iterator = iter(inputs)
    c = True
    
    while c is True:
        group = list()
        
        for i in range(n):
            try:
                val = next(iterator)
            except StopIteration:
                c = False
                break    
            group.append(val)
            
        if len(group) > 0:
            groups.append(tuple(group))
    
    return groups

print(grouper([1, 2, 3, 4, 5, 6], 2))
print(grouper([1, 2, 3, 4, 5, 6], 3))
print(grouper([1, 2, 3, 4, 5], 3))

[(1, 2), (3, 4), (5, 6)]
[(1, 2, 3), (4, 5, 6)]
[(1, 2, 3), (4, 5)]


#### Profiling runtime example 2

In [5]:
import numpy as np
from time import perf_counter

lst = np.random.randn(5000)
tic = perf_counter()
grouper(lst, 2)
toc = perf_counter()
print(f'Timer for 5000  : {toc - tic:0.4f} seconds.')

lst = np.random.randn(50000)
tic = perf_counter()
grouper(lst, 2)
toc = perf_counter()
print(f'Timer for 50000 : {toc - tic:0.4f} seconds.')

lst = np.random.randn(100000)
tic = perf_counter()
grouper(lst, 2)
toc = perf_counter()
print(f'Timer for 100000: {toc - tic:0.4f} seconds.')

Timer for 5000  : 0.0038 seconds.
Timer for 50000 : 0.0338 seconds.
Timer for 100000: 0.0606 seconds.


#### Profiling space usage

In [58]:
import sys
lst = [i for i in range(1, 10010000)]
sys.getsizeof(lst)

81528064

In [56]:
# -- WHAT IT DOES:
iterator = iter([i for i in range(1, 10010000)])
next(iterator)  # then calls it over and over

1

<h1 style="background:#DDEEEE;padding: 15px;">Generators</h1>

https://realpython.com/introduction-to-python-generators/ (tuto)   
https://realpython.com/courses/python-generators/ (vidéo)

## Generator expressions (like comprehensions)

In [85]:
# Same syntax as list comprehensions but with parentheses 
# -> returns an iterator

csv_gen = (row for row in open('test_file.txt', 'r'))

row_count = 0
for row in csv_gen:
    row_count += 1
    
print(f'Row count is {row_count}.')

Row count is 8.


In [83]:
# Same with a function:
# Open a file, loop through each line and yield each row instead of returning it

def csv_reader(filename):
    for row in open(filename, 'r'):
        yield row

row_count = 0
for row in csv_reader('test_file.txt'):
    row_count += 1
    
print(f'Row count is {row_count}.')

Row count is 8.


## Generator functions

- Same as any function except use `yield` instead of `return`
  - with `return`, you exit the function afterward
  - with `yield`, you remember the state of the function
    - we call `next()`: the previously yielded variable is incremented and yielded again

#### Example 1: Sequences

In [78]:
def countdown(start):
    while start > 0:
        yield start
        start -= 1
    yield "BLASTOFF!"
    
for count in countdown(3):
    print(count)

3
2
1
BLASTOFF!


In [74]:
def evens_up_to(ceiling):
    num = 2
    while num <= ceiling:
        yield num
        num += 2

evens = evens_up_to(8)
print('type: ', type(evens))

for even in evens:
    print(even)

type:  <class 'generator'>
2
4
6
8


#### Example 2: Reading files

In [80]:
def count_capitals_in_file(filename):
    '''
    Count the number of capital letters in a file.

    >>> count_capitals_in_file('test_file.txt')
    39
    '''
    n_capitals = 0
    iterator = iter(open(filename, 'r'))

    while True:
        try:
            line = next(iterator)
        except StopIteration:
            break
            
        for char in line:
            if char.isupper():
                n_capitals += 1

    return n_capitals

**SOURCE:** https://realpython.com/lessons/creating-data-pipelines/
dictionary comprehension:
```
dict_comprehension = {key:val for key, val in enumerate('sample')}
```

In [9]:
filename = 'generators_csv_file.csv'

# yield each line in the file
lines = (line for line in open(filename))

# iterate through the 'lines' generator from within the 'line_cols' generator
# = split each line into a list of values
list_line_values = (s.rstrip().split(',') for s in lines)

# first line in the file: list of column names
col_names = next(list_line_values)

# iterate through file lines,
# create dictionaries and unite them with zip():
#  - 'col_names': the list of columns names
#  - 'data'     : the list of values in each line
company_dicts = (dict(zip(col_names, data)) for data in list_line_values)

# iterate through the 'company_dicts' dictionary,
# take the 'raisedAmt' for any 'company_dict' where 'round' == 'a'
# = get each company's series A funding amount, filter out any other raised amount
funding = (
    int(company_dict['raisedAmt'])
    for company_dict in company_dicts
    if company_dict['round'] == 'a'
)

#-- NOW we begin the iteration process with sum(): iterate through the generators
total_series_a = sum(funding)

print(f'Total series A fundraising: ${total_series_a}')

Total series A fundraising: $18500000


#### Example 3: List all numeric palindromes

In [4]:
def is_palindrome(num):
    # Skip single-digit inputs
    if (num == 0) or (num // 10 == 0):
        return False
    
    temp = num
    reversed_num = 0
    
    while temp != 0:
        # num = 121:
        #  - reversed_num  =  0 -> 1 -> 12 -> 121
        #  - temp          =  121 -> 12 -> 1 -> 0
        #  temp % 10       =>  get last digit in the 'num' sequence
        #  temp // 10      =>  remove last digit in the 'num' sequence
        reversed_num = (reversed_num * 10) + (temp % 10)
        temp = temp // 10
    
    if num == reversed_num:
        return num
    else:
        return False

In [5]:
def infinite_sequence():
    num = 0
    while True:
        yield num
        num += 1

In [10]:
i, count = 0, 10   # 'count' first numeric palindromes

for num in infinite_sequence():
    pal = is_palindrome(num)
    if pal:
        i += 1
        print(pal, end=' ')
    if i == count:
        break

11 22 33 44 55 66 77 88 99 101 

## Profiling performance

In [11]:
import sys
nums_squared_lc = [i * 2 for i in range(10000)]
print(sys.getsizeof(nums_squared_lc))
nums_squared_gc = (i * 2 for i in range(10000))
print(sys.getsizeof(nums_squared_gc))

87632
128


- If list comprehension fits in memory, it is faster to evaluate

In [17]:
import cProfile
cProfile.run('sum([i * 2 for i in range(10000)])')
cProfile.run('sum((i * 2 for i in range(10000)))')

         5 function calls in 0.001 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 <string>:1(<listcomp>)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.001    0.001 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


         10005 function calls in 0.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    10001    0.001    0.000    0.001    0.000 <string>:1(<genexpr>)
        1    0.000    0.000    0.002    0.002 <string>:1(<module>)
        1    0.000    0.000    0.002    0.002 {built-in method builtins.exec}
        1    0.001    0.001    0.002    0.002 {built-in method builtins.sum}
        1    0.000    0.00

## Advanced generator methods

https://realpython.com/lessons/advanced-generator-methods/   

- `.send()`
- `.throw()`: handle exceptions
- `.close()`: stop the generator after a given amount

<h1 style="background:#DDEEEE;padding: 15px;">yield statement</h1>

https://realpython.com/introduction-to-python-generators/#understanding-the-python-yield-statement

- When you call methods such as `next()` on a generator:
  - the code within the generatof function is executed up to `yield`
  
  
- When a program sees `yield`:
  - suspend function execution 
  - return yielded value
  - save the state of the function, including:
    - variable bindings, instruction pointer, internal stack, any exception handling
    
    
- So we can resume function execution whenever we want:
  - function execution picks back up right after `yield`
  
  
- Like all iterators, generators can be exhausted -> `StopIteration` exception

In [5]:
def multi_yield():
    txt = "Print string #1"
    yield txt
    txt = "Print string #2"
    yield txt

it = multi_yield()
while True:
    try:
        txt = next(it)
    except StopIteration:
        break
    print(txt)

Print string #1
Print string #2


<h1 style="background:#DDEEEE;padding: 15px;">Module itertools</h1>

https://realpython.com/python-itertools/

## `count(start=0, step=0)`
Same as `range()` but infinite

In [1]:
def evens():
    '''Generate even integers, starting with 0.'''
    n = 0
    while True:
        yield n
        n += 2

evens = evens()
list(next(evens) for _ in range(5))

[0, 2, 4, 6, 8]

In [9]:
from itertools import count   # count(start=0, step=1)

evens = count(step=2)
print(list(next(evens) for _ in range(5)))

odds = count(start=1, step=2)
print(list(next(odds) for _ in range(5)))

quarters = count(start=-1, step=0.25)
print(list(next(quarters) for _ in range(10)))

[0, 2, 4, 6, 8]
[1, 3, 5, 7, 9]
[-1, -0.75, -0.5, -0.25, 0.0, 0.25, 0.5, 0.75, 1.0, 1.25]


#### Emulate the behavior of the built-in `enumerate()` function:

In [11]:
list(zip([1, 2, 3], ['a', 'b', 'c']))

[(1, 'a'), (2, 'b'), (3, 'c')]

In [10]:
list(zip(count(), ['a', 'b', 'c']))

[(0, 'a'), (1, 'b'), (2, 'c')]

## `cycle(<iterable>)`
Infinitely cycles through an iterable

In [21]:
from itertools import cycle
alternating_ones = cycle([1, -1])
print(next(alternating_ones))
print(next(alternating_ones))
print(next(alternating_ones))
print(next(alternating_ones))

1
-1
1
-1


## `repeat(elem[, times])`
Repeats a given `elem` `times` times

In [72]:
from itertools import repeat

all_twos = repeat(2)
print(next(all_twos))

all_threes = list(repeat(3, times=5))
print(all_threes)

2
[3, 3, 3, 3, 3]


In [29]:
# List powers of 2 from 0 to 4
print(list(map(pow, range(5), repeat(2))))

# same as:
print(list(map(pow, [0, 1, 2, 3, 4], [2, 2, 2, 2, 2])))

[0, 1, 4, 9, 16]
[0, 1, 4, 9, 16]


## `accumulate(iterable[, func, *, initial=None])`
- Crée un itérateur qui renvoie les résultats cumulés de `func`
- Si `func` est renseigné, il doit être une fonction à deux arguments
- Les éléments de `iterable` doivent être d'un type acceptable comme argument de `func`

In [48]:
from itertools import accumulate
from operator import mul

lst = [1, 2, 3, 4, 5, 6]
print(list(accumulate(lst, mul)))
print(list(accumulate(lst, lambda x, y: x + y)))

[1, 2, 6, 24, 120, 720]
[1, 3, 6, 10, 15, 21]


## `islice(iterable, start, stop[, step])`
- Crée un itérateur qui renvoie les éléments sélectionnés de `iterable`

In [59]:
from itertools import islice
print(list(islice('ABCDEF', 2)))
print(list(islice('ABCDEF', 2, 5)))
print(list(islice('ABCDEF', 1, None, 2)))

['A', 'B']
['C', 'D', 'E']
['B', 'D', 'F']


## `permutations(iterable, r=None)`
- Finds all permutations (order matters), of `r` elements
- Default `r`: length of `iterable`

In [73]:
from itertools import permutations
lst = ['a', 'b', 'c']

print('Permutations of 3: ')
print(list(permutations(lst)))

print('\nPermutations of 2: ')
print(list(permutations(lst, 2)))

print('\nr > len(iterable): ')
print(list(permutations(lst, 4)))

Permutations of 3: 
[('a', 'b', 'c'), ('a', 'c', 'b'), ('b', 'a', 'c'), ('b', 'c', 'a'), ('c', 'a', 'b'), ('c', 'b', 'a')]

Permutations of 2: 
[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]

r > len(iterable): 
[]


## `combinations(iterable, r)`
- Finds all combinations (order does not matter), of `r` elements
- Default `r`: length of `iterable`

In [71]:
from itertools import combinations_with_replacement
lst = ['Mo', 'Sa', 'No']
print(list(combinations_with_replacement(lst, r=2)))

[('Mo', 'Mo'), ('Mo', 'Sa'), ('Mo', 'No'), ('Sa', 'Sa'), ('Sa', 'No'), ('No', 'No')]


## `combinations_with_replacement(iterable, r)`
- Individual elements can be repeated

In [71]:
from itertools import combinations_with_replacement
lst = ['Mo', 'Sa', 'No']
print(list(combinations_with_replacement(lst, r=2)))

[('Mo', 'Mo'), ('Mo', 'Sa'), ('Mo', 'No'), ('Sa', 'Sa'), ('Sa', 'No'), ('No', 'No')]


In [None]:
# Timing on terminal command:
# $ time -f "Memory used (kB): %M\nUser time (seconds): %U" python3 naive.py