# Generators
## Simple generator examlpe

Generator is a function which returns a generator iterator. 

Generator uses ```yield```, which  temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements) when pass to ```next```

In [659]:
def simpleGeneratorFun(): 
    yield 1
    yield 2
    yield 3

In [660]:
g = simpleGeneratorFun()

In [661]:
g

<generator object simpleGeneratorFun at 0x1123d0f48>

In [662]:
next(g)

1

In [663]:
next(g)

2

In [664]:
next(g)

3

In [665]:
next(g)

StopIteration: 

Generator iterators can be used in ```for```

In [666]:
g = simpleGeneratorFun()
for x in g:
    print(x)

1
2
3


But is can be consumed once

In [667]:
for x in g:
    print(x)

Generator iterators can be convert into ```list```

In [668]:
g = simpleGeneratorFun()
list(g)

[1, 2, 3]

After consumed, nothing will be retrieved

In [669]:
list(g)

[]

Generator above is equivalent with the belowing class

In [670]:
class EquivalentSimpleGenerator:
    def __init__(self):
        self.i = 1
        
    def __next__(self): # support next()
        if self.i <= 3:
            result = self.i
            self.i += 1
            return result
        else:
            raise StopIteration
            
    def __iter__(self): # support list()
        return self

In [671]:
g = EquivalentSimpleGenerator()
for x in g:
    print(x)

1
2
3


In [672]:
g = EquivalentSimpleGenerator()
list(g)

[1, 2, 3]

In [673]:
list(g)

[]

In [674]:
g = EquivalentSimpleGenerator()
g

<__main__.EquivalentSimpleGenerator at 0x1123fcda0>

In [675]:
next(g)

1

In [676]:
next(g)

2

In [677]:
next(g)

3

In [678]:
next(g)

StopIteration: 

In [679]:
g = simpleGeneratorFun()
for x in g:
    print(x)

1
2
3


In [680]:
g = simpleGeneratorFun()
list(g)

[1, 2, 3]

```generator``` is a short-cutting method to define and create an generator iterator object. Don't mix it with normal function call.

## Another example

In [681]:
def generator_generator_function():
    i = 0
    while True:
        if i < 10:
            yield i
            i += 1
        else:
            break

In [682]:
g = generator_generator_function()

In [683]:
for x in g:
    print(x)

0
1
2
3
4
5
6
7
8
9


In [684]:
class EquvalentGenerator:
    def __init__(self):
        self.i = 0
        
    def __next__(self):
        if self.i < 10:
            result = self.i
            self.i += 1
            return result
        else:
             raise StopIteration
                
    def __iter__(self):
        return self
        

In [685]:
g = EquvalentGenerator()
list(g)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [686]:
g = EquvalentGenerator()
for x in g:
    print(x)

0
1
2
3
4
5
6
7
8
9


## Generator comprehension

list comprehension

In [687]:
iterator = [x for x in range(4)]
list(iterator)

[0, 1, 2, 3]

In [688]:
list(iterator)

[0, 1, 2, 3]

generator comprehension

In [689]:
iterator = (x for x in range(4))
iterator

<generator object <genexpr> at 0x1123d0390>

In [690]:
list(iterator)

[0, 1, 2, 3]

In [691]:
list(iterator)

[]

## Why using generator iterator?

Generator iterators are lazy and thus produce items one at a time and only when asked. 
So they are much more memory efficient when dealing with large datasets.

But generator iterators can be consumed only-once

# itertools

Most itertools return generator iterator

## Why use itertools?

* Use itertools can save a lot of for loop code
* Force dividing iteration pattern and computation up
* make intent more explicit

In [692]:
from itertools import count, cycle, repeat, accumulate, chain, compress, dropwhile, filterfalse, groupby, islice, starmap, takewhile, tee, zip_longest, product, permutations, combinations, combinations_with_replacement

## Infinite iterators

### count

In [693]:
for i, x in enumerate(count(10)):
    print(x)
    if i==5:
        break

10
11
12
13
14
15


### cycle

In [694]:
for i, x in enumerate(cycle([1,2,3])):
    print(x)
    if i == 5:
        break

1
2
3
1
2
3


### repeat

In [695]:
for i, x in enumerate(repeat(1, 10)):
    print(x)

1
1
1
1
1
1
1
1
1
1


## Iterables function

### chain
link several iterables

In [696]:
for i , x in enumerate(chain([1,2,3], [11,22,33], [44, 55])):
    print(x)

1
2
3
11
22
33
44
55


### chain.from_iterable

In [697]:
for i , x in enumerate(chain.from_iterable([[1,2,3], [11,22,33], [44, 55]])):
    print(x)

1
2
3
11
22
33
44
55


### compress
select by indicator

In [698]:
list(compress(['A', 'B', 'C'], [0, 1, 1]))

['B', 'C']

### filterfalse

In [699]:
def is_uppercase(s):
    return s.upper() == s

### filterfalse

In [700]:
list(filterfalse(is_uppercase, ['A', 'B', 'c', 'C', 'D', 'e']))

['c', 'e']

compare with filter

In [701]:
list(filter(is_uppercase, ['A', 'B', 'c', 'C', 'D', 'e']))

['A', 'B', 'C', 'D']

### takewhile
"break" while condition is meet

In [702]:
list(takewhile(is_uppercase, ['A', 'B', 'c', 'C', 'D']))

['A', 'B']

### dropwhile
stop dropping when condition is not meet. Condition checking is turned down later.

In [703]:
list(dropwhile(is_uppercase, ['A', 'B', 'c', 'd', 'C', 'D']))

['c', 'd', 'C', 'D']

### groupby

注意， 并不是全局的groupby，如果需要groupby， 应该先做sort

In [704]:
data = [
    'apple',
    'bed',
    'apart',
    'bird'
]
list(groupby(data, key=lambda s:s[0]))

[('a', <itertools._grouper at 0x1122929e8>),
 ('b', <itertools._grouper at 0x112292160>),
 ('a', <itertools._grouper at 0x112292e48>),
 ('b', <itertools._grouper at 0x112292f98>)]

In [705]:
data = [
    'apple',
    'apart',
    'bird',
    'bed',
    'birth'
]
list(groupby(data, key=lambda s:s[0]))

[('a', <itertools._grouper at 0x112292cf8>),
 ('b', <itertools._grouper at 0x1122927f0>)]

### islice

In [706]:
data = [
    'apple',
    'apart',
    'bird',
    'bed',
    'birth'
]
list(islice(['a', 'd', 'e', 'f', 'g'], 2, None))

['e', 'f', 'g']

### map and starmap

In [707]:
import operator as op

In [708]:
data = [
    (1, 2),
    (3, 4),
    (5, 6)
]

list(starmap(op.mul, data))

[2, 12, 30]

In [709]:
data = [1,2,3]
list(map(lambda x:x**2, data))

[1, 4, 9]

### zip longest

In [710]:
list(zip_longest([1,2,3], [2,3], fillvalue=0))

[(1, 2), (2, 3), (3, 0)]

compare with zip (shortest)

In [711]:
list(zip([1,2,3], [2,3]))

[(1, 2), (2, 3)]

### accumulate

In [712]:
list(
    accumulate([1,2,3,4,5], lambda x, y: x + y)
)

[1, 3, 6, 10, 15]

In [713]:
list(
    accumulate([1,2,3,4,5], lambda x, y: x * y)
)

[1, 2, 6, 24, 120]

## tee

```optimized copying``` iterable several times

In [714]:
a = [1,2,3,4,5,6,7,8]
b, c = tee(a, 2)

In [715]:
list(b)

[1, 2, 3, 4, 5, 6, 7, 8]

In [716]:
list(b) # b is consumed

[]

In [717]:
list(c)

[1, 2, 3, 4, 5, 6, 7, 8]

## reduce
While not imported from ```itertools```, ```reduece``` is often used together with itertools

In [718]:
from functools import reduce

In [719]:
reduce(lambda x, y: x + y, [1, 2, 3, 4])

10

compare with accumulate

In [720]:
list(accumulate([1, 2, 3, 4], lambda x, y: x + y))

[1, 3, 6, 10]

An memory efficient and concise way to find person with largest age

In [721]:
class Person:
    def __init__(self, age):
        self.age = age

In [722]:
people = (Person(age) for age in [32, 34, 29, 27, 31, 37, 18, 29])

In [723]:
def get_elder(p1, p2):
    return p1 if p1.age >= p2.age else p2

p_eldest = reduce(get_elder, people)
p_eldest.age

37

## Pipeline style

In [724]:
from functools import reduce

In [725]:
x = [1, 2, 3, 4]

squared = map(lambda x:x**2, x)

filtered = filter(lambda x:x > 4, squared)

reduce(lambda x, y: x + y, filtered)

25

For pipeline style, use pyfunctional

In [726]:
# b is consumed

from functional import seq

(
seq(1, 2, 3, 4)\
    .map(lambda x: x ** 2)\
    .filter(lambda x: x > 4)\
    .reduce(lambda x, y: x + y)
)

25

see more in documentation

## Combinatoric iterators:

In [727]:
list(product(['a', 'b', 'c'], [1,2,3]))

[('a', 1),
 ('a', 2),
 ('a', 3),
 ('b', 1),
 ('b', 2),
 ('b', 3),
 ('c', 1),
 ('c', 2),
 ('c', 3)]

In [728]:
list(permutations(['a', 'b', 'c'], 2))

[('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]

In [729]:
list(combinations(['a', 'b', 'c'], 2))

[('a', 'b'), ('a', 'c'), ('b', 'c')]

In [730]:
list(combinations_with_replacement(['a', 'b', 'c'], 2))

[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'b'), ('b', 'c'), ('c', 'c')]