# CH 14

## TOC<a id='toc'></a>
* [Ch14 Notes](#ch14_notes)

### CH14 Notes <a id='ch14_notes'></a>
[toc](#toc)
### Iterables, Iterators and Generators

* The **yield** keyword was added in python 2.2, which allows the construction of generators, which work as iterators.
* *Every generator is an iterator*: generators fully implement the iterator interface. But an iterator (as defined in the GoF book) retrieves iterms from a collection, while a generator can produce iterms "out of thin air".
    - however, python community treats iterator and generator as synomyms most of the time.
* Every collection in python is **iterable**, and iterators are used internally to support:
    - for loops
    - collection types constructions and extensions
    - looping over text files line by line
    - list, dict, and set comprehensions
    - typle unpacking
    - unpacking actual parameters with * in function call

## Sentence Take #1: A Sequence of Words
* when an interpreter needs to iterate over and object x, it automatically calls `iter(x)`. It
    1. checks whether object implements `__tier__`, and calls that to obtain an iterator
    2. if `__iter__` not implemented, but `__getitem__` is, python creates and iterator that attemps to fetch items in order starting from index 0 (why all sequences are iterable)
    3. If that fails, python raises TypeError
* from the sense of goose typing, and object is consideret iterable if it implements `__iter__` method. 
    - no subclassing or registration required because abc.Iterable implements the sibclasshook.
* but because of the getitem fallback, as of python 3.4, most accurate way to check if object is iterable is to call iter on it and handle TyperError exception.

* **iterable**: an object from which the iter function can obtain an iteratir. 
    - objects implementing an `__iter__` method returning an *iterator* are itertable
* **iterator**: ...?
        - from python docs - "An object representing a stream of data. can call next on it. Will raise StopIteration when exhausted. (and every subsequent call to next)
        - it is required to implement an `__iter__` function and return self, so that it can also be used as an iterable.

* The standard interface for an interator has two methods:
   * `__next__`: returns next available item, raising `StopIteration` when there are no more items
   * `__iter__`: Returns self; this allows the iterators to be used where an iterable is expected - for example in a for loop

```
class Iterator(Iterable):
    __slots__ = ()
    
    @abstractmethod
    def __next__(self):
        'Return the next item from the iterator. When exhausted raise StopIteration'
        raise StopIteration
    
    def __iter__(self):
        return self
        
    @classmethod
    def __subclasshook__(cls, C):
        if cls is Iterator:
            if (any("__next__" in B.__dict__ for B in C.__mro__) and any("__iter__" in B.__dict__ for B in C.__mro__)):
                return True
        return NotImplemented
```

From python source code: "Iterators in python aren't a matter of type but of protocol. ... Don't check the type! Use hasattr to check for both "`__iter__`" and "`__next__`" attributes instead.

There is no way to "reset" and iterator. If you need to start over, you need to call iter(...) on the iterable that built it.
    - this wont work f you built it from an iterator because it just returns self

## Sentence Take # 2: A Classic Interior
* built according to classic iterator design pattern following GoF (not idiomatic python)
* This sentence is *iterable* because it implements the `__iter__` special method, which returns a `SentenceIterator`
* writes two classes: `Sentence`  which is built with some text, and `SentenceIterator` built with a list of words (returned by sentence `__iter__`), and contains an index which gets advanced at every call of next(..)


* all iterators are iterables, but not all iterables are iterators
* it is tempting to implement `__next__` in addition to `__iter__` in Sentence, but that is a terrible idea. It is a common *anti-pattern* accordin to Alex Martelli
    - you need to be able to support multiple traversals
    - that is, you must be able to support multiple independent iterators, from same iterable instance; each with its own internal state.

## Sentence Take #3: A generator Function

```
class Sentence:

    def __init__(self, text)::
        self.text = text
        self.words = RE_WORD.findall(text)
        
    def __iter___(self):
        for word in self.words:
            yield word
        return
```

* The last return is not needed, function can just fall through.
* generator function does not need to raise StopIteration, it simply exits.

### How a Generator Function Works

* a **generator function** is apython function that has the word *yield* in its body. When called, such a function returns a **generator object**.
    - so a generator function is a generator factory
    - generators are iterators that produce the value passed to *yield*

In [1]:
def gen_123():
    yield 1
    yield 2
    yield 3

In [4]:
gen_obj = gen_123()

In [6]:
next(gen_obj), next(gen_obj), next(gen_obj)

(1, 2, 3)

In [7]:
next(gen_obj)

StopIteration: 

In [8]:
def gen_123():
    yield 1
    yield 2
    return 4
    yield 3

In [9]:
gen_obj = gen_123()

In [12]:
next(gen_obj)

StopIteration: 4

A *generator function* builds a *generator object* that wraps the body of the function. When we invoke `next()` on the generator object, execution advances to the next *yield* statement in the function, and the next() call evaluates to the value yielded when the body of the function is suspended. Finally, when the function body *returns*, the enclosing generator object raises `StopIteration` in accordance with the Iterator protocol.
    - can return value - it is given to the StopIteration object. (mostly used when using generators a coroutines)

* Improvements
    1. Iterators designed to be lazy - instead of creating list and using for loop, can just do a for loop a re.finditer() and yield result.
    2. can actually replace that by a generator expression using the finditer

### generator expression
* istead of 
```
def __iter__(self):
    for match in RE_WORD.finditer(self.text):
        yield match.group()
```

just do

```
def __iter__(self):
    return ( match.group() for match in RE_WORD.finditer(self.text))
```

the above is a generator function. The below is a regular function that returns a generator object (so while not a generator function, it is still a generator factory). So in both cases, calling iter on object returns a generator.

* generator expressions are the lazy version of list comprehension. Saves memory, and often saves processing. Because a user might stop evaluation early on in the process.

* syntax tip: when calling function with single argument being a generator expression, can drop the paren, and only use the function paren.

* generators not only useful for traversal, they can aldo be use to produce objects on the fly.

### Ready to use generators
* itertools has 19 generator functions that can be combined in a variety of interesting ways.
    - `itertools.count` returns a generator that produces numbers that never stops (can provide start and step)
    - `itertools.takewhile` generator that consumes another generator and stops when a given  predicate evals to False
* other
    - `os.walk` yields filenames while traversing a directory tree - so can do recursive file system search with simple for loop
    - filtering:
        * itertools.compress(it, selector_it)
        * itertool.dropwhile(predicated, it) - once hits false, stops dropping and doesn't check again
        * (builtin) filter(predicate, it) - only lets pass if predicate true; if predicate none, it is basically bool(arg)
        * itertools.isslice(it, stop) or (it, start, stop, step=1)
        * itertools.takewhile(predicate, it)
    - mapping generators
        * itertools.accumulate(it, [func])
            - like reduce but returns rolling value; if no func it sums
        * (builtin) enumerate(iterable, start=0)
        * (builtin) map( func, it1, [it2,...,itN]) -  if n iterables, all passed consumed in parallel by func
        * itertools.starmap( func, it) - it should yield iterable item `iit`, and `func(*iit)` applied.
    - merging generators
        - itertools.chain(it1, ..., itN) - yield them in order
        - itertools.chain.from_iterables(it) - it should be itrable of iterables
        - itetools.produce(it1,..itN, repeat = 1) - cartesian product (like nested for loops)
            * repeat, basically like repeating inputs
        - (built-in) zip(it1,..,itN) - stops once smalles one stops
        - itertools.zip_longest(it1, .., itN, fillvalue=None) - as expected
    - expanding generators
        - itertools.combinations(it, out_len)
        - itertools.combinations_with_replacement(it, out_len)
        - itertools.count(start=0, step=1) - never ends
        - itertools.cycle(it) - just repeats in order, over and over
        - itertools.permutations(it, out_len=None) - default len is len of list
        - itertools.repeat(item, [times]) - if not times, indefinitely
    - grouping iterators
        - itertools.groupby(it, key=None) - requires sorted by key, or at the very least clustered
        - itertools.tee(it, n=2) - yields tuple of n independent generators, each yielding items independently
        - reversed(seq) - takes a seq, returns iterator in reversed order
    - Iterator reducing functions (not iteratoor producing)
        - functools.reduce(func, it, [initial])
            * builtin versions: any(), all(), max(), min(), sum()

In [15]:
list(itertools.product('AB', 'CD', repeat=2))

[('A', 'C', 'A', 'C'),
 ('A', 'C', 'A', 'D'),
 ('A', 'C', 'B', 'C'),
 ('A', 'C', 'B', 'D'),
 ('A', 'D', 'A', 'C'),
 ('A', 'D', 'A', 'D'),
 ('A', 'D', 'B', 'C'),
 ('A', 'D', 'B', 'D'),
 ('B', 'C', 'A', 'C'),
 ('B', 'C', 'A', 'D'),
 ('B', 'C', 'B', 'C'),
 ('B', 'C', 'B', 'D'),
 ('B', 'D', 'A', 'C'),
 ('B', 'D', 'A', 'D'),
 ('B', 'D', 'B', 'C'),
 ('B', 'D', 'B', 'D')]

In [16]:
list(itertools.product('AB', 'CD', 'AB', 'CD'))

[('A', 'C', 'A', 'C'),
 ('A', 'C', 'A', 'D'),
 ('A', 'C', 'B', 'C'),
 ('A', 'C', 'B', 'D'),
 ('A', 'D', 'A', 'C'),
 ('A', 'D', 'A', 'D'),
 ('A', 'D', 'B', 'C'),
 ('A', 'D', 'B', 'D'),
 ('B', 'C', 'A', 'C'),
 ('B', 'C', 'A', 'D'),
 ('B', 'C', 'B', 'C'),
 ('B', 'C', 'B', 'D'),
 ('B', 'D', 'A', 'C'),
 ('B', 'D', 'A', 'D'),
 ('B', 'D', 'B', 'C'),
 ('B', 'D', 'B', 'D')]

Cool examples:

In [2]:
import itertools

cumsum, running max, min

In [7]:
nums = [1,2,0,2, 15,-1,0,7]

In [11]:
list(itertools.accumulate(nums)), list(itertools.accumulate(nums, min)), list(itertools.accumulate(nums, max))

([1, 3, 3, 5, 20, 19, 19, 26],
 [1, 1, 0, 0, 0, -1, -1, -1],
 [1, 2, 2, 2, 15, 15, 15, 15])

running average

In [6]:
list(itertools.starmap(lambda a,b: b/a, enumerate(itertools.accumulate([1,3,5,9,10,13]),start=1)))

[1.0, 2.0, 3.0, 4.5, 5.6, 6.833333333333333]

combinations, cycles, permutations

In [19]:
list(itertools.combinations('ABCDE', r=3))

[('A', 'B', 'C'),
 ('A', 'B', 'D'),
 ('A', 'B', 'E'),
 ('A', 'C', 'D'),
 ('A', 'C', 'E'),
 ('A', 'D', 'E'),
 ('B', 'C', 'D'),
 ('B', 'C', 'E'),
 ('B', 'D', 'E'),
 ('C', 'D', 'E')]

In [21]:
list(itertools.permutations('ABCDE', r=3))

[('A', 'B', 'C'),
 ('A', 'B', 'D'),
 ('A', 'B', 'E'),
 ('A', 'C', 'B'),
 ('A', 'C', 'D'),
 ('A', 'C', 'E'),
 ('A', 'D', 'B'),
 ('A', 'D', 'C'),
 ('A', 'D', 'E'),
 ('A', 'E', 'B'),
 ('A', 'E', 'C'),
 ('A', 'E', 'D'),
 ('B', 'A', 'C'),
 ('B', 'A', 'D'),
 ('B', 'A', 'E'),
 ('B', 'C', 'A'),
 ('B', 'C', 'D'),
 ('B', 'C', 'E'),
 ('B', 'D', 'A'),
 ('B', 'D', 'C'),
 ('B', 'D', 'E'),
 ('B', 'E', 'A'),
 ('B', 'E', 'C'),
 ('B', 'E', 'D'),
 ('C', 'A', 'B'),
 ('C', 'A', 'D'),
 ('C', 'A', 'E'),
 ('C', 'B', 'A'),
 ('C', 'B', 'D'),
 ('C', 'B', 'E'),
 ('C', 'D', 'A'),
 ('C', 'D', 'B'),
 ('C', 'D', 'E'),
 ('C', 'E', 'A'),
 ('C', 'E', 'B'),
 ('C', 'E', 'D'),
 ('D', 'A', 'B'),
 ('D', 'A', 'C'),
 ('D', 'A', 'E'),
 ('D', 'B', 'A'),
 ('D', 'B', 'C'),
 ('D', 'B', 'E'),
 ('D', 'C', 'A'),
 ('D', 'C', 'B'),
 ('D', 'C', 'E'),
 ('D', 'E', 'A'),
 ('D', 'E', 'B'),
 ('D', 'E', 'C'),
 ('E', 'A', 'B'),
 ('E', 'A', 'C'),
 ('E', 'A', 'D'),
 ('E', 'B', 'A'),
 ('E', 'B', 'C'),
 ('E', 'B', 'D'),
 ('E', 'C', 'A'),
 ('E', 'C'

### Yield from

* python 3.3 has new syntax: `yield from`
```
def chain(*iterables):
    for i in iterables:
        yield from i
```

In [68]:
def chain(*iterables):
    for i in iterables:
        for j in i:
            yield j

In [69]:
list(chain((1,2), (3,4)))

[1, 2, 3, 4]

In [25]:
temp = iter((1,2))

In [70]:
def chain(*iterables):
    for i in iterables:
        yield from i

In [71]:
list(chain((1,2), (3,4)))

[1, 2, 3, 4]

Interesting problem:
```
def f():
    def do_yield(n):
        yield n
    x=0
    while True:
        x += 1
        do_yield(x)
```

this will not actually yield increasing values of x - because the yield keyword only makes the immediately enclosing function a generator -  so cannot delegate generator behavior  

This can be "fixed" by using the `yield from do_yield(x)` but still had to use yield outside again - no real delegation

In [2]:
def f():
    def do_yield(n):
        yield n
    x = 0
    while x < 10:
        x += 1
        do_yield(x)

In [4]:
g = f()

In [5]:
next(g)

TypeError: 'NoneType' object is not an iterator

In [6]:
def f():
    x = 0
    while x < 10:
        x += 1
        yield x

In [19]:
g = f()

In [21]:
next(g)

2

## Closer look at iter

* iter(x) returns iterator - called by intepreted
* cna also call iter() with two arguments
    - arg 1 is a callable to be invoked repeatedly, with no args - the value *returned* by func, is yielded by iterator
    - arg 2 is a sentinel value which when returned byt he callabale, cause the iterator to raise StopIteration instead of yiedling sentinel
* ex: if d6(), rolls 6-sided dice, then `iter(d6,1)` will roll forever until 1 is reached.

An actual useful example:
```
with open('mydata.txt') as fp:
    for line in iter(fp.readline, ''):
        process_line(line)
```

will continue processing lines until nlank line is found or endo file is reached.
- <font color='red'> just doing for line in fp.readlines() will still do lazy iteratation right? just not stop until EOF </font>

## Closing
* his claim is that `def` should not be used for generators - they are different things altogether
* moreover, while `yield` is also used for coroutines, generators and coroutines are two different concepts, so there is a need for another syntax here also.

### Semantics Of Generator vs Iterator
* generator objects implement both `__iter__` and `__next__` - so they are iterators from this perspective
* however, from the conceptual viewpoint, an iterator, as a design in GoF, produces iterms by traversing a collection. Generators on the other hand can create on the fly.
* In reality python programers are not strict about what they call a generator and what the call an iterator - even in the docs

