# Chapter 14: Iterables, Iterators, and Generators

This chapter is about the [Iterator pattern](https://en.wikipedia.org/wiki/Iterator_pattern), which among other things lets us *lazily* fetch items one at a time when scanning datasets that don't fit in memory. The Iterator pattern is built into Python.

The `yield` keyword, introduces in Python 2.2, allows the construction of generators, which work as iterators.

> Every generator is an iterator: generators fully implement the iterator interface. But an iterator [...] retrieves items from a collection, while a generator can produce items "out of thin air". [...] be aware that the Python community treats *iterator* and *generator* as synonyms most of the time.

Every collection in Python is *iterable*, and iterators are used internally to support:

* `for` loops
* Collection types construction and extension
* Looping over text files line by line
* List, dict and set comprehensions
* Tuple unpacking
* Unpacking actual parameters with `*` in function calls

This chapter covers the following topics:

* How the `iter(...)` built-in function works
* How to implement the Iterator pattern
* How a generator function works in detail
* How the Iterator pattern can be replaced by a generator function or expression
* General purpose generator functions from the standard library
* Using the new `yield` statement to combine generators
* A case study: using generator functions in a database conversion utility designed to work with large datasets
* Why generators and coroutines look alike but are very different and should not be mixed

## Sequence Take #1: A Sequence of Words

We will explore iterables by implementing a `Sequence` class.

In [1]:
# Example 14-1. sentence.py: A Sentence as a sequence of words

import re
import reprlib

RE_WORD = re.compile("\w+")

class Sentence:
    
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
        
    def __getitem__(self, index):
        return self.words[index]
    
    def __len__(self):
        return len(self.words)
    
    def __repr__(self):
        return f"Sentence({reprlib.repr(self.text)})"


In [2]:
s = Sentence("'The time has come,' the Walrus said.")
s

Sentence("'The time ha... Walrus said.")

In [3]:
for word in s:
    print(word)

The
time
has
come
the
Walrus
said


### Why Sequences Are Iterable: The `iter` Function

Whenever the interpreter needs to iterate over an object `x`, it automatically calls `iter(x)`. The `iter` built-in function:

1. Checks if the object implements `__iter__`, and calls that to obtain iterator.
2. If `__iter__` is not found, but `__getitem__` is implemented, Python creates an iterator that attempts to fetch items in order (starting from index 0).
3. If that fails, Python raises `TypeError`.

This is why any Python sequence is iterable; they all implement `__getitem__`.

> As of Python 3.4, the most accurate way of checking whether an object is iterable is to call `iter(x)` and handle a `TypeError` exception if it fails.

## Iterables Versus Iterators

It's important to be clear about the relationship between iterables and iterators: **Python obtains iterators from iterables.**

Here are the definitions:

>*__iterable__*<br>
Any object from which the `iter` built-in function can obtain an iterator. Objects implementing an `__iter__` method returning an *iterator*  are iterable. Sequences are always iterable; as are objects implementing a `__getitem__` method that takes 0-based indexes.
>
>*__iterator__*<br>
Any object that implements the `__next__` no-argument method that returns the next item in a series or raises `StopIteration` when there are no more items. Python iterators also implement the `__iter__` method so they are *iterable* as well.

The standard interface for an iterator has two methods:

`__next__`: Returns the next available item, raising `StopIteration` when exhausted.

`__iter__`: Returns `self`; this allows iterators to be used where an iterable is expected, for example, in a `for` loop.

Because the only methods required of an iterator are `__next__` and `__iter__`, there is no way to check whether there are remaining items, other than to call `next()` and catch `StopIteration`.

It is not possible to "restart" an iterator. To start over, call `iter(...)` on the iterable that built the iterator.

## Sentence Take #2: A Classic Iterator

The next `Sentence` class is built according to the classic Iterator pattern. 

Note that this is **not idiomatic Python**, which the next refactoring will make clear. The example serves to make explicit the relationship between the iterable collection and the iterator object.

In [6]:
# Example 14-4. sentence_iter.py: Sentence implemented using the Iterator pattern

import re
import reprlib

RE_WORDS = re.compile("\w+")

class Sentence:
    """Iterables must implement __iter__"""
    def __init__(self, text):
        self.text = text
        self.words = RE_WORDS.findall(text)
        
    def __repr__(self):
        return f"Sentence({reprlib.repr(self.text)})"
    
    def __iter__(self):
        return SentenceIterator(self.words)
    

class SentenceIterator:
    """Iterators are supposed to implement both __next__ AND __iter__"""
    def __init__(self, words):
        self.words = words
        self.index = 0
        
    def __next__(self):
        try:
            word = self.words[index]
        except IndexError:
            raise StopIteration()
        self.index += 1
        return word
    
    def __iter__(self):
        return self
        

### Making Sentence an Iterator: Bad Idea

A common cause of errors in building iterables and iterators is to confuse the two. To be clear: 

* **iterables have an `__iter__` method that instantiates a new iterator every time.**
* **iterators implement a `__next__` method that returns individual items, and an `__iter__` method that returns `self`.**

Therefore, iterators are also iterable, but iterables are not iterators.

An iterable should never act as an iterator over itself. In other words, iterables must implement `__iter__`, but not `__next__`.

On the other hand, iterators should always be iterable. An iterator's `__iter__` should just return `self`.

## Sentence Take #3: A Generator Function

A Pythonic implementation of the classic  Iterator pattern uses a generator function to replace the `SentenceIterator` class.

In [7]:
# Example 14-5. sentence_gen.py: Sentence implemented using a generator function.

import re
import reprlib

RE_WORD = re.compile("\w+")

class Sentence:
    
    def __init__(self, text):
        self.text = text
        self.words = RE_WORDS.findall(text)
        
    def __repr__(self):
        return f"Sentence({reprlib.repr(self.text)})"
    
    def __iter__(self):
        """__iter__ here is a generator function"""
        for word in self.words:
            yield word
        

### How a Generator Function Works

Any Python function that has the `yield` keyword in its body is a **generator function**: a function which, when called, returns a generator object.

In other words, a generator function is a generator factory. 

A generator function builds a generator object that wraps the body of the function. 

When `next(...)` is invoked on the generator object, execution advances to the next `yield` in the function body, and `next(...)` evaluates to the value yielded when the function body is suspended.

Finally, when the function body returns, the enclosing generaotr object raises `StopIteration`, in accordance with the Iterator protocol.

In [8]:
def gen_AB():
    print("start")
    yield "A"
    print("continue")
    yield "B"
    print("end")
    
for c in gen_AB():
    print("-->", c)

start
--> A
continue
--> B
end


## Sentence Take #4: A Lazy Implementation

A lazy implementation postpones producing values to the last possible moment. This saves memory and may avoid useless processing as well.

Previously, `__init__` eagerly builds a list of all the words in `text`, which means processing the entire text itself.

In the following implementation we use `re.finditer()` which builds an iterator over matches of `RE_WORD` on `self.text`, yielding `MatchObject` instances. Calling `.group()` returns the actual matched text.

In [10]:
# Example 14-7. sentence_gen2.py: Sentence implemented using generator function calling re.finditer

import re
import reprlib

RE_WORD = re.compile("\w+")

class Sentence:
    
    def __init__(self, text):
        self.text = text
    
    def __repr__(self):
        return f"Sentence({reprlib.repr(self.text)})"
    
    def __iter__(self):
        for match in RE_WORD.finditer(self.text):
            yield match.group()

## Sentence Take #5: A Generator Expression

A generator expression (genexp) can be understood as a lazy version of a list comprehension: it does not eagerly build a list, but returns a generator that will lazily produce the items on demand.

Genexps are syntactic sugar: they can always be replaced by generator functions, but are sometimes more convenient.

Ramalho's rule of thumb: **if the generator expression spans more than a couple of lines, use a generator function for readability**.

In [None]:
# Example 14-9. sentence_genexp.py: Sentence implemented using generator expression

import re
import reprlib

RE_WORD = re.compile("\w+")

class Sentence:
    
    def __init__(self, text):
        self.text = text
    
    def __repr__(self):
        return f"Sentence({reprlib.repr(self.text)})"
    
    def __iter__(self):
        return (match.group() for match in RE_WORD.finditer(self.text))

## Another Example: Arithmetic Progression Generator

TODO.

In [1]:
# Example 14-11. The ArithmeticProgression class

class ArithmeticProgression:
    
    def __init__(self, begin, step, end=None):
        self.begin = begin
        self.step = step
        self.end = end # None -> "infinite" series
        
    def __iter__(self):
        result = type(self.begin + self.step)(self.begin)
        forever = self.end is None
        index = 0
        while forever or result < self.end:
            yield result
            index += 1
            result = self.begin + self.step * index
            

**Things to note about the above implementation:**

* The first line of `__iter__` coerces the type of `self.begin` to the type of the subsequent additions (in `while` loop).
* The `index` variable is used to calculate each `result` to reduce cumulative error when working with floats.

If the point of a class is to build a generator by implementing `__iter__`, the class can be reduced to a generator function:

In [3]:
# Example 14-12. The aritprog_gen generator function

def aritprog_gen(begin, step, end=None):
    result = type(begin + step)(begin)
    forever = end is None
    index = 0
    while forever or result < end:
        yield result
        index += 1
        result = begin + step * index
        

### Arithmetic Progression with `itertools`

`itertools` has, at time of publication, 19 generator functions that can be combined in different ways. We can rewrite `aritprog_gen` using `itertools.count` and `itertools.takewhile`:

```
class count(builtins.object)
 |  count(start=0, step=1) --> count object
 |  
 |  Return a count object whose .__next__() method returns consecutive values.
 |  Equivalent to:
 |  
 |      def count(firstval=0, step=1):
 |          x = firstval
 |          while 1:
 |              yield x
 |              x += step
```

```
class takewhile(builtins.object)
 |  takewhile(predicate, iterable) --> takewhile object
 |  
 |  Return successive entries from an iterable as long as the 
 |  predicate evaluates to true for each entry.)
 ```

In [6]:
# Example 14-13. aritprog_gen leveraging itertools

def aritprog_gen(begin, step, end=None):
    first = type(begin + step)(begin)
    ap_gen = itertools.count(first, step)
    if end is not None:
        ap_gen = itertools.takewhile(lambda n: n < end, ap_gen)
    return ap_gen


Note the example 14-13 is not a generator function. It returns a generator, so it operates as a generator factory.

## Generator Functions in the Standard Library

### *Table 14-1. Filtering generator functions*

| Module      | Function                                                | Description                                                  |
| ----------- | ------------------------------------------------------- | ------------------------------------------------------------ |
| `itertools` | `compress(it, selector_it)`                             | Consumes two iterables in parallell; yields items from `it` whenever the corresponding item in `selector_it` is truthy. |
| `itertools` | `dropwhile(predicate, it)`                              | Consumes `it` skipping items while `predicate` computes truthy, then yields every remaining item (no further checks are made). |
| (built-in)  | `filter(predicate, it)`                                 | Applies `predicate` to each item of `iterable`, yielding the item if `predicate(item)` is truthy; if `predicate` is `None`, only truthy items are yielded. |
| `itertools` | `filterfalse(predicate, it)`                            | Same as `filter`, with the `predicate` logic negated: yields items whenever `predicate` computes falsy. |
| `itertools` | `islice(it, stop)` or `islice(it, start, stop, step=1)` | Yields items from a slice of `it`, similar to `s[:stop]` or `s[start:stop:step]` except it can be any iterable, and the operation is lazy. |
| `itertools` | `takewhile(predicate, it)`                              | Yields items while `predicate` computes truthy, then stops and no further checks are made. |


#### Examples

In [7]:
import itertools

def vowel(c):
    return c.lower() in "aeiou"

list(itertools.compress("Aardvark", (1, 0, 1, 1, 0 ,1)))

['A', 'r', 'd', 'a']

In [8]:
list(itertools.dropwhile(vowel, "Aardvark"))

['r', 'd', 'v', 'a', 'r', 'k']

In [9]:
list(filter(vowel, "Aardvark"))

['A', 'a', 'a']

In [10]:
list(itertools.filterfalse(vowel, "Aardvark"))

['r', 'd', 'v', 'r', 'k']

In [11]:
list(itertools.islice("Aardvark", 4))

['A', 'a', 'r', 'd']

In [12]:
list(itertools.islice("Aardvark", 4, 7))

['v', 'a', 'r']

In [13]:
list(itertools.islice("Aardvark", 1, 7, 2))

['a', 'd', 'a']

In [14]:
list(itertools.takewhile(vowel, "Aardvark"))

['A', 'a']

### *Table 14-2. Mapping generator functions*

| Module      | Function                          | Description                                                  |
| ----------- | --------------------------------- | ------------------------------------------------------------ |
| `itertools` | `accumulate(it, [func])`          | Yields accumulated sums; if `func` is provided, yields the result of applying it to the first pair of items, then to the first result and next item, etc. |
| (built-in)  | `enumerate(iterable, start=0)`    | Yields 2-tuples of the form `(index, item)`, where `index` is counted from `start` and `items` is taken from the `iterable`. |
| (built-in)  | `map(func, it1, [it2, ..., itN])` | Applies `func` to each item of `it`, yielding the result; if N iterables are given, `func` must take N arguments and the iterables will be consumed in parallel. |
| `itertools` | `starmap(func, it)`               | Applies `func` to each item of `it`, yielding the result; the input iterable should yield iterable items `iit`, and `func` is applied as `func(*iit)`. |

#### Examples



In [22]:
import itertools
import operator
sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]

list(itertools.accumulate(sample))

[5, 9, 11, 19, 26, 32, 35, 35, 44, 45]

In [23]:
list(itertools.accumulate(sample, min))

[5, 4, 2, 2, 2, 2, 2, 0, 0, 0]

In [24]:
list(itertools.accumulate(sample, max))

[5, 5, 5, 8, 8, 8, 8, 8, 9, 9]

In [25]:
list(itertools.accumulate(sample, operator.mul))

[5, 20, 40, 320, 2240, 13440, 40320, 0, 0, 0]

In [26]:
# Factorials from 1! to 10!
list(itertools.accumulate(range(1, 11), operator.mul))

[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

In [27]:
list(enumerate("albatroz", 1))

[(1, 'a'),
 (2, 'l'),
 (3, 'b'),
 (4, 'a'),
 (5, 't'),
 (6, 'r'),
 (7, 'o'),
 (8, 'z')]

In [28]:
list(map(operator.mul, range(11), range(11)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

In [29]:
list(map(operator.mul, range(11), [2, 4, 8]))

[0, 4, 16]

In [30]:
list(map(lambda a, b: (a, b), range(11), [2, 4, 8]))

[(0, 2), (1, 4), (2, 8)]

In [33]:
# Running average
list(itertools.starmap(lambda a, b: b/a, enumerate(itertools.accumulate(sample), 1)))

[5.0,
 4.5,
 3.6666666666666665,
 4.75,
 5.2,
 5.333333333333333,
 5.0,
 4.375,
 4.888888888888889,
 4.5]