# Chapter 17. Iterators, Generators, and Classic Coroutines

## A Sequence of Words

You give its constructor a string with some text, and then you can iterate word by word.

In [1]:
# sentence.py

import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text)

  def __getitem__(self, index):
    return self.words[index]

  def __len__(self):
    return len(self.words)

  def __repr__(self):
    return 'Sentence(%s)' % reprlib.repr(self.text)



In [2]:
s = Sentence('"The time has come," the Walrus said,')
s

Sentence('"The time ha... Walrus said,')

In [3]:
for word in s:
  print(word)

The
time
has
come
the
Walrus
said


In [4]:
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In the following sections, we'll develop other `Sentence` classes that pass the tests in above example.

In [5]:
s[0]

'The'

In [6]:
s[5]

'Walrus'

In [7]:
s[-1]

'said'

## Why Sequences Are Iterable: The `iter` function

Whenever Python needs to iterate over an object `x`, it automatically calls `iter(x)`

The `iter` built-in fcn:
 1. Checks whether the object implements `__iter__` and calls that to obtain an iterator
 2. Otherwise, but if `__getitem__` is implemented, then `iter()` creates an iterator that tries to fetch items by index, starting from 0.
 3. If that fails, Python raises `TypeError`, usually saying 'C' object is not iterable, where C is the class of the target obj.

In [8]:
class Spam:
  def __getitem__(self, i):
    print('->', i)
    raise IndexError()

In [9]:
spam_can = Spam()
iter(spam_can)

<iterator at 0x7efb25334850>

In [10]:
list(spam_can)

-> 0


[]

In goose typing approach, the defintion for an iterable is simpler but not as flexible: an obj is considered iterable if it implements the `__iter__` method.

In [11]:
from collections import abc
isinstance(spam_can, abc.Iterable)

False

### Using `iter` with Callable

We can call `iter` with two arguments to create an iterator from a fcn or any callable obj. In this usage, the first argument must be a callable to be invoked repeatedly to produce values, and the second argument is a `sentinel`

In [18]:
from random import randint

def d6():
  return randint(1, 6)

In [23]:
d6_iter = iter(d6, 1) #

In [24]:
d6_iter

<callable_iterator at 0x7efb25334160>

In [25]:
for roll in d6_iter:
  print(roll)

3
2
2
2
2
2
2
5
3
6
5
4
4


Note that the `iter` function here returns a `callable_iterator`. As usual with iterators, the `d6_iter` object becomes useless once exhausted.

In [None]:
from functools import partial

with open('mydata.db', 'rb') as f:
  read64 = partial(f.read, 64)
  for block in iter(read64, b''):
    process_block(block)

## Iterables Verses Iterators

> *iterable* \
Any object from which the `iter` built-in fcn can obtain an iterator. Objects implementing an `__iter__` method returning an iterator are iterable. Sequences are always iterable, as are objects implementing a `__getitem__` method that accepts 0-based indexes


Python obtains iterator from iterables.

In [26]:
s = 'ABC' # iterable
for char in s: # iterator behind the scene
  print(char)

A
B
C


In [27]:
s = 'ABC'
it = iter(s)
while True:
  try:
    print(next(it))
  except StopIteration:
    del it # release ref to `it`
    break

A
B
C


Python's standard interface for an iterator has two methods:
 -`__next__` return the next item in the series, raising `StopIteration` if there are no more.
 -`__iter__` return self; this allows iterators to be used where an iterable is expected, for example in a `for` loop

In [30]:
s = "ABC"
for char in s:
  print(char)

A
B
C


In [31]:
s = iter("ABC")
for char in s: # iter(s) == s
  print(char)

A
B
C


In [33]:
iter(s) is s

True

In [35]:
s3 = Sentence('Life of Brain')
it = iter(s3)

In [36]:
it

<iterator at 0x7efb53f98ee0>

In [37]:
next(it)

'Life'

In [38]:
next(it)

'of'

In [39]:
next(it)

'Brain'

In [40]:
next(it)

StopIteration: 

In [41]:
list(it) # Once exhausted, an iterable will always raise `StopIteration`

[]

In [42]:
list(iter(s3))

['Life', 'of', 'Brain']

`Sentence` implemented using the Iterator pattern

In [43]:
# sentence_iter.py


import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text)

  def __repr__(self):
    return 'Sentence(%s)' % reprlib.repr(self.text)

  def __iter__(self):
    return SentenceIterator(self.words)

class SentenceIterator:

  def __init__(self, words):
    self.words = words
    self.index = 0

  def __next__(self):
    try:
      word = self.words[self.index]
    except IndexError:
      raise StopIteration()
    self.index += 1
    return word

  def __iter__(self):
    return self

In [44]:
s = Sentence('"The time has come," the Walrus said,')

In [45]:
s

Sentence('"The time ha... Walrus said,')

In [46]:
for word in s:
  print(word)

The
time
has
come
the
Walrus
said


In [47]:
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In [49]:
isinstance(s, abc.Iterable)

True

### **Don't Make the Iterable an Iterator for Itself**

i.e., Don't implement `__next__` in addition to `__iter__` in the `Sentence` class.

Use the Iterator pattern
 - to access an aggregate obj's contents w/o exposing its internal repr.
 - to support multiple traversals of aggregated objs.
 - to provide a uniform interface for traversing different aggregate structures

To "support multiple traversals" it must be possible to obtain multiple indep. iterators form teh same iterable instance.

In [None]:
# sentence_gen.py


import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text)

  def __repr__(self):
    return 'Sentence(%s)' % reprlib.repr(self.text)

  def __iter__(self):
    for word in self.words:
      yield word
    # explicit return is not necessary

Now the iterator in the above example is in fact a generator object, built automatically when the `__iter__` method is called, because `__iter__` here is a generator function.

## How a Generator Works

Any Python fcn that has the `yield` keyword in its body is a generator function: a function which, when called, returns a generator obj (i.e. generator factory)

In [50]:
def gen_123():
  yield 1
  yield 2
  yield 3

In [51]:
gen_123

Generator objects implement the `Iterator` interface, so they are also iterable.

In [52]:
gen_123()

<generator object gen_123 at 0x7efb25289c40>

In [53]:
for i in gen_123():
  print(i)

1
2
3


In [58]:
isinstance(g, abc.Iterator)

True

In [54]:
g = gen_123()
next(g)

1

In [55]:
next(g)

2

In [56]:
next(g)

3

In [57]:
next(g)

StopIteration: 

In [59]:
def gen_AB():
  print('start')
  yield 'A'
  print('continue')
  yield 'B'
  print('end.')


To iterate, `for` machinery does the equivalent of `g = iter(gen_AB())` to get a generator object, and then `next(g)` at each iteration

In [60]:
for c in gen_AB():
  print('-->', c)

# expected:
# start
# --> A
# continue
# --> B
# end


start
--> A
continue
--> B
end.
