<a href="https://colab.research.google.com/github/present42/PythonExercise/blob/main/Fluent_Python_ch17.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 17. Iterators, Generators, and Classic Coroutines

## A Sequence of Words

You give its constructor a string with some text, and then you can iterate word by word.

In [None]:
# sentence.py

import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text)

  def __getitem__(self, index):
    return self.words[index]

  def __len__(self):
    return len(self.words)

  def __repr__(self):
    return 'Sentence(%s)' % reprlib.repr(self.text)



In [None]:
s = Sentence('"The time has come," the Walrus said,')
s

Sentence('"The time ha... Walrus said,')

In [None]:
for word in s:
  print(word)

The
time
has
come
the
Walrus
said


In [None]:
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In the following sections, we'll develop other `Sentence` classes that pass the tests in above example.

In [None]:
s[0]

'The'

In [None]:
s[5]

'Walrus'

In [None]:
s[-1]

'said'

## Why Sequences Are Iterable: The `iter` function

Whenever Python needs to iterate over an object `x`, it automatically calls `iter(x)`

The `iter` built-in fcn:
 1. Checks whether the object implements `__iter__` and calls that to obtain an iterator
 2. Otherwise, but if `__getitem__` is implemented, then `iter()` creates an iterator that tries to fetch items by index, starting from 0.
 3. If that fails, Python raises `TypeError`, usually saying 'C' object is not iterable, where C is the class of the target obj.

In [None]:
class Spam:
  def __getitem__(self, i):
    print('->', i)
    raise IndexError()

In [None]:
spam_can = Spam()
iter(spam_can)

<iterator at 0x7efb25334850>

In [None]:
list(spam_can)

-> 0


[]

In goose typing approach, the defintion for an iterable is simpler but not as flexible: an obj is considered iterable if it implements the `__iter__` method.

In [None]:
from collections import abc
isinstance(spam_can, abc.Iterable)

False

### Using `iter` with Callable

We can call `iter` with two arguments to create an iterator from a fcn or any callable obj. In this usage, the first argument must be a callable to be invoked repeatedly to produce values, and the second argument is a `sentinel`

In [None]:
from random import randint

def d6():
  return randint(1, 6)

In [None]:
d6_iter = iter(d6, 1) #

In [None]:
d6_iter

<callable_iterator at 0x7efb25334160>

In [None]:
for roll in d6_iter:
  print(roll)

3
2
2
2
2
2
2
5
3
6
5
4
4


Note that the `iter` function here returns a `callable_iterator`. As usual with iterators, the `d6_iter` object becomes useless once exhausted.

In [None]:
from functools import partial

with open('mydata.db', 'rb') as f:
  read64 = partial(f.read, 64)
  for block in iter(read64, b''):
    process_block(block)

## Iterables Verses Iterators

> *iterable* \
Any object from which the `iter` built-in fcn can obtain an iterator. Objects implementing an `__iter__` method returning an iterator are iterable. Sequences are always iterable, as are objects implementing a `__getitem__` method that accepts 0-based indexes


Python obtains iterator from iterables.

In [None]:
s = 'ABC' # iterable
for char in s: # iterator behind the scene
  print(char)

A
B
C


In [None]:
s = 'ABC'
it = iter(s)
while True:
  try:
    print(next(it))
  except StopIteration:
    del it # release ref to `it`
    break

A
B
C


Python's standard interface for an iterator has two methods:
 -`__next__` return the next item in the series, raising `StopIteration` if there are no more.
 -`__iter__` return self; this allows iterators to be used where an iterable is expected, for example in a `for` loop

In [None]:
s = "ABC"
for char in s:
  print(char)

A
B
C


In [None]:
s = iter("ABC")
for char in s: # iter(s) == s
  print(char)

A
B
C


In [None]:
iter(s) is s

True

In [None]:
s3 = Sentence('Life of Brain')
it = iter(s3)

In [None]:
it

<iterator at 0x7efb53f98ee0>

In [None]:
next(it)

'Life'

In [None]:
next(it)

'of'

In [None]:
next(it)

'Brain'

In [None]:
next(it)

StopIteration: 

In [None]:
list(it) # Once exhausted, an iterable will always raise `StopIteration`

[]

In [None]:
list(iter(s3))

['Life', 'of', 'Brain']

`Sentence` implemented using the Iterator pattern

In [None]:
# sentence_iter.py


import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text)

  def __repr__(self):
    return 'Sentence(%s)' % reprlib.repr(self.text)

  def __iter__(self):
    return SentenceIterator(self.words)

class SentenceIterator:

  def __init__(self, words):
    self.words = words
    self.index = 0

  def __next__(self):
    try:
      word = self.words[self.index]
    except IndexError:
      raise StopIteration()
    self.index += 1
    return word

  def __iter__(self):
    return self

In [None]:
s = Sentence('"The time has come," the Walrus said,')

In [None]:
s

Sentence('"The time ha... Walrus said,')

In [None]:
for word in s:
  print(word)

The
time
has
come
the
Walrus
said


In [None]:
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In [None]:
isinstance(s, abc.Iterable)

True

### **Don't Make the Iterable an Iterator for Itself**

i.e., Don't implement `__next__` in addition to `__iter__` in the `Sentence` class.

Use the Iterator pattern
 - to access an aggregate obj's contents w/o exposing its internal repr.
 - to support multiple traversals of aggregated objs.
 - to provide a uniform interface for traversing different aggregate structures

To "support multiple traversals" it must be possible to obtain multiple indep. iterators form teh same iterable instance.

In [None]:
# sentence_gen.py


import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text)

  def __repr__(self):
    return 'Sentence(%s)' % reprlib.repr(self.text)

  def __iter__(self):
    for word in self.words:
      yield word
    # explicit return is not necessary

Now the iterator in the above example is in fact a generator object, built automatically when the `__iter__` method is called, because `__iter__` here is a generator function.

## How a Generator Works

Any Python fcn that has the `yield` keyword in its body is a generator function: a function which, when called, returns a generator obj (i.e. generator factory)

In [None]:
def gen_123():
  yield 1
  yield 2
  yield 3

In [None]:
gen_123

Generator objects implement the `Iterator` interface, so they are also iterable.

In [None]:
gen_123()

<generator object gen_123 at 0x7efb25289c40>

In [None]:
for i in gen_123():
  print(i)

1
2
3


In [None]:
isinstance(g, abc.Iterator)

True

In [None]:
g = gen_123()
next(g)

1

In [None]:
next(g)

2

In [None]:
next(g)

3

In [None]:
next(g)

StopIteration: 

In [None]:
def gen_AB():
  print('start')
  yield 'A'
  print('continue')
  yield 'B'
  print('end.')


To iterate, `for` machinery does the equivalent of `g = iter(gen_AB())` to get a generator object, and then `next(g)` at each iteration

In [None]:
for c in gen_AB():
  print('-->', c)

# expected:
# start
# --> A
# continue
# --> B
# end


start
--> A
continue
--> B
end.


## Lazy Sentences

`Iterator` interface is designed to be lazy: `next(my_iterator)` yields one item at a time. The opposite of lazy is eager.

Our `Sentence` implementation has not been lazy because the `__init__` eagerly builds a list of all words in the text, binding it to the `self.words` attributes.

In [2]:
# sentence_gen2.py


import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

  def __init__(self, text):
    self.text = text
    # self.words = RE_WORD.findall(text) - need to process all words


  def __repr__(self):
    return 'Sentence(%s)' % reprlib.repr(self.text)

  def __iter__(self):
    # finditer builds an iterator over the matches of
    # RE_WORD on self.text, yielding MatchObject instances
    for matched in RE_WORD.finditer(self.text):
      # match.group() extracts the matched text from
      # the MatchObject instance
      yield matched.group()

### Lazy Generator Expression

In [8]:
# generator function
def gen_AB():
  print('start')
  yield 'A'
  print("continue")
  yield 'B'
  print('end')

In [9]:
res1 = [x * 3 for x in gen_AB()]

start
continue
end


In [10]:
for i in res1:
  print('-->', i)

--> AAA
--> BBB


In [11]:
res2 = (x*3 for x in gen_AB())
res2

<generator object <genexpr> at 0x78fb866f3680>

In [12]:
res2

<generator object <genexpr> at 0x78fb866f3680>

In [13]:
for i in res2:
  print('-->', i)

start
--> AAA
continue
--> BBB
end


In [14]:
# sentence_genexp.py

import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

  def __init__(self, text):
    self.text = text
    # self.words = RE_WORD.findall(text) - need to process all words


  def __repr__(self):
    return 'Sentence(%s)' % reprlib.repr(self.text)

  def __iter__(self):
    # finditer builds an iterator over the matches of
    # RE_WORD on self.text, yielding MatchObject instances
    return (matched.group() for matched in RE_WORD.finditer(self.text))

# When to Use Generator Expressions

*iterator*
 - General term for any object that implements `__next__` method. Iterators are designed to prodcue data that is consumed by the client code, i.e., the code that drives the iterator via a `for` loop or other iterative feature, or by explicitly calling `next(it)` on the iterator. In practice, most iterators we use in Python are *generators*.

*generator*
 - An iterator built by the Python compiler. To create a generator, we don't implement `__next__` method. instead, we use the `yield` keyword to make a *generator* function, which is a factory of *generator objects*. A *generator expression* is another way to build a generator object. Generator objects provide *__next__*, so they are iterators.

In [16]:
def g(): # generator fcn
  yield 0

In [19]:
g() # generator obj (iterator) created by generator fcn

<generator object g at 0x78fb865a5770>

In [20]:
ge = (c for c in 'XYZ') # generator exp builds a generator obj

In [21]:
ge

<generator object <genexpr> at 0x78fb865a6ea0>

In [23]:
type(g()), type(ge)

(generator, generator)

## An Arithmetic Progrssion Generator

In [24]:
class ArithmeticProgression:
  def __init__(self, begin, step, end=None):
    self.begin = begin
    self.step = step
    self.end = end # None -> infinite series

  def __iter__(self):
    result_type = type(self.begin + self.step)
    result = result_type(self.begin)
    forever = self.end is None

    index = 0

    while forever or result < self.end:
      yield result
      index += 1
      # Why not adding cumulatively? numerical stability
      result = self.begin + self.step * index

In [25]:
ap = ArithmeticProgression(0, 1, 3)

In [26]:
list(ap)

[0, 1, 2]

In [27]:
ap = ArithmeticProgression(0, 0.5, 3)
list(ap)

[0.0, 0.5, 1.0, 1.5, 2.0, 2.5]

In [28]:
ap = ArithmeticProgression(0, 1/3, 1)
list(ap)

[0.0, 0.3333333333333333, 0.6666666666666666]

In [29]:
from fractions import Fraction
ap = ArithmeticProgression(0, Fraction(1,3), 1)
list(ap)

[Fraction(0, 1), Fraction(1, 3), Fraction(2, 3)]

In [30]:
from decimal import Decimal
ap = ArithmeticProgression(0, Decimal('.1'), .3)
list(ap)

[Decimal('0'), Decimal('0.1'), Decimal('0.2')]

In [31]:
100 * 1.1

110.00000000000001

In [32]:
sum(1.1 for _ in range(100))

109.99999999999982

In [33]:
1000 * 1.1

1100.0

In [34]:
sum(1.1 for _ in range(1000))

1100.0000000000086

If the whole point of a class is to build a generator by implementing `__iter__`, we can replace the class with a generator function. A generator function is, after all, a generator factory.

In [35]:
def aritporg_gen(begin, step, end=None):
  result_type = type(begin + step)
  result = result_type(begin)
  forever = end is None

  index = 0

  while forever or result < end:
    yield result
    index += 1
    # Why not adding cumulatively? numerical stability
    result = begin + step * index

But remember! There are plenty of ready-to-uses generators in the standard library

### Arithemetic Progression with itertools

In [38]:
import itertools

gen = itertools.count(1, .5) # return a generater that yields numbers

In [39]:
next(gen)

1

In [40]:
next(gen)

1.5

`itertools.takewhile` returns a generator that consumes another generator and stops when a given predicate evalutates to `False`

In [41]:
gen = itertools.takewhile(lambda n: n < 3, itertools.count(1, .5))

In [42]:
list(gen)

[1, 1.5, 2.0, 2.5]

In [43]:
import itertools

def aritprog_gen(begin, step, end=None):
  first = type(begin + step)(begin)
  ap_gen = itertools.count(first, step)
  if end is None:
    return ap_gen
  return itertools.takewhile(lambda n: n < end, ap_gen)

Note that aritprog_gen is not a generator function: it has no `yield` in its body. But it returns a generator, just as a generator function does.

## Generator Functions in the Standard Library

### 1. Filtering Generator Functions
They yield a subset of items produced by the input iterable, without changing the items themselves.

In [46]:
def vowel(c):
  return c.lower() in 'aeiou'

In [47]:
list(filter(vowel, 'Suwon'))

['u', 'o']

In [48]:
import itertools
list(itertools.filterfalse(vowel, 'Suwon'))

['S', 'w', 'n']

In [50]:
list(itertools.dropwhile(vowel, 'Suwon'))

['S', 'u', 'w', 'o', 'n']

In [51]:
list(itertools.takewhile(vowel, 'Suwon'))

[]