<a href="https://colab.research.google.com/github/rahiakela/fluent-python-book-practice/blob/master/part-v-control-flow/14_iterables_iterators_and_generators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Iterables, iterators and generators

Iteration is fundamental to data processing. And when scanning datasets that don’t fit in memory, we need a way to fetch the items lazily, that is, one at a time and on demand. This is what the Iterator pattern is about.

Python does not have macros like Lisp, so abstracting away the Iterator pattern required changing the language: the yield keyword was added
in Python 2.2 (2001). The yield keyword allows the construction of generators, which work as iterators.

Python 3 uses generators in many places. Even the range() built-in now returns a generator-like object instead of full-blown lists like before. If you must build a list from range, you have to be explicit, e.g. list(range(100)).

Every collection in Python is iterable, and iterators are used internally to support:

- for loops;
- collection types construction and extension;
- looping over text files line by line;
- list, dict and set comprehensions;
- tuple unpacking;
- unpacking actual parameters with * in function calls.

## Sentence take #1: a sequence of words

We’ll start our exploration of iterables by implementing a Sentence class: you give its constructor a string with some text, and then you can iterate word by word.

In [1]:
import re
import reprlib

In [2]:
RE_WORD = re.compile("\w+")

class Sentence:

  def __init__(self, text):
    self.text = text
    # returns a list with all non-overlapping matches of the regular expression, as a list of strings.
    self.words = RE_WORD.findall(text)

  def __getitem__(self, index):
    # self.words holds the result of .findall, so we simply return the word at the given index.
    return self.words[index]

  # To complete the sequence protocol, we implement __len__ — but it is not needed to make an iterable object.
  def __len__(self):
    return len(self.words)

  def __repr__(self):
    # generate abbreviated string representations of data structures that can be very large
    return "Sentence(%s)" % reprlib.repr(self.text)

By default, reprlib.repr limits the generated string to 30 characters.

In [3]:
s = Sentence('"The time has come," the Walrus said,')
s

Sentence('"The time ha... Walrus said,')

In [4]:
# Sentence instances are iterable
for word in s:
  print(word)

The
time
has
come
the
Walrus
said


In [5]:
# Being iterable, Sentence objects can be used as input to build lists and other iterable types.
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In [6]:
# because it’s also a sequence, so you can get words by index
s[0]

'The'

In [7]:
s[5]

'Walrus'

In [8]:
s[-1]

'said'

In [9]:
s[-2]

'Walrus'

### Why sequences are iterable: the iter function

Every Python programmer knows that sequences are iterable. Now we’ll see precisely why.

Whenever the interpreter needs to iterate over an object x, it automatically calls iter(x).

The iter built-in function:

- Checks whether the object implements, __iter__, and calls that to obtain an iterator;
- If __iter__ is not implemented, but __getitem__ is implemented, Python creates an iterator that attempts to fetch items in order, starting from index 0 (zero);
- If that fails, Python raises TypeError, usually saying "'C' object is not iterable", where C is the class of the target object.

That is why any Python sequence is iterable: they all implement `__getitem__`. In fact, the standard sequences also implement `__iter__`, and yours should too, because the special handling of `__getitem__` exists for backward compatibility reasons and may be gone in the future.

This is an extreme form of duck typing: an object is considered iterable not only when it implements the special method `__iter__`, but also when it implements `__getitem__`, as long as `__getitem__` accepts
int keys starting from 0.

In the goose-typing approach, the definition for an iterable is simpler but not as flexible: an object is considered iterable if it implements the `__iter__` method. No subclassing or registration is required, because abc.Iterable implements the `__subclasshook__`.

In [10]:
class Foo:
  def __iter__(self):
    pass

In [11]:
from collections import abc

In [12]:
issubclass(Foo, abc.Iterable)

True

In [13]:
f = Foo()
isinstance(f, abc.Iterable)

True

However, note that our initial Sentence class does not pass the issubclass(Sentence, abc.Iterable) test, even though it is iterable in practice.

### Iterables versus iterators

It’s important to be clear about the relationship between iterables and iterators: Python obtains iterators from iterables.

Here is a simple for loop iterating over a str. The str 'ABC' is the iterable here. You don’t see it, but there is an iterator behind the curtain:

In [14]:
s = 'ABC'
for char in s:
  print(char)

A
B
C


If there was no for statement and we had to emulate the for machinery by hand with a while loop, this is what we’d have to write:

In [15]:
s = 'ABC'
# Build an iterator it from the iterable.
it = iter(s)
while True:
  try:
    # Repeatedly call next on the iterator to obtain the next item.
    print(next(it))
  except StopIteration:  # The iterator raises StopIteration when there are no further items.
    # Release reference to it — the iterator object is discarded.
    del it
    break

A
B
C


StopIteration signals that the iterator is exhausted. This exception is handled internally in for loops and other iteration contexts like list comprehensions, tuple unpacking etc.

The standard interface for an iterator has two methods:

- `__next__`: Returns the next available item, raising StopIteration when there are no more items.
- `__iter__`: Returns self; this allows iterators to be used where an iterable is expected, for example, in a for loop.

In [16]:
s3 = Sentence("Pig and Pepper")

# Obtain an iterator from s3.
it = iter(s3)

In [17]:
# next(it) fetches the next word.
next(it)

'Pig'

In [18]:
next(it)

'and'

In [19]:
next(it)

'Pepper'

In [20]:
# There are no more words, so the iterator raises a StopIteration exception.
next(it)

StopIteration: ignored

In [21]:
# Once exhausted, an iterator becomes useless.
list(it)

[]

In [23]:
# To go over the sentence again, a new iterator must be built.
list(iter(s3))

['Pig', 'and', 'Pepper']

Since the only methods required of an iterator are `__next__` and `__iter__`, there is no way to check whether there are remaining items, other than call next() and catch StopInteration. 

Also, it’s not possible to “reset” an iterator. If you need to start over, you need to call iter(…) on the iterable that built the iterator in the first place. 

Calling `iter(…)` on the iterator itself won’t help, because — as mentioned — Iterator.`__iter__` is implemented by returning self, so this will not reset a depleted iterator.

## Sentence take #2: a classic iterator