## Iterators

Iteration is fundamental to data processing: programs apply computations to data series, from pixels to nucleotides. *If the data doesn’t fit in memory, we need to fetch the items lazily—one at a time and on demand*. That’s what an iterator does.  

Every standard collection in Python is iterable. *An iterable is an object that provides an iterator*, which Python uses to support operations like:
- `for` loops  
- List, dict, and set comprehensions  
- Unpacking assignments  
- Construction of collection instances  

## A Sequence of Words

In [5]:
# tag::SENTENCE_SEQ[]
import re
import reprlib

RE_WORD = re.compile(r'\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)  # `.findall` returns a list with all nonoverlapping 
                                            # matches of the regular expression, as a list of strings.

    def __getitem__(self, index):
        return self.words[index]  # return the word at the given index.

    def __len__(self):
        return len(self.words)

    def __repr__(self):
        """
        reprlib.repr is a utility function to generate abbreviated string
        representations of data structures that can be very large
        """
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
s = Sentence('"The time has come," the Walrus said,')
s

Sentence('"The time ha... Walrus said,')

In [6]:
for word in s:
    print(word)

The
time
has
come
the
Walrus
said


In [7]:
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In [9]:
s[0]

'The'

## Why Sequences Are Iterable: The `iter` Function

<span style="color:skyblue">***Whenever Python needs to iterate over an object `x`, it automatically calls `iter(x)`. The `iter` built-in function:***</span>
1. <span style="color:skyblue">*Checks whether the object implements `__iter__`, and calls that to obtain an iterator.*</span>
2. <span style="color:skyblue">*If `__iter__` is not implemented, but `__getitem__` is, then `iter()` creates an iterator that tries to fetch items by index, starting from 0 (zero). This is why all Python sequences are iterable: by definition, they all implement `__getitem__` (this is an extreme example of duct typing)*</span>
3. <span style="color:skyblue">*If that fails, Python raises `TypeError`, usually saying `'C'` object is not iterable, where `C` is the class of the target object.*</span>

In [13]:
class Spam:
    def __getitem__(self, i):
        print('->', i)
        raise IndexError()
    

spam_can = Spam()
iter(spam_can)

<iterator at 0x7f2caedd93c0>

In [14]:
list(spam_can)

-> 0


[]

Although `spam_can` is iterable (its `__getitem__` could provide items), it is not recognized as such by an isinstance against `abc.Iterable`.

In [15]:
from collections import abc

isinstance(spam_can, abc.Iterable)

False

An object is considered iterable if it implements the `__iter__` method

In [16]:
class GooseSpam:
    def __iter__(self):
        pass

from collections import abc
print(f"{issubclass(GooseSpam, abc.Iterable) = }")

goose_spam_can = GooseSpam()
print(f"{isinstance(goose_spam_can, abc.Iterable) = }")

issubclass(GooseSpam, abc.Iterable) = True
isinstance(goose_spam_can, abc.Iterable) = True


**Note**: As of Python 3.10, the most accurate way to check whether an object x is iterable is to call `iter(x)` and handle a `TypeError` exception if it isn’t. This is more accurate than using `isinstance(x, abc.Iterable)`, because `iter(x)` also considers the legacy `__getitem__` method, while the Iterable ABC does not.

### Using `iter` with a Callable

We can call `iter()` with two arguments to create an iterator from a function or any callable object. In this usage, the first argument must be a callable to be invoked repeatedly (with no arguments) to produce values, and the second argument is a sentinel: a marker value which, when returned by the callable, causes the iterator to raise `StopIteration` instead of yielding the sentinel.

In [42]:
from random import randint

def d6():
    return randint(1, 6)

d6_iter = iter(d6, 1)

for roll in d6_iter:
    print(roll)

6
5
3


One useful application of the second form of `iter()` is to build a block-reader: 

```python
from functools import partial

with open('mydata.db', 'rb') as f:
    read64 = partial(f.read, 64)
    for block in iter(read64, b''):
        process_block(block)
```

## Iterables Versus Iterators

- **iterable**: Any object from which the `iter` built-in function can obtain an **iterator**. Objects implementing an `__iter__` method returning an iterator are iterable. Sequences are always iterable, as are objects implementing a `__getitem__` method that accepts 0-based indexes.

Python’s standard interface for an iterator has two methods:
- `__next__`: Returns the next item in the series, raising `StopIteration` if there are no more.
- `__iter__`: Returns `self`; this allows iterators to be used where an iterable is expected, for example, in a for loop.

<img src="../images/iterable-iterator.png" style="width: 50%;">.  


In [44]:
from collections.abc import Iterable
from abc import abstractmethod


def _check_methods(C, *methods):
    """
    traverses the `__mro__` of the class to check whether the methods
    are implemented in its base classes.
    """
    mro = C.__mro__
    for method in methods:
        for B in mro:
            if method in B.__dict__:
                if B.__dict__[method] is None:
                    return NotImplemented
                break
        else:
            return NotImplemented
    return True


class Iterator(Iterable):
    __slots__ = ()
    
    @abstractmethod
    def __next__(self):
        'Return the next item from the iterator. When exhausted, raise `StopIteration`'
        raise StopIteration
    
    def __iter__(self):
        return self
    
    @classmethod
    def __subclasshook__(cls, C):  # `__subclasshook__` supports structural type 
                                # checks with isinstance and issub class.
        if cls is Iterator:
            return _check_methods(C, '__iter__', '__next__')
        return NotImplemented
    

False

## Sentence Classes with `__iter__`

### Sentence Take #2: A Classic Iterator

The next `Sentence` implementation follows the blueprint of the classic `Iterator` design pattern from the Design Patterns book.

In [49]:
"""
Sentence: iterate over words using the Iterator Pattern, take #1

WARNING: the Iterator Pattern is much simpler in idiomatic Python;
see: sentence_gen*.py.
"""

# tag::SENTENCE_ITER[]
import re
import reprlib

RE_WORD = re.compile(r'\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return f'Sentence({reprlib.repr(self.text)})'

    def __iter__(self):
        """
        The `__iter__` method is the only addition to the previous Sentence implementation. 
        This version has no `__getitem__`, to make it clear that the class is iterable 
        because it implements `__iter__`
        """
        return SentenceIterator(self.words)  # `__iter__` fulfills the iterable protocol by 
                                            # instantiating and returning an iterator.


class SentenceIterator:

    def __init__(self, words):
        self.words = words  # holds a reference to the list of words
        self.index = 0  # determines the next word to fetch

    def __next__(self):
        try:
            word = self.words[self.index]  # Get the word at self.index
        except IndexError:
            raise StopIteration()  # If there is no word at self.index, raise StopIteration
        self.index += 1  # Increment self.index
        return word  # Return the word

    def __iter__(self):  # Implement self.__iter__
        return self
# end::SENTENCE_ITER[]

def main():
    import sys
    import warnings

    word_number = 1
    with open('test.txt', 'rt', encoding='utf-8') as text_file:
        s = Sentence(text_file.read())
    for n, word in enumerate(s, 1):
        if n == word_number:
            print(word)
            break
    else:
        warnings.warn(f'last word is #{n}, {word!r}')

main()

Hello


### Don’t Make the Iterable an Iterator for Itself

<span style="color:orange">***A common cause of errors in building iterables and iterators is to confuse the two. To be clear: iterables have an `__iter__` method that instantiates a new iterator every time. Iterators implement a `__next__` method that returns individual items, and an `__iter__` method that returns `self`.***</span>

The “Applicability” section about the Iterator design pattern in the "Design Patterns" book says: Use the Iterator pattern
- to access an aggregate object’s contents without exposing its internal representation.
- to support multiple traversals of aggregate objects.
- to provide a uniform interface for traversing different aggregate structures (that is, to support polymorphic iteration).

To “support multiple traversals,” it must be possible to obtain multiple independent iterators from the same iterable instance, and each iterator must keep its own internal state, so a proper implementation of the pattern requires each call to iter(`my_iterable`) to create a new, independent, iterator. That is why we need the `SentenceIterator` class in this example.

### Sentence Take #3: A Generator Function

A Pythonic implementation of the same functionality uses a generator, avoiding all the work to implement the `SentenceIterator` class.

In [None]:
"""
Sentence: iterate over words using a generator function
"""

# tag::SENTENCE_GEN[]
import re
import reprlib

RE_WORD = re.compile(r'\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        for word in self.words:  # Iterate over self.words.
            # Yield the current word. Explicit return is not necessary; the function can
            # just “fall through” and return automatically. Either way, a generator function 
            # doesn’t raise `StopIteration`: it simply exits when it’s done producing values
            yield word  

# done! No need for a separate iterator class!

# end::SENTENCE_GEN[]

## How a Generator Works
Any Python function that has the `yield` keyword in its body is a generator function: a function which, when called, returns a generator object. In other words, a generator function is a generator factory.

In [50]:
def gen_123():
    yield 1
    yield 2
    yield 3

`gen_123` is a function object

In [51]:
gen_123

<function __main__.gen_123()>

But when invoked, `gen_123()` returns a generator object

In [52]:
gen_123()

<generator object gen_123 at 0x7f2c7fe13320>

Generator objects implement the `Iterator` interface, so they are also iterable

In [53]:
for i in gen_123():
    print(i)

1
2
3


In [54]:
g = gen_123()

Because `g` is an iterator, calling `next(g)` fetches the next item produced by `yield`. When the generator function returns, the generator object raises `StopIteration`.

In [55]:
next(g)

1

In [56]:
next(g)

2

In [57]:
next(g)

3

In [58]:
next(g)

StopIteration: 

Learn more about `yield`

In [60]:
def gen_AB():
    print('start')
    yield 'A'
    print('continue')
    yield 'B'
    print('end.')

gab = gen_AB()
next(gab)

start


'A'

In [61]:
next(gab)

continue


'B'

In [62]:
next(gab)

end.


StopIteration: 

In [63]:
for c in gen_AB():
    print('-->', c)

start
--> A
continue
--> B
end.


<span style="color:green">*That second version of `Sentence` is more concise than the first, but it’s not as lazy as it could be. Nowadays, laziness is considered a good trait, at least in programming languages and APIs. A lazy implementation postpones producing values to the last possible moment. This saves memory and may avoid wasting CPU cycles, too.*</span>

## Lazy Sentences

#### Sentence Take #4: Lazy Generator
The `Iterator` interface is designed to be lazy: `next(my_iterator)` yields one item at a time. The opposite of lazy is eager: lazy evaluation and eager evaluation are technical terms in programming language theory.