## Iterators

Python and many other programming languages provide a unified way to process elements of a container value sequentially, called an `iterator`. An `iterator` is an object that provides sequential access to values, one by one.

The iterator abstraction has two components: a mechanism for retrieving the next element in the sequence being processed and a mechanism for signaling that the end of the sequence has been reached and no further elements remain. For any container, such as a list or range, an iterator can be obtained by calling the built-in iter function. The contents of the iterator can be accessed by calling the built-in next function.

In [13]:
primes = [2, 3, 5, 7]
type(primes)

list

In [14]:
iterator = iter(primes)

In [15]:
type(iterator)

list_iterator

In [16]:
next(iterator)

2

In [17]:
next(iterator)

3

The way that Python signals that there are no more values available is to raise a `StopIteration` exception when next is called. This exception can be handled using a try statement.

In [18]:
next(iterator)
next(iterator)
next(iterator)

StopIteration: 

In [19]:
try:
    next(iterator)
except StopIteration:
    print("No more values")

No more values


An iterator maintains local state to represent its position in a sequence. Each time `next` is called, that position advances. Two separate `iterators` can track two different positions in the same sequence. However, two names for the same iterator will share a position, because they share the same value.

In [20]:
r = range(3, 13)
s = iter(r)  # 1st iterator over r
next(s)

3

In [21]:
next(s)

4

In [22]:
t = iter(r)   # 2nd iterator over r

In [23]:
next(t)

3

In [24]:
next(t)

4

In [25]:
u = t        # Alternate name for the 2nd iterator
next(u)

5

Advancing the second iterator does not affect the first. Since the last value returned from the first iterator was 4, it is positioned to return 5 next. On the other hand, the second iterator is positioned to return 6 next.

In [26]:
next(s)

5

In [27]:
next(t)

6

Calling iter on an iterator will return that iterator, not a copy. This behavior is included in Python so that a programmer can call iter on a value to get an iterator without having to worry about whether it is an iterator or a container.

In [28]:
v = iter(t)  # Another alterante name for the 2nd iterator
next(v)

7

The usefulness of iterators is derived from the fact that the underlying series of data for an iterator may not be represented explicitly in memory. An iterator provides a mechanism for considering each of a series of values in turn, but all of those elements do not need to be stored simultaneously. Instead, when the next element is requested from an iterator, that element may be computed on demand instead of being retrieved from an existing memory source.

Ranges are able to compute the elements of a sequence lazily because the sequence represented is uniform, and any element is easy to compute from the starting and ending bounds of the range. Iterators allow for lazy generation of a much broader class of underlying sequential datasets, because they do not need to provide access to arbitrary elements of the underlying series. Instead, iterators are only required to compute the next element of the series, in order, each time another element is requested. While not as flexible as accessing arbitrary elements of a sequence (called random access), sequential access to sequential data is often sufficient for data processing applications.

## Iterables

Any value that can produce iterators is called an iterable value. In Python, an iterable value is anything that can be passed to the built-in iter function. Iterables include sequence values such as strings and tuples, as well as other containers such as sets and dictionaries. Iterators are also iterables, because they can be passed to the iter function.

Even unordered collections such as dictionaries must define an ordering over their contents when they produce iterators. Dictionaries and sets are unordered because the programmer has no control over the order of iteration, but Python does guarantee certain properties about their order in its specification.

In [29]:
d = {'one': 1, 'two': 2, 'three': 3}

In [30]:
k = iter(d)
next(k)

'one'

In [31]:
next(k)

'two'

In [32]:
v = iter(d.values())
next(v)

1

If a dictionary changes in structure because a key is added or removed, then all iterators become invalid and future iterators may exhibit arbitrary changes to the order their contents. On the other hand, changing the value of an existing key does not change the order of the contents or invalidate iterators.

In [33]:
d.pop('two')
next(k)

RuntimeError: dictionary changed size during iteration

## Built-in Iterators

Several built-in functions take as arguments iterable values and return iterators. These functions are used extensively for lazy sequence processing.

The map function is lazy: calling it does not perform the computation required to compute elements of its result. Instead, an iterator object is created that can return results if queried using next. We can observe this fact in the following example, in which the call to print is delayed until the corresponding element is requested from the `doubled` iterator.

In [34]:
def double_and_print(x):
    print("***", x, "=>", 2 * x, "***")
    return 2 * x


s = range(3, 7)
doubled = map(double_and_print, s)  # double_and_print not yet called
next(doubled)  # double_and_print called once

*** 3 => 6 ***


6

In [35]:
next(doubled)                       # double_and_print called again

*** 4 => 8 ***


8

In [36]:
list(doubled)                       # double_and_print called twice more

*** 5 => 10 ***
*** 6 => 12 ***


[10, 12]

The `filter` function returns an iterator over, `zip`, and `reversed` functions also return iterators.

## For Statements

The for statement in Python operates on iterators. Objects are iterable (an interface) if they have an `__iter__` method that returns an iterator. Iterable objects can be the value of the `<expression>` in the header of a for statement:

```python
for <name> in <expression>:
    <suite>
```

To execute a for statement, Python evaluates the header `<expression>`, which must yield an iterable value. Then, the `__iter__` method is invoked on that value. Until a StopIteration exception is raised, Python repeatedly invokes the __next__ method on that iterator and binds the result to the `<name>` in the for statement. Then, it executes the `<suite>`.

In [37]:
counts = [1, 2, 3]
for item in counts:
    print(item)

1
2
3


In the above example, the counts list returns an iterator from its `__iter__()` method. The for statement then calls that iterator's `__next__()` method repeatedly, and assigns the returned value to item each time. This process continues until the iterator raises a StopIteration exception, at which point execution of the for statement concludes.

With our knowledge of iterators, we can implement the execution rule of a for statement in terms of while, assignment, and try statements.

In [38]:
items = counts.__iter__()
try:
    while True:
        item = items.__next__()
        print(item)
except StopIteration:
    pass

1
2
3


Above, the iterator returned by invoking the `__iter__` method of counts is bound to a name items so that it can be queried for each element in turn. The handling clause for the `StopIteration` exception does nothing, but handling the exception provides a control mechanism for exiting the while loop.

To use an iterator in a for loop, the iterator must also have an __iter__ method. The Iterator types (http://docs.python.org/3/library/stdtypes.html#iterator-types) section of the Python docs suggest that an iterator have an `__iter__` method that returns the iterator itself, so that all iterators are iterable.

## Generators and Yield Statements

The Letters and Positives objects above require us to introduce a new field self.current into our object to keep track of progress through the sequence. With simple sequences like those shown above, this can be done easily. With complex sequences, however, it can be quite difficult for the `__next__` method to save its place in the calculation. Generators allow us to define more complicated iterations by leveraging the features of the Python interpreter.

A generator is an iterator returned by a special class of function called a generator function. Generator functions are distinguished from regular functions in that rather than containing return statements in their body, they use yield statement to return elements of a series.

Generators do not use attributes of an object to track their progress through a series. Instead, they control the execution of the generator function, which runs until the next yield statement is executed each time the generator's `__next__` method is invoked. The Letters iterator can be implemented much more compactly using a generator function.