# Iterables and Iterators
### _or_
## Round and Round the Mulberry Bush

As a Python programmer you may have heard the terms _iterable_ and _iterator_ and wondered
what the difference between them is.
I started asking people that question in telephone interviews a while back, thinking
that it was a good way to discriminate between more- and less-experienced Python users.
Then I discovered that people I regard as extremely competent Pythonistas still
had some confusion about the issue, so I wrote this to try and help clear up that confusion.

The shortest way I can think of to describe the essentials of an iterable is "something you
can iterate over any number of times," whereas an iterator is "something you can iterate
over once." Many objects in Python are iterables - lists, dicts, strings, and so on.
But iterables aren't iterators!

The two are closely related: each time you iterate over an iterable, the interpreter
actually creates a new iterator for the iteration, and loops over that.
The mechanism is quite simple, and understandig the details helps you write
better code.

Let's begin by defining a simple function that can be used as a loop body, and
a sample iterable (all Python containers are iterables - remember this if you
have to write a container type).

In [1]:
def do_something_with(o):
    print(f"--- {o} ---")

test_list = ["Roberta", "Tom", "Alice"]

## How Iteration Started Out

It helps to remember that

     a = x[s]

is in Python merely a (very welcome and higly comprehensible) shorthand for

    a = x.__getitem(s)

Originally (well, certainly in Python 1.5), `for` loop iterations over an object `x` were quite simplistic.

The interpeter would internally initialise a hidden integer variable to zero,
then repeatedly index `x` using the hidden variable as an index
(by calling `x`'s `__getitem__` method with the hidden variable as an argument).

The hidden variable was incremented to produce successive values
until the `__getitem__` call produced an `IndexError` exception,
causing the loop to terminate normally.

Internally, then, an iteration like

    for i in test_list:
        do_something_with(i)

would be handled by something like the C equivalent of the following code.
(This article is for Python users so it's in Python to explain the logic.
Python is open source, so if you want to read the C source code of the actual interpreter you can).

In [3]:
_private_var = 0
while True:
    try:
        i = test_list.__getitem__(_private_var)
    except IndexError:
        break
    do_something_with(i)
    _private_var += 1       

--- Roberta ---
--- Tom ---
--- Alice ---


This was Python's original _iteration protocol_/.

It was easy to understand, but worked only for objects that could be
numerically subscripted, making it possible to iterate over tuples,
lists, and other sequence types.

To iterate over a dictionary, however, required you to extract a list
of its keys and then iterate over that instead.

### Writing an Original-Style Iterable

Python's emphasis on backwards compatibility means that this protocol is still supported.

You can verify this by writing your own class whose instances obey the old protocol.

In [4]:
class Stars():
    "Class with only __init__ and __getitem__."
    def __init__(self, N):
        self.N = N
    def __getitem__(self, index):
        print("Getting item:", index)
        if index > self.N:
            raise IndexError
        return "*" * index

s = Stars(4)

for v in s:
    do_something_with(v)

Getting item: 0
---  ---
Getting item: 1
--- * ---
Getting item: 2
--- ** ---
Getting item: 3
--- *** ---
Getting item: 4
--- **** ---
Getting item: 5


As you can see, even a modern interpreter is perfectly happy to iterate over your `Stars` instances,
old-fashioned or not.

## Enter the Iterable

To overcome the limitations of this old protocol, and specifically to allow iteration
over objects that can't be numerically indexed, a newer protocol was
introduced and objects that obeyed it were classified as _iterable_.

To use the new-style object (_i.e._ to be an an iterable) an object _must_ have an `__iter__` method.

As the next cell shows, lists have been updated to use the new protocol.

In [6]:
hasattr([], '__iter__')

True

A natural question is "what type of object does the `__iter__` call return?"

In [8]:
x = [3, 1, 2].__iter__()
x

<list_iterator at 0x110f984e0>

For the newer-style iterables the `__iter__` method returns an _iterator_
for use by this particular iteration. To execute the code

    for i in test_list: # or some other iterable
        do_something_with(i)

the interpreter uses the iterator in the following way.

In [9]:
_iterator = test_list.__iter__()
try:
    while True:
        i = _iterator.__next__()
        do_something_with(i)
except StopIteration:
    pass

--- Roberta ---
--- Tom ---
--- Alice ---


Originally, the interpreter made repeated calls to `__getitem__` until `IndexError` was raised,
In the new protocol, the interpreter instead repeatedly calls an iterator's `__next__` method
(in Python 2, its `next` method) until `StopIteration` is raised.

This removes any reliance on numerical subscripting, allowing dictionaries
and sets to become iterable (which they duly did in Python 2.4, if memory
serves correctly).

If the object has no `__iter__` method, the interpreter
simply falls back to the old protocol.
That's why the `Stars` class above (for which `__iter__` is not implemented),
functions as expected.

If there's no `__getitem__` method either, the interpreter just raises a TypeError exception,
on the not unreasonable grounds that there's no way to iterate over the given value.

In [14]:
print(hasattr(None, "__iter__"), hasattr(None, "__getitem__"))

False False


In [15]:
for i in None:
    do_something_with(i)

TypeError: 'NoneType' object is not iterable

Each time through the loop, the interpreter extracts the next value from the iterator
by calling its `__next__` method (Python 2 contained a design
flaw and the method is called `next`, failing to denote it as a special method.
It was renamed in Python 3).
In the case above, the results of the `__next__` call are
successively bound to `i`, until `__next__` raises a `StopIteration` exception,
which is used to terminate the loop normally - the exception is caught internally by
the interpreter's `for` implementation, and not passed to the user's code.

What actually happens is the equivalent of the following code, although no
new variable is introduced into the Python namespace.

In [None]:
_ = test_list.__iter__()  # creates an iterator
while True:
    try:
        i = _.__next__()  # Python 2: _.next()
    except StopIteration: # iterator is exhausted
        break
    do_something_with(i)

The easiest way to determine if the new iteration protocol will work on an object is simply
to see whether it has an `__iter__` method. If it does, then it's an iterable.
Lists are iterables, for example:

In [None]:
hasattr(test_list, "__iter__")

So what kind of an object does a call to that method return?
A specific kind of iterator called a _list iterator_.

In [None]:
li = test_list.__iter__()
print(li, type(li))

### Recognizing Iterators

The list iterator object's attributes verify that it is indeed an iterator,
which is simply to say that it provides both the `__iter__` and `__next__`
(Python 2, remember, `next`) methods.

In [None]:
print(hasattr(li, '__iter__'), hasattr(li, '__next__'))

Note that the list itself, while it _is_ an iterable
(_i.e._ it implements the `__iter__` method, which returns an iterator),
is not itself an iterator because it has no `__next__` method.

In [None]:
print(hasattr(test_list, '__iter__'), hasattr(test_list, '__next__'))

When iterating over an iterable the interpreter calls the iterable's `__iter__()` method,
which returns an _iterator_ whose `__next__` method returns the iterator's successive
values and raises `StopIteration` when there are no more values.

Note carefully that the iterator is not the same object as the iterable,
and that each call to an iterable's `__iter__` method creates a brand-new iterator.

### Iterating over Iterators

Using iterators rather than iterables in nested `for`s gives odd results:
the inner loop exhausts `iterator_2` the first time through the outer loop,
so further attempts to extract a value from it cause immediate termination of the inner loop.

In [None]:
iterator_1 = iter(test_list) # same as test_list.__iter__()
iterator_2 = iter(test_list)
print(id(test_list), id(iterator_1), id(iterator_2), sep="\n")

When you loop over an iterable, the call to the iterable's `__iter__()` method is made
automatically.
Because each call to an iterable's `__iter__` creates a new iterator, nested looping works as we
expect it to.

In [None]:
for i in test_list:
    for j in test_list:
        do_something_with(i + ":" + j)

Using Iterators in the iteration gives us a rather different result, however.
This is because the first iteration of the outer loop exhausts the inner iterator,
which thereafter refuses to provide further values.

In [None]:
iterator_1 = iter(test_list)
iterator_2 = iter(test_list)
for i in iterator_1:
    print("outer loop")
    for j in iterator_2:
        print("inner loop")
        do_something_with(i + ":" + j)

Using the same iterator in both loops gives even odder results.
In this case, `iterator_1` is partially exhausted even the first time through the
inner loop, which then exhausts it completely. The outer loop then terminates after only
a single iteration becausse the iterator is exhausted.

In [None]:
iterator_1 = iter(test_list)
for i in iterator_1:
    print("outer loop")
    for j in iterator_1:
        print("inner loop")
        do_something_with(i + ":" + j)

Why do these looping constructs fail to perform as you might expect?
Whereas the iterable's `__iter__()` returns a new iterator on each call,
an iterator's `__iter__()` method simply returns `self`.

In [None]:
it_3 = iter(test_list)
print(id(test_list),
      id(it_3),
      id(iter(it_3)),
      id(it_3) is id(iter(it_3)), sep="\n")

### Why We Need Iterables

Technically, iterators _are_ iterables because they too have
an `__iter__` method.
Most iterators' `__iter__` methods, however, don't create a new object,
but simply return the iterator itself, which can lead to some unexpected
consequences.

You can use the built-in `iter` function which, when called with a single argument\*,
is equivalent to calling that argument's `__iter__` method.
Knowing equivalences like this is the start of understanding how the interpreter works.

Remember that calling the `iter` function (with a single argument) is
equivalent to calling the argument's `__iter_` method.

### `iter(o) == o.__iter__()`

Once you have created an iterator, remember, you can always use the `next` function to
extract a value from the iteration sequence.
Similarly to `iter` the `next` function simply calls its argument's `__next__` method.
Which is to say

### `next(o) == o.__next__()` 
or, in Python 2:
### `next(o) == o.next()`

This is completely in line with Python's usual practice of providing utility functions to all standard methods,
allowing you to provide tailored responses in your own objects by writing the methods.

### Iterators Aren't a Silver Bullet

It's possible to use iterators directly using explicit calls to `next`
rather than using the interpreter's `for`-handling logic.
But it's important to be aware that this can raise a `StopIteration` exception
that won't be handled by the iteration logic, because the exception
doesn't occur during the operation of the `for` logic but inside the loop body.

In [None]:
it_4 = iter(["one", "two", "three", "four"])
it_5 = iter(["one", "two", "three", "four", "five"])
for iterator in it_4, it_5:
    print("++ New iterator ++")
    for item_1 in iterator:
        item_2 = next(iterator)
        do_something_with(item_1+":"+item_2)

## Writing Your Own Iterators and Iterables

We've observed that iterators have both an `__iter__` and a `__next__` method,
while iterables only have the former.
This is why iterables aren't iterators.
It also allows us to write functions to identify both types of object.

In [None]:
def is_iterable(o):
    "Return True if o is an iterable."
    return hasattr(o, "__iter__") and not hasattr(o, "__next__")

def is_iterator(o):
    "Return True if o is an iterator."
    return hasattr(o, "__iter__") and hasattr(o, "__next__")

In [None]:
test_it = iter(test_list)
is_iterable(test_list), is_iterator(test_list), is_iterable(test_it), is_iterator(test_it)

This lets us investigate various object types to determine their iteration properties.
As you can see, though iterators are iterable, iterables aren't iterators.
Now you understand the iteration protocols, you might be wondering how to write your
own iterables and iterators.


### The Basic Iterator Pattern

Let's begin with iterators: despite the slightly more complex interface it's easier
to start there, because the objects you produce only have to handle a single
iteration, whereas iterables have to handle multiple, possibly simultaneous iterations.

The first example is not intended to be complex: iterate over a string, producing each character N times.
I chose this example because it isn't particularly convenient to write: essentially
a nested loop is required, but the "position in the loop" has to be maintained in
insstance variables between calls to `__next__`.

In [None]:
class MyIterator:
    "An iterator to produce each character of a string N times."
    def __init__(self, s, N):
        self.s = s
        self.N =  N
        self.pos = self.count = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.pos == len(self.s):
            raise StopIteration
        result = self.s[self.pos]
        self.count += 1
        if self.count == self.N:
            self.pos += 1
            self.count = 0
        return result

The `__iter__` method seems so simple you might wonder why it's there at all.
The answer is obvious once you understand the iteration protocol: to make it iterable.

If iterators weren't iterable there'd be no way to iterate over them using the `for`
construct, since the first thing the interpreter does to iterate over an iterable
is to generate an iterator from it, by calling its `__iter__` method.

In [None]:
for s in MyIterator("abc", 2):
    do_something_with(s)

As you might expect, if you use these iterators in a nested loop you
get the usual funky results.

In [None]:
it_6 = MyIterator("*+", 3)
it_7 = MyIterator("=-", 3)
for c1 in it_6:
    print("iterating over c1:", c1)
    for c2 in it_7:
        do_something_with(c1+":"+c2)

In that case we just used the two iterators in nested loops.
The great thing about iterators is that they allow you to compute
the next value in a sequence _when it's required_, rather than having
not only to precompute all the values in advance _and_ create a container
to store them all in.

That's why the new iteration protocol was such a win for Python.
Any function that takes an iterable argument can be passed either
an iterable _or_ an iterator.
Pretty much any function that iterates over a container can be part of
a program structure where values are computed on demand.

### The Advantages of Generators

It's often inconvenient to write a class to create iterators.
The next snippet creates a so-called _generator function_, because its body
contains at least one `yield` expression.

In [None]:
def rangedown(n):
    for i in reversed(range(n)):
        yield i

A call to the generator function returns a generator.
You'll notice that this is an iterator, but not an iterable.

In [None]:
generator = rangedown(5)

type(generator), is_iterable(generator), is_iterator(generator)

Generator functions are often a more convenient way to create iterators
than class definitions, since the iteration protocol conformance is built-in.
Just call the function and it returns a generator.
Since generators are a special case of iterators,
the job is done without any need for a class definition.

In [None]:
for x in generator:
    print(x)

The next cell creates a _generator expression_, another object you may not
be familiar with.
Its value is also a generator, and therefore an iterator too.
Generator expressions are an even more powerful way of producing iterators.

In [None]:
genexp = (i*2 for i in range(5))
type(rangedown), type(generator), type(genexp)

So we can iterate (once) over `genexp` just like any other iterator.

In [None]:
for o in genexp:
    print(o)

### The Basic Iterable Pattern

So the next question is, how do we produce an _iterable_?
The answer is to have its `__iter__` method produce a new iterator each time it's called.
Suppose you wanted to create _multi-iterable strings_,
a type of object I have just invented, which behave pretty much like strings except
that when you iterate over them they produce each character a number of times?

In [None]:
class MIString(str):
    def __new__(cls, value, N):
        return str.__new__(cls, value)
    def __init__(self, value, N):
        self.N = N
    def __iter__(self):
        return MyIterator(self, self.N)

The `__new__` method has to be provided because otherwise it would be inherited from `str`,
which would cause a signature mismatch; so it returns exactly what `str(value)` would
return.
The overridden `__init__`, however, adds an extra attribute to the object, and the
overridden `__iter__` method returns a newly-created `MyIterator` object.

In [None]:
[s for s in MIString("xyz", 3)]

Because the `__iter__` method always returns a new iterator, `MIString` objects are true
iterables, and work as expected in nested loops.

In [None]:
x = MIString("01", 2)
for c1 in x:
    for c2 in x:
        print(c1, c2)

Because the `__new__` method returns an `str` object (which is then modified by the `__init__`
method) the objects behave exactly like strings except when iterated over.

In [None]:
print(x, x+x, 3*x)

Our functions tell us what is what.

In [None]:
is_iterable(x), is_iterator(x), is_iterable(iter(x)), is_iterator(iter(x))

One of the disadvantages of class-based iterators is the inconvenience of having
to maintain state inside the instances.
As you saw in the case of the `MyIterator` class, this ledd to having to write
loops manually.
In the `MIString` class we used an `__iter__` method that returned a `MyIterator` instance,
so the complexity was hidden (though without a separate definition we might have had to
implement a nested class).
Lets see how we could achieve the same end with a generator function (strictly, I suppose,
a generator method).

In [None]:
class MIString2(str):
    def __new__(cls,value, N):
        return str.__new__(cls, value)
    def __init__(self, value, N):
        self.N = N
    def __iter__(self):
        for c in str(self):
            for i in range(self.N):
                yield c

In this implementation the `__iter__` method defines a generator function and
returns the result of calling it (_i.e._ an iterator, as we shall shortly confirm).
The only complexity is that the function can't iterate over `self` because that would
create an infintely recursive call to the `__iter__` method.

Of course the implementation could as easily have been external, but the internal
definition easily picks up the instance variables because the instance (`self`) is a member
of its closure.
Ultimately `MIString2` works in exactly the same way as `MIString`, but its
internal logic is considerably simpler.
Don't ignore generator functions, they can be very useful indeed!

In [None]:
[c for c in MIString2("abcde", 3)]

### Python Iterables

Now let's apply our functions to determine whether we've created any iterables and/or
iterators.
You will see that nothing is iterable, but both generators are iterators.
In fact a generator is just a particular form of iterator.

In [None]:
for function in is_iterable, is_iterator:
    for value in rangedown, generator, genexp:
        print(function.__name__, value, ":", function(value))

Dicts, tuples, sets and strings are all iterables.

In [None]:
is_iterable({}), is_iterable(()), is_iterable(set()), is_iterable("")

Be careful with list comprehensions.
Although they are essentially generator expressions surrounded by brackets,
this doesn't mean that you can simply put brackets round a generator (or, indeed, any other
iterator) - you just get a list containing the iterator!
The secret is to omit the parentheses around the generator expression - this specific syntax
is what constitues a comprehension.

In [None]:
[i for i in range(10)], [(i for i in range(10))]

Some functions iterate over their argument, which can therefore be either an
iterable _or_ an iterator (though remember that an iterator will be exhausted by the call).
`print` is not one of those functions, but `list` _is_.
Though this example uses list comprehensions, the same thing holds true for dict and
set comprehensions as well.

In [None]:
genexp = (i*2 for i in range(5))
print(i for i in range(5))
print((i for i in range(5)))
print([genexp])
print([(i for i in range(5))])
print([i for i in range(5)])
print(list(i for i in range(5)))
print(list(genexp))
print(list(genexp))

## In Conclusion

I hope you've enjoyed this little excursion into one of the more esoteric aspects of the
Python language, and that you'll never again be confused by the difference between an
iterable and an iterator.
<hr>

\* When you call the `iter` function with two arguments the first argment _must_ be
a callable, and the second argument is treated as a _sentinel value_.
The callable is called repeatedly until the iterator produces
the sentinel value, when iteration ceases.