# Iterables and Iterators: Going Loopy With Python

__[NOTE: this code is Python 3 and will induce unexpected errors on a Python 2 interpreter]__

As a Python programmer you may have heard the terms _iterable_ and _iterator_ and wondered
what the difference between them is.
I've been asking people that question in telephone interviews recently, thinking
that it was a good way to discriminate between more- and less-experienced Python users.
Then I discovered that people I regard as extremely competent Pythonistas still
had some confusion about the issue, so I thought I'd try and help clear up that confusion.
(I'm also going to be asking it rather later in the interview process).

The shortest way I can think of to describe the essence is that an iterable is something you
can iterate over any number of times, whereas an iterator is something you can iterate
over once. Many objects in Python are iterables - lists, dicts, strings, and so on.
But iterables aren't iterators!

The two are closely related: each time you iterate over an iterable, the interpreter
actually creates a new iterator from the iterable, and loops over that.
The mechanism is quite simple, but until you understand the details it
can seem a little blurry.
Let's begin by defining a simple function that can be used as a loop body, and
a sample iterable (all Python containers are iterables - remember this if you have to write a container type).

In [1]:
def do_something_with(o):
    print("---", o, "---")

test_list = ["Roberta", "Tom", "Alice"]

## Iteration History

Originally (well, certainly in Python 1.5), `for` loop iterations over an object `x` were quite simplistic.
The interpeter would internally create a hidden integer variable,
then repeatedly index `x` using the hidden variable as an index
(by calling `x`'s `__getitem__` method with the hidden variable as an argument),
incrementing it to produce successive values
until the call produced an `IndexError` exception, thereby causing the loop to terminate.

Internally, then, iterations like

    for i in test_list:
        do_something_with(x)

were handled by something like the C equivalent of the following code.
Because this article is for Python users,
rather than show you the C code (Python _is_ open source, remember),
I transcribed it into Python so you can understand the logic.

In [2]:
_private_var = 0
while True:
    try:
        i = test_list.__getitem__(_private_var)
    except IndexError:
        break
    do_something_with(i)
    _private_var += 1       

--- Roberta ---
--- Tom ---
--- Alice ---


The mechanism, which we can think of as the _old iteration protocol_, was easy to understand
but only worked for objects that could be numerically indexed (tuples, lists, and other
sequence types).

### Writing an Old-Style Iterable

To maintain backwards compatibility, objects from that era are still supported.
It's easy to verify this by creating your own class whose instances obey the old protocol.

In [3]:
class Stars():
    "Class with only __init__ and __getitem__."
    def __init__(self, N):
        self.N = N
    def __getitem__(self, index):
        print("Getting item:", index)
        if index > self.N:
            raise IndexError
        return "*" * index

s = Stars(4)

for v in s:
    do_something_with(v)

Getting item: 0
---  ---
Getting item: 1
--- * ---
Getting item: 2
--- ** ---
Getting item: 3
--- *** ---
Getting item: 4
--- **** ---
Getting item: 5


As you can see, even a modern interpreter is perfectly happy to iterate over your `Stars` instances,
old-fashioned as they are.

## Enter the Iterable

To overcome the limitations of this old protocol, and specifically to allow iteration
over objects that can't be numerically indexed, a newer protocol was
introduced, which works with any _iterable_.

The protocol is quite simple, but not well understood.
When you write code like the following to iterate over an iterable
such as a list.

    for i in test_list: # or some other iterable
        do_something_with(i)

the interpreter begins by calling the iterable's `__iter__` method
to create an _iterator_.
If the object has no `__iter__` method, the interpreter
simply falls back to the old protocol, as the `Stars` class above (which doesn't)
demonstrated.
If there's no `__getitem__` method either, the interpreter just raises a TypeError exception,
on the not unreasonable grounds that there's no way to iterate over the given value.

In [4]:
for i in None:
    do_something_with(i)

TypeError: 'NoneType' object is not iterable

In [5]:
oi = dir(None)
print("__iter__" in oi, "__getitem__" in oi)

False False


Each time through the loop, the interpreter extracts the next value from the iterator
by calling its `__next__` method (Python 2 contained a design
flaw and the method is called `next`, failing to denote it as a special method.
It was renamed in Python 3).
In the case above, the results of the `__next__` call are
successively bound to `i`, until `__next__` raises a `StopIteration` exception,
which is used to terminate the loop normally - the exception is caught internally by
the interpreter's `for` implementation, and not passed to the user's code.

What actually happens is the equivalent of the following code, although no
new variable is introduced into the Python namespace.

In [6]:
_ = test_list.__iter__()  # creates an iterator
while True:
    try:
        i = _.__next__()  # Python 2: _.next()
    except StopIteration: # iterator is exhausted
        break
    do_something_with(i)

--- Roberta ---
--- Tom ---
--- Alice ---


The easiest way to determine if the new iteration protocol will work on an object is simply
to see whether it has an `__iter__` method. If it does, then it's an iterable.
Lists are iterables, for example:

In [7]:
hasattr(test_list, "__iter__")

True

So what kind of an object does a call to that method return?
A specific kind of iterator called a _list iterator_.

In [8]:
li = test_list.__iter__()
print(li, type(li))

<list_iterator object at 0x10aea05f8> <class 'list_iterator'>


### Recognizing Iterators

The list iterator object's attributes verify that it is indeed an iterator,
which is simply to say that it provides both the `__iter__` and `__next__` methods.

In [9]:
print(hasattr(li, '__iter__'), hasattr(li, '__next__'))

True True


Note that the list itself, while it _is_ an iterable
(_i.e._ it implements the `__iter__` method, which returns an iterator),
is not itself an iterator because it has no `__next__` method.

In [10]:
print(hasattr(test_list, '__iter__'), hasattr(test_list, '__next__'))

True False


When iterating over an iterable the interpreter calls the iterable's `__iter__()` method,
which returns an _iterator_ whose `__next__` method returns the iterator's successive
values and raises `StopIteration` when there are no more values.

Note carefully that the iterator is not the same object as the iterable,
and that each call to an iterable's `__iter__` method creates a brand-new iterator.

### Iterating over Iterators

Using iterators rather than iterables in nested `for`s gives odd results:
the inner loop exhausts `iterator_2` the first time through the outer loop,
so further attempts to extract a value from it cause immediate termination of the inner loop.

In [11]:
iterator_1 = iter(test_list) # same as test_list.__iter__()
iterator_2 = iter(test_list)
print(id(test_list), id(iterator_1), id(iterator_2), sep="\n")

4478105928
4483498056
4483498168


When you loop over an iterable, the call to the iterable's `__iter__()` method is made
automatically.
Because each call to an iterable's `__iter__` creates a new iterator, nested looping works as we
expect it to.

In [12]:
for i in test_list:
    for j in test_list:
        do_something_with(i + ":" + j)

--- Roberta:Roberta ---
--- Roberta:Tom ---
--- Roberta:Alice ---
--- Tom:Roberta ---
--- Tom:Tom ---
--- Tom:Alice ---
--- Alice:Roberta ---
--- Alice:Tom ---
--- Alice:Alice ---


Using Iterators in the iteration gives us a rather different result, however.
This is because the first iteration of the outer loop exhausts the inner iterator,
which thereafter refuses to provide further values.

In [13]:
iterator_1 = iter(test_list)
iterator_2 = iter(test_list)
for i in iterator_1:
    print("outer loop")
    for j in iterator_2:
        print("inner loop")
        do_something_with(i + ":" + j)

outer loop
inner loop
--- Roberta:Roberta ---
inner loop
--- Roberta:Tom ---
inner loop
--- Roberta:Alice ---
outer loop
outer loop


Using the same iterator in both loops gives even odder results.
In this case, `iterator_1` is partially exhausted even the first time through the
inner loop, which then exhausts it completely. The outer loop then terminates after only
a single iteration becausse the iterator is exhausted.

In [14]:
iterator_1 = iter(test_list)
for i in iterator_1:
    print("outer loop")
    for j in iterator_1:
        print("inner loop")
        do_something_with(i + ":" + j)

outer loop
inner loop
--- Roberta:Tom ---
inner loop
--- Roberta:Alice ---


Why do these looping constructs fail to perform as you might expect?
Whereas the iterable's `__iter__()` returns a new iterator on each call,
an iterator's `__iter__()` method simply returns `self`.

In [15]:
it_3 = iter(test_list)
print(id(test_list),
      id(it_3),
      id(iter(it_3)),
      id(it_3) is id(iter(it_3)), sep="\n")

4478105928
4483637320
4483637320
False


### Why We Need Iterables

Technically, iterators _are_ iterables because they too have
an `__iter__` method.
Most iterators' `__iter__` methods, however, don't create a new object,
but simply return the iterator itself, which can lead to some unexpected
consequences.

You can use the built-in `iter` function which, when called with a single argument\*,
is equivalent to calling that argument's `__iter__` method.
Knowing equivalences like this is the start of understanding how the interpreter works.

Remember that calling the `iter` function (with a single argument) is
equivalent to calling the argument's `__iter_` method.

### `iter(o) == o.__iter__()`

Once you have created an iterator, remember, you can always use the `next` function to
extract a value from the iteration sequence.
Similarly to `iter` the `next` function simply calls its argument's `__next__` method.
Which is to say

### `next(o) == o.__next__()` 
or, in Python 2:
### `next(o) == o.next()`

This is completely in line with Python's usual practice of providing utility functions to all standard methods,
allowing you to provide tailored responses in your own objects by writing the methods.

### Iterators Aren't a Silver Bullet

It's possible to use iterators directly using explicit calls to `next`
rather than using the interpreter's `for`-handling logic.
But it's important to be aware that this can raise a `StopIteration` exception
that won't be handled by the iteration logic, because the exception
doesn't occur during the operation of the `for` logic but inside the loop body.

In [16]:
it_4 = iter(["one", "two", "three", "four"])
it_5 = iter(["one", "two", "three", "four", "five"])
for iterator in it_4, it_5:
    print("++ New iterator ++")
    for item_1 in iterator:
        item_2 = next(iterator)
        do_something_with(item_1+":"+item_2)

++ New iterator ++
--- one:two ---
--- three:four ---
++ New iterator ++
--- one:two ---
--- three:four ---


StopIteration: 

## Writing Your Own Iterators and Iterables

We've observed that iterators have both an `__iter__` and a `__next__` method,
while iterables only have the former.
This is why iterables aren't iterators.
It also allows us to write functions to identify both types of object.

In [17]:
def is_iterable(o):
    "Return True if o is an iterable."
    return hasattr(o, "__iter__") and not hasattr(o, "__next__")

def is_iterator(o):
    "Return True if o is an iterator."
    return hasattr(o, "__iter__") and hasattr(o, "__next__")

In [18]:
test_it = iter(test_list)
is_iterable(test_list), is_iterator(test_list), is_iterable(test_it), is_iterator(test_it)

(True, False, False, True)

This lets us investigate various object types to determine their iteration properties.
As you can see, though iterators are iterable, iterables aren't iterators.
Now you understand the iteration protocols, you might be wondering how to write your
own iterables and iterators.


### The Basic Iterator Pattern

Let's begin with iterators: despite the slightly more complex interface it's easier
to start there, because the objects you produce only have to handle a single
iteration, whereas iterables have to handle multiple, possibly simultaneous iterations.

The first example is not intended to be complex: iterate over a string, producing each character N times.
I chose this example because it isn't particularly convenient to write: essentially
a nested loop is required, but the "position in the loop" has to be maintained in
insstance variables between calls to `__next__`.

In [19]:
class MyIterator:
    "An iterator to produce each character of a string N times."
    def __init__(self, s, N):
        self.s = s
        self.N =  N
        self.pos = self.count = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.pos == len(self.s):
            raise StopIteration
        result = self.s[self.pos]
        self.count += 1
        if self.count == self.N:
            self.pos += 1
            self.count = 0
        return result

The `__iter__` method seems so simple you might wonder why it's there at all.
The answer is obvious once you understand the iteration protocol: to make it iterable.

If iterators weren't iterable there'd be no way to iterate over them using the `for`
construct, since the first thing the interpreter does to iterate over an iterable
is to generate an iterator from it, by calling its `__iter__` method.

In [20]:
for s in MyIterator("abc", 2):
    do_something_with(s)

--- a ---
--- a ---
--- b ---
--- b ---
--- c ---
--- c ---


As you might expect, if you use these iterators in a nested loop you
get the usual funky results.

In [21]:
it_6 = MyIterator("*+", 3)
it_7 = MyIterator("=-", 3)
for c1 in it_6:
    print("iterating over c1:", c1)
    for c2 in it_7:
        do_something_with(c1+":"+c2)

iterating over c1: *
--- *:= ---
--- *:= ---
--- *:= ---
--- *:- ---
--- *:- ---
--- *:- ---
iterating over c1: *
iterating over c1: *
iterating over c1: +
iterating over c1: +
iterating over c1: +


In that case we just used the two iterators in nested loops.
The great thing about iterators is that they allow you to compute
the next value in a sequence _when it's required_, rather than having
not only to precompute all the values in advance _and_ create a container
to store them all in.

That's why the new iteration protocol was such a win for Python.
Any function that takes an iterable argument can be passed either
an iterable _or_ an iterator.
Pretty much any function that iterates over a container can be part of
a program structure where values are computed on demand.

### The Advantages of Generators

It's often inconvenient to write a class to create iterators.
The next snippet creates a so-called _generator function_, because its body
contains at least one `yield` expression.

In [22]:
def rangedown(n):
    for i in reversed(range(n)):
        yield i

A call to the generator function returns a generator.
You'll notice that this is an iterator, but not an iterable.

In [23]:
generator = rangedown(5)

type(generator), is_iterable(generator), is_iterator(generator)

(generator, False, True)

Generator functions are often a more convenient way to create iterators
than class definitions, since the iteration protocol conformance is built-in.
Just call the function and it returns a generator.
Since generators are a special case of iterators,
the job is done without any need for a class definition.

In [24]:
for x in generator:
    print(x)

4
3
2
1
0


The next cell creates a _generator expression_, another object you may not
be familiar with.
Its value is also a generator, and therefore an iterator too.
Generator expressions are an even more powerful way of producing iterators.

In [25]:
genexp = (i*2 for i in range(5))
type(rangedown), type(generator), type(genexp)

(function, generator, generator)

So we can iterate (once) over `genexp` just like any other iterator.

In [26]:
for o in genexp:
    print(o)

0
2
4
6
8


### The Basic Iterable Pattern

So the next question is, how do we produce an _iterable_?
The answer is to have its `__iter__` method produce a new iterator each time it's called.
Suppose you wanted to create _multi-iterable strings_,
a type of object I have just invented, which behave pretty much like strings except
that when you iterate over them they produce each character a number of times?

In [27]:
class MIString(str):
    def __new__(cls, value, N):
        return str.__new__(cls, value)
    def __init__(self, value, N):
        self.N = N
    def __iter__(self):
        return MyIterator(self, self.N)

The `__new__` method has to be provided because otherwise it would be inherited from `str`,
which would cause a signature mismatch; so it returns exactly what `str(value)` would
return.
The overridden `__init__`, however, adds an extra attribute to the object, and the
overridden `__iter__` method returns a newly-created `MyIterator` object.

In [28]:
[s for s in MIString("xyz", 3)]

['x', 'x', 'x', 'y', 'y', 'y', 'z', 'z', 'z']

Because the `__iter__` method always returns a new iterator, `MIString` objects are true
iterables, and work as expected in nested loops.

In [29]:
x = MIString("01", 2)
for c1 in x:
    for c2 in x:
        print(c1, c2)

0 0
0 0
0 1
0 1
0 0
0 0
0 1
0 1
1 0
1 0
1 1
1 1
1 0
1 0
1 1
1 1


Because the `__new__` method returns an `str` object (which is then modified by the `__init__`
method) the objects behave exactly like strings except when iterated over.

In [30]:
print(x, x+x, 3*x)

01 0101 010101


Our functions tell us what is what.

In [31]:
is_iterable(x), is_iterator(x), is_iterable(iter(x)), is_iterator(iter(x))

(True, False, False, True)

One of the disadvantages of class-based iterators is the inconvenience of having
to maintain state inside the instances.
As you saw in the case of the `MyIterator` class, this ledd to having to write
loops manually.
In the `MIString` class we used an `__iter__` method that returned a `MyIterator` instance,
so the complexity was hidden (though without a separate definition we might have had to
implement a nested class).
Lets see how we could achieve the same end with a generator function (strictly, I suppose,
a generator method).

In [32]:
class MIString2(str):
    def __new__(cls,value, N):
        return str.__new__(cls, value)
    def __init__(self, value, N):
        self.N = N
    def __iter__(self):
        for c in str(self):
            for i in range(self.N):
                yield c

In this implementation the `__iter__` method defines a generator function and
returns the result of calling it (_i.e._ an iterator, as we shall shortly confirm).
The only complexity is that the function can't iterate over `self` because that would
create an infintely recursive call to the `__iter__` method.

Of course the implementation could as easily have been external, but the internal
definition easily picks up the instance variables because the instance (`self`) is a member
of its closure.
Ultimately `MIString2` works in exactly the same way as `MIString`, but its
internal logic is considerably simpler.
Don't ignore generator functions, they can be very useful indeed!

In [33]:
[c for c in MIString2("abcde", 3)]

['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'e', 'e', 'e']

### Python Iterables

Now let's apply our functions to determine whether we've created any iterables and/or
iterators.
You will see that nothing is iterable, but both generators are iterators.
In fact a generator is just a particular form of iterator.

In [34]:
for function in is_iterable, is_iterator:
    for value in rangedown, generator, genexp:
        print(function.__name__, value, ":", function(value))

is_iterable <function rangedown at 0x109f8c158> : False
is_iterable <generator object rangedown at 0x10b3fd5a0> : False
is_iterable <generator object <genexpr> at 0x10b3fd8b8> : False
is_iterator <function rangedown at 0x109f8c158> : False
is_iterator <generator object rangedown at 0x10b3fd5a0> : True
is_iterator <generator object <genexpr> at 0x10b3fd8b8> : True


Dicts, tuples, sets and strings are all iterables.

In [35]:
is_iterable({}), is_iterable(()), is_iterable(set()), is_iterable("")

(True, True, True, True)

Be careful with list comprehensions.
Although they are essentially generator expressions surrounded by brackets,
this doesn't mean that you can simply put brackets round a generator (or, indeed, any other
iterator) - you just get a list containing the iterator!
The secret is to omit the parentheses around the generator expression - this specific syntax
is what constitues a comprehension.

In [36]:
[i for i in range(10)], [(i for i in range(10))]

([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [<generator object <genexpr> at 0x10b3fdaf8>])

Some functions iterate over their argument, which can therefore be either an
iterable _or_ an iterator (though remember that an iterator will be exhausted by the call).
`print` is not one of those functions, but `list` _is_.
Though this example uses list comprehensions, the same thing holds true for dict and
set comprehensions as well.

In [37]:
genexp = (i*2 for i in range(5))
print(i for i in range(5))
print((i for i in range(5)))
print([genexp])
print([(i for i in range(5))])
print([i for i in range(5)])
print(list(i for i in range(5)))
print(list(genexp))
print(list(genexp))

<generator object <genexpr> at 0x10b3fd678>
<generator object <genexpr> at 0x10b3fdbd0>
[<generator object <genexpr> at 0x10b3fd708>]
[<generator object <genexpr> at 0x10b3fd9d8>]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
[0, 2, 4, 6, 8]
[]


## In Conclusion

I hope you've enjoyed this little excursion into one of the more esoteric aspects of the
Python language, and that you'll never again be confused by the difference between an
iterable and an iterator.
<hr>

\* When you call the `iter` function with two arguments the first argment _must_ be
a callable, and the second argument is treated as a _sentinel value_.
The callable is called repeatedly until the iterator produces
the sentinel value, when iteration ceases.