# Iterables and Iterators
### _or_




# Round and Round the Mulberry Bush
-


### Steve Holden

As a Python programmer you may have heard the terms _iterable_ and _iterator_ and wondered
what the difference between them is.
I started asking people that question in telephone interviews a while back, thinking
that it was a good way to discriminate between more- and less-experienced Python users.
Then I discovered that people I regard as extremely competent Pythonistas still
had some confusion about the issue, so I wrote this to try and help clear up that confusion.

The shortest way I can think of to describe the essentials of an iterable is "something you
can iterate over any number of times," whereas an iterator is "something you can iterate
over once." Many objects in Python are iterables - lists, dicts, strings, and so on.
But iterables aren't iterators!

The two are closely related: each time you iterate over an iterable, the interpreter
actually creates a new iterator for the iteration, and loops over that.
The mechanism is quite simple, and understandig the details helps you write
better code.

Let's begin by defining a simple function that can be used as a loop body, and
a sample iterable (all Python containers are iterables - remember this if you
have to write a container type).

In [None]:
def do_something_with(o):
    "Acts as a proxy for real work of any kind."
    print("---", o, "---")

test_list = ["Roberta", "Tom", "Alice"]
do_something_with(test_list)

As you'd expect, if you iterate over an object you can do something with each item in that object.

In [None]:
for item in test_list:
    do_something_with(item)

## Iteration History

### Back before you could iterate over dictionaries ... (v1.5.2?)

It helps to remember that

     a = x[s]

is in Python merely a (very welcome and higly comprehensible) shorthand for

    a = x.__getitem(s)

Originally (well, certainly in Python 1.5), `for` loop iterations over an object `x` were quite simplistic.

The interpeter would internally initialise a hidden integer variable to zero,
then repeatedly index `x` using the hidden variable as an index
(by calling `x`'s `__getitem__` method with the hidden variable as an argument).

The hidden variable was incremented to produce successive values
until the `__getitem__` call produced an `IndexError` exception,
causing the loop to terminate normally.

Internally, then, an iteration like

    for i in test_list:
        do_something_with(i)

would be handled by something like the C equivalent of the following code.
(This article is for Python users so it's in Python to explain the logic.
Python is open source, so if you want to read the C source code of the actual interpreter you can).

In [None]:
# How "for i in test_list" used to work (and still can)
_private_var = 0
while True:
    try:
        i = test_list.__getitem__(_private_var)
    except IndexError:
        break
    do_something_with(i)
    _private_var += 1       

This was Python's original iteration protocol/.

It was easy to understand, but worked only for objects that could be numerically subscripted, making it possible to iterate over tuples, lists, and other sequence types.

To iterate over a dictionary, however, required you to extract a list of its keys and then iterate over that instead.

### Writing an _Old-Style_ Iterable

Python's emphasis on backwards compatibility means that this protocol is still supported.

You can verify this by writing your own class whose instances obey the old protocol.

In [None]:
class Stars():
    "Class with only __init__ and __getitem__."
    def __init__(self, N):
        self.N = N
    def __getitem__(self, index):
        if index >= self.N:
            raise IndexError
        print("Getting item:", index)  # trace print
        return "*" * index

s = Stars(3)

for v in s:
    do_something_with(v)

As you can see, even a modern interpreter is perfectly happy to iterate over your `Stars` instances,
old-fashioned or not.

## Enter the _modern-day_ Iterable

To overcome the limitations of this old protocol, and specifically to allow iteration
over objects that can't be numerically indexed, a newer protocol was
introduced and objects that obeyed it were classified as _iterable_.

To use the new-style object (_i.e._ to be an an iterable) an object _must_ have an `__iter__` method.

As the next cell shows, lists have been updated to use the new protocol.

In [None]:
for i in None:
    do_something_with(i)

#### "is not iterable?"

## How does Python “Know” Something is Iterable??

## What can we iterate over, but not subscript?

Here we use a little Python trickery to determine the intersection of the available methods of a tuple (which can be subscripted) and a generator (which can't). Subtracting all the common methods leaves us with the suspect for this new iteration capability.

In [None]:
def g(): yield 42

set(dir(g())) & set(dir(tuple())) - set(dir(object))

## So What does `__iter__` do?

A natural question is "what type of object does the `__iter__` call return?"

In [None]:
tli = test_list.__iter__()
type(tli)

#### It returns an _iterator_ - in this case a list iterator

For the newer-style iterables the `__iter__` method returns an _iterator_
for use by this particular iteration. To execute the code

    for i in test_list: # or some other iterable
        do_something_with(i)

the interpreter uses the iterator as shown in the next section.

#### Coincidentally, this is why you can't iterate over `None`

In [None]:
print("__iter__" in dir(None), "__getitem__" in dir(None))

## How Iteration works today (mostly)

Originally, the interpreter made repeated calls to `__getitem__` until IndexError was raised, In the new protocol, the interpreter instead repeatedly calls an iterator's `__next__` method) until `StopIteration` is raised.

This removes any reliance on numerical subscripting, allowing dictionaries and sets to become iterable (which they duly did in Python 2.4, if memory serves correctly).

If the object has no `__iter__` method, the interpreter simply falls back to the old protocol. That's why the Stars class above (for which `__iter__` is not implemented), functions as expected.

If there's no `__getitem__` method either, the interpreter just raises a TypeError exception, on the not unreasonable grounds that there's no way to iterate over the given value.

Each time through the loop, the interpreter extracts the next value from the iterator
by calling its `__next__` method.
In the case below, the results of the `__next__` call are
successively bound to `_i`, until `__next__` raises a `StopIteration` exception,
which is used to terminate the loop normally - the exception is caught internally by
the interpreter's `for` implementation, and not passed to the user's code.

What actually happens is the equivalent of the following code, although no
new variable is introduced into the Python namespace.

In [None]:
def iterate_over(something):
    _i = something.__iter__()  # creates an iterator
    while True:
        try:
            i = _i.__next__()
        except StopIteration: # iterator is exhausted
            break
        do_something_with(i)

In [None]:
iterate_over(test_list)

## If objects have no `__iter__` method ...
###### ... Python still attempts to fall back to `__getitem__`

In [None]:
hasattr(Stars, "__iter__")

Remember, Python was happy to iterate over a `Starts` object,

## Recognizing Iterators and Iterables

In [None]:
# Iterators have both __iter__ and __next__
print(hasattr(tli, '__iter__'), hasattr(tli, '__next__'))

The list iterator object's attributes verify that it is indeed an iterator,
which is simply to say that it provides both the `__iter__` and `__next__`
methods.

In [None]:
# Iterables only have __iter__ (or possibly __getitem__)
print(hasattr(test_list, '__iter__'), hasattr(test_list, '__next__'))

Note that the list itself, while it _is_ an iterable
(_i.e._ it implements the `__iter__` method, which returns an iterator),
is not itself an iterator because it has no `__next__` method.

## A Quick Piece of Shorthand

## _`iter(thing)`_
## is the same as
## _`thing.__iter__()`_

## This is the easy way to create an iterator from an iterable!

The rest of the notebook is to encourage play!

## Iterating over Iterables _vs_ Iterators

In [None]:
# Create two distinct iterators
iterator_1 = iter(test_list) # same as test_list.__iter__()
iterator_2 = iter(test_list)
print(id(iterator_1), id(iterator_2), sep="\n")
print(iterator_1 is iterator_2)

### Nested iterations over iterables

In [None]:
for i in test_list:
    for j in test_list:
        do_something_with(f'{i} : {j}')

### Nested iterations over two separate iterators

In [None]:
iterator_1 = iter(test_list)
iterator_2 = iter(test_list)
for i in iterator_1:
    print("outer loop")
    for j in iterator_2:
        print("inner loop")
        do_something_with(i + ":" + j)

### Nested iterations over _the same_ iterator

In [None]:
iterator_1 = iter(test_list)
for i in iterator_1:
    print("outer loop")
    for j in iterator_1:
        print("inner loop")
        do_something_with(i + ":" + j)

### Using Iterators Doesn't Mean No Issues

In [None]:
it_4 = iter(["one", "two", "three", "four"])
it_5 = iter(["five", "six", "seven"])
for iterator in it_4, it_5:
    print("++ New iterator ++")
    for item_1 in iterator:
        item_2 = next(iterator)
        do_something_with(f'{item_1} : {item_2}')

## Writing Your Own Iterators and Iterables

In [None]:
def is_iterable(o):
    "Return True if o is an iterable."
    return hasattr(o, "__iter__") and not hasattr(o, "__next__")

def is_iterator(o):
    "Return True if o is an iterator."
    return hasattr(o, "__iter__") and hasattr(o, "__next__")

In [None]:
test_it = iter(test_list)
print(is_iterable(test_list), is_iterator(test_list))
print(is_iterable(test_it), is_iterator(test_it))

### The Basic Iterator Pattern

In [None]:
class MyIterator:
    "An iterator to produce each character of a string N times."
    def __init__(self, s, N):
        self.s = s
        self.N =  N
        self.pos = self.count = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.pos >= len(self.s):
            raise StopIteration
        result = self.s[self.pos]
        self.count += 1
        if self.count == self.N:
            self.pos += 1
            self.count = 0
        return result

In [None]:
for s in MyIterator("abc", 2):
    do_something_with(s)

In [None]:
it_6 = MyIterator("*+", 3)
it_7 = MyIterator("=-", 3)
for c1 in it_6:
    print("iterating over c1:", c1)
    for c2 in it_7:
        do_something_with(c1+":"+c2)

## The Basic Iterable Pattern

In [None]:
class MIString(str):
    def __new__(cls, value, N):
        return str.__new__(cls, value)
    def __init__(self, value, N):
        self.N = N
    def __iter__(self):
        return MyIterator(self, self.N)

In [None]:
[s for s in MIString("xyz", 3)]

## A Short Example

In [None]:
x = MIString("01", 2)
for c1 in x:
    for c2 in x:
        print(c1, c2)

In [None]:
is_iterable(x), is_iterator(x), is_iterable(iter(x)), is_iterator(iter(x))

In [None]:
class MIString2(str):
    def __new__(cls,value, N):
        return str.__new__(cls, value)
    def __init__(self, value, N):
        self.N = N
    def __iter__(self):
        for c in str(self):
            for i in range(self.N):
                yield c

In [None]:
[c for c in MIString2("abcde", 3)]

### Python Iterables

In [None]:
is_iterable({}), is_iterable(()), is_iterable(set()), is_iterable("")

## Any others?