# Iterators and Iterables

**Iteration is the repetition of some kind of process over and over again.** Python’s for loop gives us an easy way to iterate over various objects. Often, you’ll iterate over a list, but we can also iterate over other Python objects such as strings and dictionaries.

In [None]:
# Iterating over a list
ez_list = [1, 2, 3]
for i in ez_list:
    print(i)

1
2
3


In [None]:
# Iterating over a string
ez_string = "Generators"
for s in ez_string:
    print(s)

G
e
n
e
r
a
t
o
r
s


In [None]:
# Iterating over a dictionary
ez_dict = {1 : "First", 2 : "Second"}
for key, value in ez_dict.items():
    print(key, value)

1 First
2 Second


In each of the above examples, the for loop iterates over the sequence we give it. The code above used a list, string, and dictionary, but you can iterate over tuples and sets as well. In each loop above, we print each of the items in the sequence in the order they appear. For example, you can confirm that the order of the ez_list is replicated in the order that its items are printed out.

**We refer to any object that can support iteration as an iterable.**

### What defines an iterable?

Iterables support something called the **Iterator Protocol**. The technical definition for the Iterator Protocol is out of the scope of this article, but it can be thought of as **a set of requirements to be used for a for loop**. That is to say: lists, strings and dictionaries all follow the Iterator Protocol, therefore we can use them in for loops. Conversely, objects that do not follow the protocol cannot be used in a for loop. **One example of an object that does not follow the protocol is an integer**. If we try to give an integer to a for loop, Python will throw an error.

In [None]:
number = 12345
for n in number:
    print(n)

TypeError: 'int' object is not iterable

An integer is just a singular number, not a sequence. You may argue that the “first” number in number is 1, but it is not the same as the first item in a sequence. It doesn’t make sense to ask “What’s after 1?” from number since Python only understands integers as a single entities. Therefore, one of the requirements to be an iterable is to be able to describe to the for loop what the next item to perform the operation on is. For example, lists tell the for loop that the next item to iterate on is in the index+1 from the current one (1 comes after 0). Consequently, an iterable must also signal to a for loop when to stop iterating. This signal usually comes when we arrive at the end of a sequence (i.e. the end of a list or string). We will explore the specific functions that make something iterable later in this article, the important thing to know is that iterables describe how a for loop should traverse its contents. Generators are iterables themselves. As you’ll see later, for loops are one of the main ways we use a generator, so they must be able to support iteration. We’ll delve into how we can create our own generators in the next secton.

> - Iteration is the idea of repeating some process over a sequence of items. In Python, iteration is usually related to the for loop.
- An iterable is an object that supports iteration.
- To be an iterable, it must describe to a for loop two things:
    - What item comes next in the iteration.
    - When should the loop stop iteration.
- Generators are iterables.

## Iterators

<p data-id="e676dedd66b7a59576952d0c5743e38f">An iterator is an object that will allow you to iterate over a
container. The iterator in Python is implemented via two distinct
methods: <strong>__iter__</strong> and <strong>__next__</strong>. The <strong>__iter__</strong>
method is required for your container to provide iteration support. It
will return the iterator object itself. But if you want to create an
iterator object, then you will need to define <strong>__next__</strong> as well,
which will return the next item in the container.</p>

To make things extra clear, let’s go over a couple of definitions:

<ul>
<li>iterable - an object that has the __iter__ method defined</li>
<li>iterator - an object that has both __iter__ and __next__
defined where __iter__ will return the iterator object and
__next__ will return the next element in the iteration.</li>
</ul>

<p>As with most magic methods (the methods with double-underscores), you
should not call __iter__ or __next__ directly. Instead you can
use a <strong>for</strong> loop or list comprehension and Python will call the
methods for you automatically. There are cases when you may need to
call them, but you can do so with Python’s built-ins: <strong>iter</strong> and
<strong>next</strong>.</p>

<p>Before we move on, I want to mention Sequences. Python 3 has several
sequence types such as list, tuple and range. The list is an iterable,
but not an iterator because it does not implement __next__. This
can be easily seen in the following example:</p>

In [None]:
my_list = [1, 2, 3]
next(my_list)

TypeError: 'list' object is not an iterator

When we tried to call the list’s next method in the example above, we received a TypeError and were informed that the list object is not an iterator. But we can make it one! Let’s see how:

In [None]:
iter(my_list)

<list_iterator at 0x7f8acc362780>

In [None]:
list_iterator = iter(my_list)
next(list_iterator)

1

In [None]:
next(list_iterator)

2

In [None]:
next(list_iterator)

3

In [None]:
next(list_iterator)

StopIteration: 

To turn the list into an iterator, just wrap it in a call to Python’s iter method. Then you can call next on it until the iterator runs out of items and StopIteration gets raised. Let’s try turning the list into an iterator and iterating over it with a loop:

In [None]:
for item in iter(my_list):
    print(item)

1
2
3


In [None]:
for item in my_list:
    print(item)

1
2
3


When you use a loop to iterate over the iterator, you don’t need to call next and you also don’t have to worry about the StopIteration exception being raised.

### Creating Your Own Iterators

<p data-id="ef127564d4724bf7d20c938e5f00004d">Occasionally you will want to create your own custom iterators. Python
makes this very easy to do. As mentioned in the previous section, all
you need to do is implement the __iter__ and __next__ methods in
your class. Let’s create an iterator that can iterate over a string of
words:</p>

In [None]:
import re
RE_WORD = re.compile(r'\w+')


class SentenceIterator:
    def __init__(self, text):
        """
        Constructor
        """
        self.words = RE_WORD.findall(text)
        self.index = 0

    def __iter__(self):
        """
        Returns itself as an iterator
        """
        return self

    def __next__(self):
        """
        Returns the next word in the sequence or 
        raises StopIteration
        """
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index += 1
        return word

if __name__ == '__main__':
    sentence = SentenceIterator('Danes je lep dan.')
    for item in sentence:
        print(item)

Danes
je
lep
dan


<p data-id="31f71620b602daa3e2a9d4893bad2247">For this example, we only needed three methods in our class. In our
initialization, we pass in the string of letters and create a class
variable to refer to them. We also initialize a position variable so we
always know where we’re at in the string. The __iter__ method just
returns itself, which is all it really needs to do. The __next__
method is the meatiest part of this class. Here we check the position
against the length of the string and raise StopIteration if we try to go
past its length. Otherwise we extract the letter we’re on, increment the
position and return the letter.</p>

Let’s take a moment to create an infinite iterator. An infinite iterator is one that can iterate forever. You will need to be careful when calling these as they will cause an infinite loop if you don’t make sure to put a bound on them.

In [None]:
class Doubler:
    """
    An infinite iterator
    """
    def __init__(self):
        """
        Constructor
        """
        self.number = 0

    def __iter__(self):
        """
        Returns itself as an iterator
        """
        return self

    def __next__(self):
        """
        Doubles the number each time next is called
        and returns it. 
        """
        self.number += 1
        return self.number * self.number

if __name__ == '__main__':
    doubler = Doubler()
    count = 0

    for number in doubler:
        print(number)
        if count > 5:
            break
        count += 1

1
4
9
16
25
36
49


In this piece of code, we don’t pass anything to our iterator. We just instantiate it. Then to make sure we don’t end up in an infinite loop, we add a counter before we start iterating over our custom iterator. Finally we start iterating and break out when the counter goes above 5.

## Advanced topics - Iterables, Iterators

Iteration is fundamental to data processing. And when scanning datasets that don’t fit
in memory, we need a way to fetch the items lazily, that is, one at a time and on demand.
This is what the Iterator pattern is about. This chapter shows how the Iterator pattern
is built into the Python language so you never need to implement it by hand.

The yield keyword allows the construction of generators, which
work as iterators.

> Every generator is an iterator: generators fully implement the
iterator interface. But an iterator—as defined in the GoF book—
retrieves items from a collection, while a generator can produce
items “out of thin air.” That’s why the Fibonacci sequence generator
is a common example: an infinite series of numbers cannot
be stored in a collection. However, be aware that the Python
community treats iterator and generator as synonyms most of the
time.

Python 3 uses generators in many places. Even the range() built-in now returns a
generator-like object instead of full-blown lists like before. If you must build a list
from range, you have to be explicit (e.g., list(range(100))).

Every collection in Python is iterable, and iterators are used internally to support:
- for loops
- Collection types construction and extension
- Looping over text files line by line
- List, dict, and set comprehensions
- Tuple unpacking
- Unpacking actual parameters with * in function calls

### Sentence Take #1: A Sequence of Words

We’ll start our exploration of iterables by implementing a Sentence class: you give its
constructor a string with some text, and then you can iterate word by word. The first
version will implement the sequence protocol, and it’s iterable because all sequences are
iterable, as we’ve seen before, but now we’ll see exactly why.

In [None]:
import re
import reprlib

In [None]:
RE_WORD = re.compile('\w+')

In [None]:
class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
    def __getitem__(self, index):
        return self.words[index]
    def __len__(self):
        return len(self.words)
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

- re.findall returns a list with all nonoverlapping matches of the regular
expression, as a list of strings.
- self.words holds the result of .findall, so we simply return the word at the
given index.
- To complete the sequence protocol, we implement \__len__—but it is not needed
to make an iterable object.
- reprlib.repr is a utility function to generate abbreviated string representations
of data structures that can be very large.

By default, reprlib.repr limits the generated string to 30 characters. See the console
session in Example 14-2 to see how Sentence is used.

In [None]:
# A sentence is created from a string.
s = Sentence('"The time has come," the Walrus said,')

In [None]:
# Note the output of __repr__ using ... generated by reprlib.repr.
s

Sentence('"The time ha... Walrus said,')

In [None]:
# Sentence instances are iterable; we’ll see why in a moment.
for word in s:
    print(word)

The
time
has
come
the
Walrus
said


In [None]:
# Being iterable, Sentence objects can be used as input to build lists and other
# iterable types.
list(s)

['The', 'time', 'has', 'come', 'the', 'Walrus', 'said']

In [None]:
len(s)

7

In the following pages, we’ll develop other Sentence classes that pass the tests in
Example 14-2. However, the implementation in Example 14-1 is different from all the
others because it’s also a sequence, so you can get words by index:

In [None]:
s[0]

'The'

In [None]:
s[0]

'The'

In [None]:
s[-1]

'said'

Every Python programmer knows that sequences are iterable. Now we’ll see precisely
why.

#### Why Sequences Are Iterable: The iter Function

Whenever the interpreter needs to iterate over an object x, it automatically calls iter(x).
The iter built-in function:
- 1. Checks whether the object implements  `__iter__`, and calls that to obtain an iterator.
- 2. If  `__iter__` is not implemented, but  `__getitem__` is implemented, Python creates
an iterator that attempts to fetch items in order, starting from index 0 (zero).
- 3. If that fails, Python raises TypeError, usually saying “C object is not iterable,” where
C is the class of the target object.

That is why any Python sequence is iterable: they all implement `__getitem__`. In fact,
the standard sequences also implement `__iter__`, and yours should too, because the
special handling of `__getitem__` exists for backward compatibility reasons and may be
gone in the future (although it is not deprecated as I write this).

As mentioned in “Python Digs Sequences” on page 310, this is an extreme form of duck
typing: an object is considered iterable not only when it implements the special method `__iter__`, but also when it implements `__getitem__`, as long as `__getitem__` accepts
int keys starting from 0.

In the goose-typing approach, the definition for an iterable is simpler but not as flexible:
an object is considered iterable if it implements the `__iter__` method. No subclassing
or registration is required, because abc.Iterable implements the `__subclasshook__`,
as seen in “Geese Can Behave as Ducks” on page 338. Here is a demonstration:

In [None]:
class Foo:
    def __iter__(self):
        pass

In [None]:
from collections import abc

In [None]:
issubclass(Foo, abc.Iterable)

True

In [None]:
f = Foo()

In [None]:
isinstance(f, abc.Iterable)

True

However, note that our initial Sentence class does not pass the issubclass(Sentence,
abc.Iterable) test, even though it is iterable in practice.

> As of Python 3.4, the most accurate way to check whether an object
x is iterable is to call iter(x) and handle a TypeError exception
if it isn’t. This is more accurate than using isinstance(x,
abc.Iterable), because iter(x) also considers the legacy
`__getitem__` method, while the Iterable ABC does not.

Explicitly checking whether an object is iterable may not be worthwhile if right after the
check you are going to iterate over the object. After all, when the iteration is attempted
on a noniterable, the exception Python raises is clear enough: TypeError: 'C' object
is not iterable . If you can do better than just raising TypeError, then do so in a
try/except block instead of doing an explicit check. The explicit check may make sense
if you are holding on to the object to iterate over it later; in this case, catching the error
early may be useful.
The next section makes explicit the relationship between iterables and iterators.

### Iterables Versus Iterators

iterable: Any object from which the iter built-in function can obtain an iterator. Objects
implementing an `__iter__`method returning an iterator are iterable. Sequences are always iterable; as are objects implementing a __getitem__ method that takes
0-based indexes.

It’s important to be clear about the relationship between iterables and iterators: Python
obtains iterators from iterables.
Here is a simple for loop iterating over a str. The str 'ABC' is the iterable here. You
don’t see it, but there is an iterator behind the curtain:

In [None]:
s = 'ABC'

for char in s:
    print(char)

A
B
C


If there was no for statement and we had to emulate the for machinery by hand with
a while loop, this is what we’d have to write:

In [None]:
s = 'ABC'
it = iter(s) # Build an iterator it from the iterable.

while True:
    try:
        print(next(it)) # Repeatedly call next on the iterator to obtain the next item.
    except StopIteration: #The iterator raises StopIteration when there are no further items.
        del it # Release reference to it—the iterator object is discarded.
        break # Exit the loop.

A
B
C


StopIteration signals that the iterator is exhausted. This exception is handled internally
in for loops and other iteration contexts like list comprehensions, tuple unpacking,
etc.

The standard interface for an iterator has two methods:

- `__next__`: Returns the next available item, raising StopIteration when there are no more
items.
- `__iter__`: Returns self; this allows iterators to be used where an iterable is expected, for
example, in a for loop.

Back to our Sentence class from Example 14-1, you can clearly see how the iterator is
built by iter(…) and consumed by next(…) using the Python console:

In [None]:
s3 = Sentence('Pig and Pepper')

In [None]:
it = iter(s3)

In [None]:
z

<iterator at 0x7f39acbdd3c8>

In [None]:
next(it)

'Pig'

In [None]:
next(it)

'and'

In [None]:
next(it)

'Pepper'

In [None]:
next(it)

StopIteration: 

In [None]:
list(it)

[]

In [None]:
list(iter(s3))

['Pig', 'and', 'Pepper']

Because the only methods required of an iterator are `__next__` and `__iter__`, there is
no way to check whether there are remaining items, other than to call next() and catch
StopInteration. Also, it’s not possible to “reset” an iterator. If you need to start over,
you need to call iter(…) on the iterable that built the iterator in the first place. Calling
iter(…) on the iterator itself won’t help, because—as mentioned—Iterator.`__iter__` is implemented by returning self, so this will not reset a depleted iterator.
To wrap up this section, here is a definition for iterator:

iterator: Any object that implements the `__next__` no-argument method that returns the
next item in a series or raises StopIteration when there are no more items. Python
iterators also implement the `__iter__` method so they are iterable as well.

This first version of Sentence was iterable thanks to the special treatment the iter(…)
built-in gives to sequences. Now we’ll implement the standard iterable protocol.

### Sentence Take #2: A Classic Iterator

The next Sentence class is built according to the classic Iterator design pattern following
the blueprint in the GoF book. Note that this is not idiomatic Python, as the next refactorings
will make very clear. But it serves to make explicit the relationship between
the iterable collection and the iterator object.

Example 14-4 shows an implementation of a Sentence that is iterable because it implements
the `__iter__` special method, which builds and returns a SentenceIterator.
This is how the Iterator design pattern is described in the original Design Patterns book.

We are doing it this way here just to make clear the crucial distinction between an iterable
and an iterator and how they are connected.

In [None]:
import re
import reprlib
RE_WORD = re.compile('\w+')

class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    def __iter__(self):
        return SentenceIterator(self.words)
    
class SentenceIterator:
    def __init__(self, words):
        self.words = words
        self.index = 0
    def __next__(self):
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index += 1
        return word
    def __iter__(self):
        return self

- The `__iter__` method is the only addition to the previous Sentence implementation. This version has no `__getitem__`, to make it clear that the class is iterable because it implements `__iter__`.
- `__iter__` fulfills the iterable protocol by instantiating and returning an iterator.
- SentenceIterator holds a reference to the list of words.
- self.index is used to determine the next word to fetch.
- Get the word at self.index.
- If there is no word at self.index, raise StopIteration.
- Increment self.index.
- Return the word.
- Implement `self.__iter__`.

Note that implementing `__iter__` in SentenceIterator is not actually needed for this
example to work, but the it’s the right thing to do: iterators are supposed to implement
both `__next__` and `__iter__`, and doing so makes our iterator pass the issubclass(Sen
tenceInterator, abc.Iterator) test. If we had subclassed SentenceIterator from
abc.Iterator, we’d inherit the concrete `abc.Iterator.__iter__` method.

A common cause of errors in building iterables and iterators is to confuse the two. To
be clear: iterables have an `__iter__` method that instantiates a new iterator every time.
Iterators implement a `__next__` method that returns individual items, and an `__iter__`
method that returns self.

Therefore, iterators are also iterable, but iterables are not iterators.

> An iterable should never act as an iterator over itself. In other
words, iterables must implement `__iter__`, but not `__next__`.
On the other hand, for convenience, iterators should be iterable.
An iterator’s `__iter__` should just return self.