# Implementing Iteration

## Agenda

1. Review: Iteration
2. Details: *iterables*, *iterators*, `iter`, and `next`
3. Implementing iterators with classes
4. Implementing iterators with *generators* and `yield`

## 1. Review: Iteration

*Iteration* simply refers to the process of accessing — one by one — the items stored in some container. The order of the items, and whether or not the iteration is comprehensive, depends on the container.

In Python, we typically perform iteration using the `for` loop.

In [1]:
# e.g., iterating over a list
l = [2**x for x in range(10)]
for n in l:
    print(n)

1
2
4
8
16
32
64
128
256
512


In [2]:
# e.g., iterating over the key-value pairs in a dictionary
d = {x:2**x for x in range(10)} #Dictionary comprehension
for k,v in d.items():
    print(k, '=>', v)

0 => 1
1 => 2
2 => 4
3 => 8
4 => 16
5 => 32
6 => 64
7 => 128
8 => 256
9 => 512


## 2. Details: *iterables*, *iterators*, `iter`, and `next`

An iterable is anything that can iterate, anything we can give to a for loop

We can iterate over anything that is *iterable*. Intuitively, if something can be used as the source of items in a `for` loop, it is iterable.

But how does a `for` loop really work? (Review time!)

In [3]:
l = [2**x for x in range(10)]

In [4]:
for x in l:
    print(x)

1
2
4
8
16
32
64
128
256
512


In [6]:
dir(l)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [7]:
it = l.__iter__()

In [8]:
type(it)

list_iterator

An iterable's __iter__ method returns an iterator. This is a separate object (from the container itself) that can be used to iterate over the container's contents.

In [9]:
dir(it) #Has different functions than l because it is not a list, they are different classes

['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__length_hint__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

In [20]:
it.__next__()

StopIteration: 

An iterator's __next__ method returns the "next" item from the associated container. It raises a StopIteration exception when there are no more items.

To reiterate over the list, use a new iterator

In [22]:
it = l.__iter__() #Deconstructed for-loop
while True:
    try:
        print(it.__next__())
    except StopIteration:
        break

1
2
4
8
16
32
64
128
256
512


In [23]:
it = iter(l) #Same as calling l.__iter__()

In [24]:
next(it)

1

Instead of calling __iter__ and __next__ directly, we would typically use the global iter and next functions.

In [25]:
it = iter(l)
while True:
    try:
        print(next(it))
    except StopIteration:
        break

1
2
4
8
16
32
64
128
256
512


In [26]:
for x in l:
    print(x)

1
2
4
8
16
32
64
128
256
512


In [27]:
it1 = iter(l)
it2 = iter(l)
it3 = iter(l)
next(it2)
next(it3)
next(it3)
while True:
    try:
        print(next(it1), next(it2), next(it3))
    except StopIteration:
        break

1 2 4
2 4 8
4 8 16
8 16 32
16 32 64
32 64 128
64 128 256
128 256 512


Note we can use multiple iterators concurrently — each one keeps track of its own progress through the underlying container.

In [29]:
d = {'a': 1, 'b': 2, 'c': 3}
it = iter(d)
print(next(it)) #Iterator knows that the data structure is being modified
d['d']=4
print(next(it))
print(next(it))

c


RuntimeError: dictionary changed size during iteration

In [30]:
d = {'a': 1, 'b': 2, 'c': 3}
for k in d:
    print(k)
    d['d'] = 4

c


RuntimeError: dictionary changed size during iteration

An iterators behavior is not always well defined if the container is modified during the process of iteration:

In [33]:
l = [2**x for x in range(10)]
it = iter(l)
for x in it:
    print(x)

1
2
4
8
16
32
64
128
256
512


In [34]:
dir(it) #Calling iter on an iterator just returns the iterator, because it's an iterator

['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__length_hint__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

Another detail: iterators are themselves also iterables.

... which means that iterator objects must also implement the __iter__ method (in addition to __next__)

## 3. Implementing iterators with classes

In [81]:
class MyIterator:
    def __init__(self, max):
        self.max = max
        self.curr = 0
        
    # the following methods are required for iterator objects
    
    def __next__(self):
        if self.curr < self.max:
            self.curr +=1
            return self.curr-1
        else:
            raise StopIteration()
    
    def __iter__(self):
        return self

In [82]:
it = MyIterator(10)

In [83]:
next(it)

0

In [84]:
it = MyIterator(10)
while True:
    try:
        print(next(it))
    except StopIteration:
        break

0
1
2
3
4
5
6
7
8
9


In [85]:
it = MyIterator(10)
for i in it:
    print(i)

0
1
2
3
4
5
6
7
8
9


In [94]:
for i in it: #Cannot reiterate over an iterator
    print(i)

For a container type, we need to implement an `__iter__` method that returns an iterator.

In [115]:
class ArrayList:
    def __init__(self):
        self.data = []
        
    def append(self, val):
        self.data.append(None)
        self.data[len(self.data)-1] = val
        
    def __iter__(self):
        class ArrayListIterator: #Scope is only within the iter method
            def __init__(self,lst):
                self.idx = 0
                self.lst = lst

            def __iter__(self):
                return self

            def __next__(self):
                if self.idx < len(self.lst.data):
                    self.idx +=1
                    return self.lst.data[self.idx-1]
                else:
                    raise StopIteration()

        return ArrayListIterator(self)

In [116]:
l = ArrayList()
for x in range(10):
    l.append(2**x)

In [117]:
it = iter(l)

In [118]:
type(it)

__main__.ArrayList.__iter__.<locals>.ArrayListIterator

In [119]:
next(it)

1

In [114]:
for x in l:
    print(x)

1
2
4
8
16
32
64
128
256
512


## 4. Implementing iterators with generators and `yield`

What's a "generator"?

In [120]:
g = (2**x for x in range(10))

In [121]:
type(g)

generator

In [122]:
len(g)

TypeError: object of type 'generator' has no len()

In [125]:
dir(g) #Generators are iterators, they have iter and next, special kind of iterator

['__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__next__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'gi_yieldfrom',
 'send',
 'throw']

In [126]:
for x in g:
    print(x)

1
2
4
8
16
32
64
128
256
512


The values of the generator are calculated lazily, it doesn't fill in the values

In [140]:
%timeit -n 10000 [x for x in range(1000)]

10000 loops, best of 3: 54.8 µs per loop


In [138]:
%timeit -n 10000 (x for x in range(1000))

10000 loops, best of 3: 979 ns per loop


The generator object doesn't actually get populated with all the values the comprehension indicates (as a list would). Rather, it lazily computes the values as we need them (when its __next__ method is called).

In [141]:
sum([x for x in range(10**3)])

499500

In [142]:
sum(x for x in range(10**3))

499500

In [144]:
%timeit 1024 in [2**x for x in range(100)]

10000 loops, best of 3: 57.7 µs per loop


In [147]:
%timeit 1024 in (2**x for x in range(100)) #Iterates through values as it is making them, lazy computation

100000 loops, best of 3: 6.08 µs per loop


If a function takes an iterable, we can pass it a generator. This can be much more memory-efficient (and possibly more time-efficient) than passing the function a list with the same items (as would be returned by the generator).

In [148]:
def foo():
    yield 1

In [150]:
g = foo()

In [151]:
type(g)

generator

In [153]:
next(g)

StopIteration: 

In [156]:
def foo():
    print('foo was called')
    yield 1 #not called when run, just makes a generator object

In [157]:
g = foo()

A generator object can also be created by calling a generator function. A generator function is any function that contains the yield keyword.

In [158]:
next(g)

foo was called


1

The body of a generator function is not actually run to yield a value when it is first called! It is only run for the first time when next is invoked on the returned generator object.

In [159]:
def foo():
    print('L1')
    yield 1
    print('L2')
    yield 2
    print('L3')
    yield 3
    print('L4')

In [160]:
g = foo()

In [164]:
next(g)

L4


StopIteration: 

In [166]:
for x in foo():
    print('*', x, '*')

L1
* 1 *
L2
* 2 *
L3
* 3 *
L4


Each next call on a generator object will run the body of the generator function up to the next yield statement (if any). The value specified in the yield statement will be the return value of next. If there is no remaining yield statement, the function will run until it exits and a StopIteration exception will be raised.
... which is just what we need to support iteration with a for loop!

In [167]:
def foo():
    for x in range(10):
        yield 2**x

In [168]:
for x in foo():
    print(x)

1
2
4
8
16
32
64
128
256
512


A co routine is something that you can call multiple times, called from anywhere in the code

In [169]:
def foo(): #yield is never run, but the presence of it makes a generator function
    if Fale:
        yield 10

In [170]:
foo()

<generator object foo at 0x0000020C80D4B728>

In [185]:
class ArrayList:
    def __init__(self):
        self.data = []
        
    def append(self, val):
        self.data.append(None)
        self.data[len(self.data)-1] = val
        
    def __iter__(self):
        for i in range(len(self.data)):
            yield self.data[i]
#        class ArrayListIterator: #Scope is only within the iter method
#            def __init__(self,lst):
#                self.idx = 0
#                self.lst = lst
#
#            def __iter__(self):
#                return self
#
#            def __next__(self):
#                if self.idx < len(self.lst.data):
#                    self.idx +=1
#                    return self.lst.data[self.idx-1]
#                else:
#                    raise StopIteration()
#
#        return ArrayListIterator(self)

In [186]:
l = ArrayList()
for x in range(10):
    l.append(2**x)

In [189]:
for x in l:
    print(x)

1
2
4
8
16
32
64
128
256
512


In [193]:
g = (x for x in range(10))
list(g)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [194]:
next(g)

StopIteration: 