<div style="color:red;background-color:black">
Diamond Light Source

<h1 style="color:red;background-color:antiquewhite"> Python Fundamentals: Iterators and Generators</h1>  

©2000-20 Chris Seddon 
</div>

## 1
Execute the following cell to activate styling for this tutorial

In [1]:
from IPython.display import HTML
HTML(f"<style>{open('my.css').read()}</style>")

## 2
In this tutorial we will investigate iterators and generators.

Iterators are defined as objects of any class that has the following two methods defined:
<pre>
__iter__
__next__</pre>

Generators are a special class of functions that simplify the task of writing iterators. Regular functions compute a value and return it, but generators return an iterator that returns a stream of values.

Let's start with a simple iterator class that computes Fibonacci numbers:

In [2]:
class Fibonacci:
    def __init__(self):
        self.x,self.y = 0,1
        
    def __iter__(self):
        return self  # the object on which to call next() - usually ourself

    def __next__(self):
        if self.x > 100:
            raise StopIteration     # indicate end of iteration
        
        self.x, self.y = self.y, self.x + self.y
        return self.x

## 3
Using the above class we can instantiate an iterator.  We can then use the iterator to produce a list of Fibonacci numbers.  

Instead of calling "\__next__" directly, the Python documentation recommends using the built-in function "next" to produce the Fibonacci numbers:

In [3]:
# create an iterator
iterator = Fibonacci()

# create a stream of Fibonacci numbers
print( next(iterator) )
print( next(iterator) )
print( next(iterator) )
print( next(iterator) )
print( next(iterator) )
print( next(iterator) )
print( next(iterator) )
print( next(iterator) )
print( next(iterator) )

1
1
2
3
5
8
13
21
34


## 4 
Python has special support for iterators.  Every "for" loop works with iterators.  For example in the following example the data 5, 6, 2, 8 is actual a tuple and a tuple is a built-in iterator.  We can't see the code for the tuple, but the "for" loop is calling the tuple's "\__next__" method repeatedly.

In [4]:
for n in 5, 6, 2, 8:
    print(n)

5
6
2
8


## 5
We can use our iterator in a "for" loop.  Note that we have arranged for the iterator to terminate when the Fibonacci numbers exceed 100:

In [5]:
for n in Fibonacci():
    print(n)

1
1
2
3
5
8
13
21
34
55
89
144


## 6
If we repeat the above exercise but this time add trace statements in our class, we can see that the "for" loop automatically calls the iterator methods:

In [6]:
class Fibonacci:
    def __init__(self):
        print("__init__")
        self.x,self.y = 0,1
        
    def __iter__(self):
        print("__iter__")
        return self  # the object on which to call next() - usually ourself

    def __next__(self):
        print("__next__")
        if self.x > 100:
            print("raising StopIteration exception")
            raise StopIteration     # indicate end of iteration
        
        self.x, self.y = self.y, self.x + self.y
        return self.x

for n in Fibonacci():
    print(n)

__init__
__iter__
__next__
1
__next__
1
__next__
2
__next__
3
__next__
5
__next__
8
__next__
13
__next__
21
__next__
34
__next__
55
__next__
89
__next__
144
__next__
raising StopIteration exception


## 7
Now we can see what is happening.  The Python interpreter translates the loop into a series of calls to our iterator.  
First the iterator is instantiated and "\__init__" is called.  Then the "for" loop calls the iterator's "\___iter__" method once to determine which object to iterate over.  This is normally the same object as we have just instantiated and hence the method returns "self".  

The main part of the "for" loop is to call "\__next__" repeatedly until the Fibonacci numbers exceed 100.  The loop terminates by catching the exception thrown in:
<pre>if self.x > 100:
    raise StopIteration</pre>

Thus the for loop is equivalent to the following code:

In [7]:
try:
    fib = Fibonacci()
    iter = fib.__iter__()

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

    f = iter.__next__()
    print(f, end=", ")

except StopIteration as e:
    pass

__init__
__iter__
__next__
1, __next__
1, __next__
2, __next__
3, __next__
5, __next__
8, __next__
13, __next__
21, __next__
34, __next__
55, __next__
89, __next__
144, __next__
raising StopIteration exception


## 8
Now let's turn our attention to generators; a generator is a special type of iterator.  Any function that contains a <pre>yield</pre> statement is a generator.

Here is an example:

In [8]:
def powers():
    x = 1
    while(x < 1000):
        x = x * 2
        yield x
    return

## 9
Generators are simpler than iterators because they don't involve writing a class.  Instead, when a generator is called it creates an iterator class behind the scenes and then instatiates and returns an iterator.  This is all done inside the Python interpreter, so we can't see the code generated.  

Nevertheless we can check that an iterator is indeed created.  If we look at the iterator returned by calling the generator, we can use the built-in method "hasattr" to check if the iterator has "\__iter__" and "\__next__" methods:

In [9]:
# calling the function produces a generator object, which is also an iterator
g = powers()

# check that g has both iterator methods
print("Does g have an '__iter__' function?", hasattr(g, "__iter__"))
print("Does g have an '__next__' function?", hasattr(g, "__next__"))

Does g have an '__iter__' function? True
Does g have an '__next__' function? True


## 10
Let's use our generator in a loop.  The same considerations apply as per our Fibonacci iterator:

In [10]:
for n in powers():
    print(n)

2
4
8
16
32
64
128
256
512
1024


## 11
The mechanics of a generator is quite simple.  Every time we call "next" directly or indirectly through a loop, code inside the generator is executed until we reach the yield statement.  It is the yield statement that returns the next value from the generator.  
If we call "next" again, the generator continues execution from the yield statement until it hits a yield statement again.  This will happen repeatedly in our example because the yield statement is in a loop.

Recall that we are really working with a iterator in memory that was instantiated by calling the generator.  The yield statement in the generator corresponds to the return statement in the iterator.

Execution continues like this until there are no yield statements left and the generator reaches its return statement.  At this point the underlying iterator raises a "StopIteration" exception and this terminates the generator.  Any further calls will generate "StopIteration" exceptions.

Let's put some trace calls into our generator to confirm this behaviour:

In [11]:
def powers():
    x = 1
    while(x < 1000):
        x = x * 2
        yield x
        print("statement after yield")
    print("about to terminate")
    return

for n in powers():
    print(n)

2
statement after yield
4
statement after yield
8
statement after yield
16
statement after yield
32
statement after yield
64
statement after yield
128
statement after yield
256
statement after yield
512
statement after yield
1024
statement after yield
about to terminate


## 12
We can also use generators in a comprehension.  In the following example the expression: <pre>(sqrt(x) for x in range(10))</pre>
defines a generator expression.  The round brackets are used to differentiate this comprehension from the other types of comprehension (list, dict and set comprehensions).  

Note that a comprehension always uses the keywords "for" and "in".  That is what distinguishes comprehensions from tuples, lists, dicts and sets.

Consider the following generator comprehension:

In [12]:
from math import sqrt

roots = (sqrt(x) for x in range(10))

## 13
A generator comprehension, like its generater counterpart only yield values when we ask it to do so:

In [13]:
print( next(roots) )
print( next(roots) )
print( next(roots) )
print( next(roots) )

0.0
1.0
1.4142135623730951
1.7320508075688772


## 14
Of course we can use the generator comprehension as the target of a loop; "next" will be called automatically:

In [14]:
for n in roots:
    print(n)

2.0
2.23606797749979
2.449489742783178
2.6457513110645907
2.8284271247461903
3.0


## 15
Note that the generator continued from where it had left off.  Earlier, we consumed 4 values from the generator comprehension and inspecting the code we see there were only 10 values available.  Hence the above code generated the last 6 values.  

Now our generator comprehension is exhausted.  If we try to call next again, the generator comprehension will not be able to return anything and will raise a "StopIteration" exception:

In [15]:
try:
    print( next(roots) )
except StopIteration as e:
    print("no more values available")

no more values available


## 16
This leads to the concept that generators and generator comprehensions get consumed.  We can see similar behaviour if we use the built-in function sum on our generator expression (sum calls "next" until the comprehension is exhausted).

In [16]:
from math import sqrt

roots = (sqrt(x) for x in range(10))

print(sum(roots))   # consume the generator comprehension
print(sum(roots))   # the generator is now empty

19.30600052603572
0


## 17
The same applies to our "powers" generator that we defined earlier:

In [17]:
def powers():
    x = 1
    while(x < 1000):
        x = x * 2
        yield x
    return

g = powers()
print(sum(g))   # consume the generator
print(sum(g))   # the generator is now empty

2046
0


## 18
Iterators and generators only return values when asked.  This is called lazy evaluation.  Structures like lists and tuples evaluate all the components immediately (eager evaluation).

With lazy evaluation it is possible to define an infinite generator - one that will return values forever.  Of course we need to be careful when using an infinite generator in a loop - there has to be a way of avoiding looping forever.  
Let's modify our "powers" generator such that it never returns.  It will then be an infinite generator: 

In [18]:
def powers():
    x = 1
    while True:
        x = x * 2
        yield x

## 19
Now we can call this generator as many times as we like.  It will always return values:

In [19]:
g = powers()

print("generating from a while loop")
x = 0
while x < 1000:
    x = next(g)
    print(x)
    
print("generating from a for loop")
for n in range(10):
    print(next(g)) 

generating from a while loop
2
4
8
16
32
64
128
256
512
1024
generating from a for loop
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576


## 20
There are many built-in generators and iterators in Python already.  Lists, tuples and dicts can be iterated over in loops.  Furthermore "range" is a generator.  And there are many, many more.

Finally, realise that generators execute code and then wait before they are asked to continue.  We can use this behaviour to run several generators concurrently.  In fact this is the basis behind the "aysync IO" introduced recently into Python.

Let's define 3 generators and create a primitive round-robin scheduler to call "next" on each generator in turn.  We will then have a simple concurrent system:

In [20]:
import time

def squares():
    n = 1
    while True:
        yield f"square({n}) = {n**2}"
        n += 1

def cubes():
    n = 1
    while True:
        yield f"cube({n}) = {n**3}"
        n += 3
        
def quads():
    n = 10
    while True:
        yield f"quad({n}) = {n**4}"
        n -= 1

generators = []
generators.append( squares() )
generators.append( cubes() )
generators.append( quads() )

# generators allow us to perform different calculations in parallel
# create a round robin scheduler
t = 0
while(t < 20):
    for g in generators:
        print(next(g))
        time.sleep(0.5)
        t += 0.5

square(1) = 1
cube(1) = 1
quad(10) = 10000
square(2) = 4
cube(4) = 64
quad(9) = 6561
square(3) = 9
cube(7) = 343
quad(8) = 4096
square(4) = 16
cube(10) = 1000
quad(7) = 2401
square(5) = 25
cube(13) = 2197
quad(6) = 1296
square(6) = 36
cube(16) = 4096
quad(5) = 625
square(7) = 49
cube(19) = 6859
quad(4) = 256
square(8) = 64
cube(22) = 10648
quad(3) = 81
square(9) = 81
cube(25) = 15625
quad(2) = 16
square(10) = 100
cube(28) = 21952
quad(1) = 1
square(11) = 121
cube(31) = 29791
quad(0) = 0
square(12) = 144
cube(34) = 39304
quad(-1) = 1
square(13) = 169
cube(37) = 50653
quad(-2) = 16
square(14) = 196
cube(40) = 64000
quad(-3) = 81
