# Iterators and Generators

So far, you have seen things like

If you’ve ever struggled with handling huge amounts of data (who hasn’t?!), and your machine running out of memory, then you’ll love the concept of Iterators and generators in Python.

Rather than putting all the data in the memory in one go, it would be better if we could work with it in bits, dealing with only that data that is required at that moment, right? This would reduce the load on our computer memory tremendously. And this is what iterators and generators do!

In [None]:
a = [1,2,3]
for x in a:
    print(x)

1
2
3


 similar type of iteration known as definite iteration, which means going through the same code a predefined number of times. This kind of iteration is especially useful when you need to iterate over the items of a data stream one by one in a loop.

When you use a while or for loop to repeat a piece of code several times, you’re actually running an iteration. That’s the name given to the process itself.

In Python, if your iteration process requires going through the values or items in a data collection one item at a time, then you’ll need another piece to complete the puzzle. You’ll need an iterator.

Iterators take responsibility for two main actions:

. Returning the data from a stream or container one item at a time

. Keeping track of the current and visited items

In Python, an iterator is an object that allows you to iterate over collections of data, such as lists, tuples, dictionaries, and sets.

Python iterators implement the iterator design pattern, which allows you to traverse a container and access its elements. The iterator pattern decouples the iteration algorithms from container data structures.

A Python object is considered an iterator when it implements two special methods collectively known as the iterator protocol. These two methods make Python iterators work. So, if you want to create custom iterator classes, then you must implement the following methods:

Method	Description
.__iter__()	Called to initialize the iterator. It must return an iterator object.
.__next__()	Called to iterate over the iterator. It must return the next value in the data stream.

An iterator is an object representing a stream of data i.e. iterable. They implement something known as the Iterator protocol in Python.

An iterator is an object representing a stream of data i.e. iterable. They implement something known as the Iterator protocol in Python. What is that?

Well, the Iterator protocol allows us to loop over items in an iterable using two methods: __iter__() and __next__(). All iterables and iterators have the __iter__() method which returns an iterator.

An iterator keeps track of the current state of an iterable.

But what sets iterables and iterators apart is the __next__() method accessible only to iterators. This allows the iterators to return the next value in the iterable, whenever it is asked for it

creating a simple iterable, a list, and an iterator from it using the __iter__() method:

In [None]:
sample = ['AI', 'Internet of Things', 'Deep learning']
# generating an iterator
it = sample.__iter__()
print(it)
# iterables do not have __next__() method
sample.__next__()

<list_iterator object at 0x7851c5eb7a30>


AttributeError: ignored

iterators are also iterables but not vice-versa. And they are their own iterators

In [None]:
sample = ['AI', 'Internet of Things', 'Deep learning']

it = sample.__iter__()
print(type(it))
itit = it.__iter__()
print(type(itit))
print(itit.__next__())
print(itit.__next__())
print(itit.__next__())

<class 'list_iterator'>
<class 'list_iterator'>
AI
Internet of Things
Deep learning


In [None]:
# iterator
it = iter(sample)

# next values
print(next(it))
print(next(it))
print(next(it))

AI
Internet of Things
Deep learning


In [None]:
print(next(it))

StopIteration: ignored

we get an error! If we try to access the next value after reaching the end of an iterable, a StopIteration exception will be raised which simply says “you can’t go further!”.

In [None]:
it = iter(sample)
while True:
    # this will execute till an error is raised
    try:
        val = next(it)
    # when we reach end of the list, error is raised and we break out of the loop
    except StopIteration:
        break
    print(val)

AI
Internet of Things
Deep learning


**If you take a step back, you will realize that this is precisely how the for-loop works under the hood. What we did with the loop we made here manually, for-loop does the same thing automatically. And that is why for-loops are preferred for looping over the iterables because they automatically deal with the exception.**

**Whenever we are iterating over an iterable, the for-loop knows which item to be iterated over using iter() and returns the subsequent items using the next() method.**

we know how Python iterators work, we can dive deeper and create one ourselves from scratch just to get a better understanding of how things work.

I am going to create a simple iterator for printing all the even numbers:

In [None]:
class Sequence():
    def __init__(self):
        self.num = 2
    def __iter__(self):
        return self
    def __next__(self):
        val = self.num
        self.num += 2
        return val

The __init__() method is a class constructor and is the first thing that gets executed when a class is called. It is used to assign any values initially that will be required by the class during the program execution. I have initiated the num variable with 2 here

The iter() and next() methods are what make this class an iterator

The iter() method returns the iterator object and initializes the iteration. Since the class object is itself an iterator, therefore it returns itself

The next() method returns the current value from the iterator and changes the state for the next call. We update the value of the num variable by 2 since we are only printing even numbers

In [None]:
it = Sequence()
print(next(it))
print(next(it))
print(next(it))
print(next(it))
print(next(it))

2
4
6
8
10


Since I did not mention any condition that will determine the end of the sequence, the iterator will keep on returning the next value forever. But we can easily update it with the stop condition:

In [None]:
class Sequence():
    def __init__(self):
        self.num = 2
    def __iter__(self):
        return self
    def __next__(self):
        val = self.num
        if val>10:
            raise StopIteration
        self.num += 2
        return val

 just included an if statement that stops the iteration whenever the value overshoots 10

In [None]:
it = Sequence()
for i in it:
    print(i)

2
4
6
8
10


**Generators** are also iterators but are much more elegant. Using a generator, we can achieve the same thing as an iterator but don’t have to write the iter() and next() functions in a class.

Instead, we could use a simple function to achieve the same task as an iterator

In [None]:
# fibonacci sequence using a generator
def fib():
    prev, curr = 0, 1
    # infinite loop
    while prev<5:
        value = prev
        # Calculate the next number in the sequence. Using Tuple unpacking.
        prev, curr = curr, prev + curr
        # yield the value
        yield value

Normal functions return values using the return keyword. But generator functions return values using a yield keyword.

 This is what sets the generator function apart from normal functions (apart from this distinction, they are absolutely the same).

The yield keyword works like a normal return keyword but with additional functionality – it remembers the state of the function.

 So the next time the generator function is called, it doesn’t start from scratch but from where it was left-off in the last call.

In [None]:
# generator object
gen=fib()
print(gen)
# values
print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))

<generator object fib at 0x7851b18480b0>
0
1
1
2
3


Generators are of ‘generator’ type which is a special type of iterator but is still an iterator, so they are also lazy workers. They won’t return any value unless explicitly told to do so by the next() method.

**Initially, when the object for fib() generator function is created, it initializes the prev and curr variables.**

**Now, when the next() method is called on the object, the generator function computes the values and returns the output, while at the same time remembering the state of the function. So, the next time a next() method is called, the function picks up from where it left off last time and resumes from there.**

# The function will keep on generating values every time it is asked by the next() method until the prev becomes greater than 5, at which point, a StopIteration error will be raised as shown below:

You don’t have to write a function every time you want to execute a generator. You could instead use a generator expression, much like list comprehension. The only difference is that unlike a list comprehension, a generator expression is enclosed within parenthesis like the one below:

In [None]:
squared_gen = (x*x for x in range(2,5))
print(squared_gen)

<generator object <genexpr> at 0x7851b1848200>


But they are still lazy, so you need to use the next() method. However, you know by now that using for-loops is a better option to return the values

In [None]:
for i in squared_gen:
    print(i)

4
9
16


Generator expressions are very useful when you want to write simple code because they are easy to read and comprehend. But their functionality decreases rapidly as the code becomes more complex. This is where you will find yourself resorting back to generator functions which provide greater flexibility in terms of writing more sophisticated functions

**you use Iterators because they save us a ton of memory. This is because Iterators don’t compute their items when they are generated, but only when they are called upon.**

list containing 10 million items and a generator containing the same amount of items, the difference in their sizes will be shocking:

In [None]:
import sys
# list comprehension
mylist = [i for i in range(10000000)]
print('Size of list in memory',sys.getsizeof(mylist))
# generator expression
mygen = (i for i in range(10000000))
print('Size of generator in memory',sys.getsizeof(mygen))

Size of list in memory 89095160
Size of generator in memory 104


For the same size as the list and generator, we have a huge difference in their sizes. That is the beauty of iterators.

And not just that, you could use iterators to read text from a file line-by-line instead of reading everything in one go. This will again save you a lot of memory especially if the file is huge.

Here, let’s use generators to read a file iteratively. For this, we can create a simple generator expression to open files lazily, that is, to read one line at a time:

In [None]:
file = "sample.txt"
# generator expression
lines = (line for line in open(file))
print(lines)
# print lines
print(next(lines))
print(next(lines))
print(next(lines))

FileNotFoundError: ignored

This looks very different from a C-style for loop where we loop over the variable index:
```C++
for (size_t i = 0; i < 3; i++) {
    printf("%d\n", i);
}
```

Or for instance, we can use something called a `range`

In [None]:
for i in range(3):
    print(i)

0
1
2


or other data types

In [None]:
d = {"hello": 1, "goodbye": 2}
for k,v in d.items():
    print(k, ' : ', v)

hello  :  1
goodbye  :  2


The key to using this sort of syntax is the concept of [iterator](https://wiki.python.org/moin/Iterator).  This is common in object-oriented programming (not just in Python), but you probably haven't seen iterators before if you've only used imperative languages.

An object is **iterable** if it implements the `__iter__` method, which is expected to return an iterator object.
An object is an **iterator** if it implements the `__next__` method, which either
1. returns the next element of the iterable object
2. raises the `StopIteration` exception if there are no more elements to iterate over

## A Basic Iterator

What if we want to replicate `range`?  

In [None]:
r = range(3)
type(r)

range

we can produce an iterator using the `iter` function

In [None]:
ri = iter(r)
type(ri)

range_iterator

we can explicitly run through the iterator using the `next` function

In [None]:
next(ri)

StopIteration: 

In [None]:
r = range(1,5,2)
ri = iter(r)
print(next(ri))
print(next(ri))


1
3


In [None]:
class my_range_iterator(object):
    def __init__(self, start, stop, stride):
        self.state = start
        self.stop = stop
        self.stride = stride

    def __next__(self):
        if self.state >= self.stop:
            raise StopIteration  # signals "the end"
        ret = self.state # we'll return current state
        self.state += self.stride # increment state
        return ret


# an iterable
class my_range(object):
    def __init__(self, start, stop, stride=1):
        self.start = start
        self.stop = stop
        self.stride = stride

    def __iter__(self):
        return my_range_iterator(self.start, self.stop, self.stride)

In [None]:
r = my_range(0,3)
type(r)

__main__.my_range

In [None]:
ri = iter(r)
type(ri)

__main__.my_range_iterator

In [None]:
next(ri)

StopIteration: 

In [None]:
for i in my_range(0,3):
    print(i)

0
1
2


In [None]:
for i in range(0,3):
    print(i)

0
1
2


You can also create classes that are both iterators and iterables

In [None]:
# an iterable and iterator
class my_range2(object):
    def __init__(self, start, stop, stride=1):
        self.start = start
        self.stop = stop
        self.stride = stride
        self.state = start

    def __iter__(self):
        return self

    def __next__(self):
        if self.state >= self.stop:
            raise StopIteration  # signals "the end"
        ret = self.state # we'll return current state
        self.state += self.stride # increment state
        return ret

## Using Iterators for Computation

Let's now use iterators for something more interesting - computing the Fibonacci numbers.

In [None]:
class FibonacciIterator(object):
    def __init__(self):
        self.a = 0 # current number
        self.b = 1 # next number

    def __iter__(self):
        return self

    def __next__(self):
        ret = self.a
        self.a, self.b = self.b, self.a + self.b # advance the iteration
        return ret

In [None]:
for i in FibonacciIterator():
    if i > 1000:
        break
    print(i)

0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987


Note that we never raise a `StopIteration` exception - the iterator will just keep going if we let it.

### Exercise

Define `FibonacciIterator` so it will iterate over all Fibonacci numbers until they are greater than a parameter `n`.

In [None]:
## Your code here


## Generators

Often, a more elegant way to define an iterator is using a [generator](https://wiki.python.org/moin/Generators)

This is a special kind of iterator defined using a function instead of using classes that implement the `__iter__` and `__next__` methods.

See [this post](https://nvie.com/posts/iterators-vs-generators/) for more discussion.

In [None]:
def my_range3(state, stop, stride=1):
    while state < stop:
        yield state
        state += stride


Note that we use the `def` keyword instead of the `class` keyword for our declaration.  The `yield` keyword returns subsequent values of the iteration.

In [None]:
r = my_range3(0,3)
type(r)

generator

In [None]:
ri = iter(r)
type(ri)

generator

In [None]:
next(ri)

StopIteration: 

In [None]:
for i in my_range3(0,3):
    print(i)

0
1
2


Our Fibonacci example re-written using a generator:

In [None]:
def FibonacciGenerator():
    a = 0
    b = 1
    while True:
        yield a
        a, b = b, a + b

In [None]:
for i in FibonacciGenerator():
    if i > 1000:
        break
    print(i)


0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987


### Exercise

Define `FibonacciGenerator` so it will iterate over all Fibonacci numbers until they are greater than a parameter `n`.

In [None]:
## Your code here


In [None]:
def FibonacciGenerator(n):
    a = 0
    b = 1
    while a < n:
        yield a
        a, b = b, a + b

for i in FibonacciGenerator(1000):
    print(i)

0
1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987


## Iteration tools

Some useful tools for iterators that come in handy are:

`zip` - iterates over multiple iterators simulataneously

In [None]:
for i, a in zip([0,1,2], ['a', 'b', 'c']):
    print(i,a)

0 a
1 b
2 c


`reversed` - iterates in reverse order

In [None]:
for i in reversed(range(3)):
    print(i)

2
1
0


`enumerate` - returns the iteration step count as well as the iterator value

In [None]:
for i, a in enumerate('abc'):
    print(i, a)

0 a
1 b
2 c


### Exercise

Implement your own versions of `zip` and `enumerate` using generators

In [None]:
## Your code here


In [None]:
def my_zip(a, b):
    ai = iter(a)
    bi = iter(b)
    while True:
        try:
            yield next(ai), next(bi)
        except:
            return

for i, a in my_zip(range(3), 'abc'):
    print(i,a)

0 a
1 b
2 c


In [None]:
def my_enumerate(a):
    ct = 0
    for x in a:
        yield ct, x
        ct = ct + 1

for i, a in my_enumerate('abcd'):
    print(i, a)

0 a
1 b
2 c
3 d


## The Itertools Package

A useful package for dealing with iterators is the [itertools package](https://docs.python.org/3.8/library/itertools.html).  Here are a few examples - click on the link to see what else the package provides.

`product` gives the equivalent of a nested for-loop

In [None]:
from itertools import product

for i, j in product(range(2), range(3)):
    print(i, j)

0 0
0 1
0 2
1 0
1 1
1 2


In [None]:
# equivalent to:
for i in range(2):
    for j in range(3):
        print(i,j)

0 0
0 1
0 2
1 0
1 1
1 2


`repeat` just repeats a value

In [None]:
from itertools import repeat

for i in repeat(10, 5):
    print(i)

10
10
10
10
10


`permutations`

In [None]:
from itertools import permutations

for p in permutations(range(3)):
    print(p)

(0, 1, 2)
(0, 2, 1)
(1, 0, 2)
(1, 2, 0)
(2, 0, 1)
(2, 1, 0)


### Exercise

Implement your own version of `product` and `repeat` using generators.

In [None]:
## Your code here


In [None]:
def my_product(a, b):
    for x in a:
        for y in b:
            yield x, y

for x, y in my_product(range(2), range(3)):
    print(x,y)

0 0
0 1
0 2
1 0
1 1
1 2


In [None]:
def my_repeat(n, ct):
    for i in range(ct):
        yield n

for x in my_repeat(10, 5):
    print(x)

10
10
10
10
10


## Iterators for Scientific Computing

One way you might use an iterator in scientific computing is when implementing an iterative algorithm.

Here is an example of the power method, which finds the largest eigenvalue-eigenvector pair of a matrix.

In [None]:
import numpy as np

def PowerMethodGenerator(A, x):

    def RayleighQuotient(A, x):
        """
        x^T A x / x^T x
        """
        return np.dot(x, A @ x) / np.dot(x,x)

    x = x / np.linalg.norm(x)
    rq_prev = np.inf
    rq = RayleighQuotient(A, x)

    while True:
        # yield state: RayleighQuotient, x, and difference from previous iteration
        yield rq, x, np.abs(rq - rq_prev)

        # compute next iteration
        x = A @ x
        x = x / np.linalg.norm(x)
        rq_prev = rq
        rq = RayleighQuotient(A, x)


In [None]:
# here's a version that uses the generator in a while-loop

n = 100
A = np.random.randn(n, n)
A = A @ A.T + 5 # constant increases eigenvalue in constant vector direction
x0 = np.random.randn(n)

solver = PowerMethodGenerator(A, x0)
tol = 1e-4

while True:
    rq, x, eps = next(solver)
    print(rq, eps)
    if eps < tol:
        break

89.81253501091724 inf
242.85910417948966 153.04656916857243
444.85274716457826 201.9936429850886
574.2764337134453 129.423686548867
605.3333719186537 31.05693820520844
611.5407739473036 6.207402028649881
612.9106617975037 1.3698878502001435
613.2485808429876 0.3379190454838863
613.3392058160957 0.09062497310810613
613.3650279556526 0.02582213955690804
613.3727234642167 0.0076955085640975085
613.3750960037759 0.002372539559132747
613.375846586167 0.0007505823911060361
613.3760887640708 0.00024217790382863313
613.376168092833 7.932876224003849e-05


If we decide that we're not satisfied with convergence yet, we can resume where we left off

In [None]:
tol = 1e-6

while True:
    rq, x, eps = next(solver)
    print(rq, eps)
    if eps < tol:
        break



613.3761943849217 2.6292088705304195e-05
613.3762031806232 8.795701432973146e-06
613.3762061456915 2.965068347293709e-06
613.3762071517388 1.006047227747331e-06
613.3762074950506 3.4331185361224925e-07


You can do the same thing with for-loops

In [None]:
# here's a version that uses the generator in a for-loop

n = 100
A = np.random.randn(n, n)
A = (A @ A.T) + 5 # constant increases eigenvalue in constant vector direction
x0 = np.random.randn(n)

solver = PowerMethodGenerator(A, x0)
tol = 1e-4

for rq, x, eps in solver:
    print(rq, eps)
    if eps < tol:
        break

tol = 1e-6
print('\nresuming iteration after decreasing tolerance\n')
for rq, x, eps in solver:
    print(rq, eps)
    if eps < tol:
        break

98.54240658228312 inf
263.89481933418114 165.35241275189802
323.65196672338845 59.75714738920732
390.0128085671451 66.36084184375665
484.9275155474049 94.91470698025978
563.0011106312606 78.07359508385576
599.1416496541569 36.14053902289629
611.5563409103167 12.414691256159813
615.457347021376 3.901006111059246
616.6819177841095 1.2245707627334923
617.078052353799 0.3961345696894796
617.2115459496376 0.13349359583867226
617.2585339675145 0.04698801787685625
617.275774701485 0.017240733970538713
617.2823363892841 0.006561687799035099
617.2849102204586 0.0025738311745726605
617.2859439103643 0.0010336899056255788
617.2863664706344 0.0004225602701808384
617.2865414495121 0.00017497887768058717
617.2866145761685 7.312665638892213e-05

resuming iteration after decreasing tolerance

617.2866453354011 3.075923257256363e-05
617.2866583321916 1.2996790474062436e-05
617.286663841021 5.508829417522065e-06
617.2866661810857 2.34006472510373e-06
617.2866671766113 9.95525624603033e-07
