# Iterables, Generators and Yield in Python

****
## About this notebook: 

Notebook prepared by **Jesus Perez Colino** Version 0.1, First Released: 01/12/2014, Alpha.  

- This work is licensed under a [Creative Commons Attribution-ShareAlike 3.0 Unported License](http://creativecommons.org/licenses/by-sa/3.0/deed.en_US). This work is offered for free, with the hope that it will be useful.


- **Summary**: This notebook is a brief introduction to the concept of **iterable**, **generators** and the keyword **yield** for Python 2.7
****

## Iterables

When you create a list, you can read its items one by one, and it's called iteration:

In [1]:
mylist = [1, 2, 3]
for i in mylist:
    print (i)

1
2
3


Mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

In [2]:
mylist = [x*x for x in range(3)]
for i in mylist:
    print (i)

0
1
4


Everything you can use "for... in..." on is an iterable: lists, strings, files... These iterables are handy because you can read them as much as you wish, but you store all the values in memory and it's not always what you want when you have a lot of values.

Another example of a huge list, that we cannot afford to keep in memory (10Mb in memory)

In [28]:
# Build and return a list
def firstn(n):
    num, nums = 0, []
    while num < n:
        nums.append(num)
        num += 1
    return nums

sum_of_first_n = sum(firstn(1000000))
print sum_of_first_n


499999500000


## Generators

A problem with lists is that they can easily grow very big. `range(1000000)` creates an actual list of 1 million elements. If you only need to deal with them one at a time, this can be a huge source of inefficiency (or of running out of memory). If you potentially only need the first few values, then calculating them all is a waste.

A **generator** is something that you can iterate over (for us, usually using for) but whose values are produced only as needed (lazily).

Unlike normal functions that return a value and exit, **generator functions** automatically suspend and resume their execution and state around the point of value generation. Because of that, they are often a useful alternative to both computing an entire series of values up front and manually saving and restoring state in classes. The state that generator functions retain when they are suspended includes both their code location, and their entire local scope. Hence, their local variables retain information between results, and make it available when the functions are resumed.



In [8]:
mygenerator = (x*x for x in range(3))
for i in mygenerator:
    print (i)
    

0
1
4


It is just the same except you used () instead of []. Here you have another example:

In [34]:
# list comprehension
doubles = [2 * n for n in range(50)]

# same as the list comprehension above
doubles = list(2 * n for n in range(50))

But, you can not perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

In the the next example, we implement the generator pattern by ourselves:

In [35]:
# Using the generator pattern (an iterable)
class firstn(object):
    def __init__(self, n):
        self.n = n
        self.num, self.nums = 0, []

    def __iter__(self):
        return self

    def next(self):
        if self.num < self.n:
            cur, self.num = self.num, self.num + 1
            return cur
        else:
            raise StopIteration()

            
sum_of_first_n = sum(firstn(1000000))
print sum_of_first_n

499999500000


Furthermore, this is a pattern that we will use over and over for many similar constructs. Imagine writing all that just to get an iterator. That is when **yield** come to help us.

## Yield

The main difference between **generator** and *normal functions* is that a generator **yields** a value, rather than returning one—the `yield` statement suspends the function and sends a value back to the caller, but retains enough state to enable the function to resume from where it left off. When resumed, the function continues execution immediately after the last yield run. From the function’s perspective, this allows its code to produce a series of values over time, rather than computing them all at once and sending them back in something like a list.

In [1]:
def gensquares(N):
    for i in range(N):
        yield i ** 2 # Resume here later


This function yields a value, and so returns to its caller, each time through the loop; when it is resumed, its prior state is restored, including the last values of its variables i and N, and control picks up again immediately after the yield statement. For example, when it’s used in the body of a for loop, the first iteration starts the function and gets its first result; thereafter, control returns to the function after its yield statement each time through the loop:

In [3]:
for i in gensquares(5): # Resume the function 
    print(i) # Print last yielded value

0
1
4
9
16


Python provides generator functions as a convenient shortcut to building iterators. Lets us rewrite the above iterator as a generator function:

In [33]:
# a generator that yields items instead of returning a list
def firstn(n):
    num = 0
    while num < n:
        yield num
        num += 1

sum_of_first_n = sum(firstn(1000000))

Note that the expression of the number generation logic is clear and natural. It is very similar to the implementation that built a list in memory, but has the memory usage characteristic of the iterator implementation. Yield is a keyword that is used like return, except the function will return a generator.

In [18]:
def createGenerator():
    mylist = range(3)
    for i in mylist:
        yield i*i

mygenerator = createGenerator() # create a generator
print('Object type:', mygenerator) # mygenerator is an object!



('Object type:', <generator object createGenerator at 0x105c39690>)


In [20]:
for i in mygenerator:
    print (i)

0
1
4


Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that **when you call the function, the code you have written in the function body does not run**. The function **only returns the generator object**. Then, your code will be run each time the for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return.

The generator is considered empty once the function runs but does not hit yield anymore. It can be because the loop had come to an end, or because you do not satisfy a "if/else" anymore.

### Why generators functions?

Generators can be better in terms of both memory use and performance in larger programs. They allow functions to avoid doing all the work up front, which is especially useful when the result lists are large or when it takes a lot of computation to produce each value. Generators distribute the time required to produce the series of values among loop iterations.

Moreover, for more advanced uses, generators can provide a simpler alternative to manually saving the state between iterations in class objects—with generators, variables accessible in the function’s scopes are saved and restored automatically.

### Why generator expressions?

Just like generator functions, generator expressions are a memory-space optimization —they do not require the entire result list to be constructed all at once, as the square- bracketed list comprehension does. Also like generator functions, they divide the work of results production into smaller time slices—they yield results in piecemeal fashion, instead of making the caller wait for the full set to be created in a single call.

On the other hand, generator expressions may also run slightly slower than list com- prehensions in practice, so they are probably best used only for very large result sets, or applications that cannot wait for full results generation. A more authoritative state- ment about performance, though, will have to await the timing scripts we’ll code in the next chapter.