# Generators 

In this tutorial we will be going over generators and how the can be advantageous over lists, especailly when we are trying to do a very memory intensive operation. Additionally, some find the generator to be more readable, since it is a bit less verbose and doens't deal with list initializations. 

To show this advantage, lets take the following example. Below is a function that takes in a list as an arguent, loops through that list, squares each value and appends in to a new list, `result`, and returns the new list.

#### A Toy example

In [1]:
def square_list(in_list): 
    result = []
    for i in in_list: 
        result.append(i**2)
    return result

test_list = [1, 2, 3, 4, 5, 6]
square_list(test_list)

[1, 4, 9, 16, 25, 36]

Simple. Let's convert this function to a generator, which will be a lot less verbose. We will get rid of all instances of the `result` output list completely, and replace it with one keyword, `yield`, instead of the appending. Yield is what makes the function a generator. 

#### The Generator 

In [2]:
def gen_func_square(in_list): 
    for i in in_list: 
        #yield the next result 
        yield (i**2)
        
gen_func_square(test_list)

<generator object gen_func_square at 0x00000139E9686518>

Interesting. We did not get what we expect. Calling the function returned a generator object, not a list. This is because the generator function yields the result one at a time. The generator is waiting for us to ask for a result. We can ask for new results by creating the generator object and calling the `next` keyword on it. Each time we call next, we will make one pass through the loop.

This is meaningful because by only having the generator object, it means the output list is not held in memory. If we know that our output is a huge, memory intensive value (think machine learning data or IoT applications where memory real estate is precious), we can use generators to only draw new values sequentially, and save that memory. 

#### Passing through the generator

In [3]:
#create the generator object, assign to variable 
gen_object = gen_func_square(test_list)

#sequentially pass through generator object with next 
print (next(gen_object))
print (next(gen_object))
print (next(gen_object))
print (next(gen_object))
print (next(gen_object))
print (next(gen_object))

1
4
9
16
25
36


#### Stop Iteration

But what if I was to call `next` one more time, after we have passed through the entire list? 

In [4]:
print (next(gen_object))

StopIteration: 

This StopIteration exception means that the generator has been exhausted and it is out of values. 

Now this was a nice toy example, but manually stepping through the iteration is pretty cumbersome and doesn't really save anytime. We can also use `next` and pass through the generator using a for loop. The nice thing about this is that the for loop knows when to stop and will not run into the StopIteration exception. 

#### Loop through a generator 

In [5]:
#need to recreate gen object since it was previously exhausted
gen_object = gen_func_square(test_list)

for square in gen_object: 
    print (square)

1
4
9
16
25
36


## A Larger Example

Okay, that was just an introduction to generators, but to behold their true power, we need to be doing more memory intensive tasks. Let see an example, where I take the average of a vector that is 100 values long. I do this 1 million times, something that can be quite common in programming projects at scale, append the average to a running list.

In [33]:
import time 
import random
import numpy as np
from pympler import asizeof #memory profiler 

#loop through length of how_long
#calculate mean of random vector 
#append that mean to a running list 

def mean_list(how_long):    
    means = []
    for i in range(how_long):
        #create vector of 100 random numbers range=[0, 100]
        vec = np.random.uniform(0, 100, size=100)
        avg = np.mean(vec)
        means.append(avg)
    return means 

#do the same with a generator 
def mean_gen(how_long):    
    for i in range(how_long):
        vec = np.random.uniform(0, 100, size=100)
        yield np.mean(vec)

You don't need to worry too much about the code below to understand what it is doing. Basically I am running the list function defined above, timing it, and probing how much memory the returned list object takes. 

In [23]:
#here I am just timing how long it takes to run the function 
start_time = time.clock()
test_avg = mean_list(1000000)
end_time = time.clock()

#and the memory it uses 
byte_size = asizeof.asizeof(test_avg)
MB = byte_size/1000000

print ('process took {} seconds'.format(end_time-start_time))
print ('process used {} megabytes'.format(MB))

process took 12.05374617283951 seconds
process used 40.697464 megabytes


Wow. Ok so 12 seconds and 41 megabytes to do the 1 million iterations. That's a lot. Now let's try the generator. Note that this is the same exact test code, but I just swap out mean_list with mean_gen. 

In [28]:
#here I am just timing how long it takes to run the function 
start_time = time.clock()
test_avg = mean_gen(1000000)
end_time = time.clock()

#and the memory it uses 
byte_size = asizeof.asizeof(test_avg)
MB = byte_size/1000000.

print ('process took {} seconds'.format(end_time-start_time))
print ('process used {} bytes'.format(byte_size))

process took 9.283950623739656e-05 seconds
process used 0 megabytes


Both the time and memory useage are essentially negligible! To pass through the generator we can just use next. 

In [36]:
start_time = time.clock()
test_avg = mean_gen(1000000)
for i in test_avg: 
    pass
end_time = time.clock()
print ('process took {} seconds'.format(end_time-start_time))

11.759528296296253


Note that this took roughly the same around of time, but used waaay less memory. One drawback of not holding the entire list in memory is that we can not probe a particular index of the generator, the way we do a list. Like `my_list[3]`, grabbing the 4th index. We can still quickly get a list from the generator by calling `list()` on it, but then we lose the nice properties of a generator, since it is a list again. 

In [40]:
mean_gen_toList = list(mean_gen(10000))
mean_gen_toList[5064]

50.452362059533428

In this way, generators are most useful when we have a memory intensive task and you do not need hold on to the full results of your processing. An example is in machine learning, when you may need to loop through your data many times to train your model. You can still keep the results at the end with logging techniques (outside the scope of this tutorial), but you don't really care about how each individual data value was processed; just the end result. 