# PYTHON GENERATORS

## Basics:

Common 'functions' or 'subroutines' return the control flow of the application to the point them are being called.


In [2]:
def common_function(n):
    i = 0
    while i < n:
        return i

In [3]:
c = common_function(8)
print c
cc = common_function
print type(cc)
print cc(8)

0
<type 'function'>
0


If we call them again, everything starts from 'scratch'

In [3]:
print cc(8)

0


## PEP 255 : Simple generators (2001)

Introduces basic syntax and 'yield' statement


The idea:
----------
Provide a mechanishm to have a function that return a value mantaining the current local status of the function. 

In [4]:
def gen(n):
    i = 0
    while i < n:
        yield i
        i+=1    

In [5]:
g = gen(8)
print g

<generator object gen at 0x7f4db82d4e60>


Let's call it again

In [6]:
print g.next()
print g.next()
print g.next()

0
1
2


The local status of the generator is keept, and on the *next* call is resumed

In [7]:
print g.next()
print g.next()
print g.next()

3
4
5


One more time!

In [8]:
print g.next()
print g.next()
print g.next()

6
7


StopIteration: 

### OUCH!
What is happening?, what are those weird 'next' and 'StopIteration'?
A step by step example:

In [12]:
def simple():
    print "I'm simple :P"
    yield 1
    yield 2
    yield 3

Behind the scenes, when the generator function is called no code is executed, that conforms with the iterator protocol.
Is not until, each time the *next()* method of the generator is called, that the body of the function is executed, **until** a *yield* a *return* or *the end* of the body is found.

In [13]:
s = simple()

In [14]:
dir(simple())

['__class__',
 '__delattr__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__hash__',
 '__init__',
 '__iter__',
 '__name__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'next',
 'send',
 'throw']

In [15]:
# internal print is called
# first yield returned
print s.next()
print next(s) ## Follows the iterator protocol
print s.next()

I'm simple :P
1
2
3


next(*iterator*[, default]) https://docs.python.org/2/library/functions.html#next

What is going on here ?
--------------------------
As said, the generator is "freezing" is local state, that includes, local bindings, instruction pointer, internal stack, as the PEP sais:

> enough information is saved so that the next time .next() is invoked, the function can
> proceed exactly as if the yield statement were just another external call.


In [16]:
print s.next()

StopIteration: 

StopIteration
---------------
So each time we call the *next* method on the generator, it gives us the next element and if there are no more elements, it raises a StopIteration. Like the Iterator protocol

Return
-------
A genrator can also have a return statement, that simply returns ('Hey!, i'm done') as any function but then a StopIteration is raised, indicating that the generator is exhausted.
Note that return != raise StopIteration

In [14]:
def gen_except():
    try:
        raise StopIteration
    except:
        yield 42

# The StopIteration exception is being capturred by the except clause.
print list(gen_except())

[42]


Restrictions
--------------
 - ~~A yield statement is not allowed in the try clause of a try/finally construct.
    There's no guarantee the generator will ever be resumed, hence no guarantee that the finally
    block will ever get executed~~. Addition to .close() method (PEP 343)

 - A generator cannot be resumed while it is actively running

In [18]:
def restriction1():
    yield 1
    try:
        yield 1/0
    finally:
        print "Leñe!"
        

r = restriction1()
print list(r)

Leñe!


ZeroDivisionError: integer division or modulo by zero

In [19]:
def restriction2():
    i = r2.next()
    yield i

r2 = restriction2()
print list(r2)

ValueError: generator already executing

Exception propagation
------------------------
If an exception is raised by, or passes through, a generator function, then the exception is passed on to the caller in the usual way, and subsequent attempts to resume the generator function raise StopIteration, terminating generator's useful life.


In [20]:
def zero_div():
    return 1/0

def gen_numbers():
    yield 1
    yield zero_div()
    yield 3

g = gen_numbers()
g.next()
try:
    g.next()
except ZeroDivisionError:
    pass

# generator exhausted StopIteration
g.next()

    

StopIteration: 

In [18]:
def odd_numbers():
    """ Print an infinite sequence of odd numbers """
    last = 0
    while 1:
        if last % 2:
            yield last
        last += 1 

In [21]:
def print_hexas(generator):
    for i in xrange(18):
        print generator.next()
    print "..."

def hexa():
    i = 0
    while 1:
        yield "{:010x}".format(i)
        i += 1

h =  hexa()

print_hexas(h)

import itertools

def hexa():
    for i in itertools.count():
        yield "{:010x}".format(i)

h = ("{:010x}".format(i) for i in itertools.count())
print_hexas(h)


0000000000
0000000001
0000000002
0000000003
0000000004
0000000005
0000000006
0000000007
0000000008
0000000009
000000000a
000000000b
000000000c
000000000d
000000000e
000000000f
0000000010
0000000011
...
0000000000
0000000001
0000000002
0000000003
0000000004
0000000005
0000000006
0000000007
0000000008
0000000009
000000000a
000000000b
000000000c
000000000d
000000000e
000000000f
0000000010
0000000011
...


In [20]:
import time
from functools import wraps

def time_log(func):
    @wraps(func)
    def wrapper(*args, **kw):
        start = time.time()
        func(*args, **kw)
        print "TIME: ", time.time() - start
    return wrapper
    

## PEP 289 : Genetaror expresions

The idea:
----------

Generator expressions as a high performance, memory efficient generalization of list comprehension and generators


In [21]:
print "SUM: ", sum([x*x for x in range(10)])

SUM:  285


Build a full list of squares in memory, iterate over those values, and, when the reference is no longer needed, delete the list.

In [22]:
print "SUM: ", sum(x*x for x in range(10))

SUM:  285


In [23]:
@time_log
def summation():
    return sum([x*x for x in range(100000000)])

summation()

TIME:  32.2357950211


In [24]:
@time_log
def summation_gen():
    return sum(x*x for x in range(100000000))

summation_gen()

NameError: name 'time_log' is not defined

In [25]:
import dis
dis.dis(lambda: sum([x*x for x in range(100000000)]))

  2           0 LOAD_GLOBAL              0 (sum)
              3 BUILD_LIST               0
              6 LOAD_GLOBAL              1 (range)
              9 LOAD_CONST               1 (100000000)
             12 CALL_FUNCTION            1
             15 GET_ITER            
        >>   16 FOR_ITER                16 (to 35)
             19 STORE_FAST               0 (x)
             22 LOAD_FAST                0 (x)
             25 LOAD_FAST                0 (x)
             28 BINARY_MULTIPLY     
             29 LIST_APPEND              2
             32 JUMP_ABSOLUTE           16
        >>   35 CALL_FUNCTION            1
             38 RETURN_VALUE        


In [26]:
import dis
dis.dis(lambda: sum(x*x for x in range(100000000)))

  2           0 LOAD_GLOBAL              0 (sum)
              3 LOAD_CONST               1 (<code object <genexpr> at 0x7f7f792d8930, file "<ipython-input-26-3f3c232968ae>", line 2>)
              6 MAKE_FUNCTION            0
              9 LOAD_GLOBAL              1 (range)
             12 LOAD_CONST               2 (100000000)
             15 CALL_FUNCTION            1
             18 GET_ITER            
             19 CALL_FUNCTION            1
             22 CALL_FUNCTION            1
             25 RETURN_VALUE        


In [27]:
#%load_ext memory_profiler
#%load_ext line_profiler

In [28]:
#%mprun -f summation_gen summation_gen()

In [25]:
import zipfile
import re
import itertools
import datetime
import collections
import pprint

def gen_zip_open(filename):
    subs_zip = zipfile.ZipFile(filename)
    for zip_name in subs_zip.namelist():
        yield subs_zip.open(zip_name)

def gen_reader(sources):
    for source in sources:
        for item in source:
            yield item

def gen_grouper_v1(source):
    """ Makes 'groups_num' groups fom iterable """
    group = []
    speech = []
    count = 0
    for line in source:
        if line in ('\n', '\r\n'):
            group.append(speech)
            yield group
            count = 0
            group = []
            speech = []
            continue
        elif count == 0 or count == 1:
            group.append(line.strip())
        else:
            speech.append(line.strip())
        count += 1

def gen_grouper(source):
    """ Makes 'groups_num' groups fom iterable """
    while 1:
        pack = []
        speech = []
        pack.append(source.next().strip())
        pack.append(source.next().strip())
        for line in source:
            if line in ('\n', '\r\n'):
                pack.append(speech)
                yield pack
                break
            speech.append(line.strip())

            
def gen_time_extract(time_string):
    to_datetime = lambda t: datetime.datetime.strptime(t.strip(), "%H:%M:%S,%f") 
    start_end = (to_datetime(time) for time in time_string.split("-->"))
    start = start_end.next()
    yield start
    end = start_end.next()
    yield end
    yield end - start
    
def gen_subs_mapper(lines):
    """
    1
    00:02:28,546 --> 00:02:31,344
    Estación comando, habla ST-321.

    2
    00:02:31,448 --> 00:02:32,847
    Código de aprobación azul.
    """
    for line, time, speech in lines:
        tt = gen_time_extract(time)
        yield {
            "line": int(line),
            "start": next(tt),
            "end": next(tt),
            "diff": next(tt),
            "speech": speech
        }

def stats(dicts):
    total = 0
    father = 0
    biggest_string = None
    longest = None
    phrases = collections.Counter()
    words = collections.Counter()
    for dic in dicts:
        phrases.update(dic["speech"])
        words.update(itertools.chain.from_iterable(line.split() for line in dic["speech"]))
        father += sum(1 for papa in gen_grep("padre", dic["speech"]))
        total += 1
        longest = dic if longest is None or longest["diff"] < dic["diff"] else longest
    return { "total": total, "father": father, "longest": longest, "phrases": phrases, "words": words }
        
def gen_grep(pattern, lines):
    compiled = re.compile(pattern)
    for line in lines:
        if compiled.search(line): yield line
    
        
subs_filename = './return_of_the_jedi.zip'
#
subs_open = gen_zip_open(subs_filename)
#
lines_stream = gen_reader(subs_open)
#
groups = gen_grouper(lines_stream)
#
speech_dicts = gen_subs_mapper(groups)
#
results = stats(speech_dicts)

print "Total lines: ", results["total"]
print "Longest (in time): "
pprint.pprint(results["longest"])
print "Father: ", results["father"]
print "5 Most common phrases", results["phrases"].most_common(5)
print "10 Most common words", results["words"].most_common(15)


Total lines:  964
Longest (in time): 
{'diff': datetime.timedelta(0, 6, 963000),
 'end': datetime.datetime(1900, 1, 1, 0, 10, 13, 566000),
 'line': 74,
 'speech': ['No entregar\xe9 mi adorno preferido.'],
 'start': datetime.datetime(1900, 1, 1, 0, 10, 6, 603000)}
Father:  22
5 Most common phrases [('Bien.', 4), ('-Te amo.', 2), ('Debe permitir que hable.', 2), ('\xa1No se muevan!', 2), ('\xa1Ap\xfantala a la cubierta!', 2)]
10 Most common words [('de', 150), ('que', 116), ('la', 110), ('a', 81), ('el', 72), ('No', 70), ('no', 51), ('tu', 47), ('un', 47), ('en', 45), ('lo', 39), ('te', 36), ('para', 32), ('y', 32), ('es', 31)]


## PEP 342 : Coroutines via enhaced generators

The idea:
----------
Well, now *yield* can accept values...

...coroutines topic gives us for a complete talk... **¿?¿?**?