**About this notebook**

This notebook/presentation has been prepared for the 2017 edition of http://python.g-node.org, the renowned Advanced Scientific Programming in Python summer school (Happy 10th Anniversary!!). I gratefully acknowledge the efforts of the entire Python community, which produced great documentation I largely consumed to create this notebook; a list of which can be found at the end of the notebook. If I have missed anyone, apologies, let me know and I'll add you to the list!

Although you should be able to run the notebook straight out of the box, bear in mind that it was designed to work in conjunction with the following nbextensions:
* RISE by https://github.com/damianavila/RISE (enables the slideshow)
* Runtools by https://github.com/ipython-contrib/jupyter_contrib_nbextensions/wiki/Runtools (runs the entire notebook regardless of exceptions thrown on the way, as we will be covering recovering from errors)

The repository also contains exercises, with and without solutions, which I borrowed from last year's edition of the summer school.

I hope you enjoy it! By all means get in touch! :)

Etienne

<center><a href="https://twitter.com/@etienneroesch"><span class="fa-stack fa-lg">
<i class="fa fa-circle fa-stack-2x"></i>
<i class="fa fa-twitter fa-stack-1x fa-inverse"></i></span></a> <a href="http://etienneroes.ch"><span class="fa-stack fa-lg">
<i class="fa fa-circle fa-stack-2x"></i>
<i class="fa fa-home fa-stack-1x fa-inverse"></i></span></a> <a href="http://github.com/eroesch"><span class="fa-stack fa-lg">
<i class="fa fa-circle fa-stack-2x"></i>
<i class="fa fa-github fa-stack-1x fa-inverse"></i></span></a></center>

In [1]:
import sys
print('Python version ' + sys.version)

Python version 3.6.1 |Anaconda 4.4.0 (x86_64)| (default, May 11 2017, 13:04:09) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]


# Iterators, generators, decorators, and context managers

Etienne B. Roesch   |   University of Reading

http://etienneroes.ch

## I am...

* an old fashioned software engineer
* a cognitive scientist, and passionate interdisciplinarian
    * perception and experience
    * methods (EEG-fMRI, ERG, psychophysics)
    * modelling
* an increasingly bigger data person
  (soon Google Cloud Platform certified)

## Take-home message
* _Iterators_ are arcane mechanisms that support loops, and everything else;

* _Generators_ are kinds of iterators that provide a level of optimisation and interactivity;

* _Decorators_ are a mechanism to incrementally power-up existing code;

* _Context managers_ are semantically related to decorators, to manage resources properly.

## Iterators

An iterator is any Python type that can be used with a _for_ loop.

They implement the _iterator protocol_, which describes implicit methods, like \_\_init\_\_, to iterate in sets of objects. In Python 3, you find them everywhere, e.g. files, i/o.

<font size="4">https://docs.python.org/3.6/whatsnew/2.2.html#pep-234-iterators</font>

In [2]:
import numpy as np
nums = np.arange(2)    # ndarray contains [0, 1]

In [3]:
for x in nums:
    print(x, end=" ")

0 1 

In [4]:
iter(nums)             # ndarray is an iterable

<iterator at 0x1157caf28>

In [5]:
it = iter(nums)
it.__next__()          # One way to iterate

0

In [6]:
next(it)    # Another way to iterate

1

In [7]:
next(it)    # Raises StopIteration exception

StopIteration: 

<font size="4">http://www.scipy-lectures.org/intro/language/exceptions.html#exceptions</font>

Leonardo Filius Bonacci (1175-1245), aka Leonardo Fibonacci, defines the _recurrence relation_ that now bears his name and fuels conspiracy theorists.

<center>$F_{n} = F_{n-1} + F_{n-2}$ given $F_{0} = 0, F_{1} = 1$</center>

<center>![noimg](picts/FibonacciSpiral.png)</center>

In [8]:
class Fib:
    '''Iterator to calculate the Fibonacci series'''

    def __init__(self, max):
        self.max = max

    def __iter__(self):    # defines initial conditions
        self.a = 0
        self.b = 1
        return self

    def __next__(self):    # defines behaviour for next()
        fib = self.a
        if fib > self.max:
            raise StopIteration # is caught when in _for_ loop
        temp_b = self.a + self.b
        self.a = self.b
        self.b = temp_b
        return fib

# 33rd degree in Freemason Antient & Accepted Scottish Rite
for i in Fib(33):
    print(i, end=' ')   # literally calls the __next__() method

0 1 1 2 3 5 8 13 21 

## Generators

Generators (_generator-iterators_ as they are called) is a mechanism to simplify this process.

Python provides the _yield_ keyword to define generators, which takes care of \_\_iter\_\_ and \_\_next\_\_ for you.

<font size="4">https://www.python.org/dev/peps/pep-0255/</font>

In [9]:
def fib_without_generator_iterator(max):
    numbers = []          # Needs to return an array of values
    a, b = 0, 1           # a = 0  and  b = 1
    while a < max:
        numbers.append(a)
        a, b = b, a + b   # Evalute right-hand side and assign
    return numbers        # Returns full list of numbers

for i in fib_without_generator_iterator(33):
    print(i, end=" ")     # iterates through array of values

0 1 1 2 3 5 8 13 21 

In real life problems, this way of doing things is problematic because it forces us to compute all numbers in turn and to store everything in one go.

<pre><code>yield expression_list</pre></code>

_yield_ does something similar to _return_:
* _return_ gives back control to the caller function, and returns some content;
* _yield_ freezes execution temporarily, stores current context, and returns some content to .\_\_next\_\_()'s caller;

_yield_ saves local state and variables, instruction pointer and internal evaluation stack; i.e. enough information so that .\_\_next\_\_() behaves like an external call.

In [10]:
def fib_with_yield(max):
    '''fib function using yield'''
    a, b = 0, 1          # a = 0  and  b = 1
    while a < max:
        yield a          # freezes execution, returns current a
        a, b = b, a + b  # a = b  and  b = a + b

for i in fib_with_yield(33):
    print(i, end=" ")

0 1 1 2 3 5 8 13 21 

In [11]:
my_masonic_secret = fib_with_yield(33)
my_masonic_secret

<generator object fib_with_yield at 0x11583ebf8>

In [12]:
next(my_masonic_secret)

0

In [13]:
next(my_masonic_secret)

1

In [14]:
next(my_masonic_secret)

1

In [15]:
next(my_masonic_secret)

2

... and so on.

Python's _list comprehension_, with [..], computes everything at once and can take a lot of memory.

In [16]:
squares = [i**2 for i in range(10)]
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Generator expressions, with (..), are computed _on demand_.

In [17]:
squares = (i**2 for i in range(10))
squares

<generator object <genexpr> at 0x11583ee08>

<font size="4">https://www.python.org/dev/peps/pep-0289/</font>

On-demand calculation is important for the streamed processing of big amount of data; where the size of the data is uncertain, values of parameters are changing, etc, or when the processing steps might take a long time, yields errors or enter infinite loops.

Generators are also an easier way to handle callbacks, and can be used to simulate concurrency.

<font size="4">https://www.python.org/dev/peps/pep-0342/</font>

<font size="5">Bash pipeline to count the number of characters, omitting whitespaces,


per line, in a given file:</font>

In [18]:
!sed 's/Ë†//g' ./custom.css | tr -d ' ' | awk '{ printf "%i ", length($0); }'
# Noticed the magic "!"? Type %lsmagic in a cell to learn more
# https://blog.dominodatalab.com/lesser-known-ways-of-using-notebooks/

5 41 15 14 21 1 28 11 1 

<font size="5">or equivalent* processing pipeline using native generators (alt. use http://toolz.readthedocs.io/):</font>

In [19]:
my_custom_css = open("./custom.css")
line_stripped = (line.replace(" ", "") for line in my_custom_css)
size_line = (len(line) for line in line_stripped)
for i in size_line:
    print(i, end=" ")

6 42 16 15 22 2 29 12 2 

<font size="4">*: not exactly the same because _sed_ strips end of line carriages.</font>

<font size="6">In your research, you may only need to analyse one single .csv file.

More likely, you will be faced with increasingly bigger and more complex data, leaning towards so-called "big data", whatever that actually is.</font>

<font size="6">This data won't fit in your workspace, may be "live" and constantly changing, and will require real-time or batch analysis methods; e.g., you won't be able to store raw data, but will have to filter it, compute "metrics", like averages, standard deviations, to then make a decision about what to do with the data.</font>

<font size="6">You'll enter the realm of big data techniques, which will attempt to decouple data handling from analysis, and pipeline steps of preprocessing to ease analyses proper.</font>

<font size="5"> Keywords: dataflow, processing pipelines and stream processors, map reduce, lambda & kappa architectures, dremmel; e.g., hadoop.</font>

You can simulate concurrency, by interacting with instantiated (currently alive) functions.

In [20]:
def receiver():
    while True:
        item = yield
        print("I'm currently processing:", item)
        
recv = receiver()   # Instantiate function
next(recv)          # Starts function, alt. recv.send(None)
recv.send("Hello")  # Python's .send() to function communicate..
recv.send("World")  # ..with the instantiated object

I'm currently processing: Hello
I'm currently processing: World


In [21]:
recv.close()        # Obviously, clean up after yourself

### Generator-iterator cheatsheet

<font size="5"><pre><code>
def my_generator():
    ...
    item = yield
    ...
    value = do_something(item)
    ...
    yield value   # return value
   
   
gen = my_generator()

next(gen)                 # Starts generator and advances to _yield_
value = gen.send(item)    # Sends and receives stuff
gen.close()               # Terminates
gen.throw(exc, val, tb)   # Raises exception
result = yield from gen   # Handles callback and returns content
</pre></code></font>

<font size="4">https://www.python.org/dev/peps/pep-0342/</font>

## Decorators

Functions are objects themselves.

In [22]:
def shout(word="hello world"):
    return word.capitalize() + "!"
print(shout())

Hello world!


In [23]:
yell = shout
print(yell())

Hello world!


In [24]:
del shout
try:     # this is how you catch an Exception
    print(shout())      # This won't work
except NameError as e:
    print(e)  
print(yell())           # But this still works

name 'shout' is not defined
Hello world!


Therefore, functions can be defined inside other functions.

In [25]:
def languaging():
    def whisper(word="Hello world"):
        return word.lower() + "..."
    print(whisper())
languaging()

hello world...


In [26]:
try:
    print(whisper())      # is outside the scope!
except NameError as e:
    print(e)

name 'whisper' is not defined


In [27]:
def languaging(type="shout"):
    def shout(word="hello world"):
        return word.capitalize() + "!"
    
    def whisper(word="hello world"):
        return word.lower() + "..."
    
    if type == "shout":
        return shout
    else:
        return whisper

speak = languaging()
print(speak)

<function languaging.<locals>.shout at 0x115861488>


In [28]:
print(speak())

Hello world!


In [29]:
print(languaging("whisper")())

hello world...


If functions, _as objects_, can be returned, they can also be arguments!

In [30]:
def my_good_old_analysis():
    print("Ah, the way we've always done analysis.")
my_good_old_analysis()

Ah, the way we've always done analysis.


In [31]:
def deprecated(my_function):
    def wrapper():
        print("!!! You should not be using this function.")
        my_function()
        print("!!! Please, don't do it.")
    return wrapper
my_good_old_analysis = deprecated(my_good_old_analysis)
my_good_old_analysis()

!!! You should not be using this function.
Ah, the way we've always done analysis.
!!! Please, don't do it.


And this is *exactely* what _decorators_ do!

In [32]:
def deprecated(my_function):
    def wrapper():
        print("!!! You should not be using this function.")
        my_function()
        print("!!! Please, don't do it")
    return wrapper

@deprecated  # <-- ain't this a pretty decorator?
def my_even_older_analysis():
    print("Aaaaah, please kill me.")

my_even_older_analysis()

!!! You should not be using this function.
Aaaaah, please kill me.
!!! Please, don't do it


<font size="5">Some in-built Python decorators will ease _abstraction_ (only expose relevant information) and _encapsulation_ (combine data and functions in a usable unit). See: https://docs.python.org/3.6/howto/descriptor.html</font>

In [33]:
class My_class:
    def __init__(self,x):
        self.x = x

    @property                # In-built Python decorator
    def x(self):             # x is publicly accessible 
        return self.__x      # __x is private

    @x.setter                # ".setter" in-built Python decorator
    def x(self, x):
        if x < 0:            # Implementation is hidden to end-users
            self.__x = 0     # __x actually stores the data 
        elif x > 1000:       # "__" is warning to end-users
            self.__x = 1000  # that things under the hood might
        else:                # change in future releases
            self.__x = x
            
my_instance = My_class(10000)
print( my_instance.x )

1000


In [34]:
def bread(my_function):
    def wrapper():
        print(" /''''''\ ")
        my_function()
        print(" \______/ ")
    return wrapper

def ingredients(my_function):
    def wrapper():
        print("@Tomatoes@")
        my_function()
        print("~~Salad~~")
    return wrapper

In [35]:
@bread         # Order matters
@ingredients   #
def sandwich(food="---Ham---"):
    print(food)
sandwich()

 /''''''\ 
@Tomatoes@
---Ham---
~~Salad~~
 \______/ 


## Context managers

Context managers are semantically related to decorators.

They aim primarily to help you manage resources properly, i.e., groom your memory, avoid consumer bottlenecks, clean up after yourself, maintain livelihood of connections (db), etc, and other sensible things.

In [36]:
files = []
for x in range(100000):
    files.append(open("how_to_mess_up_my_memory.txt", "w"))
#.. at this point of the notebook, I have messed up my memory
# and won't be able to open any more files

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "//anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-36-bfc1ac5e0354>", line 3, in <module>
    files.append(open("how_to_mess_up_my_memory.txt", "w"))
OSError: [Errno 24] Too many open files: 'how_to_mess_up_my_memory.txt'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "//anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 1821, in showtraceback
    stb = value._render_traceback_()
AttributeError: 'OSError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "//anaconda/lib/python3.6/site-packages/IPython/core/ultratb.py", line 1132, in get_records
  File "//anaconda/lib/python3.6/site-packages/IPython/core/ultratb.py", line 313, in

OSError: [Errno 24] Too many open files: 'how_to_mess_up_my_memory.txt'

<font size="5">In real life, you are dealing with finite resources. When you allocate some resource to a particular task, you need to make sure you use only what you need, and when you are done, you release it for other task/people to use.</font>

In [37]:
print(my_custom_css)       # Remember me? (see Section Generators)
if not my_custom_css.closed:
    print("Clean up, or you'll mess up your memory!")

<_io.TextIOWrapper name='./custom.css' mode='r' encoding='UTF-8'>
Clean up, or you'll mess up your memory!


In [38]:
my_custom_css.close()     # Always clean up after yourself!
del my_custom_css
my_custom_css

NameError: name 'my_custom_css' is not defined

That's primarily what _context managers_ do for you.

In [39]:
# First, I need to clean up the mess I created by opening
# 100K files, otherwise I won't be able to open files
files = []

#for name in dir():
#    if not name.startswith('files'):
#        del globals()[name]

In [40]:
with open("./custom.css") as my_custom_css:
    for line in my_custom_css:
        print(len(line), end=" ")

7 49 19 18 25 2 30 13 2 

In [41]:
if not my_custom_css.closed:
    print("Clean up, or you'll mess up your memory!")
else:
    print("It's already closed! Ain't that wonderful?")

It's already closed! Ain't that wonderful?


That's all there is to it: the _with_.._as_ statement instantiates a variable that is short-lived, in a given scope.

It automatically calls a number of "management" functions for you.

You'll find context managers for files, locks, threads, database connections, and you can implement your own.

In [42]:
class File():
    def __init__(self, filename, mode):
        self.filename = filename
        self.mode = mode

    def __enter__(self):
        self.open_file = open(self.filename, self.mode)
        return self.open_file

    def __exit__(self, *args):
        self.open_file.close()

files = []
for _ in range(100000):
    with File('that_shouldnt_mess_up_my_memory.txt', 'w') as myfile:
        files.append(myfile)
len(files)

100000

In [43]:
for i in range(len(files)):
    if not files[i].closed:
        print("Arrg, files[%i] is not closed!" % i)
        # Hopefully, there is no output to this cell!! :)

## Take-home message
* _Iterators_ are arcane mechanisms that support loops, and everything else;

* _Generators_ are kinds of iterators that provide a level of optimisation and interactivity;

* _Decorators_ are a mechanism to incrementally power-up existing code;

* _Context managers_ are semantically related to decorators, to manage resources properly.

## Grateful acknowledgements and sources of inspiration

* The Python Documentation: https://docs.python.org/
* Generators: http://intermediatepythonista.com/python-generators
* Decorators: http://sametmax.com/ for some examples
* Context managers: https://jeffknupp.com/blog/2016/03/07/python-with-context-managers/
* Zbigniew for last year's exercises!

<center><font size="9">Thanks for your attention!</font></center>

<center><a href="https://twitter.com/@etienneroesch"><span class="fa-stack fa-lg">
<i class="fa fa-circle fa-stack-2x"></i>
<i class="fa fa-twitter fa-stack-1x fa-inverse"></i></span></a> <a href="http://etienneroes.ch"><span class="fa-stack fa-lg">
<i class="fa fa-circle fa-stack-2x"></i>
<i class="fa fa-home fa-stack-1x fa-inverse"></i></span></a> <a href="http://github.com/eroesch"><span class="fa-stack fa-lg">
<i class="fa fa-circle fa-stack-2x"></i>
<i class="fa fa-github fa-stack-1x fa-inverse"></i></span></a></center>