<div style="position: relative;">
<img src="https://user-images.githubusercontent.com/7065401/98728503-5ab82f80-2378-11eb-9c79-adeb308fc647.png"></img>

<h1 style="color: white; position: absolute; top:27%; left:10%;">
     Advanced Python
</h1>
<h2 style="color: white; position: absolute; top:36%; left:10%;">
    Iterators, Generators, Context Managers, and Decorators
</h2>


<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:58%; left:10%;">
    David Mertz, Ph.D.
</h3>

<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:63%; left:10%;">
    Data Scientist
</h3>
</div>

# Decorator Factories

A common pattern is to use *decorator factories* rather than decorators directly.  This allows us to parameterize the decoration of a function in some manner. In order to accept these parameters, we have to wrap our decorator in another function that will accept these.  Let's write a decorator factory that will write the output of a function to a file.

In [1]:
from functools import wraps
from datetime import datetime

def log_results(filename):
    def decorator(func):
        @wraps(func)
        def inner(*args, **kws):
            result = func(*args, **kws)
            with open(filename, 'a') as fh:
                now = datetime.utcnow().isoformat()
                name = func.__name__
                fh.write(f"{now}\t{name}\t{args}\t{kws}\t{repr(result)}\n")
            return result
        return inner
    return decorator

logfile = "ine-lesson.log"

In [2]:
@log_results(logfile)
def myfunc(a, b, c=None):
    "Return a formatted display of arguments"
    outstr = f"a={a}; b={b}; c={c}"
    return outstr

In [3]:
# Make sure the docstring of the decorated function remains
help(myfunc)

Help on function myfunc in module __main__:

myfunc(a, b, c=None)
    Return a formatted display of arguments



In [4]:
myfunc(3, 6, "Flowerpot")

'a=3; b=6; c=Flowerpot'

In [5]:
myfunc(4, 5, "Gardenhose")

'a=4; b=5; c=Gardenhose'

We might use the decorator with another function. In this case, it logs to the same file, but a different one is equally possible as a parameter.

In [6]:
@log_results(logfile)
def hypotenuse(a, b):
    "Calculate the hypotenuse of a right triangle"
    from math import sqrt
    c = sqrt(a**2 + b**2)
    return c

In [7]:
hypotenuse(3, 4), hypotenuse(11, 15)

(5.0, 18.601075237738275)

In [8]:
# Lets read the log
with open(logfile) as fi:
    print(''.join(fi.readlines()))

2021-03-04T23:08:35.795322	myfunc	(3, 6, 'Flowerpot')	{}	'a=3; b=6; c=Flowerpot'
2021-03-04T23:08:42.153743	myfunc	(4, 5, 'Gardenhose')	{}	'a=4; b=5; c=Gardenhose'
2021-03-04T23:09:30.016644	hypotenuse	(3, 4)	{}	5.0
2021-03-04T23:09:30.017101	hypotenuse	(11, 15)	{}	18.601075237738275
2021-03-04T23:36:25.773006	myfunc	(3, 6, 'Flowerpot')	{}	'a=3; b=6; c=Flowerpot'
2021-03-04T23:36:26.414269	myfunc	(4, 5, 'Gardenhose')	{}	'a=4; b=5; c=Gardenhose'
2021-03-04T23:36:27.895755	hypotenuse	(3, 4)	{}	5.0
2021-03-04T23:36:27.896583	hypotenuse	(11, 15)	{}	18.601075237738275



When we use a function based approach, we need to nest several levels.  There is the factory function, inside that is the (parameterized)

# Decorator Classes

You can use a class to define a decorator factory.  This class will need define the `__call__()` method, and may define any supporting methods that are useful.  Since it is a decorator factory, presumably there should be an `__init__()` method to process the parameterization.  The point to understand is that calling the class produces an instance, and that instance that acts as a decorator (and hence needs to be made callable as something that transforms a function in some manner).

In [9]:
from collections import defaultdict

class logfile(object):
    # Class-level attribute
    _logged_funcs = defaultdict(list)
    
    def __init__(self, filename):
        # Keep track of the name of the log file used
        self._logfile = filename
    
    @classmethod
    def _note_func(cls, logfile, func):
        cls._logged_funcs[logfile].append(func)
    
    def __call__(self, func):
        self._note_func(self._logfile, func)
        
        @wraps(func)
        def inner(*args, **kws):
            result = func(*args, **kws)
            now = datetime.utcnow().isoformat()
            name = func.__name__
            with open(self._logfile, 'a') as fh:
                fh.write(f"{now}\t{name}\t{args}\t{kws}\t{repr(result)}\n")
            return result
        return inner
    
    @classmethod
    def get_registry(cls):
        return dict(cls._logged_funcs)

In [10]:
@logfile('ine-lesson-en.log')
def g():
    "A function that says hello"
    return 'hello from function g'
g()

'hello from function g'

In [11]:
help(g)

Help on function g in module __main__:

g()
    A function that says hello



In [12]:
logfile.get_registry()

{'ine-lesson-en.log': [<function __main__.g()>]}

In [13]:
@logfile('ine-lesson-en.log')
def h():
    return "bye from function h"
h()

'bye from function h'

In [14]:
@logfile('ine-lesson-es.log')
def i():
    return "Hola de la función i"
i()

'Hola de la función i'

In [15]:
@logfile('ine-lesson-es.log')
def j():
    return "Adiós de la función j"
j()

'Adiós de la función j'

In [16]:
g(), h(), i(), j()

('hello from function g',
 'bye from function h',
 'Hola de la función i',
 'Adiós de la función j')

In [17]:
logfile.get_registry()

{'ine-lesson-en.log': [<function __main__.g()>, <function __main__.h()>],
 'ine-lesson-es.log': [<function __main__.i()>, <function __main__.j()>]}

In [18]:
!cat ine-lesson*.log

2021-03-04T23:17:19.002112	g	()	{}	'hello from function g'
2021-03-04T23:19:00.472819	h	()	{}	'bye from function h'
2021-03-04T23:19:49.045045	g	()	{}	'hello from function g'
2021-03-04T23:19:49.050641	h	()	{}	'bye from function h'
2021-03-04T23:36:35.478895	g	()	{}	'hello from function g'
2021-03-04T23:36:37.447330	h	()	{}	'bye from function h'
2021-03-04T23:36:40.811252	g	()	{}	'hello from function g'
2021-03-04T23:36:40.812397	h	()	{}	'bye from function h'
2021-03-04T23:19:15.430406	i	()	{}	'Hola de la función i'
2021-03-04T23:19:20.179805	j	()	{}	'Adiós de la función j'
2021-03-04T23:19:49.051021	i	()	{}	'Hola de la función i'
2021-03-04T23:19:49.051341	j	()	{}	'Adiós de la función j'
2021-03-04T23:36:38.566427	i	()	{}	'Hola de la función i'
2021-03-04T23:36:39.416805	j	()	{}	'Adiós de la función j'
2021-03-04T23:36:40.812789	i	()	{}	'Hola de la función i'
2021-03-04T23:36:40.812876	j	()	{}	'Adiós de la función j'
2021-03-04T23:08:35.795322	myfunc	(3, 6, 'Flowerpot')	{}	'a=3; b=6; 

# Memoization

Suppose we have a computationally intensive method, ```f()```, that calculates some result (in this case, a number).
We have to call this function many times, but do not wait forever to recalculate our result.

In [19]:
import time

def f(a, b):
    # <expensive number crunching here>
    out = a + b
    time.sleep(1.5)
    return out

In [20]:
%timeit f(3, 5)

1.5 s ± 379 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


Assuming our function is *pure*, it would be nice to cache the results of our previous calls. This was, when `f()` is called again with the same arguments it will simply return the cached result instead of recalculating the answer.

This sort of caching is called *memoization*.  Lets define a class decorator that will memoize any function that we decorate. Note that there is a faster, better tested, and more flexible version of this particular decorator available as `functools.lru_cache` in the standard library.  As of Python 3.9, there is also a `functoools.cache()` as well (with slightly different behavior).

In [21]:
class Memoizer(object):
    def __init__(self, func):
        self.cache = {}
        self.func = func
        
    def __call__(self, *args, **kwargs):
        # Use sorted tuples because much smaller in memory than frozensets
        # Even though frozensets are slightly faster to construct.
        # We sort so that we can compare the keyword args.
        # We use strings because our args or kwargs may not be hashable.
        # Repr should return a unique string for its object
        key = (repr(args), repr(tuple(sorted(kwargs.items()))))
        if key in self.cache:
            return self.cache[key]
        self.cache[key] = self.func(*args, **kwargs)
        return self.cache[key]

In [22]:
import sys

@Memoizer
def f(a, b, mod=sys.maxsize):
    # <expensive number crunching here>
    out = (a + b) % mod
    time.sleep(1.5)
    return out

In [23]:
%time f(3, 5)

CPU times: user 0 ns, sys: 2.01 ms, total: 2.01 ms
Wall time: 1.5 s


8

In [24]:
%time f(6, 7, mod=37)

CPU times: user 0 ns, sys: 3.02 ms, total: 3.02 ms
Wall time: 1.5 s


13

In [25]:
%time f(3, 5)

CPU times: user 53 µs, sys: 11 µs, total: 64 µs
Wall time: 78.7 µs


8

In [26]:
%time f(6, 7, mod=37)

CPU times: user 68 µs, sys: 0 ns, total: 68 µs
Wall time: 84.2 µs


13

In [27]:
%time f(6, 7, mod=111)

CPU times: user 3.76 ms, sys: 0 ns, total: 3.76 ms
Wall time: 1.5 s


13

In [28]:
f.cache

{('(3, 5)', '()'): 8,
 ('(6, 7)', "(('mod', 37),)"): 13,
 ('(6, 7)', "(('mod', 111),)"): 13}