# Lesson 7: Advanced stuff, decorators, \*args, \**kwargs, list comprehensions, generators, generator expressions and the itertools module.

# Chapters:
Chapter 10: Decorators (and args/kwargs) <br>
Chapter 11: Advanced Iterations (generators, comprehensions) <br>
Itertools module <br>
Author: Jurre Hageman <br>

## Decorators

Decorators are advanced topis but many modules and frameworks (like Flask) make use of them so a basic understanding of decorators is important. <br>
Decorators are functions that take another function and extend their behaviour without changing the code of the other function. Python supports the use of decorators with special syntactic sugar that simplifies their use. Let's start with some basic understanding of decorators:

Functions are objects in Python:

In [1]:
def my_function():
    print("OK")

print(type(my_function))

<class 'function'>


And it is possible to pass a function as an argument in anouther function and invoke them within another function:

In [2]:
def func1():
    print('two')


def func2(f):
    print('one')
    f()

    
func2(func1)

one
two


We can also define nested functions:

In [3]:
def func1():
    print('one')
    def func2():
        print('two')
    func2()

func1()

one
two


However, due to the scoping rules we can not invoke func2 from the outer scope:

In [4]:
def func1():
    print('one')
    def func2():
        print('two')


#func2() will produce a TypeError

But we can also return a function from another function without invoking the second function:

In [5]:
def func1():
    print('one')
    def func2():
        print('two')
    return func2


x = func1()

one


The variable x now contains func2. We can invoke func2 as follows:

In [6]:
x()

two


If you understand the above concepts we can now continue to decorators. We first write a nested functions. The inner function is a wrapper function:

In [7]:
def decorate_function(function):
   def function_wrapper(name):
       return "DNA is composed of {}".format(function(name))
   return function_wrapper


def get_message(seq):
   return "the nucleotides {}".format(seq)


get_message = decorate_function(get_message)

print(get_message("ATCG"))

DNA is composed of the nucleotides ATCG


A lot is happening here. The 'decorate_function' contains an inner 'function_wrapper'. The 'decorate_function' also takes another function (get_message) as argument (this becomes a parameter in the function header). The inner function 'function_wrapper' invokes the function that 'decorate_function' received as argument and the 'wrapper_function' augments it's behaviour (it adds text to the string). Note that the 'function_wrapper' itself is not invoked. It is just returned by 'decorate_function'.

Let's now descibe the order of events that happens when the code runs:
Fist the two functions are declared:

In [8]:
def decorate_function(function):
   def function_wrapper(name):
       return "DNA is composed of {}".format(function(name))
   return function_wrapper


def get_message(seq):
   return "the nucleotides {}".format(seq)

Next, the following code is executed:

In [9]:
get_message = decorate_function(get_message)
print(get_message)

<function decorate_function.<locals>.function_wrapper at 0x10896ac80>


get_message is a variable that catches a function object... 'decorate_function' is invoked and the get_message function is used as an argument. Note that the get_message function is not invoked yet. The 'get_message' function is a parameter in 'decorate_function'. In 'decorate_function' is a nested function 'function_wrapper'. This function takes some text as argument and invokes 'get_message' when 'function_wrapper' get's invoked. But 'function_wrapper' is not invioked yet. It is returned as a function object. The 'get_message' function definition get's overwritten by the 'get_message' variable. So we can now invoke the 'function wrapper' with the original 'get_message' function by:

In [10]:
print(get_message("ATCG"))

DNA is composed of the nucleotides ATCG


Now the get_message function is decorated by the 'decorate_function'. Note that the behaviour of 'get_message' is changed but not it's code!

The above pattern is such an important pattern in Python that Python adds some syntactic sugar for it. We can rewrite the above as:

In [11]:
def decorate_function(function):
   def function_wrapper(name):
       return "DNA is composed of {}".format(function(name))
   return function_wrapper


@decorate_function
def get_message(seq):
   return "the nucleotides {}".format(seq)


print(get_message("ATCG"))

DNA is composed of the nucleotides ATCG


This reads as: 'add the functionality of the 'decorate_function' to get_message. And that is exactly what happened...

## \*args and \**kwargs

\*args and \**kwargs are also called magic variables. They can be very handy so it is important to understand them. Remember from the lessen about functions that a function call can contain arguments and the function header contains parameters: 

In [12]:
def my_func(param):
    print(param)
    
arg = "hello"
my_func(arg)

hello


However, if the number of arguments does not match the number of parameters an error occurs:

In [13]:
def my_func(param):
    print(param)
    
arg1 = "hello"
arg2 = "world"
my_func(arg1, arg2)

TypeError: my_func() takes 1 positional argument but 2 were given

However, sometimes you might not know the number of arguments to expect. The *arg notation in the function header accepts any number of arguments. Only the * notation is important so you can also write *blablabla but do not do that as *args is used by convention:

In [None]:
def my_func(*args):
    print(args)
    
arg1 = "hello"
arg2 = "world"

my_func(arg1) #1 argument
my_func(arg1, arg2) #2 arguments
my_func() #0 arguments

So now we do not have an error anymore. \*args in the function header accepts any number of positional arguments and is available in the function as a tuple. Calling arg within the function unpacks the tuple. <br>
Invoking my_func without arguments results in an empty tuple. <br>
However, we can also use \*args as argument in the function call instead as parameter in the function header. This will pack the arguments in a tuple:

In [None]:
def my_func(param1, param2):
    print(param1)
    print(param2)
    
arg1 = "hello"
arg2 = "world"
args = (arg1, arg2)
my_func(*args)


Thus \*args in the function call UNPACKS argument lists while \*args in the function header PACKS an argument list!

Remember from a previous lesson that it was possible to use positional arguments and keyword arguments:

In [None]:
def my_func(val1, val2):
    print(val1)
    print(val2)
    
my_func(val2=3, val1=4)

However, like the positional arguments, this function MUST accept 2 arguments. We can use \**kwargs to accept any number (including 0) keyword arguments:

In [None]:
def my_func(**kwargs): #Note that **kwargs is a parameter in a function header
    print(kwargs)
    
my_func(val1=1, val2=2, val3=3, val4=4) #4 keyword arguments
my_func() #0 keyword arguments

The keyword arguments will be __packed__ in a dictionary. Likewise, we can also use \**kwargs as argument in a function call to __unpack__ a dictionary:

In [None]:
def my_func(val1, val2, val3, val4):
    print(val1)
    print(val2)
    print(val3)
    print(val4)

kwargs = {'val1' : 1, 'val2' : 2, 'val3' : 3, 'val4' : 4}
my_func(**kwargs) #Note that **kwargs is know an argument in a function call


Using this information will can now write a generic function that accepts any number and type of arguments:

In [None]:
def accept_all(*args, **kwargs):
    if args:
        for arg in args:
            print(arg)
    if kwargs:
        for kwarg in kwargs:
            print(kwarg)

accept_all("bla", 10, 15, ['small', 'middle', 'big'], {"one": 1, 'two': 2}, naam="Jurre", age="40")

However there are some rules: 
In the function header: first \*args and then \**kwargs.
In the function call: positional arguments must come first and than keyword arguments.
These are not all the rules. For a thorough overview: 

Quoted from Mark Lutz Learning Python: <br>
- In a function call, all nonkeyword arguments (name) must appear first, followed
by all keyword arguments (name=value), followed by the \*name form, and, finally,
the \**name form, if used.
- In a function header, arguments must appear in the same order: normal arguments
(name), followed by any default arguments (name=value), followed by the
\*name form if present, followed by \**name, if used.

## Generators

Another more advanced concept is the generator function. Generators are used to save memory space. 
A generator function looks like a normal function, except that instead of returning value, a generator yields as many values as it needs to. Python will call the generator function each time it needs a value, then saves the state of the generator when the generator yields a value so that it can be resumed when the next value is required. This saves a lot of memory. Let's start with a simple generator function:

In [14]:
def my_generator():
    yield 1
    yield 2
    yield 3

print(my_generator())

<generator object my_generator at 0x1089831a8>


Invoking the function will only print object information about the function.
To do something we need to iterate over the function:

In [15]:
for i in my_generator():
    print(i)

1
2
3


So how did this work? To understand this you need to understand a bit about the iteration protocol. Suppose we have a list with 3 elements. We can easily iterate over the elements using a for loop but we can do the same with next if we make an iterator from the list:

In [16]:
my_list = ['a', 'b', 'c']
for i in my_list:
    print(i)

my_list2 = ['d', 'e', 'f']
iter_object = iter(my_list2)
print(next(iter_object))
print(next(iter_object))
print(next(iter_object))
print(next(iter_object))

a
b
c
d
e
f


StopIteration: 

All went fine until the end was reached. The last print statement raised the StopIteration protocol. Now back to the generator:

In [17]:
def my_generator():
    yield 1
    yield 2
    yield 3

generator_object = my_generator()
print(next(generator_object))
print(next(generator_object))
print(next(generator_object))
print(next(generator_object))

1
2
3


StopIteration: 

So what happened here:
- Each time next() is called on the generator iterator (either with next or in a for loop), the generator resumes execution from where it called yield, not from the beginning of the function.
- If a generator function calls return or reaches the end of its definition, a StopIteration exception is raised.


So why would this be usefull? It is all about memory usage! Let's take a real example. Let's write a simple function to check if a number is a prime number:

In [20]:
import math

def is_prime(number):
    ''' check if this is a prime number'''
    if number > 1:
        if number == 2: #2 is the only even prime number
            return True
        if number % 2 == 0:
            return False
        for current in range(3, int(math.sqrt(number) + 1), 2): 
            if number % current == 0: 
                return False
        return True
    return False #0 and negative numbers are not prime

print(is_prime(3))
print(is_prime(4))
print(is_prime(-17))

True
False
False


Using this function we can generate a list of all prime numbers below the integer 100:

In [21]:
def get_primes(n):
    primes = []
    num=0
    while num < n:
        if is_prime(num):
            primes.append(num)
        num += 1
    return primes

print(get_primes(100))

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


This works as expected but we used a considerable amount of memory if large numbers are involved:

In [22]:
import sys
print(sys.getsizeof(get_primes(100000)), "bytes")

77848 bytes


This is were the generator function shines:

In [23]:
def get_primes_from_generator_function(n):
    num=0
    while num < n:
        if is_prime(num):
            yield num
        num += 1


for i in get_primes_from_generator_function(100):
    print(i, end= " ")

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 

So what is the deal here? We get the same prime numbers. But if you look close we do not have received a list. It looks as if these for loops are connected! The function 'remembers' the state of the previous call! And what about memory usage if large numbers are involved?

In [24]:
print(sys.getsizeof(get_primes_from_generator_function(100000)), "bytes")

88 bytes


The resulting generator object has a much smaller memory size compared to the list object! Using a generator function we can even safely write a while True loop to retreive prime numbers.
What will be the first prime number above the integer 1000?

In [26]:
def get_next_prime_from_generator_function(num):
    '''prime number generator from a number till infinity...'''
    while True:
        if is_prime(num):
            yield num
        num += 1

num = 1000
my_generator_object = get_next_prime_from_generator_function(num)
print(next(my_generator_object))

1009


Get the subsequent 10 prime numbers:

In [27]:
for i in range(10):
    print(next(my_generator_object))

1013
1019
1021
1031
1033
1039
1049
1051
1061
1063


Remember that we can safely write the while True loop because we are using a generator function that will 'remember' the previous return value. This will save a lot of memory!

The last concept about generators is how to reset a generator object. Have a look at the fist generator example:

In [3]:
def my_generator():
    yield 1
    yield 2
    yield 3

for i in my_generator():
    print(i)

1
2
3


Here we created one generator object and the for loop implicity called next(object) three times. So how can we make different objects and 'reset' the generator? The following code will explain: 

In [9]:
x = my_generator()
print('x', next(x)) #prints 1
print('x', next(x)) #prints 2
y = my_generator() #new generator object, 'resets' 
print(x is y) #False, not the same object
print('y', next(y)) #prints 1


x 1
x 2
False
y 1


## Comprehensions: List comprehensions

Comprehensions are efficient one-liners which are very efficient in adding (modified) items to lists, dictionaries and sets without a classical for loop pattern. This makes your code more compact. In addition, comprehensions are faster in excecution speed compared to classical for loops. Let's have a look at some examples:

In [11]:
x = [i for i in range(10)]
print(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


The above example is the simples list comprehension possible but nothing special as list(range(10)) would have yielded the same result. But list comprehensions can do more:

In [13]:
x = [i * i for i in range(10)]
print(x)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [14]:
dna = ['a', 't', 'c', 'g']
dna_up = [i.upper() for i in dna]
print(dna_up)

['A', 'T', 'C', 'G']


To do this task with a classical for loop will need considerable more lines of code:

In [20]:
dna = ['a', 't', 'c', 'g']
dna_up = []
for i in dna:
    dna_up.append(i.upper())
print(dna_up)

['A', 'T', 'C', 'G']


These are examples were list comprehensions shine. Small tasks that would take more lines using a for loop. You can filter results with an if statement:

In [19]:
nucleotides = ['a', 't', 'c', 'g', 'u']
dna_up = [i.upper() for i in nucleotides if i != 'u']
rna_up = [i.upper() for i in nucleotides if i != 't']
print(dna_up)
print(rna_up)       

['A', 'T', 'C', 'G']
['A', 'C', 'G', 'U']


And instead of calling buildin functions or methods you can also call your own defined functions: 

In [29]:
def calc_gc(seq):
    G = seq.count('G')
    C = seq.count('C')
    GC = (G + C) / len(seq) * 100
    return GC

sequences = ["GAATT", "GCGCGC", "ATATATTA"]
gc_perc = [calc_gc(i) for i in sequences]
print(gc_perc)

[20.0, 100.0, 0.0]


Once you understand list comprehensions you may get a bit to enthousiastic:

In [26]:
print(''.join([{'A':'T', 'C':'G', 'T':'A', 'G':'C'}[i] for i in "GGACCCTTT"])[::-1])

AAAGGGTCC


Reverse complement in one line of code. However, the code above is difficult to understand and therefore the above code is a bad idea. List comprehensions are great but do not make them to complex!

Another example with a proper use of comprehensions:

In [34]:
import math
nums = [1, -5, -8, 10, 11, 3, 16, -5, 10, 25, 36, 81, 99]
res = [math.sqrt(i) for i in nums if i > 0]
print(res)

[1.0, 3.1622776601683795, 3.3166247903554, 1.7320508075688772, 4.0, 3.1622776601683795, 5.0, 6.0, 9.0, 9.9498743710662]


Now you have seen some examples, let's look a bit closer to the architechture of list comprehensions:

All comprehensions have this architecture: <br>
expression for element in iterable <optional test>

In the expression part, you can return any type you like:
tuples, lists, dicts, objects

In [35]:
x = "atcg"
y = [(i, i.upper()) for i in x]
print(y)
    

[('a', 'A'), ('t', 'T'), ('c', 'C'), ('g', 'G')]


The optional test is an if statement optionally combined with an else:

In [40]:
nums = [1, 3, -7, 9, -12, 16]
print([math.sqrt(i) if i > 0 else "<0 encountered" for i in nums])

[1.0, 1.7320508075688772, '<0 encountered', 3.0, '<0 encountered', 4.0]


Finally it is worth to know that you can nest comprehensions loops:

In [44]:
bases = ['g', 'a', 't', 'c']
codons = [i+j+k for i in bases for j in bases for k in bases]
print(codons)

['ggg', 'gga', 'ggt', 'ggc', 'gag', 'gaa', 'gat', 'gac', 'gtg', 'gta', 'gtt', 'gtc', 'gcg', 'gca', 'gct', 'gcc', 'agg', 'aga', 'agt', 'agc', 'aag', 'aaa', 'aat', 'aac', 'atg', 'ata', 'att', 'atc', 'acg', 'aca', 'act', 'acc', 'tgg', 'tga', 'tgt', 'tgc', 'tag', 'taa', 'tat', 'tac', 'ttg', 'tta', 'ttt', 'ttc', 'tcg', 'tca', 'tct', 'tcc', 'cgg', 'cga', 'cgt', 'cgc', 'cag', 'caa', 'cat', 'cac', 'ctg', 'cta', 'ctt', 'ctc', 'ccg', 'cca', 'cct', 'ccc']


## Comprehensions: dict comprehensions

In addition to list comprehensions, it is possible to write dictionary comprehensions. Here is an example:

In [48]:
x = {i:i*i for i in range(1, 11)}
print(x)

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100}


It is possible for dict comprehensions to use a filter as well:

In [50]:
x = {i:i*i for i in range(1, 11) if i%2 == 0}
print(x)

{2: 4, 4: 16, 6: 36, 8: 64, 10: 100}


Another example:

In [65]:
bases = "a t c g u".split()
base_names = "adenine tymine cytosine guanine uracil".split()
base_info = dict(zip(bases, base_names))
print(base_info)
base_info_upper = {i.upper():j[0].upper() + j[1:] for (i, j) in base_info.items() if i!='u'}
print(base_info_upper)

{'a': 'adenine', 't': 'tymine', 'c': 'cytosine', 'g': 'guanine', 'u': 'uracil'}
{'A': 'Adenine', 'T': 'Tymine', 'C': 'Cytosine', 'G': 'Guanine'}


In addition to list and dict comprehensions, set comprehensions do also excist but won't be covered here. Tuple comprehensions do not excist due to the immutable nature of tuples.

## Generator expressions

Now you have seen some generator functions and list comprehensions in action, it is time for another concept: generator expressions. These combine list comprehensions with generator functions. Like generator functions, generator expressions save memory usage. Like list comprehensions, generator expressions reduce code lines and are fast in terms of execution speed. They are very powerfull for repatative tasks were lists with a lot of items are used. Here is an example of a generator expression:

In [67]:
x = (i*i for i in range(1, 11) if i%2 == 0)
print(x)

<generator object <genexpr> at 0x108bd39e8>


Like a generator function, this only generates a generator object. You need to loop through the object to get the result:

In [71]:
x = (i*i for i in range(1, 11) if i%2 == 0)
for item in x:
    print(item)

4
16
36
64
100


You can push it in a list:

In [73]:
x = (i*i for i in range(1, 11) if i%2 == 0)
print(list(x))

[4, 16, 36, 64, 100]


But remember that this will not help you much to reduce memory usage. You could just as well use a list comprehension.

## The Itertools module

One module that makes a lot of use of generators is the itertools module. Pretend that we have some substances that we would run on a HPLC machine: A, B, C, D. One substance will hit the detector first, then another, yet another etc. What would be all possible sequence outcomes (thus order matters)?
We can explore this using the itertools.permutations method. Here it is in action: 

In [88]:
import itertools
substances = "A B C D".split()
possibilities = itertools.permutations(substances)
print(possibilities)

<itertools.permutations object at 0x108b1f830>


As you can see, permutations returned a generator object to save memory.

In [82]:
for i in possibilities:
    print(i)

('A', 'B', 'C', 'D')
('A', 'B', 'D', 'C')
('A', 'C', 'B', 'D')
('A', 'C', 'D', 'B')
('A', 'D', 'B', 'C')
('A', 'D', 'C', 'B')
('B', 'A', 'C', 'D')
('B', 'A', 'D', 'C')
('B', 'C', 'A', 'D')
('B', 'C', 'D', 'A')
('B', 'D', 'A', 'C')
('B', 'D', 'C', 'A')
('C', 'A', 'B', 'D')
('C', 'A', 'D', 'B')
('C', 'B', 'A', 'D')
('C', 'B', 'D', 'A')
('C', 'D', 'A', 'B')
('C', 'D', 'B', 'A')
('D', 'A', 'B', 'C')
('D', 'A', 'C', 'B')
('D', 'B', 'A', 'C')
('D', 'B', 'C', 'A')
('D', 'C', 'A', 'B')
('D', 'C', 'B', 'A')


And here we have all the possibilities as items in tuples. Yes this explodes as the number of possible permutations of k objects from a set of n can be written as nPk = n!/(n-k)!. Four items: 24, 5 items: 120, 6 items: 720. We do need a generator here!

Suppose there is a big chance that two substances will finish at the same time. What are the possible combinations using the list of substances? Note that order does not matter here.

In [95]:
combinations = itertools.combinations(substances, 2)
for i in combinations:
    print(i)

('A', 'B')
('A', 'C')
('A', 'D')
('B', 'C')
('B', 'D')
('C', 'D')


And if 3 will finish at the same time?

In [96]:
combinations = itertools.combinations(substances, 3)
for i in combinations:
    print(i)

('A', 'B', 'C')
('A', 'B', 'D')
('A', 'C', 'D')
('B', 'C', 'D')


What are the possible genotypes (cartesion product) of a male with genotype ABO breeding a female with genotype ABO?

In [101]:
male = "ABO"
female = "ABO"
genotypes = itertools.product(male, female)
for i in genotypes:
    print(i)

('A', 'A')
('A', 'B')
('A', 'O')
('B', 'A')
('B', 'B')
('B', 'O')
('O', 'A')
('O', 'B')
('O', 'O')


Another example from itertools. Suppose we have a circular genome of "TACG". In reality, the 5'- T is connected to the 3'-G. Can we mimic this in Python? The answer is yes:

In [104]:
seq = 'TACG'
circular_genome = itertools.cycle(seq)
print(next(circular_genome))
print(next(circular_genome))
print(next(circular_genome))
print(next(circular_genome))
print(next(circular_genome))
print(next(circular_genome))

T
A
C
G
T
A


Finally, you can have a look yourself at the itertools module as there are many more valuable methods...