# Advanced concepts in Python

## Agenda

- List comprehensions and Generator expressions
- Iterables, Iterators and Generators
- Co-routines, Futures and asncio
- Parallel tasks processing


## List Comprehensions

List comprehensions are a tool for transforming one list into another list. During this transformation, elements can be conditionally included in the new list and each element can be transformed as needed.

### Example - 1: Creating a list of unicode codepoints from a list

In [4]:
# Normal way using for loop
symbols = '§$€£¢'
codes = []
for symbol in symbols:
    codes.append(ord(symbol))
print(codes)

[167, 36, 8364, 163, 162]


In [7]:
# Using List comprehension

symbols = '§$€£¢'
codes = [ord(code) for code in symbols]
print(codes)

[167, 36, 8364, 163, 162]


**Note** : *Listcomps are no longer leak their variables* 

### Listcomps versus map and filter

In [19]:
numbers = [3, 5, 1, 13, 10, 20 ,43,32,65,75,90]

squares = list(map(lambda n: n*2, filter(lambda n: n%2 == 1, numbers)))
print(squares)

squares = [n*2 for n in numbers if n%2 == 1]
print(squares)

[6, 10, 2, 26, 86, 130, 150]
[6, 10, 2, 26, 86, 130, 150]


### Cortesian product

In [23]:
colors = ['black', 'white']
sizes = ['S', 'M', 'L']

# Regular way
combinations = []
for color in colors:
    for size in sizes:
        combinations.append((color, size))
print(combinations)

# Using List comprehension

combinations = [(color, size) for color in colors for size in sizes]
print(combinations)

[('black', 'S'), ('black', 'M'), ('black', 'L'), ('white', 'S'), ('white', 'M'), ('white', 'L')]
[('black', 'S'), ('black', 'M'), ('black', 'L'), ('white', 'S'), ('white', 'M'), ('white', 'L')]


In [35]:
fizzbuzz = [
    'fizzbuzz' if n % 3 == 0 and n % 5 == 0
    else 'fizz' if n % 3 == 0
    else 'buzz' if n % 5 == 0
    else n
    for n in range(100)
]

print(fizzbuzz)

['fizzbuzz', 1, 2, 'fizz', 4, 'buzz', 'fizz', 7, 8, 'fizz', 'buzz', 11, 'fizz', 13, 14, 'fizzbuzz', 16, 17, 'fizz', 19, 'buzz', 'fizz', 22, 23, 'fizz', 'buzz', 26, 'fizz', 28, 29, 'fizzbuzz', 31, 32, 'fizz', 34, 'buzz', 'fizz', 37, 38, 'fizz', 'buzz', 41, 'fizz', 43, 44, 'fizzbuzz', 46, 47, 'fizz', 49, 'buzz', 'fizz', 52, 53, 'fizz', 'buzz', 56, 'fizz', 58, 59, 'fizzbuzz', 61, 62, 'fizz', 64, 'buzz', 'fizz', 67, 68, 'fizz', 'buzz', 71, 'fizz', 73, 74, 'fizzbuzz', 76, 77, 'fizz', 79, 'buzz', 'fizz', 82, 83, 'fizz', 'buzz', 86, 'fizz', 88, 89, 'fizzbuzz', 91, 92, 'fizz', 94, 'buzz', 'fizz', 97, 98, 'fizz']


In [None]:
def some_function():
    if n % 3 == 0 and n % 5 == 0:
        return fizzbuzz'
    else  if n % 3 == 0:
        return 'fizz'
    else  if n % 5 == 0:
        return 'buzz'
    else n

fizzbuzz = [ some_function(n) for n in range(100)]

### Generator expressions


Generators expressions are used to generate tuples, arrays and other type of sequences. These are better than using Listcomp because they save memory by yielding items one by one using iterator protocol instead of building whole list.


In [39]:
symbols = '§$€£¢'
codes = tuple(ord(code) for code in symbols)
print(codes)

(167, 36, 8364, 163, 162)


### Tuple unpacking

In [53]:
coordinates = (0.32, 0.45)
x,y = coordinates
print(x, y)

coordinates = [0.32, 0.45]
x,y = coordinates[0], coordinates[1]

numbers = (30, 20)
print(divmod(*numbers))

a, b, *rest = range(20)
print(a, b, rest)

a,*middle, b = range(20)
print(a, middle, b)

0.32 0.45
(1, 10)
0 1 [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
0 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] 19


In [None]:
def some_fun():
    return (a, b, c, d, e)

a, *_ = some_fun()

## Iterables, Iterators and Generators

Iterator pattern is a way to load the data into memory lazily one item at a time. Iterators are crucial when processing large datasets

**Iterator in Python is just an object which can be iterated.** 

**Objects implementing an __iter__ method returning iterator are iterable.**

### Iterables- What is behind a for loop

for item in container:
    do_something(item)

or 

[do_something(item) for item in container]


Iterator `__iter__` method is behind the for loop

In [3]:
list = []
print(dir(list))

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


In [19]:
some_list = [1,3,5,7]
it = iter(some_list)
next(it)
next(it)
next(it)

5

**Example:** Sentence implementation using Iterator pattern

In [78]:
import re
import reprlib

RE_WORD = re.compile('\w+')
text = 'this is sample sentence to implement iterator'

    
class Sentence:   #Iterable
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
    
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
    def __iter__(self):
        return SentenceIterator(self.words)
    
class SentenceIterator: #Iterator
    def __init__(self, words):
        self.words = words
        self.index = 0
    
    def __next__(self):
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index+=1
        return word
    
    def __iter__(self):
        return self

In [82]:
from collections import abc

sentence = Sentence(text)
print(issubclass(SentenceIterator, abc.Iterator))
for word in sentence:
    print(word)


True
this
is
sample
sentence
to
implement
iterator


### Generators

Any python function that has the yield keyword in its body is a generator funtion

**Example 2:** Sentence using Generator function


In [83]:
class Sentence:   #Iterable
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)
    
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    
    def __iter__(self):
        for word in self.words:
            yield word
        return


In [85]:
sentence = Sentence(text)
for word in sentence:
    print(word)

this
is
sample
sentence
to
implement
iterator


Here instead of implementing new Iterator, we are using `yield` keypword.

Demonstrating the behaviour of generator expression

**Example 3:** Generator function 

In [6]:
def gen_123():
    yield 1
    yield 2
    yield 3

g = gen_123()
print(next(g))
print(next(g))
print(next(g))
print(next(g))
    

1
2
3


StopIteration: 

**Example 4:** Generator expression

In [21]:
def gen_AB():
    print('Start')
    yield 'A'
    print('Continue')
    yield 'B'
    print('End')


In [22]:
res1 = [x*3 for x in gen_AB()]

Start
Continue
End


In [13]:
for i in res1:
    print('-->{}'.format(i))

-->AAA
-->BBB


In [23]:
res2 = (x*3 for x in gen_AB())

In [24]:
print(res2)

<generator object <genexpr> at 0x7f342db2e468>


In [25]:
for i in res2:
    print('-->{}'.format(i))

Start
-->AAA
Continue
-->BBB
End


### Generator functions provided by standard library(itertools)

- compress
- dropwhile
- filter
- isslice
- takeawhile
- accumulate
- ennumerate
- map
- starmap
- chain
- product
- zip 
- zip_longest
- count
- cycle
- combinations
- repeat
- permutations
- groupby
- reversed




In [31]:
list_1 = [1, 5,7,8]
list_2 = [10, 40,49,90]
for index, val in enumerate(list_1):
    print(index, val)
for val1, val2 in zip(list_1, list_2):
    print(val1, val2)


0 1
1 5
2 7
3 8
1 10
5 40
7 49
8 90


## Co-routines, Futures, asyncio

### PEP 342 — Coroutines via Enhanced Generators

- .send() and yield in an expression
- .trow() - raise exception inside a generator
- .close() - terminate a generator

### PEP 388 - Syntax for delegating to a subgenerator

- this PEP allowed to return from a generator
- Allows yield from (seen earlier)

### Basic co-routine

In [33]:
def basic_coro():
    print("started and waiting for input ...")
    x = yield
    print("I got {}".format(x))
    print("I am going to finish now ...")
b = basic_coro()
print(b)

<generator object basic_coro at 0x7f342db2ea98>


In [34]:
print(next(b))
print(b.send(2))

started and waiting for input ...
None
I got 2
I am going to finish now ...


StopIteration: 

We can also inspect status of the co-routines using getgeneratorstate

In [36]:
from inspect import getgeneratorstate
c = basic_coro()
print(getgeneratorstate(c))
next(c)
print(getgeneratorstate(c))
c.send(10)


GEN_CREATED
started and waiting for input ...
GEN_SUSPENDED
I got 10
I am going to finish now ...


StopIteration: 

In [37]:
print(getgeneratorstate(c))


GEN_CLOSED


## Parallel processing

### Count how many numbers exist between a given range in each row

In [38]:
import numpy as np
from time import time

# Prepare data
np.random.RandomState(100)
arr = np.random.randint(0, 10, size=[200000, 5])
data = arr.tolist()
data[:10]

[[0, 3, 6, 4, 8],
 [6, 3, 9, 8, 4],
 [7, 5, 7, 2, 2],
 [9, 3, 3, 3, 1],
 [1, 6, 4, 6, 3],
 [8, 0, 2, 6, 3],
 [1, 0, 1, 1, 3],
 [4, 8, 5, 3, 7],
 [4, 4, 6, 7, 6],
 [1, 7, 4, 8, 8]]

In [9]:
# Solution Without Paralleization

def howmany_within_range(row, minimum, maximum):
    """Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
    count = 0
    for n in row:
        if minimum <= n <= maximum:
            count = count + 1
    return count

results = []
for row in data:
    results.append(howmany_within_range(row, minimum=4, maximum=8))

print(results[:10])
#> [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]

[2, 1, 4, 3, 2, 1, 4, 2, 2, 3]


In [11]:
# Parallelizing using Pool.apply()

import multiprocessing as mp

print(mp.cpu_count())

# Step 1: Init multiprocessing.Pool()
pool = mp.Pool(mp.cpu_count())

# Step 2: `pool.apply` the `howmany_within_range()`
results = [pool.apply(howmany_within_range, args=(row, 4, 8)) for row in data]

# Step 3: Don't forget to close
pool.close()    

print(results[:10])
#> [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]

4
[2, 1, 4, 3, 2, 1, 4, 2, 2, 3]


In [15]:
# Parallel processing with Pool.apply_async() without callback function

import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())

results = []

# call apply_async() without callback
result_objects = [pool.apply_async(howmany_within_range, args=(row, 4, 8)) for  row in data]

# result_objects is a list of pool.ApplyResult objects
results = [r.get() for r in result_objects]

pool.close()
pool.join()
print(results[:10])
#> [3, 1, 4, 4, 4, 2, 1, 1, 3, 3]

[2, 1, 4, 3, 2, 1, 4, 2, 2, 3]
