# Functional programming - Parallel Processing

Based on realpython course on "[Functional Programming in Python](https://realpython.com/courses/functional-programming-python/)"

Wile Python is not a dedicated language for functional programming, we can apply some fundamentals to make sure our functions have no side effects:
1. Start with a solid data structure: use immutable data types
2. ...

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Use-immutable-Data-Structures:-(Named)-Tuples" data-toc-modified-id="Use-immutable-Data-Structures:-(Named)-Tuples-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Use immutable Data Structures: (Named) Tuples</a></span></li><li><span><a href="#Functional-Programming-Primitives" data-toc-modified-id="Functional-Programming-Primitives-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Functional Programming Primitives</a></span><ul class="toc-item"><li><span><a href="#filter()" data-toc-modified-id="filter()-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>filter()</a></span></li><li><span><a href="#map()" data-toc-modified-id="map()-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>map()</a></span></li><li><span><a href="#reduce()" data-toc-modified-id="reduce()-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>reduce()</a></span></li></ul></li><li><span><a href="#Parallel-Processing" data-toc-modified-id="Parallel-Processing-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Parallel Processing</a></span><ul class="toc-item"><li><span><a href="#multiprocessing" data-toc-modified-id="multiprocessing-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>multiprocessing</a></span></li></ul></li></ul></div>

In [1]:
import collections
import sys
from pprint import pprint

In [4]:
print(sys.executable)
print(sys.version)

C:\Users\r2d4\miniconda3\envs\py3\python.exe
3.8.3 (default, May 19 2020, 06:50:17) [MSC v.1916 64 bit (AMD64)]


In [6]:
from csv import reader
with open('lps_2020-09-25.csv',mode='r') as infile:
    d = dict(reader(infile, dialect="excel"))

ValueError: dictionary update sequence element #0 has length 1; 2 is required

## Use immutable Data Structures: (Named) Tuples

You can work with (named) tuples instead of dictionaries or dataframes.

- immutable
- make sure the keys are consistent for all instances



In [2]:
# We create a new class `Records` using the collections namedtuple
Records = collections.namedtuple("Records", [
    "quantity",
    "artist",
    "album",
    "genre",
    "preis",
    "monat",
])

print(Records)

<class '__main__.Records'>


In [44]:
# Create new instances
rec_1 = Records(quantity=1, 
                 artist="Year Of The Knife",
                 album="Ultimate Aggression",
                 genre="Hardcore",
                 preis=30.0,
                 monat="Oct 19"
                )

rec_2 = Records(quantity=1, 
                 artist="Undeath",
                 album="Lesions Of A Different Kind",
                 genre="Death Metal",
                 preis=20.0,
                 monat="Oct 19"
                )

In [4]:
print(rec_1.artist)
print(rec_2.preis)

Year Of The Knife
20.0


**Note**: If we would collect our instances in a list of Records, then the individual instances are immutable, but the list as such is mutable (e.g. we could delete an instance with `del records[0]` or also new). _Mixing mutable and immutable data structures is dangerous!_

In [46]:
# Make a tuple of Records 
records = (rec_1, rec_2)

pprint(records)

(Records(quantity=1, artist='Year Of The Knife', album='Ultimate Aggression', genre='Hardcore', preis=30.0, monat='Oct 19'),
 Records(quantity=1, artist='Undeath', album='Lesions Of A Different Kind', genre='Death Metal', preis=20.0, monat='Oct 19'))


## Functional Programming Primitives
### filter()

The built-in filter() takes a function (or None) and an iterable and returns an iterator (filter objects yielding those items from the iterator for which the that function(item) returns True)

In [7]:
# filter() returns a filter object
filter(lambda x: x.monat == "Oct 19", records)

<filter at 0x211faa78be0>

In [8]:
# Generator basics: How to iterate over the items one at a time
hc_recs = filter(lambda x: x.monat == "Oct 19", records)
next(hc_recs)

Records(quantity=1, artist='Year Of The Knife', album='Ultimate Aggression', genre='Hardcore', preis=30.0, monat='Oct 19')

In [9]:
# Generator basics: How to get them all at once, wrap into list() or tuple() ...
tuple(filter(lambda x: x.monat == "Oct 19", records))

(Records(quantity=1, artist='Year Of The Knife', album='Ultimate Aggression', genre='Hardcore', preis=30.0, monat='Oct 19'),
 Records(quantity=1, artist='Undeath', album='Lesions Of A Different Kind', genre='Hardcore', preis=20.0, monat='Oct 19'))

Note: Compared to a classic for loop
- we don't have side effects of printing out and calling functions
- very declarative, no need to spell out for loop and so
- we can easily chain this code (actually it is already a chain made of highly declarative simple, reusable building blocks)

Even better would be to write:
    
```
def oct_filter(x):
    return x.monat == "Oct 19"

tuple(filter(oct_filter, records))
```

But take care, in every day usage list comprehensions (or even better: generator comprehensions) are certainly the more pythonic way to write code - functional programming should be applied with a cause, e.g. parallelization.

### map()

The built-in map() takes a function and iterable(s) and returns an iterator that computes the function using arguments from each of the passed iterables. (Stops when the shortest iterable is exausted.) -> It maps a function on each of the original items.

In [25]:
# Example: Return new collection of named tuples with doubled prices

Doubled = collections.namedtuple("Doubled", ["album", "d_price"])

doubled = tuple(map(
    lambda x: Doubled(album = x.album.upper(), d_price = x.preis * 2),
    records
))

doubled

(Doubled(album='ULTIMATE AGGRESSION', d_price=60.0),
 Doubled(album='LESIONS OF A DIFFERENT KIND', d_price=40.0))

Again: The comprehension code is of course much more pythonic and easier to read than using map():

```
tuple(Doubled(album = x.album, d_price = x.preis * 2) for x in records)
```

But to get to the point / mindset of functional programming:
When we frist apply the filter function, then the map function then we have this clearly spelled out series of steps that we can chain together. Also note that we have transformed the 'album' entries, but without touching the original (immutable) data.
We could reuse all these single building blocks and data steps and have no sideeffects what so ever.

### reduce()

Not a built-in! Has to be imported

```
from functools import reduce
```

reduce() applies a function of 2 args cumulatively to the items of a sequence (--> value), so as tho reduce the sequence to a single value (--> accumulator).

Example:
```
redude(lambda x, y: x+y [1, 2, 3]
```
calculates ((1+2)+3)


(Note: For simplicity's sake, no named tuples but (mutable) dicts for the following code examples.)

In [36]:
from functools import reduce

# Very simple example: Calculate the total price
reduce(lambda total_preis, Records:  total_preis + Records.preis, records, 0) # The final arg is the optional initial

50.0

In [47]:
# Better use case: populate an empty dict with genre counts
genre_dict = {"Hardcore": [], "Death Metal": [], "Crossover": []}

def reducer(acc, val):
    acc[val.genre].append(val.album)
    return acc

records_by_genre = reduce(
    reducer,
    records,
    genre_dict
)

records_by_genre

{'Hardcore': ['Ultimate Aggression'],
 'Death Metal': ['Lesions Of A Different Kind'],
 'Crossover': []}

In [52]:
# On a side note: A safer way would be to generate the genre_dict directly form the entries

from collections import defaultdict

records_by_genre = reduce(
    reducer,
    records,
    defaultdict(list)
)

dict(records_by_genre)

{'Hardcore': ['Ultimate Aggression'],
 'Death Metal': ['Lesions Of A Different Kind']}

Again, a more pythonic way _could_ be ... but now things are not so clear anymore ...:

```
import itertools

records_by_genre = {
    item[0]: list(item[1])
    for item in itertools.groupby(records, lambda x: x.genre)
}
```

 ## Parallel Processing 
### multiprocessing

...