# Outline

* Intro / motivation to dive into functional programming (FP)
* Spark
* Intro to FP in Scala

# How can FP help you produce better code?

* Relieve some of the burden of flow control
* Control side effects
* Improved error handling
* More reliable programs
* Ease the burden of testing by using the type system to ensure correctness

# Example: calculate the histogram / PDF 

In [7]:
from typing import Sequence, Any, Tuple
def pdf(xs: Sequence[Any]) -> Sequence[Tuple[Any, float]]:
    # TODO: implement
    raise NotImplementedError("!")

# Imperative style

In [8]:
from collections import defaultdict
def pdf(xs: Sequence[Any]) -> Sequence[Tuple[Any, float]]:
    """returns a list of tuples with element and probability"""
    freq = defaultdict(lambda: 0)
    total = 0
    for x in xs:
        freq[x] += 1
        total += 1
    for k,v in freq.items():
        freq[k] /= total
    return list(freq.items())  

xs = ['man', 'woman', 'person', 'camera', 'tv']
print(pdf(xs))

[('man', 0.2), ('woman', 0.2), ('person', 0.2), ('camera', 0.2), ('tv', 0.2)]


# Functional primitives
map & reduce

![map](https://d33wubrfki0l68.cloudfront.net/f0494d020aa517ae7b1011cea4c4a9f21702df8b/2577b/diagrams/functionals/map.png)

![reduce](https://d33wubrfki0l68.cloudfront.net/9c239e1227c69b7a2c9c2df234c21f3e1c74dd57/eec0e/diagrams/functionals/reduce.png)

# Histogram with map / reduce

In [9]:
from functional import seq
import numpy as np
xs = np.random.choice(['a', 'b', 'c'], size=10)
print(xs)
counts = seq(xs).map(lambda x: (x, 1)).reduce_by_key(lambda x, y: x + y)
counts  

['b' 'c' 'c' 'b' 'a' 'b' 'b' 'a' 'b' 'a']


0,1
b,5
c,2
a,3


In [10]:
total = counts.map(lambda x: x[1]).sum()
result_pdf = counts.map(lambda x: (x[0], x[1] / total)).sorted().list()
result_pdf

[('a', 0.3), ('b', 0.5), ('c', 0.2)]

# Putting it all together: PDF functional style

In [11]:
def pdf(xs: Sequence[Any]) -> Sequence[Tuple[Any, float]]:
    counts = seq(xs).map(lambda x: (x, 1)).reduce_by_key(lambda x, y: x + y)
    total = counts.map(lambda x: x[1]).sum()
    result_pdf = counts.map(lambda x: (x[0], x[1] / total)).sorted().list()
    return result_pdf

In [12]:
print(pdf(xs))

[('a', 0.3), ('b', 0.5), ('c', 0.2)]


# Comparison

## Imperative:

```
    freq = defaultdict(lambda: 0)
    total = 0
    for x in xs:
        freq[x] += 1
        total += 1
    for k,v in freq.items():
        freq[k] /= total
    return list(freq.items())  
```
## Functional
```
    counts = seq(xs).map(lambda x: (x, 1)).reduce_by_key(lambda x, y: x + y)
    total = counts.map(lambda x: x[1]).sum()
    result_pdf = counts.map(lambda x: (x[0], x[1] / total)).sorted().list()
    return result_pdf
    
```
* Less side effects
* Less State / more immutability
* Referential transparency / pure functions
* Open for scalability / parallelization
