# Functional Programming for Data Science
---
**Neal Ó Riain**

PyData Dublin, 30$^\mathsf{th}$ August 2018

# ``` $ whoami```
---

<img src="imgs/me.jpg" width="35%" align="right"> 
 
 * Former Astrophysicist (🔭, 🚀, 🌝)
 
 
<br> 
 
 
 * Current Data Scientist at Amazon.
 
 
 <br> 
 
 
 * ((Semi-) Pragmatic) Functional Programmer.

# Outline

* What is FP and why would I use it?
 
 
<br> 
 
* Some FP primitives in Python

<br> 
 
* Example



<center>
<H1> What is Functional Programming? <H1>
</center>

<center>
<H1> What is a <em>Function</em>? <H1>
</center>

```C
#include <stdio.h>

main()
{
        printf("hello, world\n");
}
```

<center>
<H1>Structured Programming<H1>
</center>

# Structured Data

![combine](imgs/data-structures.png)

# Structured Code

```python
def problem():
    s1 = sub_problem1()
    s2 = sub_problem2()
    return s1 + s2

def sub_problem1():
    
def sub_problem2():
    
def sub_sub_problem1():
```

<center>
<H1>Modularity!<H1>
<img src="imgs/lego.gif" height="500px" width="500px">
</center>


<center>
<H1>Modularity!<H1>
</center>

<center>
Re-usable
</center>

<center>
Easier to code
</center>

<center>
Debug-able
</center>

```python
def problem():
    s1 = sub_problem1()
    s2 = sub_problem2()
    return s1 + s2

def sub_problem1():
    
def sub_problem2():
    
def sub_sub_problem1():
```

<center>
<img src="imgs/glue.jpg" width="500px">
</center>

# Glue

* Purity

<br>

* Static Analysis

<br>

* Laziness

<br>

* Higher Order Fucntions


<center>
<h2>FP in Python</h2>
</center>

<center>
<h2>Purity</h2>
</center>

In [1]:
flag = True

def not_pure(x):
    if flag:
        return x / 10
    return x * 10

In [2]:
def not_pure_either(x):
    with open('test.txt') as f:
        print(x, f.read())

In [3]:
data = [1, 2, 3, 4]
n = 5

def scale(lst):
    for i in range(len(lst)):
        lst[i] = lst[i] * n
    return lst

result = scale(data)

In [4]:
print(result)
print(data)

[5, 10, 15, 20]
[5, 10, 15, 20]


In [5]:
data = [1, 2, 3, 4]

def scale(lst, s=5):
    return [s * v for v in lst]

print(result)
print(data)

[5, 10, 15, 20]
[1, 2, 3, 4]


<center>
<H1>Static Analysis</H1>
</center>

In [6]:
data = [1, 2, 3, 4]

def scale(lst, s=5):
    return [s * v for v in lst]

print(result)
print(data)

[5, 10, 15, 20]
[1, 2, 3, 4]


In [7]:
scale(['1', '2', '3', '4'])

['11111', '22222', '33333', '44444']

In [8]:
from typing import List

Vector = List[int]

def scale(lst: Vector, s: int=5) -> Vector:
    return [s * v for v in lst]


In [9]:
scale(['1', '2', '3', '4'])

['11111', '22222', '33333', '44444']

![mypy](imgs/mypy.png)

<center>
<h2>Laziness</h2>
</center>

```python
for char in 'python':
    
for value in [1, 2, 3, 4]:

for key in {'A': 1, 'B': 2}:
```


<center>
$$g(f(x))$$
</center>

In [10]:
def numbers(x: int=0) -> int:
    while True:
        yield x
        x += 1
        
n = numbers()

print(next(n))
print(next(n))
print(next(n))
print(next(n))

0
1
2
3


In [11]:
from itertools import takewhile

def predicate(x: int) -> bool:
    p = x**2 + 10 * x + 50
    return p < 1000

odd_nums = (x for x in numbers() if x % 2)
nums_lt = takewhile(predicate, odd_nums)
sum(nums_lt)

169

<center>
    <H1> Higher Order Functions</H1>
</center>

In [12]:
names = ['alice', 'bob', 'eve'] 

capitalised = []
for name in names:
    capitalised.append(str.capitalize(name))

print(capitalised)

['Alice', 'Bob', 'Eve']


```python
data = [values] 

output = []
for value in data:
        output.append(function(value))
```

<center>
<pre>loop_and_append(function, data)</pre>
</center>

In [13]:

list(map(str.capitalize, ['alice', 'bob', 'eve']))


['Alice', 'Bob', 'Eve']

# Filter

<br>

```python
data = [values] 

output = []
for value in data:
    if predicate(value):
        output.append(value)
```

In [14]:

list(filter(lambda x: x > 10, [2, 57, 41, 5, 92, 84, 2.3]))

[57, 41, 92, 84]

# Reduce


<center>
$g(f,\; [x_1, x_2, x_3],\;i) \rightarrow f(i,\;f(x_1,\;f(x_2,\;x_3)))$
</center>

In [15]:
from functools import reduce
from operator import add

reduce(add, [1, 2, 3, 4], 0) #sum

10

In [16]:
from functools import reduce
from operator import mul

reduce(mul, [1, 2, 3, 4], 1) #factorial

24

In [17]:
from functools import reduce
from operator import mul, add

reduce(add, map(mul, [1, 2, 3, 4], [2, 3, 4, 5])) #dot product

40

# Currying

<center>
$f(x, y, z) \rightarrow f(x)(y)(z)$
</center>

In [18]:
from toolz import curry

def add_and_scale(x: int, y: int, z: int) -> int:
    return (x + y) * z

add_and_scale = curry(add_and_scale)

add_and_scale(10)(20)(2)

60

# Composition

<center>
$g(f(x)) \rightarrow (g\cdot f)(x)$
</center>

In [19]:
from toolz import compose

def add(x: int, y: int) -> int:
    return (x + y)

def scale(x: int, y: int) -> int:
    return x * y

scale2 = curry(scale)(2)

add_and_scale2 = compose(scale2, add)

add_and_scale2(10, 20)

60

<center>
<H1>Example</H1>
</center>

In [20]:
from glob import glob
import random 

In [21]:
random.sample(glob('lyrics/billboard/*'), 20)

['lyrics/billboard/send_one_your_love.txt',
 'lyrics/billboard/take_me_to_heart.txt',
 'lyrics/billboard/the_heat_is_on.txt',
 'lyrics/billboard/drops_of_jupiter_tell_me.txt',
 'lyrics/billboard/bubbly.txt',
 'lyrics/billboard/new_flame.txt',
 'lyrics/billboard/nothing_compares_2_u.txt',
 'lyrics/billboard/players_anthem.txt',
 'lyrics/billboard/handy_man.txt',
 'lyrics/billboard/another_one_bites_the_dust.txt',
 'lyrics/billboard/i_hate_this_part.txt',
 'lyrics/billboard/rainy_dayz.txt',
 'lyrics/billboard/wake_me_up_when_september_ends.txt',
 'lyrics/billboard/my_band.txt',
 'lyrics/billboard/before_you_walk_out_of_my_life__like_this_and_like_that.txt',
 'lyrics/billboard/not_over_you.txt',
 'lyrics/billboard/then_came_you.txt',
 'lyrics/billboard/wannabe.txt',
 'lyrics/billboard/im_a_fool.txt',
 'lyrics/billboard/sensitivity.txt']

In [22]:
from string import punctuation
from collections import defaultdict
punc = str.maketrans({p:None for p in punctuation})

In [23]:
def wordcount_imp(directory):

    d = defaultdict(int)
    for f in glob(directory):
        for line in open(f, 'r'):
            line = line.split()
            line = [w.lower().translate(punc) for w in line]
            for s in line:
                d[s] += 1

    return {k:d[k] for k in d.keys() if len(k) >= 4}
    
words = wordcount_imp('lyrics/billboard/*')
sorted(words.items(), key=lambda x: x[1], reverse=True)[:10]

[('that', 14231),
 ('your', 13709),
 ('love', 13204),
 ('dont', 10315),
 ('know', 9993),
 ('like', 9597),
 ('just', 8528),
 ('with', 8173),
 ('baby', 7913),
 ('what', 7140)]

In [24]:
from toolz.curried import mapcat, frequencies, keyfilter, map

def stem(word: str) -> str:
    return word.lower().translate(punc)

def drop_word(word: str) -> bool:
    return len(word) >= 4

def freqs(items: List) -> dict:
    d = defaultdict(int)
    for i in items:
        d[i] += 1
    return d
    
workflow = (glob,
            mapcat(open),
            mapcat(str.split),
            map(stem),
            freqs,
            keyfilter(drop_word))

wordcount_f = compose(*reversed(workflow))
words = wordcount_f('lyrics/billboard/*')

In [25]:
from toolz.dicttoolz import merge_with, valmap

billboard = wordcount_f('lyrics/billboard/*')
dylan = wordcount_f('lyrics/dylan/*')

m = merge_with(sum, valmap(lambda x: x / sum(dylan.values()), dylan),
                    valmap(lambda x: -x / sum(billboard.values()), billboard))

In [26]:
def col_print(l, cols=5, width=12):

        group = zip(*[l[i::cols] for i in range(cols)])
        for row in group:
                print(''.join(word.ljust(width) for word in row))

In [27]:
print('\nDylan:')
col_print(sorted(m, key=m.get)[-40:])
print('\nBillboard:')
col_print(sorted(m, key=m.get)[:40])


Dylan:
hand        more        gone        many        woman       
blues       poor        wind        looked      them        
long        train       river       broken      home        
behind      hard        says        might       mama        
people      lonesome    theres      town        door        
will        must        dead        road        their       
went        been        said        lord        where       
there       from        down        they        well        

Billboard:
love        yeah        baby        know        want        
cause       dont        like        girl        wanna       
make        this        what        need        right       
your        take        feel        youre       just        
give        life        cant        gotta       lets        
thats       dance       really      keep        stop        
real        never       time        good        think       
heart       show        shit        wont        hold        


![contact](imgs/contact-card.png)