# Introduction to Functional Programming in Python

## WHAT is Functional Programming
There are two paradigm (of many): **Imperative Programming** and **Functional Programming**

### Imperative Programming

> A language is imperative because each statement is a command, which changes the state in some way.$^{[1]}$

Example: python, java, etc

### Functional Programming

> In a functional language, we replace the state—the changing values of variables—with a simpler notion of evaluating functions

**Note: Python is a hybrid language, can be imperative or functional programming**

## WHY use Functional Programming in Data Analysis

1. Close to **mathematical formalisms**.
2. More **expressive and efficient**.

## WHEN & WHERE to use Functional Programming (esp. for Data Analysis)

## HOW



### List, Set, Dict, Zip Comprehension

In [4]:
import numpy as np

def f():
    yield np.random.randint(4)
    
def f1():
    return np.random.randint(4)

In [18]:
## not directly return result
## more memory efficient
f()

<generator object f at 0x7ffaa44d5840>

In [14]:
## its value can be returned if we "force" them
## such as using list comprehensions
[
    i for i in f()
]

[3]

In [None]:
## sample in Spark
## collect will force to retrieve the data
## rdd.map(lambda x: x+1).map(lambda x: x/2).collect() 

In [6]:
## compare with this function, it directly returns the result
f1()

2

In [19]:
# list comprehension
a = [
    x for x in range(20, 30, 2)
]
a

[20, 22, 24, 26, 28]

In [20]:
# set comprehension
b = {x for x in range(20, 30)}
b

{20, 21, 22, 23, 24, 25, 26, 27, 28, 29}

In [26]:
## zip can handle two variables with different length
## it will only zip up to the shortest length

c = [1, 2, 3, 4]
d = ['a', 'b', 'c']
e = {
    k: v for k, v in zip(c, d)
}

e

{1: 'a', 2: 'b', 3: 'c'}

In [24]:
list(zip(c, d))

[(1, 'a'), (2, 'b'), (3, 'c')]

### Aggregation Functions

**Map**

apply a function to each elements in iterable and return the transformed result

In [28]:
some_list = [
    'a',
    'b',
    'c',
    'c',
    'b'
]

# decoding string to number
# note that map is lazy, it does not execute before "forced"
list(
    map(lambda x: 1 if x == 'a' else 2 if x == 'b' else 3, some_list)
)

[1, 2, 3, 3, 2]

**Filter**

Filter an iterable by condition provided by a boolean function

In [33]:
list(
    filter(
        lambda x: x == 'c',
        some_list
    )
)

['c', 'c']

**Sorted & Reverse**

`sorted`: well do sorting!


`reversed`: reverse the order of an iterable

In [34]:
print([x for x in sorted([3,1,19,7,4])])

[1, 3, 4, 7, 19]


In [35]:
print([x for x in reversed(range(10))])
print([x for x in reversed(['a', 'b', 'c', 'd'])])

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
['d', 'c', 'b', 'a']


## Exercise!

1. Create list of integer sequence max 5372, select only even numbers, change the value into 'TADA' if it is divisible by 6

In [52]:
# TODO
list(
    map(
        lambda x: 'TADA' if (x % 6 == 0) else x,
        list(
            filter (
                lambda x: x % 2 == 0,
                [x for x in np.arange(0,5373)]
            )
        )
    )
)[:20]

['TADA',
 2,
 4,
 'TADA',
 8,
 10,
 'TADA',
 14,
 16,
 'TADA',
 20,
 22,
 'TADA',
 26,
 28,
 'TADA',
 32,
 34,
 'TADA',
 38]

2. create dictionary of random integer (max 10) -> random float (max 7000), sum for every key, then sort the result

In [66]:
# TODO
import numpy as np
from functools import reduce
from itertools import groupby

reduce(
    lambda a, b: a+b,
    map(
        lambda x: x[1],
        [
            (
                np.random.randint(10),
                np.random.uniform(7000)
            ) for i in range(1000)
        ]
    )
)

## arie will give the answer later

3365888.021834589

## Pandas!

In [67]:
import pandas as pd
from sklearn.datasets import load_iris

In [68]:
data = pd.DataFrame(
    load_iris()['data'],
    columns=[
        'petal_length',
        'petal_width',
        'sepal_length',
        'sepal_width'
    ]
)

In [69]:
data.head()

Unnamed: 0,petal_length,petal_width,sepal_length,sepal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


### Apply

Apply a function for each axis, could be per column (axis=1) or per row (by default / axis=0)

In [72]:
data.apply(
    lambda series: series + 3
    , axis=1
).head()

Unnamed: 0,petal_length,petal_width,sepal_length,sepal_width
0,8.1,6.5,4.4,3.2
1,7.9,6.0,4.4,3.2
2,7.7,6.2,4.3,3.2
3,7.6,6.1,4.5,3.2
4,8.0,6.6,4.4,3.2


### Applymap

apply a function for each element ==> **per cell**

In [20]:
data.applymap(
    lambda element: element + 3
).head()

Unnamed: 0,petal_length,petal_width,sepal_length,sepal_width
0,8.1,6.5,4.4,3.2
1,7.9,6.0,4.4,3.2
2,7.7,6.2,4.3,3.2
3,7.6,6.1,4.5,3.2
4,8.0,6.6,4.4,3.2


### Map

In [21]:
data.petal_length.map(
    lambda x: x ** 2
).head()

0    26.01
1    24.01
2    22.09
3    21.16
4    25.00
Name: petal_length, dtype: float64

### Assign

[docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.assign.html)

**To create new field**

Pay attention: some functions pass by value (e.g. assign), while some others pass by reference.

In [32]:
data.assign(
    is_petal_large=lambda x: x.petal_length > 5
).head()

Unnamed: 0,petal_length,petal_width,sepal_length,sepal_width,is_petal_large
0,5.1,3.5,1.4,0.2,True
1,4.9,3.0,1.4,0.2,False
2,4.7,3.2,1.3,0.2,False
3,4.6,3.1,1.5,0.2,False
4,5.0,3.6,1.4,0.2,False


# Exercises!

Work each of the num
1. Generate users with sequence up to 1000 and save to `users` variable 
2. Generate `user_transactions` 10,000 transactions with random user_id, brand_name, n_cloth_purchase (up to 20), random price (float up to 280,000)

In [73]:
import numpy as np

In [74]:
cloth_brands = [
    'uniqlo',
    'nudie',
    'pull&bear',
    'skelly'
]
users = [
    # TODO: generate 1000 user with prefix `user_`
    'user_{}'.format(i) for i in range(1000)
]

# TODO
# data format
# (user_id, brand_name, n_cloth_purchase, price)
np.random.seed(42)
user_transactions = [
    (np.random.choice(users), np.random.choice(cloth_brands), np.random.randint(20), np.random.uniform(280000))
    for x in range(10000)
]

In [75]:
df = pd.DataFrame(
    user_transactions, 
    columns=[
        'user_id', 'brand_name', 'n_cloth_purchase', 'price'
    ]
)

In [76]:
df.head()

Unnamed: 0,user_id,brand_name,n_cloth_purchase,price
0,user_102,skelly,14,75042.428287
1,user_700,uniqlo,6,155167.275034
2,user_214,pull&bear,10,37471.545359
3,user_99,skelly,2,274236.362182
4,user_769,skelly,11,17206.180028


3. Construct price segment where if price < 1 std dev from mean then 'low', +- 1 std dev then 'mid', higher = 'high' 
4. Construct total price where total_price = n_cloth_purchase * price
5. Construct user segment the same way as construct price segment does, but in format 'F_[].M_[]'

In [97]:
def calculate_segment(series):
    # TODO
    mean = np.mean(series)
    std_dev = np.std(series)
    
    return [
        'low' if (x < (mean-std_dev)) 
        else 'mid' if
        (
            (x >= (mean-std_dev)) or 
            (x < (mean+std_dev))
        )
        else 'high'
        for x in series
    ]

def calculate_user_segment(d):
    # TODO
    monetary = calculate_segment(d['price'])
    frequency = calculate_segment(d['n_cloth_purchase'])
    
    return [
        'F_{}.M_{}'.format(x, y) for x, y in zip(
            monetary, 
            frequency
        )
    ]

In [98]:
df.assign(
    price_segment=lambda x: calculate_segment(x['price']),
    total_price=lambda x: x['n_cloth_purchase'] * x['price'],
    user_segment=lambda d: calculate_user_segment(
        df.assign(
            total_cloth_purchase = df.groupby('user_id')['n_cloth_purchase'].transform('sum'),
            total_price = df.groupby('user_id')[
                ['n_cloth_purchase','price']
            ].transform('sum').apply(
                lambda x: x['n_cloth_purchase'] * x['price'],
                axis=1
            ),
        )
    )
).head()

Unnamed: 0,user_id,brand_name,n_cloth_purchase,price,price_segment,total_price,user_segment
0,user_102,skelly,14,75042.428287,mid,1050594.0,F_mid.M_mid
1,user_700,uniqlo,6,155167.275034,mid,931003.7,F_mid.M_mid
2,user_214,pull&bear,10,37471.545359,low,374715.5,F_low.M_mid
3,user_99,skelly,2,274236.362182,mid,548472.7,F_mid.M_low
4,user_769,skelly,11,17206.180028,low,189268.0,F_low.M_mid


# References
[1] Functional Python Programming, Steven F. Lott.

[2] https://julien.danjou.info/python-and-functional-programming/

[3] https://arithmox.ai/pythonic-functional-programming-arithmox/

[4] https://github.com/sfermigier/awesome-functional-python

[5] https://docs.python.org/3/howto/functional.html