In [1]:
import re
import numpy as np
import pandas as pd

In [2]:
i = 0
for i in range(4):
    i += 1
    print(i)

1
2
3
4


# Functional Paradigm Intro

What other paradigms we have experienced?

> <b> Procedural Programming </b>
- Instructions are procedures.

> <b> Objected Oriented Programming </b>
- Instructions are grouped as part of a state of an object.

> <b> Functional Programming </b>
- No state exists. Just a serie of functions being evaluated. 
- The solution obtained is entirely based on the input. Like in math where <code>f(x) = y</code>
- This idea leads to the fact that you can also <b>pass functions as arguments</b>. And this helps a lot.


## Function definition

```python
def function_name(arg1):
    something = arg1 + 10
    return something
```

## Functions as variables

In [3]:
soma_1 = lambda x: x + 1

In [None]:
def soma_1_c(x):
    return x + 1

In [4]:
soma_1(10)

11

In [8]:
soma_2 = lambda x: x + 2

In [9]:
somar_n = lambda x, n: x + n

In [10]:
soma_1 = lambda x: somar_n(x, 1)

In [11]:
soma_1(10)

11

## Mapping concept

In [12]:
lista_exemplo = [10, 12, 34, 23, 2, 6, 7]

In [13]:
def div_2(x):
    return x/2

### Now, how to apply that function to all elements of this list?

In [14]:
div_2(lista_exemplo)

TypeError: unsupported operand type(s) for /: 'list' and 'int'

In [15]:
div_2(np.array(lista_exemplo))

array([ 5. ,  6. , 17. , 11.5,  1. ,  3. ,  3.5])

In [16]:
new_list = []

for item in lista_exemplo:
    new_list.append(div_2(item))
    
new_list 

[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

In [17]:
[div_2(item) for item in lista_exemplo]

[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

In [18]:
for i in map(div_2, lista_exemplo):
    print(i)

5.0
6.0
17.0
11.5
1.0
3.0
3.5


Map is called `lazy`. When you run `map(function, my_list)`, it doesn't execute anything. It just stores what it needs to perform. Whenever you call it, it washes out the result.

In [19]:
list(map(div_2, lista_exemplo))

[5.0, 6.0, 17.0, 11.5, 1.0, 3.0, 3.5]

In [20]:
resultado_map = map(div_2, lista_exemplo)
for i in resultado_map:
    print(i)

5.0
6.0
17.0
11.5
1.0
3.0
3.5


In [21]:
list(resultado_map)

[]

In [23]:
resultado_map = map(div_2, lista_exemplo)
list(resultado_map)[:2]

[5.0, 6.0]

In [24]:
list(resultado_map)

[]

### Lazy evaluation

Functional programming allows the idea of not calculating the whole function at once. 

These methods return only a `python object`. This haven't calculated nothing yet. As soon as you require the results, it calculates it.

In [26]:
lista_telefones = [
    19999571559, '(21) 2412-0107', '(34) 99762-1166', '91-4002-8282',
    '(19) 3542-1820', '(19) 3561-9525', '(34) 3333-5802'
]
pattern = r'[0-9]+'

In [33]:
''.join(['1', '2', '3'])

'123'

In [34]:
lista_dds = list(map(lambda x: ''.join(re.findall(pattern, str(x)))[:2], lista_telefones))
print(lista_dds)

TypeError: expected string or bytes-like object

In [28]:
for ddd in map(lambda x: ''.join(re.findall(pattern, str(x)))[:2], lista_telefones):
    print(ddd)

19
21
34
91
19
19
34


In [29]:
set_dds = set(map(lambda x: ''.join(re.findall(pattern, str(x)))[:2], lista_telefones))
print(set_dds)

{'19', '21', '91', '34'}


## Filter

`filter` helps removing elements of a list (or any iterator, anything you can run through) by passing a function that returns `True` or `False`. `filter` will also return a `python object`, but when you require it to show you the results, it will filter out every item that has return `False` on your function.

In [36]:
lista_telefones = [
    19999571559, '(21) 2412-0107', '(34) 99762-1166', '91-4002-8282',
    '(19) 3542-1820', '(19) 3561-9525', '(34) 3333-5802'
]
def extrair_ddd(telefone):
    '''
    Recebe um telefone e retorna seu DDD
    
    telefone (str or int): Telefone onde os dois primeiros digitios numéricos são o DDD
    '''
    return ''.join(re.findall(pattern, str(telefone)))[:2]

In [37]:
lista_ddd_19 = list(filter(lambda x: True if extrair_ddd(x) == '19' else False, lista_telefones))
print(lista_ddd_19)

[19999571559, '(19) 3542-1820', '(19) 3561-9525']


In [38]:
filtro_19 = filter(lambda x: True if extrair_ddd(x) == '19' else False, lista_telefones)
for telefone in filtro_19:
    print(telefone)

19999571559
(19) 3542-1820
(19) 3561-9525


In [48]:
map_19 = filter(lambda x: extrair_ddd(x) == '19', lista_telefones)
print(map_19)

<filter object at 0x0000020500488C10>


In [49]:
for i in map_19:
    print(i)

19999571559
(19) 3542-1820
(19) 3561-9525


In [45]:
list(map_19)

[]

In [40]:
[telefone for telefone in lista_telefones if extrair_ddd(telefone) == '19']

[19999571559, '(19) 3542-1820', '(19) 3561-9525']

## Reduce

Reduce brings the idea of an `accumulator`. Imagine you have a function that performs a `sum` for each pair of arguments. `reduce` (from the library `functools`) will consider the first argument of your function an `accumulator` and will run through your iterator recursively applying your function for pairs of items.

For example, for the list [1,4,6,8]

If you perform the following function:
```python
def sum_two_elements(a,b):
    return a+b
```

as 
```python
reduce( sum_two_elements, [1,4,6,8] )
```

The steps it will perform are:
```python
a = 0 # accumulator
b = 1 # value
a + b = 1 # so the accumulator receives this cummulative sum

a = 1 # accumulator
b = 4 # value
a + b = 5
...
a = 5 # accumulator
b = 6 # value 
a + b = 11
...
a = 11 # accumulator
b = 8 # value
a + b = 19

return 19
```

In [54]:
from functools import reduce

#### Exemplo 1: Números

In [55]:
def somar_ab(a,b):
    print(f'a={a}, b={b}')
    return a+b

In [56]:
lista_numeros = [1, 2, 3, 4, 5]
reduce(somar_ab, lista_numeros)

a=1, b=2
a=3, b=3
a=6, b=4
a=10, b=5


15

In [61]:
def comp_ab(x,y):
    print(f'a={x}, b={y}')
    if x > y:
        return x
    else:
        return y

reduce(comp_ab, [2, 10, 25, 1, -10, 13, 40, 20])

a=2, b=10
a=10, b=25
a=25, b=1
a=25, b=-10
a=25, b=13
a=25, b=40
a=40, b=20


40

#### Exemplo 2: Strings

In [62]:
lista_letras = ['P', 'e', 'd', 'r', 'o']

In [63]:
reduce(lambda x, y: x + y, lista_letras)

'Pedro'

In [64]:
lista_nomes = ['Amapá', 'Roraima', 'Pará', 'Piauí', 'Maranhão']
reduce(lambda x, y: x if len(x) > len(y) else y, lista_nomes)

'Maranhão'

#### Exemplo 3: Booleanos

In [67]:
lista_bool = [True, True, False, True, True]
reduce(lambda x, y: True if x and y else False, lista_bool)

False

In [68]:
lista_bool = [True, False, False, False, False]
reduce(lambda x, y: True if x or y else False, lista_bool)

True

#### Exemplo 4: Accumulator

In [69]:
from itertools import accumulate

In [70]:
lista_numeros = [1, 2, 3, 4, 5]
list(accumulate(lista_numeros, lambda x, y: x + y))

[1, 3, 6, 10, 15]

In [71]:
lista_nomes = ['Amapá', 'Roraima', 'Pará', 'Piauí', 'Maranhão']
list(accumulate(lista_nomes, lambda x, y: x if len(x) > len(y) else y))

['Amapá', 'Roraima', 'Roraima', 'Roraima', 'Maranhão']

---
# Mapping on Pandas

> <code> df['col_name'].apply() </code>

## Exemplo 1

In [72]:
file_address = 'http://www.statsci.org/data/general/sleep.txt'
tb_sleep = pd.read_csv(file_address, sep='\t')

In [73]:
tb_sleep.head()

Unnamed: 0,Species,BodyWt,BrainWt,NonDreaming,Dreaming,TotalSleep,LifeSpan,Gestation,Predation,Exposure,Danger
0,Africanelephant,6654.0,5712.0,,,3.3,38.6,645.0,3,5,3
1,Africangiantpouchedrat,1.0,6.6,6.3,2.0,8.3,4.5,42.0,3,1,3
2,ArcticFox,3.385,44.5,,,12.5,14.0,60.0,1,1,1
3,Arcticgroundsquirrel,0.92,5.7,,,16.5,,25.0,5,2,3
4,Asianelephant,2547.0,4603.0,2.1,1.8,3.9,69.0,624.0,3,5,4


In [74]:
tb_sleep['brain_wt_kg'] = tb_sleep['BrainWt']/1000

In [75]:
tb_sleep['ratio_brain'] = tb_sleep['brain_wt_kg']/tb_sleep['BodyWt']

In [76]:
tb_sleep.head()

Unnamed: 0,Species,BodyWt,BrainWt,NonDreaming,Dreaming,TotalSleep,LifeSpan,Gestation,Predation,Exposure,Danger,brain_wt_kg,ratio_brain
0,Africanelephant,6654.0,5712.0,,,3.3,38.6,645.0,3,5,3,5.712,0.000858
1,Africangiantpouchedrat,1.0,6.6,6.3,2.0,8.3,4.5,42.0,3,1,3,0.0066,0.0066
2,ArcticFox,3.385,44.5,,,12.5,14.0,60.0,1,1,1,0.0445,0.013146
3,Arcticgroundsquirrel,0.92,5.7,,,16.5,,25.0,5,2,3,0.0057,0.006196
4,Asianelephant,2547.0,4603.0,2.1,1.8,3.9,69.0,624.0,3,5,4,4.603,0.001807


In [78]:
def maior_1p(ratio):
    if ratio > 0.01:
        return 'Pesado'
    else:
        return 'Leve'

In [79]:
maior_1p(tb_sleep['ratio_brain'])

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [81]:
tb_sleep['heavy_brain'] = tb_sleep['ratio_brain'].apply(maior_1p)

In [82]:
tb_sleep.head()

Unnamed: 0,Species,BodyWt,BrainWt,NonDreaming,Dreaming,TotalSleep,LifeSpan,Gestation,Predation,Exposure,Danger,brain_wt_kg,ratio_brain,heavy_brain
0,Africanelephant,6654.0,5712.0,,,3.3,38.6,645.0,3,5,3,5.712,0.000858,Leve
1,Africangiantpouchedrat,1.0,6.6,6.3,2.0,8.3,4.5,42.0,3,1,3,0.0066,0.0066,Leve
2,ArcticFox,3.385,44.5,,,12.5,14.0,60.0,1,1,1,0.0445,0.013146,Pesado
3,Arcticgroundsquirrel,0.92,5.7,,,16.5,,25.0,5,2,3,0.0057,0.006196,Leve
4,Asianelephant,2547.0,4603.0,2.1,1.8,3.9,69.0,624.0,3,5,4,4.603,0.001807,Leve


In [84]:
def score_risco(P, E, D):
    '''
    Calcula um score de risco com base na proporção entre os dois piores scores de risco.
    P (int): score Predation
    E (int): score Exposure
    D (int): score Danger
    '''
    lista_scores = [P,E,D]
    max_score = max(lista_scores)
    lista_scores.remove(max_score)
    score_risco = (max(lista_scores)+max_score)/2
    return score_risco

In [86]:
tb_sleep[['Predation', 'Exposure', 'Danger']].apply(sum, axis = 1)

0     11
1      7
2      3
3     10
4     12
      ..
57     7
58     7
59    11
60     4
61     5
Length: 62, dtype: int64

In [85]:
tb_sleep[['Predation', 'Exposure', 'Danger']].apply(score_risco)

TypeError: score_risco() missing 2 required positional arguments: 'E' and 'D'

In [93]:
def score_risco_p(row):
    return score_risco(row['Predation'], 
                       row['Exposure'], 
                       row['Danger'])
score_risco_p(2)

TypeError: 'int' object is not subscriptable

In [None]:
tb_sleep.apply(lambda x: score_risco(x['Predation'], x['Exposure'], x['Danger']), axis = 1)

In [None]:
tb_sleep[['Predation', 'Exposure', 'Danger']].apply(lambda x: score_risco(*x), axis = 1)

In [88]:
def somar_ab(a, b):
    return a + b

In [89]:
somar_ab(1, 2)

3

In [90]:
dupla = (1,2)
somar_ab(dupla)

TypeError: somar_ab() missing 1 required positional argument: 'b'

In [91]:
somar_ab(*dupla)

3

In [92]:
tb_sleep[['Predation', 'Exposure', 'Danger']].apply(lambda x: score_risco(*x), axis = 1)

0     4.0
1     3.0
2     1.0
3     4.0
4     4.5
     ... 
57    3.0
58    2.5
59    4.0
60    1.5
61    2.0
Length: 62, dtype: float64

## Exemplo 2

In [94]:
lista_telefones = [
    19999571559, '(21) 2412-0107', '(34) 99762-1166', '91-4002-8282',
    '(19) 3542-1820', '(19) 3561-9525', '(34) 3333-5802'
]
tb_telefone = pd.DataFrame(lista_telefones, columns=['telefones'])
tb_telefone

Unnamed: 0,telefones
0,19999571559
1,(21) 2412-0107
2,(34) 99762-1166
3,91-4002-8282
4,(19) 3542-1820
5,(19) 3561-9525
6,(34) 3333-5802


In [95]:
pattern = r'[0-9]+'
def extrair_ddd(telefone):
    '''
    Extrai o DDD de um telefone
    telefone (str or numeric): telefone onde os dois primeiros digitos numéricos são o DDD
    '''
    return ''.join(re.findall(pattern, str(telefone)))[:2]

In [96]:
extrair_ddd(1935613870)

'19'

In [97]:
extrair_ddd(tb_telefone)

'01'

In [98]:
str(tb_telefone)

'         telefones\n0      19999571559\n1   (21) 2412-0107\n2  (34) 99762-1166\n3     91-4002-8282\n4   (19) 3542-1820\n5   (19) 3561-9525\n6   (34) 3333-5802'

In [99]:
tb_telefone['DDD'] = tb_telefone['telefones'].apply(extrair_ddd)

In [100]:
tb_telefone

Unnamed: 0,telefones,DDD
0,19999571559,19
1,(21) 2412-0107,21
2,(34) 99762-1166,34
3,91-4002-8282,91
4,(19) 3542-1820,19
5,(19) 3561-9525,19
6,(34) 3333-5802,34


## Apply functions with arguments

In [104]:
def maior_custom(valor, patamar):
    if valor > patamar:
        return 'Pesado'
    else:
        return 'Leve'

In [106]:
tb_sleep['ratio_brain'].apply(maior_custom)

TypeError: maior_custom() missing 1 required positional argument: 'patamar'

In [105]:
tb_sleep['ratio_brain'].apply(lambda x: maior_custom(x, 0.005))

0       Leve
1     Pesado
2     Pesado
3     Pesado
4       Leve
       ...  
57    Pesado
58    Pesado
59    Pesado
60      Leve
61      Leve
Name: ratio_brain, Length: 62, dtype: object

In [111]:
tb_sleep['ratio_brain'].apply(maior_custom, args = (0.005,))

0       Leve
1     Pesado
2     Pesado
3     Pesado
4       Leve
       ...  
57    Pesado
58    Pesado
59    Pesado
60      Leve
61      Leve
Name: ratio_brain, Length: 62, dtype: object

# Voltamos 11h35