- *Method chaining* (encadeamento de métodos)
	- `.assign()`
	- `.pipe()`
- `for` vs *List comprehension*
- __Usando `.apply()`__
- `.apply(axis = 0)` 
- Barra de progresso usando `tqdm`
- Pandas Profiling para exploração dos dados e para garantir a qualidade dos dados
- Mostrar o dataframe usando o `style`
- Copiar e colar no Excel `.to_clipboard()`
- Funções de janela móvel (*window functions*)
- Fechamento 2o projeto

In [3]:
import pandas as pd
import numpy as np

# Usando apply

In [4]:
df_aux = pd.DataFrame({'A': np.arange(10, 101, 10), 
                       'B': np.arange(1, 11, 1)})

In [5]:
df_aux['A'] + df_aux['B']

0     11
1     22
2     33
3     44
4     55
5     66
6     77
7     88
8     99
9    110
dtype: int32

In [6]:
df_aux

Unnamed: 0,A,B
0,10,1
1,20,2
2,30,3
3,40,4
4,50,5
5,60,6
6,70,7
7,80,8
8,90,9
9,100,10


### Usando `.apply()` com `lambda` (função anônima)

In [7]:
A = print(1)

1


In [8]:
print(A)

None


In [26]:
df_aux.apply(lambda x: print(x), axis=0)

0     10
1     20
2     30
3     40
4     50
5     60
6     70
7     80
8     90
9    100
Name: A, dtype: int32
0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
Name: B, dtype: int32


A    None
B    None
dtype: object

In [14]:
df_aux.apply(lambda x: print(x), axis=1)

A    10
B     1
Name: 0, dtype: int32
A    20
B     2
Name: 1, dtype: int32
A    30
B     3
Name: 2, dtype: int32
A    40
B     4
Name: 3, dtype: int32
A    50
B     5
Name: 4, dtype: int32
A    60
B     6
Name: 5, dtype: int32
A    70
B     7
Name: 6, dtype: int32
A    80
B     8
Name: 7, dtype: int32
A    90
B     9
Name: 8, dtype: int32
A    100
B     10
Name: 9, dtype: int32


0    None
1    None
2    None
3    None
4    None
5    None
6    None
7    None
8    None
9    None
dtype: object

In [17]:
df_aux.apply(lambda x: x['A'] + x['B'], axis=1)

0     11
1     22
2     33
3     44
4     55
5     66
6     77
7     88
8     99
9    110
dtype: int32

In [27]:
df_aux['B'] % 2 == 0

0    False
1     True
2    False
3     True
4    False
5     True
6    False
7     True
8    False
9     True
Name: B, dtype: bool

### Usando `.apply()` com `lambda` e `if-else`

In [30]:
df_aux.apply(lambda x: x['A'] + x['B'] if x['B'] % 2 == 0 else 0, axis=1)

0      0
1     22
2      0
3     44
4      0
5     66
6      0
7     88
8      0
9    110
dtype: int64

### Usando `apply` com uma função definida por nós (*udf - user defined function*)

In [31]:
def soma_se_par(df):
    if (df['B'] % 2 == 0):
        return df['B'] + df['A']
    else:
        return 0

In [34]:
df_aux.apply(soma_se_par, axis=1)

0      0
1     22
2      0
3     44
4      0
5     66
6      0
7     88
8      0
9    110
dtype: int64

## Comparando a performance das *lists comprehension* com `.apply()`

In [35]:
%timeit [a + b if b % 2 == 0 else 0 for a,b in zip(df_aux['A'],df_aux['B'])]
%timeit df_aux.apply(soma_se_par, axis=1)

11.7 µs ± 92.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
309 µs ± 7.11 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [36]:
309  /11.7 

26.410256410256412

### Criando um dataframe maior e testando a performance

In [37]:
df_aux_maior = pd.concat([df_aux]*5000, ignore_index=True)

In [38]:
df_aux_maior

Unnamed: 0,A,B
0,10,1
1,20,2
2,30,3
3,40,4
4,50,5
...,...,...
49995,60,6
49996,70,7
49997,80,8
49998,90,9


In [39]:
%timeit [a + b if b % 2 == 0 else 0 for a,b in zip(df_aux_maior['A'],df_aux_maior['B'])]
%timeit df_aux_maior.apply(soma_se_par, axis=1)

12.5 ms ± 312 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
549 ms ± 14.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [40]:
549 /12.5  

43.92

### Mais um exemplo

In [45]:
[df_aux['B']**2 + df_aux['A']]

[0     11
 1     24
 2     39
 3     56
 4     75
 5     96
 6    119
 7    144
 8    171
 9    200
 dtype: int32]

In [42]:
df_aux.apply(lambda x: x['B']**2 + x['A'], axis=1)

0     11
1     24
2     39
3     56
4     75
5     96
6    119
7    144
8    171
9    200
dtype: int32

In [46]:
%timeit df_aux['B']**2+df_aux['A']
%timeit df_aux.apply(lambda x: x['B']**2+x['A'], axis=1)

161 µs ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
343 µs ± 10.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [44]:
340  /160  

2.125