# Estudos avançados de Python

- Técnicas avançadas de manipulação e limpeza de dados.

- Organização dos dados.

In [1]:
import pandas as pd

## 1.0 Função Lambda

Definimos uma função tradicional da seguinte forma:

In [2]:
def calculation(a, b):
    c = a + b
    return c

In [3]:
calculation(5, 15)

20

Para definirmos a mesma função usando a função do tipo lambda, fazemos:

In [4]:
calculation_lambda = lambda a, b: a + b

In [5]:
calculation_lambda(5, 15)

20

A função lambda é usada quando temos uma função simples que não é replicada para vários casos.

## 2.0 Função Map

A função map aplica uma função em todas as posições de uma determinada estrutura de dados.

- map (funcao, estrutura_de_dados)

#### Função map em uma lista

In [6]:
col = ['Anastacia', 'Julicana', 'Anitta', 'Vega', 'Chun Li', 'Clotilde']

#function
snakecase = lambda x: x.lower()

map(snakecase, col)

<map at 0x12d5b6f4d90>

In [7]:
list( map(snakecase, col) )

['anastacia', 'julicana', 'anitta', 'vega', 'chun li', 'clotilde']

#### Função map em um dataframe

In [8]:
data = {'Name': ['Ashley', 'Shania', 'Tanish', 'Roxie'],
        'Age': ['19', '45', '27', '22'],
        'Height (cm)': [180, 160, 165, 157]}

data = pd.DataFrame( data )

In [9]:
data

Unnamed: 0,Name,Age,Height (cm)
0,Ashley,19,180
1,Shania,45,160
2,Tanish,27,165
3,Roxie,22,157


In [10]:
conversion_meter = lambda x: x/100

list ( map(conversion_meter, data['Height (cm)']) )

[1.8, 1.6, 1.65, 1.57]

In [11]:
data['Height (m)'] = list ( map(conversion_meter, data['Height (cm)']) )

In [12]:
data

Unnamed: 0,Name,Age,Height (cm),Height (m)
0,Ashley,19,180,1.8
1,Shania,45,160,1.6
2,Tanish,27,165,1.65
3,Roxie,22,157,1.57


Também podemos aplicar a função tradicional dentro da função map. Não precisa ser apenas a função lambda. Mas a função lambda é mais prática para operações mais simples.

## 3.0 Função Apply

A função apply funciona como a função map. A diferença é que o apply é uma função da biblioteca pandas e só conseguimos aplicar sobre dataframes.

In [13]:
data['Height (mm)'] = data['Height (m)'].apply(lambda x: int( x*1000) )
data

Unnamed: 0,Name,Age,Height (cm),Height (m),Height (mm)
0,Ashley,19,180,1.8,1800
1,Shania,45,160,1.6,1600
2,Tanish,27,165,1.65,1650
3,Roxie,22,157,1.57,1570


# Exemplos práticos com Apply e Lambda

In [14]:
path = "https://raw.githubusercontent.com/lucasquemelli/ds_ao_dev/main/data_raw.csv"

data_raw = pd.read_csv(path)

In [15]:
data_raw.head()

Unnamed: 0.1,Unnamed: 0,id,product_name,product_type,price,datetime,style_id,color_id,Fit,Composition,More sustainable materials,Size
0,0,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 185cm/6'1"" and wears a size 31/32"
1,1,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Shell: Cotton 99%, Spandex 1%",,"The model is 185cm/6'1"" and wears a size 31/32"
2,2,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 185cm/6'1"" and wears a size 31/32"
3,3,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Shell: Cotton 99%, Spandex 1%",,"The model is 185cm/6'1"" and wears a size 31/32"
4,4,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 185cm/6'1"" and wears a size 31/32"


#### Removendo cifrão para uma coluna

In [16]:
df = data_raw.copy()

data_raw['price'] = data_raw['price'].apply(lambda x: x.replace('$', ''))
data_raw.head()

Unnamed: 0.1,Unnamed: 0,id,product_name,product_type,price,datetime,style_id,color_id,Fit,Composition,More sustainable materials,Size
0,0,1024256001,Slim Jeans,men_jeans_slim,19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 185cm/6'1"" and wears a size 31/32"
1,1,1024256001,Slim Jeans,men_jeans_slim,19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Shell: Cotton 99%, Spandex 1%",,"The model is 185cm/6'1"" and wears a size 31/32"
2,2,1024256001,Slim Jeans,men_jeans_slim,19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 185cm/6'1"" and wears a size 31/32"
3,3,1024256001,Slim Jeans,men_jeans_slim,19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Shell: Cotton 99%, Spandex 1%",,"The model is 185cm/6'1"" and wears a size 31/32"
4,4,1024256001,Slim Jeans,men_jeans_slim,19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 185cm/6'1"" and wears a size 31/32"


#### Removendo cifrão a partir de condicional sobre a própria coluna que contém o cifrão

In [17]:
data_raw['price'].isna().sum()

0

Felizmente, não havia nenhum 'NA' na coluna 'price'. Mas, caso houvesse, a função 'replace' não seria executada. Então, deveríamos usar uma condicional na mesma coluna. Assim, faríamos:

In [19]:
df['price'].apply(lambda x: x.replace('$', '') if pd.notnull(x) else x)

0        19.99
1        19.99
2        19.99
3        19.99
4        19.99
         ...  
2001     19.99
2002     19.99
2003     19.99
2004     19.99
2005     19.99
Name: price, Length: 2006, dtype: object

#### Removendo cifrão a partir de condicional sobre uma outra coluna

Vamos remover o cifrão da coluna de preço apenas se a coluna de cor contiver 'gray'.

In [20]:
df

Unnamed: 0.1,Unnamed: 0,id,product_name,product_type,price,datetime,style_id,color_id,Fit,Composition,More sustainable materials,Size
0,0,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 185cm/6'1"" and wears a size 31/32"
1,1,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Shell: Cotton 99%, Spandex 1%",,"The model is 185cm/6'1"" and wears a size 31/32"
2,2,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 185cm/6'1"" and wears a size 31/32"
3,3,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Shell: Cotton 99%, Spandex 1%",,"The model is 185cm/6'1"" and wears a size 31/32"
4,4,1024256001,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,1024256,1,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 185cm/6'1"" and wears a size 31/32"
...,...,...,...,...,...,...,...,...,...,...,...,...
2001,2001,985197006,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,985197,6,Slim fit,"Shell: Cotton 99%, Spandex 1%",,"The model is 187cm/6'2"" and wears a size 31/32"
2002,2002,985197006,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,985197,6,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 187cm/6'2"" and wears a size 31/32"
2003,2003,985197006,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,985197,6,Slim fit,"Shell: Cotton 99%, Spandex 1%",,"The model is 187cm/6'2"" and wears a size 31/32"
2004,2004,985197006,Slim Jeans,men_jeans_slim,$ 19.99,2022-01-21 14:52:23,985197,6,Slim fit,"Pocket lining: Polyester 65%, Cotton 35%",,"The model is 187cm/6'2"" and wears a size 31/32"
