# Padronização e normalização

### StandardScaler
- Os novos dados tenham média 0 e desvio padrão 1
- Trata os outliers melhor, Facilita a convergencia dos modelos
</br></br>

### MinMaxScaler
- Normaliza os dados entre um valor mínimo e máximo.
- O padrão é entre 0 e 1, mas podemos alterar com  o parametro 'feature_range=(0, 1)'
</br></br>
### MaxAbsScaler
-  dividindo todos os dados pelo máximo absoluto daquela coluna
- Funciona muito bem para trabalhar com dados esparsos (dispersos / espalhados)

### RobustScaler
- Ideal quando os dados tem muitos outliers
- Vai usar o interquartil como base do redimensionamento (Q3 - Q1)

In [1]:
import pandas as pd

In [2]:
# Importando e visualizando a base
titanic = pd.read_csv('train2.csv')
titanic.head(2)

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked,Titulos
0,0,3,male,22.0,1,0,7.25,S,Mr
1,1,1,female,38.0,1,0,71.2833,C,Mrs


### Informações estatísticas dessa base

#### Pclass e o Age estão em escalas muito diferentes, podendo prejudicar o modelo

In [3]:
titanic.describe()

Unnamed: 0,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,891.0,891.0,891.0
mean,0.383838,2.308642,29.430535,0.523008,0.381594,32.204208
std,0.486592,0.836071,13.551396,1.102743,0.806057,49.693429
min,0.0,1.0,0.42,0.0,0.0,0.0
25%,0.0,2.0,21.0,0.0,0.0,7.9104
50%,0.0,3.0,28.724891,0.0,0.0,14.4542
75%,1.0,3.0,36.75,1.0,0.0,31.0
max,1.0,3.0,80.0,8.0,6.0,512.3292


# Utilizando StandardScaler

In [4]:
from sklearn.preprocessing import StandardScaler

### Instanciando e fazendo fit

In [5]:
scaler = StandardScaler()
scaler = scaler.fit(titanic[['Age']])

### Transformando os dados e adicionando ao dataFrame

In [6]:
titanic['Age_Scaler'] = scaler.transform(titanic[['Age']])

### Comparando estatisticamente as colunas
- Notamos a diferença entre elas: Media, Desvio padrão...

In [7]:
titanic[['Age','Age_Scaler']].describe()

Unnamed: 0,Age,Age_Scaler
count,891.0,891.0
mean,29.430535,-1.594933e-16
std,13.551396,1.000562
min,0.42,-2.141981
25%,21.0,-0.622465
50%,28.724891,-0.05210091
75%,36.75,0.5404297
max,80.0,3.733775


# Utilizando MinMaxScaler

In [8]:
from sklearn.preprocessing import MinMaxScaler

### Instanciando e fazendo fit

In [9]:
scaler = MinMaxScaler(feature_range=(0,2))
scaler = scaler.fit(titanic[['Age']])

### Fazendo a transformação e adicionando ao dataFrame

In [10]:
titanic['Age_minmax'] = scaler.transform(titanic[['Age']])

### Visualizando os dados e comarando com a função anterior e valor inicial

In [11]:
titanic[['Age','Age_Scaler','Age_minmax']].describe()

Unnamed: 0,Age,Age_Scaler,Age_minmax
count,891.0,891.0,891.0
mean,29.430535,-1.594933e-16,0.729091
std,13.551396,1.000562,0.340573
min,0.42,-2.141981,0.0
25%,21.0,-0.622465,0.517215
50%,28.724891,-0.05210091,0.711357
75%,36.75,0.5404297,0.913043
max,80.0,3.733775,2.0


# MaxAbsScaler 

In [12]:
from sklearn.preprocessing import MaxAbsScaler

### Instanciando e fazendo fit

In [13]:
scaler = MaxAbsScaler()
scaler = scaler.fit(titanic[['Age']])

### Transformando e adicionando no dataFrame


In [14]:
titanic['Age_maxabs'] = scaler.transform(titanic[['Age']])

### Visualizando os dados e comarando com a função anterior e valor inicial

In [15]:
titanic[['Age','Age_Scaler','Age_minmax']].describe()

Unnamed: 0,Age,Age_Scaler,Age_minmax
count,891.0,891.0,891.0
mean,29.430535,-1.594933e-16,0.729091
std,13.551396,1.000562,0.340573
min,0.42,-2.141981,0.0
25%,21.0,-0.622465,0.517215
50%,28.724891,-0.05210091,0.711357
75%,36.75,0.5404297,0.913043
max,80.0,3.733775,2.0


# RobustScaler

In [16]:
from sklearn.preprocessing import RobustScaler

### Instanciando e fazendo fit

In [17]:
scaler = RobustScaler()
scaler = scaler.fit(titanic[['Age']])

### Transformando e adicionando ao dataFrame

In [18]:
titanic['Age_robust'] = scaler.transform(titanic[['Age']])

### Visualizando os dados e comarando com a função anterior e valor inicial

In [19]:
titanic[['Age','Age_Scaler','Age_minmax', "Age_robust"]].describe()

Unnamed: 0,Age,Age_Scaler,Age_minmax,Age_robust
count,891.0,891.0,891.0,891.0
mean,29.430535,-1.594933e-16,0.729091,0.044803
std,13.551396,1.000562,0.340573,0.860406
min,0.42,-2.141981,0.0,-1.797136
25%,21.0,-0.622465,0.517215,-0.490469
50%,28.724891,-0.05210091,0.711357,0.0
75%,36.75,0.5404297,0.913043,0.509531
max,80.0,3.733775,2.0,3.255562
