Esta notebook está basada en la publicacion [All about Feature Scaling](https://towardsdatascience.com/all-about-feature-scaling-bcc0ad75cb35) de Baijayanta Roy en Towards Data Science

In [1]:
import pandas as pd
import numpy as np
import matplotlib as plt
#%matplotlib inline

In [2]:
df = pd.DataFrame({'weight': [15, 18, 12,10],
                   'price': [1,3,2,5]},
                 index = ['Orange','Apple','Banana','Grape'])
df

Unnamed: 0,weight,price
Orange,15,1
Apple,18,3
Banana,12,2
Grape,10,5


In [3]:
weight_range = df.weight.max() - df.weight.min()
price_range = df.price.max() - df.price.min()
weight_range,price_range

(8, 4)

Notar que los rangos de las columnas weight y price son diferentes.

### Min-Max-Scaler

In [4]:
from sklearn.preprocessing import MinMaxScaler

In [5]:
scaler = MinMaxScaler()

In [6]:
df_1 = pd.DataFrame(scaler.fit_transform(df),
                   columns = ['weight', 'price'],
                    index = ['Orange','Apple','Banana','Grape'])
display(df, df_1)

Unnamed: 0,weight,price
Orange,15,1
Apple,18,3
Banana,12,2
Grape,10,5


Unnamed: 0,weight,price
Orange,0.625,0.0
Apple,1.0,0.5
Banana,0.25,0.25
Grape,0.0,1.0


In [7]:
weight_range = df_1.weight.max() - df_1.weight.min()
price_range = df_1.price.max() - df_1.price.min()
weight_range,price_range

(1.0, 1.0)

Despues del escalado, los rangos de las dos columnas son iguales

### Max Abs Scaler

In [8]:
from sklearn.preprocessing import MaxAbsScaler

In [9]:
scaler = MaxAbsScaler()

In [10]:
df_2 = pd.DataFrame(scaler.fit_transform(df),
                   columns = ['weight', 'price'],
                    index = ['Orange','Apple','Banana','Grape'])
display(df, df_2)

Unnamed: 0,weight,price
Orange,15,1
Apple,18,3
Banana,12,2
Grape,10,5


Unnamed: 0,weight,price
Orange,0.833333,0.2
Apple,1.0,0.6
Banana,0.666667,0.4
Grape,0.555556,1.0


In [11]:
weight_range = df_2.weight.max() - df_2.weight.min()
price_range = df_2.price.max() - df_2.price.min()
weight_range,price_range

(0.4444444444444444, 0.8)

### Robust Scaler

Vamos a incorporar en el dataset un valor outliar

In [12]:
dfr = pd.DataFrame({'WEIGHT': [15, 18, 12,10,50],
                   'PRICE': [1,3,2,5,20]},
                   index = ['Orange','Apple','Banana','Grape','Jackfruit'])
dfr

Unnamed: 0,WEIGHT,PRICE
Orange,15,1
Apple,18,3
Banana,12,2
Grape,10,5
Jackfruit,50,20


In [13]:
from sklearn.preprocessing import RobustScaler

In [14]:
scaler = RobustScaler()

In [15]:
df_3 = pd.DataFrame(scaler.fit_transform(dfr),
                   columns = ['weight', 'price'],
                   index = ['Orange','Apple','Banana','Grape','Jackfruit'])
display(df, df_3)

Unnamed: 0,weight,price
Orange,15,1
Apple,18,3
Banana,12,2
Grape,10,5


Unnamed: 0,weight,price
Orange,0.0,-0.666667
Apple,0.5,0.0
Banana,-0.5,-0.333333
Grape,-0.833333,0.666667
Jackfruit,5.833333,5.666667


In [16]:
weight_range = df_3.weight.max() - df_3.weight.min()
price_range = df_3.price.max() - df_3.price.min()
weight_range,price_range

(6.666666666666666, 6.333333333333334)