## Feature Scaling

Feature scaling in data science involves normalizing or standardizing numerical features to bring them to a similar scale. This ensures that no particular feature dominates the others, preventing biases in data science models. 


Additional reasons for transformation:

1. To more closely approximate a theoretical distribution that has nice statistical properties. 
2. To spread out data more evenly.
3. To make data distribution more symmetric
4. to make relationships between variables more linear. 
5. TO make data more constant in variance (homoscedasticity). 

#### There are 3 most used ways to scale features. 
1. __Min Max Scaling__: 
Will scale the input to have minimum of 0 and maximum of 1. That is, it scales the data in the range of [0, 1] This is useful when the parameters have to be on same positive scale. But in this case, the outliers are lost. 
$$X_{norm} = \frac{X - X_{min}}{X_{max} - X_{min}}$$

2. __Standardization__:
Will scale the input to have mean of 0 and variance of 1. 
$$X_{stand} = \frac{X - \mu}{\sigma}$$

3. __Normalizing__: 
Will scale the input to make the norm of 1. For instance, for 3D data the 3 independent variables will lie on a unit Sphere. 

4. __Log Transformation__:
Taking the log of data after any of above transformation. 

Scaling inputs to unit norms is a common operation for text classification or clustering for instance.

For most applications, Standardization is recommended. Min Max Scaling is recommended for Neural Networks. Normalizing is recommended when Clustering eg. KMeans. 

In [1]:
import pandas as pd
import numpy as np

from sklearn.preprocessing import StandardScaler, Normalizer, MinMaxScaler

df = pd.read_csv('Data.csv').dropna()
print(df)
X = df[["Age", "Salary"]].values.astype(np.float64)

   Country   Age   Salary Purchased
0   France  44.0  72000.0        No
1    Spain  27.0  48000.0       Yes
2  Germany  30.0  54000.0        No
3    Spain  38.0  61000.0        No
5   France  35.0  58000.0       Yes
7   France  48.0  79000.0       Yes
8  Germany  50.0  83000.0        No
9   France  37.0  67000.0       Yes


**Exercice**: apply the [`StandardScaler()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html), [`Normalizer()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html), and [`MinMaxScaler()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) methods to the data `X`.

In [4]:
standard_scaler = StandardScaler()
normalizer = Normalizer()
min_max_scaler = MinMaxScaler()

print("Standardization")
X_standardized = standard_scaler.fit_transform(X)
print(X)

print("Normalizing")
X_normalized = normalizer.fit_transform(X)
print(X)

print("MinMax Scaling")
X_minmax = min_max_scaler.fit_transform(X)
print(X)

Standardization
[[4.4e+01 7.2e+04]
 [2.7e+01 4.8e+04]
 [3.0e+01 5.4e+04]
 [3.8e+01 6.1e+04]
 [3.5e+01 5.8e+04]
 [4.8e+01 7.9e+04]
 [5.0e+01 8.3e+04]
 [3.7e+01 6.7e+04]]
Normalizing
[[4.4e+01 7.2e+04]
 [2.7e+01 4.8e+04]
 [3.0e+01 5.4e+04]
 [3.8e+01 6.1e+04]
 [3.5e+01 5.8e+04]
 [4.8e+01 7.9e+04]
 [5.0e+01 8.3e+04]
 [3.7e+01 6.7e+04]]
MinMax Scaling
[[4.4e+01 7.2e+04]
 [2.7e+01 4.8e+04]
 [3.0e+01 5.4e+04]
 [3.8e+01 6.1e+04]
 [3.5e+01 5.8e+04]
 [4.8e+01 7.9e+04]
 [5.0e+01 8.3e+04]
 [3.7e+01 6.7e+04]]
