# Feature Scaling

- Feature Scaling refers to the methods or techniques used to normalize the range of independent variables in our data, or in other words, the methods to set the feature value range within a similar scale.
- Variables with bigger magnitude / Large value range dominate over those with smaller magnitude / value range.
- Scale of the features is an important consideration when building machine learning models.
- Feature scaling is generally the last step in the data preprocessing pipeline, performed just before training the machine learning algorithms.
- Preserves the shape of the original distribution.
- The minimum and maximum values of the different variables may vary.
- Preserves outliers.

## Feature Scaling importance is some ML algorithms.
- Gradient descent converges faster when features are on similar scales.
- Support Vector machines.
- K-means custering.
- Principal Component Analysis(PCA).

## Various Feature Scaling Techniques. 

- Standardisation.
- Normalisation. 

## Standardisation.

- Standardisation involves centering the variable mean at zero, and standardising the variance to 1.

####  Z = (x - x_mean) / std. 

#### Standardisation::
- Centers the mean at 0.
- Scales the variance at 1. 

In [32]:
## Example how standardization scalar in detail. 

ex = pd.DataFrame({"x":[1,2,3,4,5]})
ex

Unnamed: 0,x
0,1
1,2
2,3
3,4
4,5


In [33]:
ex["x"].mean()

3.0

In [34]:
ex["x"].std()

1.5811388300841898

In [38]:
ex["x_sc"] = (ex["x"] - ex["x"].mean()) / ex["x"].std()

## x = (x - x_mean) / x_std.

ex

Unnamed: 0,x,x_sc
0,1,-1.264911
1,2,-0.632456
2,3,0.0
3,4,0.632456
4,5,1.264911


In [40]:
## All the above can be done in couple of line using sklearn 

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

ex["x_sc_sk"] = sc.fit_transform(ex[["x"]])

ex

## out put value for x_sc and x_sc_sk do not match because for x_sc we took population mean but sk learn will take sample mean formula to calulate. 

Unnamed: 0,x,x_sc,x_sc_sk
0,1,-1.264911,-1.414214
1,2,-0.632456,-0.707107
2,3,0.0,0.0
3,4,0.632456,0.707107
4,5,1.264911,1.414214


In [47]:
### Example with data set. 
import pandas as pd

In [48]:
df = pd.read_csv("titanic.csv")

df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [49]:
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [50]:
df["Age"]

0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
       ... 
886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: Age, Length: 891, dtype: float64

In [51]:
df["Age"].fillna(df.Age.median(), inplace = True)

In [52]:
### Standaristion:: We use the standardscaler from sklearn library to do the above in once shot. 

from sklearn.preprocessing import StandardScaler

### Call the function. 

sc = StandardScaler()

### fit_transform

df['Age_sc'] = sc.fit_transform(df[["Age"]])

df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_sc
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S,-0.565736
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,0.663861
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S,-0.258337
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,0.433312
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,0.433312
...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S,-0.181487
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,-0.796286
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,28.0,1,2,W./C. 6607,23.4500,,S,-0.104637
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C,-0.258337


## Min Max Scaling(CNN -- Deep Learning Techiniques)
- Min Max Scaling scales the values between 0 to 1.X_scaled = (X - X.min / (X.max - X.min))

In [53]:
### Normalization:: We use the minmaxscaler from sklearn library. 

from sklearn.preprocessing import MinMaxScaler

##Call the function
min_max = MinMaxScaler()

##fit_transform
df["Age_mm"] = min_max.fit_transform(df[["Age"]])

df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Age_sc,Age_mm
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,-0.565736,0.271174
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,0.663861,0.472229
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,-0.258337,0.321438
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,0.433312,0.434531
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,0.433312,0.434531
