# SCALING 

The Feature scaling is required because:
1.	Regression Coefficients are directly influenced by scale of Features
2.	Features with higher scale dominates over lower scale features
3.	Gradient Descent can be achieved easily if we have scaled values
4.	Some of the Algorithms would reduce time of execution, if scaled.
5.	Some Algorithms are based on Euclidean Distances, Euclidean distances are very sensitive to the feature scales.


We can use different Scaling Techniques in order to scale the input dataset. We can apply either of the following:
    
    1.Normalization (Impact of Outliers is very high)
    
    2.Standardization (Impact of Outliers is less)

## 1.Normalization

![1_14ReN_YksSci2ZX5Q2q4tg.png](attachment:1_14ReN_YksSci2ZX5Q2q4tg.png)

In the above equation:

    * Xmax and Xmin is Maximum and Minimum Value of the feature column
    
    * The value of X, is always between Minimum and Maximum Valu

In [2]:
import pandas as pd
import numpy as np

dataset = pd.read_csv("Advertising.csv")
dataset.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [3]:
X = dataset.iloc[:, :].values
X

array([[230.1,  37.8,  69.2,  22.1],
       [ 44.5,  39.3,  45.1,  10.4],
       [ 17.2,  45.9,  69.3,   9.3],
       [151.5,  41.3,  58.5,  18.5],
       [180.8,  10.8,  58.4,  12.9],
       [  8.7,  48.9,  75. ,   7.2],
       [ 57.5,  32.8,  23.5,  11.8],
       [120.2,  19.6,  11.6,  13.2],
       [  8.6,   2.1,   1. ,   4.8],
       [199.8,   2.6,  21.2,  10.6],
       [ 66.1,   5.8,  24.2,   8.6],
       [214.7,  24. ,   4. ,  17.4],
       [ 23.8,  35.1,  65.9,   9.2],
       [ 97.5,   7.6,   7.2,   9.7],
       [204.1,  32.9,  46. ,  19. ],
       [195.4,  47.7,  52.9,  22.4],
       [ 67.8,  36.6, 114. ,  12.5],
       [281.4,  39.6,  55.8,  24.4],
       [ 69.2,  20.5,  18.3,  11.3],
       [147.3,  23.9,  19.1,  14.6],
       [218.4,  27.7,  53.4,  18. ],
       [237.4,   5.1,  23.5,  12.5],
       [ 13.2,  15.9,  49.6,   5.6],
       [228.3,  16.9,  26.2,  15.5],
       [ 62.3,  12.6,  18.3,   9.7],
       [262.9,   3.5,  19.5,  12. ],
       [142.9,  29.3,  12.6,  15. ],
 

In [4]:
from sklearn.preprocessing import MinMaxScaler
norm = MinMaxScaler()

X = norm.fit_transform(X)
pd.DataFrame(X, columns=dataset.columns).head()

Unnamed: 0,TV,radio,newspaper,sales
0,0.775786,0.762097,0.605981,0.807087
1,0.148123,0.792339,0.394019,0.346457
2,0.0558,0.925403,0.60686,0.30315
3,0.509976,0.832661,0.511873,0.665354
4,0.609063,0.217742,0.510994,0.444882


## 2.Standardization

![1_YRWWEdutwJ8i-n7VmgxrKg.png](attachment:1_YRWWEdutwJ8i-n7VmgxrKg.png)

    * Standardization is based out of Standard Deviation.

    * the values are standardized between -3 to 3. (for 99.7% of size)
    
Algorithms SENSITIVE to Feature Scaling

    1-Linear and Logistic Regression

    2-Neural Networks
    
    3-Support Vector Machine
    
    4-K-Mean Clustering
    
    5-K-Nearest Neighbors
    
    6-Principle Component Analysis
    
Algorithms INSENSITIVE to Feature Scaling

    1-Classification and Regression Trees
    
    2-Random Forest Regression

In [None]:
# import pandas as pd
# import numpy as np

# dataset = pd.read_csv("Advertising.csv")
# dataset.head()
# X = dataset.iloc[:, :].values

In [5]:
from sklearn.preprocessing import StandardScaler
norm = StandardScaler()
X= norm.fit_transform(X)
pd.DataFrame(X, columns=dataset.columns).head()

Unnamed: 0,TV,radio,newspaper,sales
0,0.969852,0.981522,1.778945,1.552053
1,-1.197376,1.082808,0.669579,-0.696046
2,-1.516155,1.528463,1.783549,-0.907406
3,0.05205,1.217855,1.286405,0.86033
4,0.394182,-0.841614,1.281802,-0.215683
