### Author : Sanjoy Biswas
### Topic : Feature Scaling : Min-Max Scaler | Standardization
### Email : sanjoy.eee32@gmail.com

#### Feature Scaling
Feature Scaling is a technique to standardize the independent features present in the data in a fixed range. It is performed during the data pre-processing to handle highly varying magnitudes or values or units. If feature scaling is not done, then a machine learning algorithm tends to weigh greater values, higher and consider smaller values as the lower values, regardless of the unit of the values.

Example: If an algorithm is not using feature scaling method then it can consider the value 3000 meter to be greater than 5 km but that’s actually not true and in this case, the algorithm will give wrong predictions. So, we use Feature Scaling to bring all values to same magnitudes and thus, tackle this issue.

#### Example of Algorithm where Feature Scaling Matters :

1. K-Means 
2. K-Nearest Neighbors
3. Principle Component Analysis
4. Gradient Desent
Mainly <b>Distance Base algorithms</b> are affected by Feature Scaling

Note: Naive Bayes, Linear Discriminant Analysis,Tree based algorithm are not affected by feature scaling

#### Techniques to perform Feature Scaling
Consider the two most important ones:

<b>Min-Max Normalization:</b> This technique re-scales a feature or observation value with distribution value between 0 and 1.
![min-max-normalisation.jpg](attachment:min-max-normalisation.jpg)

<b>Standardization:</b> It is a very effective technique which re-scales a feature value so that it has distribution with 0 mean value and variance equals to 1.
![standardisation.jpg](attachment:standardisation.jpg)

#### The Big Question – Normalize or Standardize?
Normalization vs. standardization is an eternal question among machine learning newcomers. Let me elaborate on the answer in this section.

Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbors and Neural Networks.

Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does not have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.

### Import Libraries

In [1]:
import numpy as np
import pandas as pd

### Import Dataset

In [3]:
data_set = pd.read_csv(r'F:\Feature Engineering\Feature Scaling\feature_scaling.csv')
data_set

Unnamed: 0,Country,Age,Salary,Purchased
0,France,44.0,72000.0,No
1,Spain,27.0,48000.0,Yes
2,Germany,30.0,54000.0,No
3,Spain,38.0,61000.0,No
4,Germany,40.0,,Yes
5,France,35.0,58000.0,Yes
6,Spain,,52000.0,No
7,France,48.0,79000.0,Yes
8,Germany,50.0,83000.0,No
9,France,37.0,67000.0,Yes


In [4]:
x = data_set.iloc[:,1:3]

In [5]:
x

Unnamed: 0,Age,Salary
0,44.0,72000.0
1,27.0,48000.0
2,30.0,54000.0
3,38.0,61000.0
4,40.0,
5,35.0,58000.0
6,,52000.0
7,48.0,79000.0
8,50.0,83000.0
9,37.0,67000.0


### MIN MAX SCALER

In [6]:
from sklearn.preprocessing import MinMaxScaler

In [13]:
mns = MinMaxScaler(feature_range = (0,1))

In [14]:
x_after_min_max_scaler = mns.fit_transform(x)

In [15]:
x_after_min_max_scaler

array([[0.73913043, 0.68571429],
       [0.        , 0.        ],
       [0.13043478, 0.17142857],
       [0.47826087, 0.37142857],
       [0.56521739,        nan],
       [0.34782609, 0.28571429],
       [       nan, 0.11428571],
       [0.91304348, 0.88571429],
       [1.        , 1.        ],
       [0.43478261, 0.54285714]])

### Standardisation

In [16]:
from sklearn.preprocessing import StandardScaler

In [19]:
Standardisation = StandardScaler()

In [21]:
x_after_standardisation = Standardisation.fit_transform(x)

In [22]:
x_after_standardisation

array([[ 0.71993143,  0.71101276],
       [-1.62367514, -1.36437583],
       [-1.21009751, -0.84552869],
       [-0.10722383, -0.24020701],
       [ 0.16849459,         nan],
       [-0.52080146, -0.49963059],
       [        nan, -1.01847774],
       [ 1.27136827,  1.31633443],
       [ 1.54708669,  1.66223253],
       [-0.24508304,  0.27864014]])