# **Normalization Vs Standardization**

## **Normalization**(`MinMaxScaler`)

The MinMaxScaler is a data normalization technique whicht scale down and transforms the features of a dataset to fit within a specified range (usually `[0, 1]`).

- Normalization helps to bring all the features of a dataset to the same scale or range. This is important when working with machine learning algorithms that are sensitive to the scale of the input features. For example, `Linear Regression` models and `Neural Networks` often require feature scaling to improve model performance.

$$X_{norm} = \frac{X-X_{min}}{X_{max}-X_{min}}$$

The MinMaxScaler works by subtracting the minimum value of each feature and then dividing by the range (i.e., the difference between the maximum and minimum values) to transform each feature into the range `[0, 1]`. This is done independently for each feature, so each feature will have the same scale.

Here's an example usage of MinMaxScaler:

In [1]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Create a sample dataset
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Fit the scaler to the data and transform the data
X_scaled = scaler.fit_transform(X)

# Print the scaled data
print(X_scaled)

[[0.  0.  0. ]
 [0.5 0.5 0.5]
 [1.  1.  1. ]]


**Wine datset**

In [29]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

wine_url = 'https://gist.githubusercontent.com/tijptjik/9408623/raw/b237fa5848349a14a14e5d4107dc7897c21951f5/wine.csv'
df_wine = pd.read_csv(wine_url)

display(df_wine)

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Fit the scaler to the data and transform the data
x_scaler = scaler.fit_transform(df_wine[['Wine', 'Alcohol', 'Malic.acid']])

df_wine_scaled = pd.DataFrame(x_scaler, columns=df.columns)
display(df_wine_scaled)


Unnamed: 0,Wine,Alcohol,Malic.acid,Ash,Acl,Mg,Phenols,Flavanoids,Nonflavanoid.phenols,Proanth,Color.int,Hue,OD,Proline
0,1,14.23,1.71,2.43,15.6,127,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.20,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050
2,1,13.16,2.36,2.67,18.6,101,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.50,16.8,113,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,3,13.71,5.65,2.45,20.5,95,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740
174,3,13.40,3.91,2.48,23.0,102,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750
175,3,13.27,4.28,2.26,20.0,120,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835
176,3,13.17,2.59,2.37,20.0,120,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840


Unnamed: 0,Wine,Alcohol,Malic.acid
0,0.0,0.842105,0.191700
1,0.0,0.571053,0.205534
2,0.0,0.560526,0.320158
3,0.0,0.878947,0.239130
4,0.0,0.581579,0.365613
...,...,...,...
173,1.0,0.705263,0.970356
174,1.0,0.623684,0.626482
175,1.0,0.589474,0.699605
176,1.0,0.563158,0.365613


In [4]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

wine_url = 'https://gist.githubusercontent.com/tijptjik/9408623/raw/b237fa5848349a14a14e5d4107dc7897c21951f5/wine.csv'
df_wine = pd.read_csv(wine_url)

print("Original Dataset \n")
display(df_wine)

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Fit the scaler to the data and transform the data
x_scaler = scaler.fit_transform(df_wine[['Alcohol', 'Malic.acid']])

df_wine_scaled = pd.DataFrame(x_scaler, columns=['Alcohol', 'Malic.acid'])
df_wine_scaled = pd.concat([df_wine_scaled, df_wine.drop(columns=['Alcohol', 'Malic.acid'])], axis=1)

print("Transformed Dataset \n")
display(df_wine_scaled)


Original Dataset 



Unnamed: 0,Wine,Alcohol,Malic.acid,Ash,Acl,Mg,Phenols,Flavanoids,Nonflavanoid.phenols,Proanth,Color.int,Hue,OD,Proline
0,1,14.23,1.71,2.43,15.6,127,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.20,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050
2,1,13.16,2.36,2.67,18.6,101,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.50,16.8,113,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,3,13.71,5.65,2.45,20.5,95,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740
174,3,13.40,3.91,2.48,23.0,102,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750
175,3,13.27,4.28,2.26,20.0,120,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835
176,3,13.17,2.59,2.37,20.0,120,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840


Transformed Dataset 



Unnamed: 0,Alcohol,Malic.acid,Wine,Ash,Acl,Mg,Phenols,Flavanoids,Nonflavanoid.phenols,Proanth,Color.int,Hue,OD,Proline
0,0.842105,0.191700,1,2.43,15.6,127,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,0.571053,0.205534,1,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050
2,0.560526,0.320158,1,2.67,18.6,101,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185
3,0.878947,0.239130,1,2.50,16.8,113,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480
4,0.581579,0.365613,1,2.87,21.0,118,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,0.705263,0.970356,3,2.45,20.5,95,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740
174,0.623684,0.626482,3,2.48,23.0,102,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750
175,0.589474,0.699605,3,2.26,20.0,120,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835
176,0.563158,0.365613,3,2.37,20.0,120,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840


- `MinMaxScaler`
- `StandardScaler`
- `OneHotEncoder`
- `LabelEncoder`
- `LabelBinarizer`
- `Binarizer`
- `FunctionTransformer`
- `KBinsDiscretizer`
- `KernelCenterer`
- `MultiLabelBinarizer`
- `MaxAbsScaler`
- `QuantileTransformer`
- `Normalizer`
- `OrdinalEncoder`
- `PowerTransformer`
- `RobustScaler`
- `SplineTransformer`
- `add_dummy_feature`
- `PolynomialFeatures`
- `binarize`
- `normalize`
- `scale`
- `robust_scale`
- `maxabs_scale`
- `minmax_scale`
- `label_binarize`
- `quantile_transform`
- `power_transform`

## **Standardization**(`StandardScaler`) Z-Score Normalisation

In Standardization all the features are transformed in such a way such that it will have all the properties of standard normal distribution with 
- `Mean`: $\mu = 0$ &
-  `Standard deviation`: $\sigma = 1$

$$z = \frac{x-\mu}{\sigma}$$



In [6]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

wine_url = 'https://gist.githubusercontent.com/tijptjik/9408623/raw/b237fa5848349a14a14e5d4107dc7897c21951f5/wine.csv'
df_wine = pd.read_csv(wine_url)

print("Original Dataset \n")
display(df_wine)


# Create a MinMaxScaler object
scaler = StandardScaler()

# Fit the scaler to the data and transform the data
x_scaler = scaler.fit_transform(df_wine[['Alcohol', 'Malic.acid']])

df_wine_scaled = pd.DataFrame(x_scaler, columns=['Alcohol', 'Malic.acid'])
df_wine_scaled = pd.concat([df_wine_scaled, df_wine.drop(columns=['Alcohol', 'Malic.acid'])], axis=1)

print("Transformed Dataset \n")
display(df_wine_scaled)


Original Dataset 



Unnamed: 0,Wine,Alcohol,Malic.acid,Ash,Acl,Mg,Phenols,Flavanoids,Nonflavanoid.phenols,Proanth,Color.int,Hue,OD,Proline
0,1,14.23,1.71,2.43,15.6,127,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,1,13.20,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050
2,1,13.16,2.36,2.67,18.6,101,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185
3,1,14.37,1.95,2.50,16.8,113,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480
4,1,13.24,2.59,2.87,21.0,118,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,3,13.71,5.65,2.45,20.5,95,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740
174,3,13.40,3.91,2.48,23.0,102,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750
175,3,13.27,4.28,2.26,20.0,120,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835
176,3,13.17,2.59,2.37,20.0,120,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840


Transformed Dataset 



Unnamed: 0,Alcohol,Malic.acid,Wine,Ash,Acl,Mg,Phenols,Flavanoids,Nonflavanoid.phenols,Proanth,Color.int,Hue,OD,Proline
0,1.518613,-0.562250,1,2.43,15.6,127,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,0.246290,-0.499413,1,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050
2,0.196879,0.021231,1,2.67,18.6,101,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185
3,1.691550,-0.346811,1,2.50,16.8,113,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480
4,0.295700,0.227694,1,2.87,21.0,118,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,0.876275,2.974543,3,2.45,20.5,95,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740
174,0.493343,1.412609,3,2.48,23.0,102,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750
175,0.332758,1.744744,3,2.26,20.0,120,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835
176,0.209232,0.227694,3,2.37,20.0,120,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840


### when to use Standardization(StandardScalar) and when to use Normalization(MinMaxScaler)

- Use Standardization (`StandardScaler`) when the data is normally distributed or approximately so, or when you want to compare the relative importance of different features.
  
- Use Normalization (`MinMaxScaler`) when the data is not normally distributed or if it contains outliers, or when the algorithm requires features to be on a similar scale.

- When using machine learning algorithms that involve Euclidean distance or deep learning techniques that use gradient - descent, it is necessary to scale down the values in order to quickly retrieve the global minimum point.
  
- Normalization and standardization are two techniques used for feature scaling.
  
- Feature scaling is also necessary for `linear` and `logistic regression`.
 
- Algorithms such as `KNN`, `K-means clustering`, `linear regression`, and `logistic regression`,  `deep learning`, and `artificial neural networks` require scaling.
  
- `Decision tree`, `random forests`, `XGBoost`, and other boosting techniques do not require scaling because they create a decision  tree based on features and the values size does not affect it significantly.
  
- For `deep learning` techniques such as `CNN's` & `ANN's`, normalization is often used.
  
- For images, normalization is done between `0` and `1`.

- In most scenarios in machine learning algorithms, standardization performs well.


- Libraries such as `TensorFlow` and `Keras` accept inputs between `0` and `1`.

- **Normalization helps neural networks learn weights quickly.**

- In most scenarios in machine learning algorithms, standardization performs well.