# TASK - 6: Feature Scaling: Normalization and Standardization

# Feature Scaling
There are two primary ways for feature scaling which we will cover in the remainder of this article:


Rescaling, or min-max normalization: we scale the data into one of two ranges: [0, 1] or[a, b], often [-1, 1].

Standardization, or Z-score normalization: we scale the data so that the mean is zero and variance is 1.

In [1]:
import numpy as np
import pandas as pd

___
##  Rescaling (min-max normalization)

Minmax scaler should be the first choice for scaling. For each feature, each value is subtracted by the minimum value of the respective feature and then divide by the range of original maximum and minimum of the same feature. It has a default range between [0,1].

In [2]:
ds = np.array([1.0, 12.4, 3.9, 10.4])

In [3]:
n_ds = (ds - min(ds)) / (max(ds) - min(ds))   # np.min() & np.max() can also be used
n_ds  # This yields an array where the lowest value is now 0.0 and the highest is 1.0

array([0.        , 1.        , 0.25438596, 0.8245614 ])

In [6]:
# Normalization In given range of x & y.
x = 0
y = 1.5 
ns_ds = x+((ds - min(ds))*(y - x)/(max(ds) - min(ds))) 
ns_ds

array([[0.        ],
       [1.5       ],
       [0.38157895],
       [1.23684211]])

In [5]:
from sklearn.preprocessing import MinMaxScaler as mms  # Normalization
ds = ds.reshape(-1, 1) # Column
scaler = mms(feature_range=(0, 1.5))
n_ds = scaler.fit_transform(ds)
n_ds

array([[0.        ],
       [1.5       ],
       [0.38157895],
       [1.23684211]])

##  Standardization (Z-scale normalization)

In Rescaling, we normalized our dataset based on the minimum and maximum values. 
Mean and standard deviation are however _not standard_, meaning that nither the mean is 0 nor the standard deviation is 1.

StandardScaler rescales each column to have 0 mean and 1 Standard Deviation. It standardizes a feature by subtracting the mean and dividing by the standard deviation. If the original distribution is not normally distributed, it may distort the relative space among the features.

In [58]:
print("Mean ---> ",np.mean(n_ds))
print("Std. ---> ",np.std(n_ds))

Mean --->  0.8571428571428571
Std. --->  0.6247448458762822


In [35]:
dtst = np.array([2.4, 6.2, 1.8, 9.0]).reshape(-1, 1)
scaler = mms(feature_range=(0, 1.5))
n_dtst = scaler.fit_transform(dtst)
print(n_dtst)
print(np.mean(n_dtst))
print(np.std(n_dtst))

[[0.125     ]
 [0.91666667]
 [0.        ]
 [1.5       ]]
0.6354166666666665
0.6105090942538584


In [37]:
ds = np.array([1.0, 2.0, 3.0, 3.0, 3.0, 2.0, 1.0])
st_ds = (ds - np.average(ds)) / (np.std(ds))
st_ds

array([-1.37198868, -0.17149859,  1.02899151,  1.02899151,  1.02899151,
       -0.17149859, -1.37198868])

In [47]:
from sklearn.preprocessing import StandardScaler as stsc
ds = np.array([1.0, 2.0, 3.0, 3.0, 3.0, 2.0, 1.0]).reshape(-1, 1)
scaler = stsc()
st_ds = scaler.fit_transform(ds)
st_ds

array([[-1.37198868],
       [-0.17149859],
       [ 1.02899151],
       [ 1.02899151],
       [ 1.02899151],
       [-0.17149859],
       [-1.37198868]])

In [53]:
#print("Mean ---> ",np.mean(st_ds))
print("Mean ---> ",np.format_float_positional(np.mean(st_ds), trim='-'))
print("Std. ---> ",np.std(st_ds))

Mean --->  0.00000000000000003172065784643304
Std. --->  1.0


We see that the mean is really close to 0 (as 3.17 * 10^{-17}) and that standard deviation is 1.