# **Feature Transformation and Scaling Techniques.**

Feature Scaling is a method used to normalize the range of independent variables or features of data. In data processing, it is also known as data normalization and is performed during the data preprocessing step.

[Importance of Feature Scaling](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html)

[sklearn.preprocessing: Preprocessing and Normalization](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing)

[Feature Scaling Techniques](https://www.analyticsvidhya.com/blog/2020/07/types-of-feature-transformation-and-scaling/)

In [35]:
import pandas as pd

data = pd.read_csv("CE802_Ass_2019_Data.csv") # Load Dataset.
data = data.iloc[:, :-2]
data

Unnamed: 0,F1,F2,F3,F4,F5,F6,F7,F8,F9,F10,F11,F12,F13,F14,F15,F16,F17,F18,F19
0,0,0,16,2.02,0.52,-2.35,-1.98,-0.70,85,6,-2.07,-0.07,1.08,15,-0.63,-3.49,-1.68,0.02,15.3
1,0,0,86,-0.90,2.75,0.14,0.83,-0.06,107,1,-0.86,0.17,1.06,-8,-1.21,0.34,0.36,0.61,10.1
2,1,1,165,0.73,1.05,0.10,2.57,-1.65,41,5,0.08,0.04,0.42,-6,-0.46,-0.62,1.67,2.60,11.0
3,1,1,191,-1.50,0.79,0.33,1.24,1.35,17,2,-0.85,1.74,1.74,15,0.47,0.63,0.08,0.19,6.3
4,1,1,13,0.25,-1.19,-0.90,2.67,0.22,12,8,0.68,-0.39,1.25,25,-0.09,-2.41,-0.53,-0.77,10.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
495,1,1,138,1.36,0.40,1.05,4.00,0.87,36,16,0.27,-0.93,0.70,23,0.23,-1.24,-0.65,1.26,20.9
496,0,0,102,2.06,0.09,0.84,3.63,1.22,57,-2,-0.32,-0.38,0.56,-6,-0.08,-1.29,-0.03,0.47,19.6
497,1,0,211,0.18,1.71,0.30,1.22,1.16,47,-49,-0.21,-0.77,0.78,7,-0.27,-0.45,-0.89,0.00,5.9
498,1,0,94,-0.86,1.06,0.66,1.25,1.25,23,5,-0.10,0.68,0.11,-8,0.42,1.58,-0.79,-1.20,6.4


In [36]:
data.info() # Data Summary.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 19 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   F1      500 non-null    int64  
 1   F2      500 non-null    int64  
 2   F3      500 non-null    int64  
 3   F4      500 non-null    float64
 4   F5      500 non-null    float64
 5   F6      500 non-null    float64
 6   F7      500 non-null    float64
 7   F8      500 non-null    float64
 8   F9      500 non-null    int64  
 9   F10     500 non-null    int64  
 10  F11     500 non-null    float64
 11  F12     500 non-null    float64
 12  F13     500 non-null    float64
 13  F14     500 non-null    int64  
 14  F15     500 non-null    float64
 15  F16     500 non-null    float64
 16  F17     500 non-null    float64
 17  F18     500 non-null    float64
 18  F19     500 non-null    float64
dtypes: float64(13), int64(6)
memory usage: 74.3 KB


# **Normalization (MinMax Scaler)**

[sklearn.preprocessing.MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler)

Transform features by scaling each feature to a given range. MinMaxScaler estimator scales and translates each feature individually such that it is in the given range on the training set, i.e., between 0 and 1.

> # **$X_{Scaled} = \frac{X - X_{min}}{X_{max} - X_{min}}$**









In [37]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(data)
data_scaled

array([[0.        , 0.        , 0.03880597, ..., 0.24295775, 0.46998536,
        0.4077135 ],
       [0.        , 0.        , 0.24776119, ..., 0.60211268, 0.55636896,
        0.26446281],
       [1.        , 1.        , 0.48358209, ..., 0.83274648, 0.8477306 ,
        0.2892562 ],
       ...,
       [1.        , 0.        , 0.62089552, ..., 0.38204225, 0.4670571 ,
        0.14876033],
       [1.        , 0.        , 0.27164179, ..., 0.39964789, 0.29136164,
        0.16253444],
       [1.        , 1.        , 0.30149254, ..., 0.55809859, 0.46559297,
        0.49035813]])

# **Standardization (Standard Scaler)**

[sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler)

Standardize features by removing the mean (i.e., mean = 0) and scaling to unit variance. The standard score of is calculated as:

> # **$Z = \frac{X - \mu}{\sigma}$**

where $\mu$ is the mean of the training samples and $\sigma$ is the standard deviation of the training samples.

Centering and Scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Both mean and standard deviation are then stored to be used on later data using transform.



In [38]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
data_scaled

array([[-1.41846685, -1.01207287, -1.09898488, ..., -1.64115528,
         0.1034851 ,  1.03416374],
       [-1.41846685, -1.01207287, -0.10254763, ...,  0.44259201,
         0.69442788,  0.13875406],
       [ 0.70498652,  0.98807114,  1.02200299, ...,  1.78068464,
         2.68760777,  0.29372881],
       ...,
       [ 0.70498652, -1.01207287,  1.67680461, ..., -0.83421393,
         0.08345314, -0.58446145],
       [ 0.70498652, -1.01207287,  0.01133092, ..., -0.73206945,
        -1.11846438, -0.49836437],
       [ 0.70498652,  0.98807114,  0.15367909, ...,  0.18723082,
         0.07343716,  1.55074625]])

# **MaxAbsScaler**

[sklearn.preprocessing.MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler)

Scale each feature by its maximum absolute value. That is, the MaxAbs scaler takes the absolute maximum value of each column and divides each value in the column by the maximum value. This operation scales the data between the range [-1, +1].

In [39]:
from sklearn.preprocessing import MaxAbsScaler
scaler = MaxAbsScaler()
data_scaled = scaler.fit_transform(data)
data_scaled

array([[ 0.        ,  0.        ,  0.04733728, ..., -0.54901961,
         0.00549451,  0.41576087],
       [ 0.        ,  0.        ,  0.25443787, ...,  0.11764706,
         0.16758242,  0.27445652],
       [ 1.        ,  1.        ,  0.48816568, ...,  0.54575163,
         0.71428571,  0.29891304],
       ...,
       [ 1.        ,  0.        ,  0.62426036, ..., -0.29084967,
         0.        ,  0.16032609],
       [ 1.        ,  0.        ,  0.27810651, ..., -0.25816993,
        -0.32967033,  0.17391304],
       [ 1.        ,  1.        ,  0.30769231, ...,  0.03594771,
        -0.00274725,  0.49728261]])

# **Robust Scaler**

[sklearn.preprocessing.RobustScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler)

**Scale features using statistics that are robust to outliers.**

In the previous feature scaling techniques, each of the methods was using values like the mean, maximum and minimum values of the features. All these above feature scaling techniques are sensitive to outliers. If there are too many outliers in the data, then these outliers will influence the mean and the maximum value or the minimum value. Thus, even if we scale the data using the above methods, we cannot guarantee a balanced data with a normal distribution.

> # **$X_{Scaled} = \frac{X - Q1}{Q3 - Q1}$**

The Inter-Quartile Range $IQR$ is the difference between the first and third quartile of the variable. The Inter-Quartile Range can be defined as $IQR = Q3 - Q1$

This Scaler removes the median and scales the data according to the quantile range (defaults to $IQR$: Inter-Quartile Range). The $IQR$ is the range between the $1^{st}$ quartile ($25^{th}$ quantile) and the $3^{rd}$ quartile ($75^{th}$ quantile).

**The Robust Scaler is not sensitive to outliers.**

1.   Robust Scaler removes the median from the data.
2.   Robust Scaler scales the data by the Inter-Quartile Range ($IQR$).

In [40]:
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
data_scaled = scaler.fit_transform(data)
data_scaled

array([[-1.        , -1.        , -0.64122137, ..., -1.20930233,
         0.08921933,  1.05970149],
       [-1.        , -1.        ,  0.07124682, ...,  0.37209302,
         0.52788104,  0.28358209],
       [ 0.        ,  0.        ,  0.87531807, ...,  1.3875969 ,
         2.00743494,  0.41791045],
       ...,
       [ 0.        , -1.        ,  1.34351145, ..., -0.59689922,
         0.07434944, -0.34328358],
       [ 0.        , -1.        ,  0.15267176, ..., -0.51937984,
        -0.81784387, -0.26865672],
       [ 0.        ,  0.        ,  0.25445293, ...,  0.17829457,
         0.0669145 ,  1.50746269]])

# **Quantile Transformer Scaler**

[sklearn.preprocessing.QuantileTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer)

**Transform features using quantiles information.**

The Quantile Transformer Scaler converts the variable distribution to a normal distribution and scales it accordingly. Since it makes the variable normally distributed, it also deals with the outliers.

This method transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers, i.e., this is a robust preprocessing scheme. The transformation is applied to each feature independently.

A few points regarding the Quantile Transformer Scaler:

1.   It computes the cumulative distribution function of the variable.
2.   It uses the cumulative distribution function to map the values to a normal distribution.
3.   Maps the obtained values to the desired output distribution using the associated quantile function.

In [41]:
from sklearn.preprocessing import QuantileTransformer
scaler = QuantileTransformer()
data_scaled = scaler.fit_transform(data)
data_scaled

  % (self.n_quantiles, n_samples))


array([[0.        , 0.        , 0.09418838, ..., 0.04408818, 0.55711423,
        0.87775551],
       [0.        , 0.        , 0.53507014, ..., 0.66432866, 0.75350701,
        0.63827655],
       [1.        , 1.        , 0.84168337, ..., 0.95591182, 0.99398798,
        0.69639279],
       ...,
       [1.        , 0.        , 0.9258517 , ..., 0.18937876, 0.55310621,
        0.30961924],
       [1.        , 0.        , 0.58116232, ..., 0.22344689, 0.13827655,
        0.34669339],
       [1.        , 1.        , 0.64729459, ..., 0.58917836, 0.54609218,
        0.91583166]])

# **Power Transformer Scaler**

[sklearn.preprocessing.PowerTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html#sklearn.preprocessing.PowerTransformer)

Power transforms are a family of parametric, monotonic transformations that are applied to make data more Gaussian-like. This is useful for modeling issues related to heteroscedasticity (non-constant variance) or other situations where normality is desired.

At present, PowerTransformer supports the Box-Cox transform and the Yeo-Johnson transform. The optimal parameter for stabilizing variance and minimizing skewness is estimated through maximum likelihood.

In [42]:
from sklearn.preprocessing import PowerTransformer
scaler = PowerTransformer(method = 'yeo-johnson')
'''
parameters: method = 'box-cox' or 'yeo-johnson'
'''
data_scaled = scaler.fit_transform(data)
data_scaled

array([[-1.41846685, -1.01207287, -1.39729118, ..., -1.67602682,
         0.13492508,  1.11371656],
       [-1.41846685, -1.01207287,  0.16056943, ...,  0.45863061,
         0.71044795,  0.38365554],
       [ 0.70498652,  0.98807114,  1.05221182, ...,  1.74191575,
         2.53940344,  0.5296906 ],
       ...,
       [ 0.70498652, -1.01207287,  1.44309475, ..., -0.83108089,
         0.11496598, -0.48435394],
       [ 0.70498652, -1.01207287,  0.27099103, ..., -0.72544269,
        -1.13267158, -0.35902027],
       [ 0.70498652,  0.98807114,  0.40061656, ...,  0.20687762,
         0.10497309,  1.44306524]])