# **Feature Transformation and Scaling Techniques.**

Feature Scaling is a method used to normalize the range of independent variables or features of data. In data processing, it is also known as data normalization and is performed during the data preprocessing step.

> [**Importance of Feature Scaling**](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html)

> [**sklearn.preprocessing: Preprocessing and Normalization**](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing)

> [**Feature Scaling Techniques**](https://www.analyticsvidhya.com/blog/2020/07/types-of-feature-transformation-and-scaling/)

In [1]:
# Import Library.
import pandas as pd
import warnings

warnings.filterwarnings("ignore")

# Load Dataset.
data = pd.read_csv(
    "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data",
    header=None,
)

data.columns = [
    "Class Label",
    "Alcohol",
    "Malic Acid",
    "Ash",
    "Alkalinity of Ash",
    "Magnesium",
    "Total Phenols",
    "Flavanoids",
    "Nonflavanoid Phenols",
    "Proanthocyanidins",
    "Color Intensity",
    "Hue",
    "OD280/OD315 of Diluted Wines",
    "Proline",
]

data = data.iloc[:, 1:]
data.head()

Unnamed: 0,Alcohol,Malic Acid,Ash,Alkalinity of Ash,Magnesium,Total Phenols,Flavanoids,Nonflavanoid Phenols,Proanthocyanidins,Color Intensity,Hue,OD280/OD315 of Diluted Wines,Proline
0,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065
1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050
2,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185
3,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480
4,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735


In [2]:
# Data Summary.
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 13 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Alcohol                       178 non-null    float64
 1   Malic Acid                    178 non-null    float64
 2   Ash                           178 non-null    float64
 3   Alkalinity of Ash             178 non-null    float64
 4   Magnesium                     178 non-null    int64  
 5   Total Phenols                 178 non-null    float64
 6   Flavanoids                    178 non-null    float64
 7   Nonflavanoid Phenols          178 non-null    float64
 8   Proanthocyanidins             178 non-null    float64
 9   Color Intensity               178 non-null    float64
 10  Hue                           178 non-null    float64
 11  OD280/OD315 of Diluted Wines  178 non-null    float64
 12  Proline                       178 non-null    int64  
dtypes: fl

In [3]:
# Split the dataset into training and test set.
from sklearn.model_selection import train_test_split

X_train, X_test = train_test_split(data, test_size=0.2, random_state=42)

# **Normalization ($MinMax$ $Scaler$)**

> [**sklearn.preprocessing.MinMaxScaler**](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler)

Transform features by scaling each feature to a given range. $MinMaxScaler$ estimator scales and translates each feature individually such that it is in the given range on the training set, i.e., between 0 and 1.

> # **$X_{Scaled} = \frac{X - X_{min}}{X_{max} - X_{min}}$**

In [4]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X = scaler.fit_transform(X_train)
print(X)

[[0.87105263 0.16089613 0.71657754 ... 0.07317073 0.25274725 0.30102443]
 [0.39473684 0.94093686 0.68449198 ... 0.27642276 0.15384615 0.18676123]
 [0.35263158 0.03665988 0.39572193 ... 0.45528455 0.54945055 0.30102443]
 ...
 [0.88157895 0.19959267 0.54545455 ... 0.58536585 0.63369963 1.        ]
 [0.43684211 0.13034623 0.48128342 ... 0.3902439  0.28937729 0.17100079]
 [0.34473684 0.31771894 0.58823529 ... 0.2601626  0.77289377 0.12608353]]


In [None]:
y = scaler.transform(X_test)
print(y)

# **Standardization ($Standard$ $Scaler$)**

> [**sklearn.preprocessing.StandardScaler**](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler)

Standardize features by removing the mean (i.e., $mean = 0$) and scaling to unit variance. The standard score is calculated as:

> # **$Z = \frac{X - \mu}{\sigma}$**

where $\mu$ is the mean of the training samples and $\sigma$ is the standard deviation of the training samples.

Centering and Scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Both the mean and standard deviation are then stored to be used on later data using transform.

In [6]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = scaler.fit_transform(X_train)
print(X)

[[ 1.66529275 -0.60840587  1.21896194 ... -1.65632857 -0.87940904
  -0.24860607]
 [-0.54952506  2.7515415   1.00331502 ... -0.58463272 -1.25462095
  -0.72992237]
 [-0.74531007 -1.14354109 -0.93750727 ...  0.35845962  0.2462267
  -0.24860607]
 ...
 [ 1.714239   -0.44172441  0.06884503 ...  1.04434496  0.56585166
   2.69572196]
 [-0.35374006 -0.7399965  -0.36244882 ...  0.01551695 -0.74044166
  -0.79631083]
 [-0.78201975  0.06709269  0.35637426 ... -0.67036839  1.09392769
  -0.98551793]]


In [None]:
y = scaler.transform(X_test)
print(y)

# **$MaxAbsScaler$**

> [**sklearn.preprocessing.MaxAbsScaler**](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler)

Scale each feature by its maximum absolute value. That is, the $MaxAbs$ scaler takes the absolute maximum value of each column and divides each value in the column by the maximum value. This operation scales the data between the range $[-1, +1]$.

In [8]:
from sklearn.preprocessing import MaxAbsScaler

scaler = MaxAbsScaler()
X = scaler.fit_transform(X_train)
print(X)

[[0.96695887 0.28965517 0.83591331 ... 0.33333333 0.49       0.42663219]
 [0.84490897 0.95       0.81733746 ... 0.47953216 0.4225     0.33290239]
 [0.83412003 0.18448276 0.6501548  ... 0.60818713 0.6925     0.42663219]
 ...
 [0.9696561  0.32241379 0.73684211 ... 0.70175439 0.75       1.        ]
 [0.85569791 0.2637931  0.6996904  ... 0.56140351 0.515      0.31997414]
 [0.8320971  0.42241379 0.76160991 ... 0.46783626 0.845      0.28312864]]


In [None]:
y = scaler.transform(X_test)
print(y)

# **Robust Scaler**

> [**sklearn.preprocessing.RobustScaler**](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler)

**Scale features using statistics that are robust to outliers.**

In the previous feature scaling techniques, each method uses values like the mean, maximum and minimum values of the features. All these above feature scaling techniques are sensitive to outliers. If there are too many outliers in the data, then these outliers will influence the mean, the maximum value, or the minimum value. Thus, even if we scale the data using the above methods, we cannot guarantee a balanced dataset with a normal distribution.

> # **$X_{Scaled} = \frac{X - Q1}{Q3 - Q1}$**

The Inter-Quartile Range $IQR$ is the difference between the first and third quartile of the variable. The Inter-Quartile Range can be defined as $IQR = Q3 - Q1$

This Scaler removes the median and scales the data according to the quantile range (defaults to $IQR$: Inter-Quartile Range). The $IQR$ is the range between the $1^{st}$ quartile ($25^{th}$ quantile) and the $3^{rd}$ quartile ($75^{th}$ quantile).

**The Robust Scaler is not sensitive to outliers.**

1.   Robust Scaler removes the median from the data.
2.   Robust Scaler scales the data by the Inter-Quartile Range ($IQR$).

In [10]:
from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()
X = scaler.fit_transform(X_train)
print(X)

[[ 0.98884758 -0.12828947  1.03030303 ... -1.17037037 -0.61163227
   0.        ]
 [-0.35687732  2.39144737  0.84848485 ... -0.42962963 -0.81425891
  -0.33701336]
 [-0.47583643 -0.52960526 -0.78787879 ...  0.22222222 -0.00375235
   0.        ]
 ...
 [ 1.01858736 -0.00328947  0.06060606 ...  0.6962963   0.16885553
   2.0615921 ]
 [-0.23791822 -0.22697368 -0.3030303  ... -0.01481481 -0.53658537
  -0.38349797]
 [-0.49814126  0.37828947  0.3030303  ... -0.48888889  0.45403377
  -0.51597908]]


In [None]:
y = scaler.transform(X_test)
print(y)

# **Quantile Transformer Scaler**

> [**sklearn.preprocessing.QuantileTransformer**](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer)

**Transform features using quantiles information.**

The Quantile Transformer Scaler converts the variable distribution to a normal distribution and scales it accordingly. Since it makes the variable normally distributed, it also deals with the outliers.

This method transforms the features to follow a uniform or a normal distribution. Therefore for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers, i.e., this is a robust preprocessing scheme. The transformation is applied to each feature independently.

**A few points regarding the Quantile Transformer Scaler:**

1.   It computes the cumulative distribution function of the variable.
2.   It uses the cumulative distribution function to map the values to a normal distribution.
3.   Maps the obtained values to the desired output distribution using the associated quantile function.

In [12]:
from sklearn.preprocessing import QuantileTransformer

scaler = QuantileTransformer()
X = scaler.fit_transform(X_train)
print(X)

[[0.95744681 0.32978723 0.91134752 ... 0.04609929 0.26950355 0.4964539 ]
 [0.34042553 0.9858156  0.84397163 ... 0.29432624 0.15602837 0.27304965]
 [0.27659574 0.04964539 0.13120567 ... 0.60992908 0.4893617  0.4964539 ]
 ...
 [0.9751773  0.4929078  0.55319149 ... 0.84397163 0.64539007 1.        ]
 [0.38297872 0.21276596 0.31914894 ... 0.4893617  0.30141844 0.23404255]
 [0.25531915 0.64539007 0.67730496 ... 0.27304965 0.86879433 0.14893617]]


In [None]:
y = scaler.transform(X_test)
print(y)

# **Power Transformer Scaler**

> [**sklearn.preprocessing.PowerTransformer**](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html#sklearn.preprocessing.PowerTransformer)

Power transforms are a family of parametric, monotonic transformations applied to make data more Gaussian-like. This is useful for modeling issues related to heteroscedasticity (non-constant variance) or other situations where normality is desired.

Currently, $PowerTransformer$ supports the **Box-Cox** transform and the **Yeo-Johnson** transform. The optimal parameter for stabilizing variance and minimizing skewness is estimated through maximum likelihood.

In [14]:
from sklearn.preprocessing import PowerTransformer

scaler = PowerTransformer(method="yeo-johnson")
"""
parameters: method = "box-cox" or "yeo-johnson"
"""
X = scaler.fit_transform(X_train)
print(X)

[[ 1.64748053 -0.51062513  1.22815618 ... -1.69276625 -0.90739595
  -0.04293367]
 [-0.54211128  1.88663311  1.00528748 ... -0.5712274  -1.24094609
  -0.67033838]
 [-0.74051811 -1.66387873 -0.94115643 ...  0.37491807  0.19605565
  -0.04293367]
 ...
 [ 1.69479997 -0.24810801  0.05450034 ...  1.04231668  0.53592188
   1.98256983]
 [-0.34453499 -0.74457239 -0.37588826 ...  0.03488343 -0.77936539
  -0.77214116]
 [-0.77781294  0.38047054  0.34441975 ... -0.65902822  1.12177325
  -1.08943917]]


In [None]:
y = scaler.transform(X_test)
print(y)