## Standardization and Normalization in Machine Learning

Standardization and normalization are data scaling techniques used to transform the values of numerical features in a dataset to a similar range. This is crucial for many machine learning algorithms that are sensitive to the scale of the input features, such as those based on distance calculations (e.g., K-Nearest Neighbors, Support Vector Machines with RBF kernel, and K-Means clustering).

### Standardization (Z-score normalization)

Standardization scales the data such that it has a mean of 0 and a standard deviation of 1. It is particularly useful when the data follows a Gaussian (normal) distribution, but it can also be applied to data that doesn't strictly follow a normal distribution.

The formula for standardization is:

## $x' = \frac{x - \mu}{\sigma}$

Where:
- $x'$ is the standardized value.
- $x$ is the original value.
- $\mu$ is the mean of the feature.
- $\sigma$ is the standard deviation of the feature.

In your notebook, you used `sklearn.preprocessing.StandardScaler` to perform standardization on your data.

### Normalization (Min-Max scaling)

Normalization scales the data to a fixed range, usually between 0 and 1. This technique is useful when you need to bound the values within a specific range and when the data does not follow a Gaussian distribution.

The formula for normalization is:

## $x' = \frac{x - x_{min}}{x_{max} - x_{min}}$

Where:
- $x'$ is the normalized value.
- $x$ is the original value.
- $x_{min}$ is the minimum value of the feature.
- $x_{max}$ is the maximum value of the feature.

While you used standardization in your notebook, normalization is another common scaling technique you might encounter or use depending on the requirements of your machine learning model. In this notebook, you can use `sklearn.preprocessing.MinMaxScaler` for Min-Max scaling.

In [1]:
#Import Libs
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

In [2]:
#Read the Dataframe
df = pd.read_csv('/content/credit_data.csv')
print(df)

      i#clientid        income        age         loan  c#default
0              1  66155.925095  59.017015  8106.532131          0
1              2  34415.153966  48.117153  6564.745018          0
2              3  57317.170063  63.108049  8020.953296          0
3              4  42709.534201  45.751972  6103.642260          0
4              5  66952.688845  18.584336  8770.099235          1
...          ...           ...        ...          ...        ...
1995        1996  59221.044874  48.518179  1926.729397          0
1996        1997  69516.127573  23.162104  3503.176156          0
1997        1998  44311.449262  28.017167  5522.786693          1
1998        1999  43756.056605  63.971796  1622.722598          0
1999        2000  69436.579552  56.152617  7378.833599          0

[2000 rows x 5 columns]


In [3]:
#Split the DataFrame (X) in Data and Classifier (Y)
X = df.iloc[:,1:4].values
Y = df.iloc[:, 4].values

In [4]:
#Example of Standardization with Scaler
scaler = StandardScaler()
X = scaler.fit_transform(X)
print(X)

[[ 1.45393393  1.33686061  1.20281942]
 [-0.76217555  0.53663921  0.69642695]
 [ 0.83682073  1.63720692  1.17471147]
 ...
 [-0.07122592 -0.93901609  0.35420081]
 [-0.11000289  1.7006195  -0.92675625]
 [ 1.682986    1.12656872  0.96381038]]


In [5]:
#Renew rhe Data
X = df.iloc[:,1:4].values

In [6]:
#Example of Normalization with Scaler
scaler = MinMaxScaler(feature_range=(0,1))
X = scaler.fit_transform(X)
print(X)

[[0.9231759  0.95743135 0.58883739]
 [0.28812165 0.86378597 0.47682695]
 [0.74633429 0.99257918 0.58262011]
 ...
 [0.48612202 0.69109837 0.40112895]
 [0.47500998 1.         0.1177903 ]
 [0.98881367 0.93282208 0.53597028]]
