##**Scaling**
In most cases, the numerical features of the dataset do not have a certain range and they differ from each other. In real life, it is nonsense to expect age and income columns to have the same range. But from the machine learning point of view, how these two columns can be compared?

Scaling solves this problem. The continuous features become identical in terms of the range, after a scaling process. This process is not mandatory for many algorithms, but it might be still nice to apply. However, the algorithms based on distance calculations such as k-NN or k-Means need to have scaled continuous features as model input.

Basically, there are two common ways of scaling:


1.   Normalization
2.   Standardization



**Normalization**



![alt text](https://miro.medium.com/max/168/1*D3ORMiW9A7GoTezFYbL8LA.png)

Normalization (or min-max normalization) scale all values in a fixed range between 0 and 1. This transformation does not change the distribution of the feature and due to the decreased standard deviations, the effects of the outliers increases. Therefore, before normalization, it is recommended to handle the outliers.

Example:

```
   value  normalized
0      2        0.23
1     45        0.63
2    -23        0.00
3     85        1.00
4     28        0.47
5      2        0.23
6     35        0.54
7    -12        0.10
```


**Standardization**

Standardization (or z-score normalization) scales the values while taking into account standard deviation. If the standard deviation of features is different, their range also would differ from each other. This reduces the effect of the outliers in the features.

In the following formula of standardization, the **mean is shown as μ** and the **standard deviation is shown as σ**.

![alt text](https://miro.medium.com/max/82/1*BcNLM9loyAR3YQLt2hDqqg.png)


Example:

```
   value  standardized
0      2         -0.52
1     45          0.70
2    -23         -1.23
3     85          1.84
4     28          0.22
5      2         -0.52
6     35          0.42
7    -12         -0.92
```



In [0]:
import pandas as pd
import numpy as np

In [0]:
df=pd.read_csv('/content/heart.csv')

In [3]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


**Normalization**

In [0]:
# For scaling single column
df['normalized_age'] = (df['age'] - df['age'].min()) / (df['age'].max() - df['age'].min())

In [5]:
df.normalized_age.head()

0    0.708333
1    0.166667
2    0.250000
3    0.562500
4    0.583333
Name: normalized_age, dtype: float64

In [0]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
x=df
scaler.fit(x)
x=scaler.transform(x)

In [10]:
print(x)

[[0.70833333 1.         1.         ... 0.33333333 1.         0.70833333]
 [0.16666667 1.         0.66666667 ... 0.66666667 1.         0.16666667]
 [0.25       0.         0.33333333 ... 0.66666667 1.         0.25      ]
 ...
 [0.8125     1.         0.         ... 1.         0.         0.8125    ]
 [0.58333333 1.         0.         ... 1.         0.         0.58333333]
 [0.58333333 0.         0.33333333 ... 0.66666667 0.         0.58333333]]


**Standardization**


In [0]:
data=pd.read_csv('/content/heart.csv')

In [12]:
data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [0]:
data['normalized_age'] = (data['age'] - data['age'].mean()) / (data['age'].std() )

In [14]:
data.normalized_age.head()

0    0.950624
1   -1.912150
2   -1.471723
3    0.179877
4    0.289984
Name: normalized_age, dtype: float64

In [0]:
from sklearn import preprocessing
x=data
x = preprocessing.scale(x)

In [17]:
# We can compare age column
s=pd.DataFrame(x)
s.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
0,0.952197,0.681005,1.973123,0.763956,-0.256334,2.394438,-1.005832,0.015443,-0.696631,1.087338,-2.274579,-0.714429,-2.148873,0.914529,0.952197
1,-1.915313,0.681005,1.002577,-0.092738,0.072199,-0.417635,0.898962,1.633471,-0.696631,2.122573,-2.274579,-0.714429,-0.512922,0.914529,-1.915313
2,-1.474158,-1.468418,0.032031,-0.092738,-0.816773,-0.417635,-1.005832,0.977514,-0.696631,0.310912,0.976352,-0.714429,-0.512922,0.914529,-1.474158
3,0.180175,0.681005,0.032031,-0.663867,-0.198357,-0.417635,0.898962,1.239897,-0.696631,-0.206705,0.976352,-0.714429,-0.512922,0.914529,0.180175
4,0.290464,-1.468418,-0.938515,-0.663867,2.08205,-0.417635,0.898962,0.583939,1.435481,-0.379244,0.976352,-0.714429,-0.512922,0.914529,0.290464
