# Feature Scaling 
## Absolute Maximum scaling

- Find absolute max value of the features
- Divide with the max

### Assume  X is the features

```X[scaled] = (X[i] - max(|X|)) / max(|X|)```

In [1]:
import numpy as np

In [3]:
import pandas as pd

In [4]:
df = pd.read_csv('HousePriceData.csv')

df.head()

Unnamed: 0,LotArea,MSSubClass
0,8450,60
1,9600,20
2,11250,60
3,9550,70
4,14260,60


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   LotArea     1460 non-null   int64
 1   MSSubClass  1460 non-null   int64
dtypes: int64(2)
memory usage: 22.9 KB


In [7]:
df.describe()

Unnamed: 0,LotArea,MSSubClass
count,1460.0,1460.0
mean,10516.828082,56.89726
std,9981.264932,42.300571
min,1300.0,20.0
25%,7553.5,20.0
50%,9478.5,50.0
75%,11601.5,70.0
max,215245.0,190.0


In [8]:
max_vals = np.max(np.abs(df))

  return reduction(axis=axis, out=out, **passkwargs)


In [9]:
max_vals

LotArea       215245
MSSubClass       190
dtype: int64

In [10]:
(df - max_vals) / max_vals

Unnamed: 0,LotArea,MSSubClass
0,-0.960742,-0.684211
1,-0.955400,-0.894737
2,-0.947734,-0.684211
3,-0.955632,-0.631579
4,-0.933750,-0.684211
...,...,...
1455,-0.963219,-0.684211
1456,-0.938791,-0.894737
1457,-0.957992,-0.631579
1458,-0.954856,-0.894737


## Min-Max Scaling
- Find Minimum and maximum values
- substract the minimum with the value then divide it with the difference of maximum and minimum

### The final equation -
```X[scaled] = (X[i] - X[min]) / (X[max] - X[min])```

- Prone to outliers
- Range [0, 1]

In [11]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

scaled_data = scaler.fit_transform(df)

scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,0.03342,0.235294
1,0.038795,0.0
2,0.046507,0.235294
3,0.038561,0.294118
4,0.060576,0.235294


## Normalization
Similiar to the Min-Max Scaling just we'll use `X[mean]` at the place of `X[min]`

```X[scaled] = (X[i] - X[mean]) / (X[max] - X[mean])```

In [14]:
from sklearn.preprocessing import Normalizer

scaler = Normalizer()

scaled_data = scaler.fit_transform(df)

scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,0.999975,0.0071
1,0.999998,0.002083
2,0.999986,0.005333
3,0.999973,0.00733
4,0.999991,0.004208


## Standardization
Based on the central tendencies and varience of the data

- calculate the `mean` and `standard deviation`
- substract mean to the data and divide with the standard deviation

```X[scaled] = (X[i] - X[mean]) / σ```

In [19]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

scaled_data = scaler.fit_transform(df)

scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,-0.207142,0.073375
1,-0.091886,-0.872563
2,0.07348,0.073375
3,-0.096897,0.309859
4,0.375148,0.073375


## Robust Scaling
- Find Median
- Find IQR(Inter-Quartile-Range)

#### Calculate - 
```X[scaled] = (x[i] - X[median]) / IQR```

In [20]:
from sklearn.preprocessing import RobustScaler

scaler = RobustScaler()

scaled_data = scaler.fit_transform(df)

scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

scaled_df.head()

Unnamed: 0,LotArea,MSSubClass
0,-0.254076,0.2
1,0.030015,-0.6
2,0.437624,0.2
3,0.017663,0.4
4,1.181201,0.2
