### **Feature transformation (Mathematical Transformation)**

---


The basic idea behind all these transformations is to make the data closer to normal distribution.


#### **1. Function Transformer**


<img src="../assets/skewed_dist.png" />


##### **1. log transform**


It helps to normalize right skewed data.

<img src="../assets/log_trans.png" />


It does so by shifting the large values closer to the small values.
It does so because of the change from multiplicative scaling to aditive scaling in log.
Hence, large values are reduced significanty, while the small values are reduced minimally.


In [1]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

# Normal log transform
ft = FunctionTransformer(func=np.log)

# Log transform for null values :: log(1 + x)
ft = FunctionTransformer(func=np.log1p)

---


##### **2. reciprocal transform**


Reciprocal transformations are used for data that is very right skewed, where the variance is proportional to the mean cubed or to the fourth power. This data can include time periods and rates.


In [None]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

ft = FunctionTransformer(func=np.reciprocal)

---


##### **3. square transform (polynomial transform)**


Used for left skewed data, if square doesn't work, try higher powers like cube or biquad.


In [None]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

ft = FunctionTransformer(func=np.square)

---


##### **3. sqrt transform**


Used for reducing right skewness of distribution.
It is weaker than log and polynomial transform.


In [None]:
from sklearn.preprocessing import FunctionTransformer
import numpy as np

ft = FunctionTransformer(func=np.sqrt)

---


#### **2. Power Transformer**


##### **1. Box-Cox transform**


<img src="../assets/box_cox.png"/>


Here the value of lambda is unknown and is calculated in order to get the best approximation of normal distribution.

> Applicable only for x > 0


Lambda can range from [-5, 5] and can be calculated through -

- Maximum likelihood
- Bayesian Statistics


In [None]:
from sklearn.preprocessing import PowerTransformer

pt = PowerTransformer(method="box-cox")

---


##### **2. Yeo-Johnson transform**


<img src="../assets/yeo_johnson.png"/>


Applicable for all values of x, positive or negative


In [4]:
from sklearn.preprocessing import PowerTransformer

pt = PowerTransformer(method="yeo-johnson")

---
