In [2]:
from sklearn.preprocessing import PowerTransformer
import numpy as np

# Sample data with skewness
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])

# Applying Power Transform
scaler = PowerTransformer(method='yeo-johnson')  # or method='box-cox' for Box-Cox
transformed_data = scaler.fit_transform(data)

print("Transformed Data for yeo-johnson:")
print(transformed_data)


# Applying Power Transform
scaler = PowerTransformer(method='box-cox')  # or method='box-cox' for Box-Cox
transformed_data = scaler.fit_transform(data)

print("Transformed Data for box-cox :")
print(transformed_data)

Transformed Data for yeo-johnson:
[[-1.42437537 -1.40883551 -1.39869432]
 [-0.34726726 -0.36844754 -0.38171452]
 [ 0.51085366  0.50130354  0.49459072]
 [ 1.26078896  1.2759795   1.28581812]]
Transformed Data for box-cox :
[[-1.4532513  -1.42437537 -1.40883552]
 [-0.30530445 -0.34726726 -0.36844752]
 [ 0.5267224   0.51085366  0.50130355]
 [ 1.23183334  1.26078896  1.27597949]]


# Explanation

* **Purpose:**

* To reduce skewness in data, making it closer to a normal distribution.
* To stabilize variance, which can help with algorithms that are sensitive to differences in variance.
* It can improve the performance of models like linear regression, logistic regression, and other models that assume normality in the data.

## Box-Cox Transformation:

* It’s defined for positive values of data.
* The transformation applies a power to the data, and the parameter (lambda) controls the amount of transformation applied.

### Formula


![alt text](images/box-cox-transform.png "Title")


## Yeo-Johnson Transformation:

* It’s an extension of Box-Cox, and it can handle both positive and negative values.

![alt text](images/yeo-johnson-transform.png "Title")

## Why Apply a Power Transform in ML?
* Improved Model Performance: Some algorithms, like linear regression and logistic regression, can perform poorly when the data is highly skewed. Transforming the data can help these algorithms by making the relationships between features and target variables more linear.
* Handling Outliers: The transformation can make the data less sensitive to outliers, particularly for highly skewed distributions.
* Improving Homoscedasticity: This refers to the constant variance of residuals in regression models, and power transforms can help achieve this.

# When to Avoid:
* When your data is already well-behaved (close to normal).
* For some models (like decision trees), the transformation may not significantly improve performance.

# Method of dealing with imbalance data