# What is PowerTransformer in Machine Learning?
In Machine Learning the powertransformer is a type of data preprocessing technique used to transform the features of a data set to make them more Gaussian_like(more normally distributed) 

# Box_Cox: 
The Box-Cox transformation is appropriate for non-negative data that does not contain zero values.It assumes that the data follows a normal distribution and applies apower transformation to achieve normality.The Box-Cox transformation is a more rigid transformation compared to the Yeo-Johnson transformation,because it requires the data to be strictly positive and cannot handle zero values.

# Yeo-Johnson:
The Yeo-Johnson transformation is a more flexible transformation that can be applied to both positive and negative data,including zero values.

In [1]:
import numpy as np
import pandas as pd

In [2]:
from sklearn.preprocessing import PowerTransformer

In [14]:
#generate some random data with a skewed distribution
data=np.random.gamma(1,2,size=(100,1))
#Initiate powertransformer object
pt=PowerTransformer(method='yeo-johnson')
#Fit the powertransformer to the data and transform it
transformed_data=pt.fit_transform(data)
#Print the orginal and transformed data to compare
print("Orginal:\n",data[:5])
print("Transformed:\n",transformed_data[:5])

Orginal:
 [[0.71069899]
 [0.04283512]
 [2.1587481 ]
 [2.13571748]
 [1.45459053]]
Transformed:
 [[-0.61238526]
 [-1.6224089 ]
 [ 0.45720392]
 [ 0.44550901]
 [ 0.0397613 ]]


In [17]:
#generate some random data with a skewed distribution
data=np.random.gamma(1,2,size=(100,1))
#Initiate powertransformer object
pt=PowerTransformer(method='box-cox')
#Fit the powertransformer to the data and transform it
transformed_data=pt.fit_transform(data)
#Print the orginal and transformed data to compare
print("Orginal:\n",data[:5])
print("Transformed:\n",transformed_data[:5])

Orginal:
 [[2.02197692]
 [3.23350098]
 [0.91149435]
 [1.39512333]
 [0.14349466]]
Transformed:
 [[ 0.31532854]
 [ 0.77440276]
 [-0.35710089]
 [-0.01348193]
 [-1.50933673]]


In [5]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PowerTransformer


#Create a synthetic dataset
x=np.random.normal(loc=100,scale=10,size=(1000,5))
y=np.random.normal(loc=100,scale=10,size=1000)

#Split the data into tarining and testing sets
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

In [6]:
#Fit and transform the data using Box-Cox method
boxcox_transformer =PowerTransformer(method='box-cox',standardize=True)
x_train_bc=boxcox_transformer.fit_transform(x_train)
x_test_bc=boxcox_transformer.transform(x_test)

#Fit and transform the data using Yeo-Johnson method
yeojohnson_transformer =PowerTransformer(method='yeo-johnson',standardize=True)
x_train_yj=yeojohnson_transformer.fit_transform(x_train)
x_test_yj=yeojohnson_transformer.transform(x_test)

# Tips ==>

In [7]:
df=pd.read_csv("C:\\Users\\user\\Datasets\\tips.csv")
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [23]:
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PowerTransformer

lb=LabelEncoder()
df['sex']=lb.fit_transform(df['sex'])
df['smoker']=lb.fit_transform(df['smoker'])
df['day']=lb.fit_transform(df['day'])
df['time']=lb.fit_transform(df['time'])

x=df.drop(columns=['total_bill'])
y=df['total_bill']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
x_train

Unnamed: 0,tip,sex,smoker,day,time,size
228,2.72,1,0,1,0,2
208,2.03,1,1,1,0,2
96,4.00,1,1,0,0,2
167,4.50,1,0,2,0,4
84,2.03,1,0,3,1,2
...,...,...,...,...,...,...
106,4.06,1,1,1,0,2
14,3.02,0,0,2,0,2
92,1.00,0,1,0,0,2
179,3.55,1,1,2,0,2


In [34]:
yj=PowerTransformer(method='yeo-johnson',standardize=True)
x_train_yj=yj.fit_transform(x_train)
x_test_yj=yj.fit_transform(x_test)

# Covid_toy



In [35]:
df=pd.read_csv("C:\\Users\\user\\Datasets\\covid_toy.csv")
df.head()

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,Male,103.0,Mild,Kolkata,No
1,27,Male,100.0,Mild,Delhi,Yes
2,42,Male,101.0,Mild,Delhi,No
3,31,Female,98.0,Mild,Kolkata,No
4,65,Female,101.0,Mild,Mumbai,No


In [36]:
lb=LabelEncoder()
df['gender']=lb.fit_transform(df['gender'])
df['cough']=lb.fit_transform(df['cough'])
df['city']=lb.fit_transform(df['city'])
df['has_covid']=lb.fit_transform(df['has_covid'])

In [37]:
df.head()

Unnamed: 0,age,gender,fever,cough,city,has_covid
0,60,1,103.0,0,2,0
1,27,1,100.0,0,1,1
2,42,1,101.0,0,1,0
3,31,0,98.0,0,2,0
4,65,0,101.0,0,3,0


In [38]:
yj=PowerTransformer(method='yeo-johnson',standardize=True)
x_train_yj=yj.fit_transform(x_train)
x_test_yj=yj.fit_transform(x_test)