<H1>Scale and Transform</H1>

<p>Exracted from https://pycaret.gitbook.io/docs/get-started/preprocessing/scale-and-transform </p>

<H2>Normalize</H2>
<p>Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to rescale the values of numeric columns in the dataset without distorting differences in the ranges of values or losing information. There are several methods available for normalization, by default, PyCaret uses zscore. </p>



In [1]:
# load dataset
from pycaret.datasets import get_data
pokemon = get_data('pokemon')

# init setup
from pycaret.classification import *
clf1 = setup(data = pokemon, target = 'Legendary', normalize = True)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


Unnamed: 0,Description,Value
0,Session id,7308
1,Target,Legendary
2,Target type,Binary
3,Original data shape,"(800, 13)"
4,Transformed data shape,"(800, 47)"
5,Transformed train set shape,"(560, 47)"
6,Transformed test set shape,"(240, 47)"
7,Numeric features,9
8,Categorical features,3
9,Rows with missing values,48.2%


In [3]:
clf1.get_config('dataset_transformed')

Unnamed: 0,#,Name,Type 1_Rock,Type 1_Psychic,Type 1_Grass,Type 1_Ice,Type 1_Bug,Type 1_Ghost,Type 1_Fire,Type 1_Normal,...,Type 2_Normal,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
585,0.781253,-0.295599,4.443055,-0.295599,-0.319941,-0.182237,-0.299156,-0.211604,-0.27735,-0.365518,...,-0.042295,0.673147,0.561387,1.750035,1.904545,-0.395311,0.292034,-1.470555,1.000544,False
635,1.009614,-0.295599,-0.225070,3.382964,-0.319941,-0.182237,-0.299156,-0.211604,-0.27735,-0.365518,...,-0.042295,-1.194647,-0.938144,-1.502037,-0.746244,-0.551891,-0.251045,-0.791483,1.000544,False
166,-0.998060,-0.295599,-0.225070,-0.295599,3.125577,-0.182237,-0.299156,-0.211604,-0.27735,-0.365518,...,-0.042295,-0.962211,-0.938144,-0.913567,-0.249222,-0.739788,-0.251045,-0.791483,-0.787662,False
395,-0.003738,-0.295599,-0.225070,-0.295599,-0.319941,5.487359,-0.299156,-0.211604,-0.27735,-0.365518,...,-0.042295,-1.111634,-0.750702,-0.882595,-0.746244,-0.708472,-0.794124,-0.621715,-0.191593,False
110,-1.235936,-0.295599,-0.225070,-0.295599,3.125577,-0.182237,-0.299156,-0.211604,-0.27735,-0.365518,...,-0.042295,-0.904101,-0.375820,-1.192316,0.247801,-0.395311,-0.975151,-0.961251,-1.383731,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97,-1.293026,0.000000,-0.225070,-0.295599,-0.319941,-0.182237,-0.299156,-0.211604,-0.27735,-0.365518,...,-0.042295,-1.070128,-1.500468,-0.418013,0.910499,-0.865053,-1.699256,-0.961251,-1.383731,False
593,0.814555,0.000000,-0.225070,-0.295599,-0.319941,-0.182237,-0.299156,-0.211604,-0.27735,-0.365518,...,-0.042295,-0.239997,0.561387,0.820872,0.413476,-1.021634,-0.794124,-0.961251,1.000544,False
445,0.181805,0.000000,-0.225070,-0.295599,-0.319941,-0.182237,-0.299156,-0.211604,-0.27735,2.735845,...,-0.042295,-0.198490,0.336457,0.201429,-0.414896,-0.551891,-0.432071,0.091311,0.404475,False
504,0.438711,0.000000,-0.225070,-0.295599,-0.319941,-0.182237,-0.299156,-0.211604,-0.27735,-0.365518,...,-0.042295,0.465614,0.486410,0.851844,-0.249222,0.418910,-0.251045,0.566661,0.404475,False


<H2>Feature Transform</H2>
<p>While normalization rescales the data within new limits to reduce the impact of magnitude in the variance, Feature transformation is a more radical technique. Transformation changes the shape of the distribution such that the transformed data can be represented by a normal or approximate normal distribution. There are two methods available for transformation yeo-johnson and quantile. </p>


In [4]:
# load dataset
from pycaret.datasets import get_data
pokemon = get_data('pokemon')

# init setup
from pycaret.classification import *
clf1 = setup(data = pokemon, target = 'Legendary', transformation = True)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


Unnamed: 0,Description,Value
0,Session id,6119
1,Target,Legendary
2,Target type,Binary
3,Original data shape,"(800, 13)"
4,Transformed data shape,"(800, 47)"
5,Transformed train set shape,"(560, 47)"
6,Transformed test set shape,"(240, 47)"
7,Numeric features,9
8,Categorical features,3
9,Rows with missing values,48.2%


In [5]:
clf1.get_config('dataset_transformed')

Unnamed: 0,#,Name,Type 1_Dragon,Type 1_Fire,Type 1_Grass,Type 1_Water,Type 1_Psychic,Type 1_Normal,Type 1_Dark,Type 1_Ground,...,Type 2_Bug,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
419,97.569907,0.009222,0.029706,-0.000000,-0.000000,-0.000000,-0.000000,-0.0,-0.0,-0.0,...,-0.0,115.239906,12.951246,22.166751,10.386552,11.735519,11.159619,23.620991,2.248061,True
43,17.884885,0.009216,-0.000000,0.037133,-0.000000,-0.000000,-0.000000,-0.0,-0.0,-0.0,...,-0.0,102.271579,12.372263,19.946227,10.095788,9.668201,10.728862,22.300318,0.875349,False
740,147.125520,0.009216,-0.000000,-0.000000,0.065618,-0.000000,-0.000000,-0.0,-0.0,-0.0,...,-0.0,79.293105,11.760969,18.080853,8.258802,8.634145,8.448823,14.942530,3.907452,False
762,150.276231,0.009216,-0.000000,-0.000000,-0.000000,0.094462,-0.000000,-0.0,-0.0,-0.0,...,-0.0,76.113787,10.209891,15.891132,9.276189,8.390061,8.824011,13.464861,3.907452,False
636,131.446619,0.009216,-0.000000,-0.000000,-0.000000,-0.000000,0.048273,-0.0,-0.0,-0.0,...,-0.0,85.485766,11.207163,14.316929,9.792268,9.361113,10.025106,15.471320,3.388711,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
77,28.525795,0.009220,-0.000000,-0.000000,0.065618,-0.000000,-0.000000,-0.0,-0.0,-0.0,...,-0.0,100.156286,12.951246,24.393235,9.474469,10.549736,9.232857,17.948929,0.875349,False
87,31.161728,0.009220,-0.000000,-0.000000,-0.000000,0.094462,-0.000000,-0.0,-0.0,-0.0,...,-0.0,113.906972,14.101958,19.781441,14.667612,11.735519,9.772189,10.568855,0.875349,False
6,4.233238,0.009220,-0.000000,0.037133,-0.000000,-0.000000,-0.000000,-0.0,-0.0,-0.0,...,-0.0,106.305979,12.788850,21.233856,10.271691,10.928106,10.025106,22.300318,0.875349,False
457,103.262938,0.009220,-0.000000,-0.000000,-0.000000,-0.000000,-0.000000,-0.0,-0.0,-0.0,...,-0.0,58.097526,9.092983,10.774628,8.016729,6.150548,7.611425,11.867821,2.838550,False


<h2>Target Transform</h2>
<p>Target Transformation is similar to Feature Transformation as it will change the shape of the distribution of the target variable instead of Features. This feature is only available in pycaret.regression module.</p>

In [6]:
# load dataset
from pycaret.datasets import get_data
diamond = get_data('diamond')

# init setup
from pycaret.regression import *
reg1 = setup(data = diamond, target = 'Price', transform_target = True)

Unnamed: 0,Carat Weight,Cut,Color,Clarity,Polish,Symmetry,Report,Price
0,1.1,Ideal,H,SI1,VG,EX,GIA,5169
1,0.83,Ideal,H,VS1,ID,ID,AGSL,3470
2,0.85,Ideal,H,SI1,EX,EX,GIA,3183
3,0.91,Ideal,E,SI1,VG,VG,GIA,4370
4,0.83,Ideal,G,SI1,EX,EX,GIA,3171


Unnamed: 0,Description,Value
0,Session id,7423
1,Target,Price
2,Target type,Regression
3,Original data shape,"(6000, 8)"
4,Transformed data shape,"(6000, 29)"
5,Transformed train set shape,"(4200, 29)"
6,Transformed test set shape,"(1800, 29)"
7,Ordinal features,1
8,Numeric features,1
9,Categorical features,6


In [8]:
reg1.get_config('dataset_transformed')

Unnamed: 0,Carat Weight,Cut_Ideal,Cut_Very Good,Cut_Good,Cut_Signature-Ideal,Cut_Fair,Color_F,Color_G,Color_I,Color_H,...,Polish_EX,Polish_VG,Polish_ID,Polish_G,Symmetry_EX,Symmetry_VG,Symmetry_ID,Symmetry_G,Report,Price
983,0.89,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.652143
4287,1.07,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.680242
3861,2.01,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.714139
5292,1.50,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.700994
5303,1.01,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.665982
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3661,1.26,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.676303
5418,1.04,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.674544
1159,1.73,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.718315
991,1.51,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.695741
