<H1>Scale and Transform</H1>

<p>Exracted from https://pycaret.gitbook.io/docs/get-started/preprocessing/scale-and-transform </p>

<H2>Normalize</H2>
<p>Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to rescale the values of numeric columns in the dataset without distorting differences in the ranges of values or losing information. There are several methods available for normalization, by default, PyCaret uses zscore. </p>



In [1]:
# load dataset
from pycaret.datasets import get_data
pokemon = get_data('pokemon')

# init setup
from pycaret.classification import *
clf1 = setup(data = pokemon, target = 'Legendary', normalize = True)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


Unnamed: 0,Description,Value
0,Session id,2406
1,Target,Legendary
2,Target type,Binary
3,Original data shape,"(800, 13)"
4,Transformed data shape,"(800, 47)"
5,Transformed train set shape,"(560, 47)"
6,Transformed test set shape,"(240, 47)"
7,Numeric features,9
8,Categorical features,3
9,Rows with missing values,48.2%


In [2]:
clf1.get_config('dataset_transformed')

Unnamed: 0,#,Name,Type 1_Rock,Type 1_Bug,Type 1_Ghost,Type 1_Normal,Type 1_Electric,Type 1_Psychic,Type 1_Grass,Type 1_Water,...,Type 2_Bug,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
766,1.594658,-0.299156,3.815174,-0.306186,-0.220656,-0.362372,-0.220656,-0.273598,-0.323321,-0.399275,...,-0.073389,-0.589362,-0.447697,0.328119,0.111341,-0.833488,-0.942417,-0.664741,1.618118,False
600,0.830616,-0.299156,-0.262111,3.265986,-0.220656,-0.362372,-0.220656,-0.273598,-0.323321,-0.399275,...,-0.073389,-1.020270,-0.941671,-0.778234,-0.109538,-0.987006,-0.418367,-0.870086,0.997045,False
669,1.163660,-0.299156,-0.262111,-0.306186,4.531938,-0.362372,-0.220656,-0.273598,-0.323321,-0.399275,...,-0.073389,-0.523068,-0.371701,-1.177751,-0.425080,0.701683,-0.418367,-0.425173,0.997045,False
342,-0.281163,-0.299156,-0.262111,3.265986,-0.220656,-0.362372,-0.220656,-0.273598,-0.323321,-0.399275,...,-0.073389,-0.274467,-0.181712,-0.163593,-0.582851,-0.772082,0.105683,0.601549,-0.245102,False
474,0.277175,-0.299156,-0.262111,-0.306186,-0.220656,2.759599,-0.220656,-0.273598,-0.323321,-0.399275,...,-0.073389,-0.688802,-0.561691,-0.378718,-0.929947,-0.864192,-0.558113,0.601549,0.375971,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
748,1.511397,0.000000,-0.262111,-0.306186,-0.220656,-0.362372,-0.220656,-0.273598,-0.323321,-0.399275,...,-0.073389,-0.895969,-0.941671,0.051531,0.837087,-1.140523,-1.221910,-1.349223,1.618118,False
105,-1.339068,0.000000,-0.262111,-0.306186,-0.220656,-0.362372,-0.220656,3.654993,-0.323321,-0.399275,...,-0.073389,0.413328,0.578247,-0.163593,-0.109538,0.026208,1.503150,-0.014484,-1.487249,False
485,0.326152,0.000000,-0.262111,-0.306186,-0.220656,-0.362372,-0.220656,-0.273598,-0.323321,-0.399275,...,-0.073389,0.554202,-0.105716,0.328119,1.341954,0.210428,1.538087,-1.178102,0.375971,False
328,-0.330140,0.000000,-0.262111,-0.306186,-0.220656,-0.362372,-0.220656,-0.273598,-0.323321,-0.399275,...,-0.073389,-0.440201,-0.751681,0.205191,0.363775,-0.526454,-0.593050,-0.596293,-0.245102,False


<H2>Feature Transform</H2>
<p>While normalization rescales the data within new limits to reduce the impact of magnitude in the variance, Feature transformation is a more radical technique. Transformation changes the shape of the distribution such that the transformed data can be represented by a normal or approximate normal distribution. There are two methods available for transformation yeo-johnson and quantile. </p>


In [3]:
# load dataset
from pycaret.datasets import get_data
pokemon = get_data('pokemon')

# init setup
from pycaret.classification import *
clf1 = setup(data = pokemon, target = 'Legendary', transformation = True)

Unnamed: 0,#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
0,1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
1,2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
2,3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,4,Charmander,Fire,,309,39,52,43,60,50,65,1,False


Unnamed: 0,Description,Value
0,Session id,8421
1,Target,Legendary
2,Target type,Binary
3,Original data shape,"(800, 13)"
4,Transformed data shape,"(800, 47)"
5,Transformed train set shape,"(560, 47)"
6,Transformed test set shape,"(240, 47)"
7,Numeric features,9
8,Categorical features,3
9,Rows with missing values,48.2%


In [4]:
clf1.get_config('dataset_transformed')

Unnamed: 0,#,Name,Type 1_Poison,Type 1_Ghost,Type 1_Normal,Type 1_Electric,Type 1_Fire,Type 1_Water,Type 1_Grass,Type 1_Bug,...,Type 2_Ghost,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
483,111.677763,0.009408,0.02228,-0.000000,-0.000000,-0.000000,-0.000000,-0.000000,-0.000000,-0.0,...,-0.0,99.195122,12.434366,23.553487,11.526854,9.008013,7.608441,23.260506,3.026236,False
624,134.874237,0.009408,-0.00000,0.023517,-0.000000,-0.000000,-0.000000,-0.000000,-0.000000,-0.0,...,-0.0,99.769697,9.560155,15.838387,16.749148,10.160094,9.377590,11.774133,3.643345,False
445,105.022634,0.009408,-0.00000,-0.000000,0.085591,-0.000000,-0.000000,-0.000000,-0.000000,-0.0,...,-0.0,89.031301,11.028620,22.248898,10.910403,8.083261,7.558720,20.850232,3.026236,False
535,119.839521,0.009408,-0.00000,-0.000000,-0.000000,0.039608,-0.000000,-0.000000,-0.000000,-0.0,...,-0.0,105.015929,8.914119,18.752323,14.488240,10.581624,9.444015,23.618256,3.026236,False
7,4.274739,0.009408,-0.00000,-0.000000,-0.000000,-0.000000,0.042084,-0.000000,-0.000000,-0.0,...,-0.0,120.493672,10.964607,29.081396,14.746152,11.527596,8.657072,26.039076,0.897701,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
357,90.394058,0.009413,-0.00000,-0.000000,-0.000000,-0.000000,-0.000000,-0.000000,-0.000000,-0.0,...,-0.0,97.896795,11.092133,14.792208,11.354661,9.937943,9.541992,22.535093,2.372845,False
78,29.564667,0.009413,-0.00000,-0.000000,-0.000000,-0.000000,-0.000000,0.098302,-0.000000,-0.0,...,-0.0,77.351443,8.009634,13.699060,8.284273,7.757314,9.207437,20.658130,0.897701,False
75,28.637399,0.009413,-0.00000,-0.000000,-0.000000,-0.000000,-0.000000,-0.000000,0.060657,-0.0,...,-0.0,71.623992,8.914119,20.547122,8.284273,8.954610,5.672057,14.282701,0.897701,False
309,81.899536,0.009413,-0.00000,-0.000000,-0.000000,-0.000000,-0.000000,-0.000000,0.060657,-0.0,...,-0.0,70.789117,9.712942,13.699060,10.910403,7.032878,7.558720,13.062194,2.372845,False


<h2>Target Transform</h2>
<p>Target Transformation is similar to Feature Transformation as it will change the shape of the distribution of the target variable instead of Features. This feature is only available in pycaret.regression module.</p>

In [5]:
# load dataset
from pycaret.datasets import get_data
diamond = get_data('diamond')

# init setup
from pycaret.regression import *
reg1 = setup(data = diamond, target = 'Price', transform_target = True)

Unnamed: 0,Carat Weight,Cut,Color,Clarity,Polish,Symmetry,Report,Price
0,1.1,Ideal,H,SI1,VG,EX,GIA,5169
1,0.83,Ideal,H,VS1,ID,ID,AGSL,3470
2,0.85,Ideal,H,SI1,EX,EX,GIA,3183
3,0.91,Ideal,E,SI1,VG,VG,GIA,4370
4,0.83,Ideal,G,SI1,EX,EX,GIA,3171


Unnamed: 0,Description,Value
0,Session id,3054
1,Target,Price
2,Target type,Regression
3,Original data shape,"(6000, 8)"
4,Transformed data shape,"(6000, 29)"
5,Transformed train set shape,"(4200, 29)"
6,Transformed test set shape,"(1800, 29)"
7,Numeric features,1
8,Categorical features,6
9,Preprocess,True


In [6]:
reg1.get_config('dataset_transformed')

Unnamed: 0,Carat Weight,Cut_Very Good,Cut_Ideal,Cut_Good,Cut_Signature-Ideal,Cut_Fair,Color_E,Color_H,Color_G,Color_D,...,Polish_EX,Polish_VG,Polish_ID,Polish_G,Symmetry_EX,Symmetry_VG,Symmetry_G,Symmetry_ID,Report,Price
989,0.90,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.589869
299,0.91,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.582185
587,1.05,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.611007
4316,0.94,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,2.594898
40,1.01,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,2.589728
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2853,2.07,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.633565
1451,2.10,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.645747
1197,1.20,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.603601
2294,1.78,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,2.650795
