#  <span style="color:orange">Regression Tutorial (REG102) - Level Intermediate</span>

**Date Updated: May 02, 2020**

# 1.0 Tutorial Objective
Welcome to the regression tutorial (#REG102). This tutorial assumes that you have completed __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__. If you haven't used PyCaret before and this is your first tutorial, we strongly recommend you to go back and progress through the beginner tutorial to understand the basics of working in PyCaret.

In this tutorial we will use the `pycaret.regression` module to learn:

* **Normalization:**  How to normalize and scale the dataset
* **Transformation:**  How to apply transformations that make the data linear and approximately normal
* **Target Transformation:**  How to apply transformations to the target variable
* **Combine Rare Levels:**  How to combine rare levels in categorical features
* **Bin Numeric Variables:**  How to bin numeric variables and transform numeric features into categorical ones using Sturges' rule
* **Model Ensembling and Stacking:**  How to boost model performance using several ensembling techniques such as Bagging, Boosting, Voting and Generalized Stacking.
* **Tuning Hyperparameters of Ensemblers:**  How to tune hyperparameters of ensemblers
* **Save / Load Experiment:**  How to save the entire experiment

Read Time : Approx 60 Minutes


## 1.1 Installing PyCaret
If you haven't installed PyCaret yet. Please follow the link to __[Beginner's Tutorial](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ for instructions on how to install PyCaret.

## 1.2 Pre-Requisites
- Python 3.x
- Latest version of PyCaret
- Internet connection to load data from PyCaret's repository
- Completion of Regression Tutorial (REG101) - Level Beginner

## 1.3 For Google Colab Users:
If you are running this notebook on Google Colab, run the following code at the top of your notebook to display interactive visuals.<br/>
<br/>
`from pycaret.utils import enable_colab` <br/>
`enable_colab()`

## 1.4 See Also:
- __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__
- __[Regression Tutorial (REG103) - Level Expert](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Expert%20-%20REG103.ipynb)__

# 2.0 Brief Overview of Techniques Covered in This Tutorial
Before we dive into the practical execution of the techniques mentioned above in Section 1, it is important to understand what these techniques are and when to use them. More often than not, most of these techniques will help linear and parametric algorithms. However it is not surprising to also see performance gains in tree-based models as well. The explanations below are only brief and we recommend that you do extra reading to dive deeper and get a more thorough understanding of these techniques.

- **Normalization:** Normalization / scaling (often used interchangeably with standardization) is used to transform the actual values of numeric variables in a way that provides helpful properties for machine learning. Many algorithms such as Linear Regression, Support Vector Machines and K Nearest Neighbors assume that all the features are centered around zero and have variances that are at the same level of order. If a particular feature in the dataset has a variance that is larger in the order of magnitude than other features, the model may not understand all features correctly and could perform poorly. __[Read more](https://sebastianraschka.com/Articles/2014_about_feature_scaling.html#z-score-standardization-or-min-max-scaling)__ <br/>
<br/>
- **Transformation:** While normalization transforms the range of data to remove the impact of magnitude in variance, transformation is a more radical technique as it changes the shape of the distribution so that transformed data can be represented by the normal or approximately normal distirbution. In general, you should transform the data if using algorithms that assume normality or Gaussian distribution. Examples of such models are Linear Regression, Lasso Regression and Ridge Regression. __[Read more](https://en.wikipedia.org/wiki/Power_transform)__<br/>
<br/>
- **Target Transformation:** This is similar to the `transformation` technique explained above with the exception that this is only applied to the target variable. __[Read more](https://scikit-learn.org/stable/auto_examples/compose/plot_transformed_target.html)__ to understand the effects of transforming the target variable in regression.<br/>
<br/>
- **Combine Rare Levels:** Sometimes categorical features have levels that are insignificant in the frequency distribution. As such, they may introduce noise into the dataset due to a limited sample size for learning. One way to deal with rare levels in categorical features is to combine them into a new class. <br/>
<br/>
- **Bin Numeric Variables:** Binning or discretization is the process of transforming numerical variables into categorical features. An example would be `Carat Weight` in this experiment. It is a continious distribution of numeric values that can be discretized into intervals. Binning may improve the accuracy of a predictive model by reducing the noise or non-linearity in the data. PyCaret automatically determines the number and the size of bins using Sturges' rule.  __[Read more](https://www.vosesoftware.com/riskwiki/Sturgesrule.php)__<br/>
<br/>
- **Model Ensembling and Stacking:** Ensemble modeling is a process where multiple diverse models are created to predict the outcome. This is achieved either by using many different modeling algorithms or using different samples of training datasets. The ensemble model then aggregates the predictions of each base model resulting in one final prediction for the unseen data. The motivation for using ensemble models is to reduce the generalization error of the prediction. As long as the base models are diverse and independent, the prediction error of the model decreases when the ensemble approach is used. The two most common methods in ensemble learning are `Bagging` and `Boosting`. Stacking is also a type of ensemble learning where predictions from multiple models are used as input features for a meta model that predicts the final outcome. __[Read more](https://blog.statsbot.co/ensemble-learning-d1dcd548e936)__<br/>
<br/>
- **Tuning Hyperparameters of Ensemblers:** Similar to hyperparameter tuning for a single machine learning model, we will also learn how to tune hyperparameters for an ensemble model.

# 3.0 Dataset for the Tutorial

For this tutorial we will be using the same dataset that was used in __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__.

#### Dataset Acknowledgements:
This case was prepared by Greg Mills (MBA ’07) under the supervision of Phillip E. Pfeifer, Alumni Research Professor of Business Administration. Copyright (c) 2007 by the University of Virginia Darden School Foundation, Charlottesville, VA. All rights reserved.

The original dataset and description can be __[found here.](https://github.com/DardenDSC/sarah-gets-a-diamond)__ 

# 4.0 Getting the Data

You can download the data from the original source __[found here](https://github.com/DardenDSC/sarah-gets-a-diamond)__ and load it using the pandas read_csv function or you can use PyCaret's data repository to load the data using the `get_data()` function (This will require internet connection).

In [1]:
from pycaret.datasets import get_data
dataset = get_data('diamond', profile=True)



Notice that when the `profile` parameter is to `True`, it displays a data profile for exploratory data analysis. Several pre-processing steps as discussed in section 2 above will be performed in this experiment based on this analysis. Let's summarize how the profile has helped make critical pre-processing choices with the data.

- **Missing Values:** There are no missing values in the data. However, we still need imputers in our pipeline just in case the new unseen data has missing values (not applicable in this case). When you execute the `setup()` function, imputers are created and stored in the pipeline automatically. By default, it uses a mean imputer for numeric values and a constant imputer for categorical. This can be changed using the `numeric_imputation` and `categorical_imputation` parameters in `setup()`. <br/>
<br/>
- **Combine Rare Levels:** Notice the distribution of the `Clarity` feature in the dataset. It has 7 distinct classes of which `FL` only appears 4 times. Similarly in the `Cut` feature, the `Fair` level only appears `2.1%` of the time in the training dataset. We will use the `combine_rare_categories` parameter in the setup to combine the rare levels. <br/>
<br/>
- **Data Scale / Range:** Notice how significantly the scale / range of `Carat Weight` differs from the `Price` variable. Carat Weight ranges from between 0.75 to 2.91 while Price ranges from 2,184 all the way up to 101,561. We will deal with this problem by using the `normalize` parameter in setup. <br/>
<br/>
- **Target Transformation:** The target variable `Price` is not normally distributed. It is right skewed with high kurtosis. We will use the `transform_target` parameter in the setup to apply a linear transformation on the target variable. `<br/>
<br/>
- **Bin Numeric Features:** `Carat Weight` is the only numeric feature. When looking at its histogram, the distribution seems to have natural breaks. Binning will convert it into a categorical feature and create several levels using Sturges' rule. This will help remove the noise for linear algorithms. <br/>
<br/>

In [2]:
#check the shape of data
dataset.shape

(6000, 8)

In order to demonstrate the `predict_model()` function on unseen data, a sample of 600 has been withheld from the original dataset to be used for predictions. This should not be confused with the train/test split as this particular split is performed to simulate a real life scenario. Another way to think about this is that these 600 records were not available at the time when the machine learning experiment was performed.

In [3]:
data = dataset.sample(frac=0.9, random_state=786)
data_unseen = dataset.drop(data.index).reset_index(drop=True)
data.reset_index(drop=True, inplace=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions ' + str(data_unseen.shape))

Data for Modeling: (5400, 8)
Unseen Data For Predictions (600, 8)


# 5.0 Setting up Environment in PyCaret

In the previous tutorial __[Regression (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we learned how to initialize the environment in PyCaret using `setup()`. No additional parameters were passed in our last example as we did not perform any pre-processing steps (other than those that are imperative for machine learning experiments which were performed automatically by PyCaret). In this example we will take it to the next level by customizing the pre-processing pipeline using `setup()`. Let's look at how to implement all the steps discussed in section 4 above.

In [4]:
from pycaret.regression import *

In [5]:
exp_reg102 = setup(data = data, target = 'Price', session_id=123,
                  normalize = True, transformation = True, transform_target = True, 
                  combine_rare_levels = True, rare_level_threshold = 0.05,
                  remove_multicollinearity = True, multicollinearity_threshold = 0.95, 
                  bin_numeric_features = ['Carat Weight']) 

 
Setup Succesfully Completed!


Unnamed: 0,Description,Value
0,session_id,123
1,Transform Target,True
2,Transform Target Method,box-cox
3,Original Data,"(5400, 8)"
4,Missing Values,False
5,Numeric Features,1
6,Categorical Features,6
7,Ordinal Features,False
8,High Cardinality Features,False
9,High Cardinality Method,


Note that this is the same setup grid that was shown in __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__. The only difference here are the customization parameters that were passed to `setup()` and now are set to `True`. Also notice that the `session_id` is the same as the one used in the beginner tutorial, which means that the effect of randomization is completely isolated. Any improvements we see in this experiment are solely due to the pre-processing steps taken in `setup()` or any other modeling techniques used in later sections of this tutorial.

# 6.0 Comparing All Models

Similar to __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we will also begin this tutorial with `compare_models()`. We will then compare the below results with the last experiment.

In [6]:
compare_models()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,CatBoost Regressor,683.6248,1985415.174,1388.2471,0.9801,0.0729,0.0539
1,Light Gradient Boosting Machine,759.1324,2863935.6793,1643.4677,0.9726,0.0794,0.0581
2,Huber Regressor,921.9673,3436974.1745,1837.1226,0.966,0.0958,0.07
3,Random Forest,858.166,3508114.3173,1837.6291,0.9654,0.0909,0.0663
4,Ridge Regression,932.4246,3559808.6784,1865.617,0.965,0.0956,0.0707
5,Support Vector Machine,868.4623,3667101.4697,1884.777,0.9639,0.0866,0.0632
6,Bayesian Ridge,936.3514,3672909.6744,1892.0171,0.9638,0.0956,0.0707
7,Random Sample Consensus,937.3268,3676395.5162,1894.3052,0.9637,0.0957,0.0708
8,Linear Regression,941.937,3743608.7484,1907.7687,0.9632,0.0958,0.0709
9,Extra Trees Regressor,959.8065,4595932.2641,2076.1553,0.9553,0.1047,0.0754


For the purpose of comparison we will use the `RMSLE` score. Notice how drastically a few of the algorithms (mostly linear) have improved after we performed a few pre-processing steps in `setup()`. 
- Linear Regression RMSLE improved from `0.7215` to `0.0958`
- Random Sample Consensus RMSLE improved from `0.5725` to `0.0957`
- Support Vector Machine RMSLE improved from `0.7137` to `0.0866`

At this point you should also notice that while the transformations have improved the performance for several linear algorithms, it has also adversely affected the performance of tree based algorithms (to a lesser extent). For example the RMSLE of Random Forest has decreased to `0.0909` from `0.0818` in __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__. It is important that you build an intuition as you go along in your machine learning journey to know what effects different pre-processing methods are likely to have on different types of models.

To see results for all of the models from the previous tutorial refer to Section 7 in __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__.

# 7.0 Create a Model

In the previous tutorial __[Regression (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we learned how to create a model using the `create_model()` function. Now we will learn about a few other parameters that may come in handy. In this section, we will create all the models using 5-fold cross validation. Notice how the `fold` parameter is passed inside `create_model()` to achieve this.

### 7.1 Create Model (change fold to 5)

In [7]:
dt = create_model('dt', fold = 5)

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,1022.2892,5372470.0,2317.8589,0.9515,0.1091,0.0785
1,979.3522,3747603.0,1935.8727,0.9625,0.1058,0.0772
2,952.2433,4448655.0,2109.1835,0.955,0.1079,0.0771
3,941.3846,3010308.0,1735.0238,0.9676,0.1028,0.0759
4,1072.631,6859900.0,2619.1411,0.9337,0.1122,0.0809
Mean,993.58,4687787.0,2143.416,0.9541,0.1076,0.0779
SD,48.3917,1337306.0,305.8676,0.0116,0.0031,0.0017


### 7.2 Create Model (round to 2 decimals points)

In [8]:
dt = create_model('dt', fold = 5, round = 2)

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,1022.29,5372469.87,2317.86,0.95,0.11,0.08
1,979.35,3747603.26,1935.87,0.96,0.11,0.08
2,952.24,4448654.96,2109.18,0.95,0.11,0.08
3,941.38,3010307.76,1735.02,0.97,0.1,0.08
4,1072.63,6859900.23,2619.14,0.93,0.11,0.08
Mean,993.58,4687787.22,2143.42,0.95,0.11,0.08
SD,48.39,1337305.7,305.87,0.01,0.0,0.0


Notice how passing the `round` parameter inside `create_model()` has rounded the evaluation metrics to 2 decimals.

# 8.0 Tune a Model

In the previous tutorial __[Regression (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we learned how to automatically tune the hyperparameters of a model using pre-defined grids. Here we will introduce the `n_iter` parameter in `tune_model()`. `n_iter` is the number of iterations within a random grid search. For every iteration, the model randomly selects one value from the pre-defined grid of hyperparameters. By default, the parameter is set to `10`, which means there would be a maximum of 10 iterations to find the best value for hyperparameters. Increasing the value may improve the performance but will also increase the training time. See the example below:

In [9]:
tuned_knn = tune_model('knn')

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2295.3619,27569510.0,5250.6674,0.6973,0.3138,0.1658
1,3278.1215,69530550.0,8338.4983,0.4671,0.3926,0.2086
2,2612.2383,45751620.0,6763.9943,0.5804,0.3406,0.1872
3,2710.565,33536800.0,5791.0966,0.63,0.3437,0.1894
4,2501.2823,34307010.0,5857.2187,0.5721,0.3426,0.1903
5,2482.5015,50180650.0,7083.8304,0.5717,0.3222,0.1753
6,2754.5034,51158320.0,7152.5047,0.5194,0.3441,0.177
7,2412.1881,30156580.0,5491.501,0.6186,0.3471,0.1755
8,2752.7112,42372210.0,6509.3939,0.5839,0.3294,0.1693
9,2816.9196,48687550.0,6977.6465,0.5364,0.3594,0.2074


In [10]:
tuned_knn2 = tune_model('knn', n_iter = 25)

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2055.7274,21786300.0,4667.5794,0.7608,0.2835,0.152
1,3001.9837,53107040.0,7287.4574,0.593,0.3708,0.2114
2,2402.6227,38956880.0,6241.545,0.6427,0.3296,0.1829
3,2466.3722,28376180.0,5326.9294,0.6869,0.3225,0.1779
4,2288.9036,27180910.0,5213.5314,0.661,0.3299,0.1892
5,2044.4439,28514840.0,5339.9288,0.7566,0.2793,0.1632
6,2449.2604,44067900.0,6638.3661,0.586,0.3185,0.1581
7,2353.9007,28475320.0,5336.2275,0.6399,0.3442,0.1803
8,2322.0156,28354440.0,5324.8886,0.7215,0.2818,0.1544
9,2286.2674,29713540.0,5451.0123,0.7171,0.3154,0.1858


Notice how two tuned K-Nearest Neighbors were created based on the `n_iter` parameter. In `tuned_knn`, the `n_iter` parameter is left to the default value and resulted in RMSLE of `0.3436`. In `tuned_knn2`, the `n_iter` parameter was set to `25` and the RMSLE improved to `0.3176`. Observe the differences between the hyperparameters of `tuned_knn` and `tuned_knn2` below:

In [11]:
#tuned_knn with default n_iter (10)
plot_model(tuned_knn, plot = 'parameter')

Unnamed: 0,Parameters
algorithm,kd_tree
leaf_size,30
metric,minkowski
metric_params,
n_jobs,
n_neighbors,24
p,2
weights,distance


In [12]:
#tuned_knn2 with n_iter=25
plot_model(tuned_knn2, plot = 'parameter')

Unnamed: 0,Parameters
algorithm,ball_tree
leaf_size,20
metric,minkowski
metric_params,
n_jobs,
n_neighbors,6
p,2
weights,distance


# 9.0 Ensemble a Model

Ensembling is another common technique to improve the performance of models. In machine learning they combine the decisions from multiple models to improve the overall performance. There are various techniques for ensembling that we will cover in this section. These include Bagging and Boosting __[(Read More)](https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205)__. We will use the `ensemble_model()` function in PyCaret, which ensembles the trained base estimators using the method defined in the `method` parameter.

In [13]:
# lets create a simple decision tree model that we will use for ensembling 
dt = create_model('dt')

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,993.0505,3423624.0,1850.3037,0.9624,0.1091,0.0793
1,1076.9302,8859405.0,2976.4753,0.9321,0.1065,0.0754
2,962.3708,3770901.0,1941.8808,0.9654,0.115,0.0813
3,964.2332,3221884.0,1794.9608,0.9645,0.1039,0.0755
4,932.1042,4174581.0,2043.1791,0.9479,0.11,0.0765
5,857.1385,3962157.0,1990.5167,0.9662,0.0949,0.0681
6,976.6257,3234178.0,1798.382,0.9696,0.1035,0.076
7,873.9425,2532181.0,1591.2828,0.968,0.105,0.0743
8,966.2217,4063628.0,2015.8442,0.9601,0.1053,0.0775
9,1221.8533,10417400.0,3227.6004,0.9008,0.1214,0.0855


### 9.1 Bagging

In [14]:
bagged_dt = ensemble_model(dt)

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,895.6712,3304529.0,1817.8364,0.9637,0.0951,0.071
1,947.5378,6664797.0,2581.6268,0.9489,0.091,0.065
2,781.5122,1897056.0,1377.3366,0.9826,0.0876,0.065
3,845.8462,2063521.0,1436.4959,0.9772,0.0904,0.0674
4,924.6376,6335825.0,2517.1065,0.921,0.0999,0.07
5,887.7038,5238008.0,2288.6694,0.9553,0.0912,0.0662
6,918.6015,4297063.0,2072.9359,0.9596,0.0914,0.0664
7,862.5879,2756716.0,1660.3361,0.9651,0.0942,0.069
8,860.2581,3445217.0,1856.1295,0.9662,0.0924,0.0657
9,908.5273,3620214.0,1902.6861,0.9655,0.1037,0.0733


In [15]:
# check the parameter of bagged_dt
print(bagged_dt)

BaggingRegressor(base_estimator=DecisionTreeRegressor(ccp_alpha=0.0,
                                                      criterion='mse',
                                                      max_depth=None,
                                                      max_features=None,
                                                      max_leaf_nodes=None,
                                                      min_impurity_decrease=0.0,
                                                      min_impurity_split=None,
                                                      min_samples_leaf=1,
                                                      min_samples_split=2,
                                                      min_weight_fraction_leaf=0.0,
                                                      presort='deprecated',
                                                      random_state=123,
                                                      splitter='best'),
                 bootstrap=Tr

Notice how ensembling has improved the `RMSLE` from `0.1075` to `0.0937`. In the above example we have used the default parameters of `ensemble_model()` which uses the `Bagging` method. Let's try `Boosting` by changing the `method` parameter in `ensemble_model()`. See example below: 

### 9.2 Boosting

In [16]:
boosted_dt = ensemble_model(dt, method = 'Boosting')

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,927.5377,3087624.0,1757.1637,0.9661,0.1018,0.0735
1,997.7161,7267713.0,2695.8697,0.9443,0.0979,0.0709
2,840.7781,2395993.0,1547.8997,0.978,0.0913,0.0687
3,887.3461,2374569.0,1540.9638,0.9738,0.0928,0.0694
4,919.7914,4235460.0,2058.0233,0.9472,0.1009,0.0729
5,873.9482,4502171.0,2121.8321,0.9616,0.0934,0.0673
6,917.1107,3184243.0,1784.4448,0.9701,0.096,0.0691
7,915.7112,3111298.0,1763.8871,0.9607,0.0989,0.0727
8,855.2621,2787064.0,1669.4501,0.9726,0.0979,0.0706
9,1017.3759,5049684.0,2247.1502,0.9519,0.1115,0.0777


Notice how easy it is to ensemble models in PyCaret. By simply changing the `method` parameter you can do bagging or boosting which would otherwise have taken multiple lines of code. Note that `ensemble_model()` will by default build `10` estimators. This can be changed using the `n_estimators` parameter. Increasing the number of estimators can sometimes improve the results. See the example below:

In [17]:
bagged_dt2 = ensemble_model(dt, n_estimators=50)

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,852.5026,2699914.0,1643.1416,0.9704,0.0917,0.0681
1,908.2262,6538294.0,2557.0088,0.9499,0.0868,0.0622
2,770.4685,1965380.0,1401.9201,0.982,0.086,0.064
3,850.562,1978513.0,1406.5963,0.9782,0.0886,0.0673
4,889.6931,5307195.0,2303.7351,0.9338,0.0977,0.0687
5,858.2552,4662363.0,2159.2506,0.9602,0.0874,0.0638
6,890.1514,3680827.0,1918.548,0.9654,0.0895,0.0653
7,836.2295,2550333.0,1596.9763,0.9677,0.0913,0.0672
8,839.4603,3087126.0,1757.022,0.9697,0.0916,0.0656
9,904.317,3529446.0,1878.682,0.9664,0.1032,0.073


Notice how increasing the n_estimators parameter has improved the result. The bagged_dt model with the default `10` estimators resulted in a RMSLE of `0.0937` whereas in bagged_dt2 where `n_estimators = 50` the RMSLE improved to `0.0914`.

You can also use the `tune_model()` function to automatically tune the `n_estimators` parameter of an ensemble. See the example below where we create a tuned ensemble decision tree with the `Bagging` parameter enabled.

In [18]:
tuned_bagged_dt = tune_model('dt', ensemble = True, method = 'Bagging', n_iter = 100)

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,816.7963,2212970.0,1487.6056,0.9757,0.088,0.0659
1,894.954,5696451.0,2386.7239,0.9563,0.0888,0.0643
2,785.6359,2476067.0,1573.5522,0.9773,0.0863,0.0639
3,831.1297,1890625.0,1375.0,0.9791,0.0901,0.0668
4,881.0989,6263806.0,2502.7596,0.9219,0.0991,0.0673
5,853.6744,4425015.0,2103.572,0.9622,0.0868,0.0624
6,881.2739,3536073.0,1880.4448,0.9668,0.0891,0.065
7,853.6338,2753383.0,1659.3321,0.9652,0.0922,0.0677
8,876.2253,3470113.0,1862.8241,0.9659,0.0978,0.0677
9,892.1219,3560105.0,1886.8241,0.9661,0.1031,0.0726


In [19]:
# check the parameters of tuned Decision Tree with bagging
print(tuned_bagged_dt)

BaggingRegressor(base_estimator=DecisionTreeRegressor(ccp_alpha=0.0,
                                                      criterion='mse',
                                                      max_depth=19,
                                                      max_features=None,
                                                      max_leaf_nodes=None,
                                                      min_impurity_decrease=0.0,
                                                      min_impurity_split=None,
                                                      min_samples_leaf=2,
                                                      min_samples_split=2,
                                                      min_weight_fraction_leaf=0.0,
                                                      presort='deprecated',
                                                      random_state=123,
                                                      splitter='best'),
                 bootstrap=True

Notice that the `tuned_bagged_dt` is a decision tree wrapped inside a `BaggingRegressor`. Our first bagging ensemble with the default values stored in `bagged_dt` resulted in a RMSLE of `0.0937` which was improved to `0.0914` when we increased the `n_estimators` parameter to `50`. 

After tuning the decision tree with `ensemble = True` inside the `tune_model()` function, our RMSLE did not improve but it resulted in a different model where `n_estimators` is set to `220`. Trying different values of `n_iter` in `tune_model()` and different `n_estimators` in `ensemble_model()` is the way to find the best hyperparameters.

### 9.3 Blending

Blending is another common technique for ensembling that can be used in PyCaret. It creates multiple models and then averages the individual predictions to form a final prediction. If no list is passed, PyCaret uses all of the models available in the model library by default. Let's see the example below:

In [20]:
blend_all = blend_models()

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,2102.6159,17121400.0,4137.8018,0.812,0.1759,0.1311
1,2322.8391,32255100.0,5679.357,0.7528,0.1879,0.1342
2,2140.1846,19777170.0,4447.1533,0.8186,0.1794,0.1387
3,2159.5781,16103700.0,4012.9412,0.8223,0.1803,0.1352
4,1844.2475,12505210.0,3536.2706,0.844,0.1685,0.1284
5,2168.1827,26663290.0,5163.6506,0.7724,0.1795,0.129
6,2270.0503,23548730.0,4852.703,0.7788,0.1878,0.1355
7,1967.7881,14015600.0,3743.7408,0.8228,0.1782,0.1362
8,2184.4463,19948860.0,4466.4149,0.8041,0.1818,0.1295
9,2049.9526,18544290.0,4306.3085,0.8234,0.1796,0.1358


Now that we have created a voting regressor using the `blend_models()` function, the model stored in the variable `blend_all` is just like any other model that you would create using `create_model()` or `tune_model()`. You can use this model for predictions on unseen data using `predict_model()` in the same way you would do for any other model. Notice that since we didn't pass the list of specific models for voting, it uses all of the models in the model library by default. The next example will show how to pass a specific set of models for blending.

In [21]:
"""
we will create 4 specific models to be passed into blend_models().
Note that verbose is set to False to avoid printing score grid of individual models.
"""

huber = create_model('huber', verbose = False)
dt = create_model('dt', verbose = False)
lightgbm = create_model('lightgbm', verbose = False)
ridge = create_model('ridge', verbose = False)

In [22]:
blend_specific = blend_models(estimator_list = [huber,dt,lightgbm,ridge])

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,724.703,1499644.0,1224.5996,0.9835,0.0796,0.0603
1,823.2155,3699458.0,1923.3974,0.9716,0.0757,0.0567
2,719.0551,1519591.0,1232.7167,0.9861,0.0789,0.0604
3,757.7202,1765456.0,1328.7045,0.9805,0.0764,0.0588
4,685.6732,1644508.0,1282.3838,0.9795,0.0778,0.0567
5,717.38,2732449.0,1653.0121,0.9767,0.0754,0.0558
6,822.2998,2401442.0,1549.6586,0.9774,0.0843,0.0623
7,771.8973,2060155.0,1435.324,0.9739,0.0832,0.0608
8,735.0788,1787834.0,1337.0992,0.9824,0.0767,0.0595
9,857.4162,3488763.0,1867.8229,0.9668,0.091,0.0664


Notice that blending the top 4 scoring models improved the RMSLE to `0.0799` which is the best we have got so far.

### 9.4 Stacking

Stacking is another popular technique for ensembling but is less commonly implemented due to practical difficulties. Stacking is an ensemble learning technique that combines multiple models via meta-model. Another way to think about stacking is that multiple models are trained to predict the outcome and a meta-model is created that uses the predictions from those models as an input along with the original features. The implementation of `stack_models()` is based on Wolpert, D. H. (1992b). Stacked generalization __[(Read More)](https://www.sciencedirect.com/science/article/abs/pii/S0893608005800231)__. 

Let's see an example below using the models we have created in section 10.3 above.

In [23]:
stack_1 = stack_models([huber,dt,lightgbm,ridge])

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,678.7452,1484744.0,1218.5006,0.9837,0.0747,0.0559
1,771.3192,3733328.0,1932.1821,0.9714,0.0709,0.0539
2,778.6981,2590366.0,1609.4612,0.9762,0.0771,0.0582
3,696.6871,1551801.0,1245.7132,0.9829,0.0697,0.0546
4,646.0917,1361240.0,1166.7218,0.983,0.0698,0.053
5,651.2433,1389252.0,1178.6652,0.9881,0.0685,0.0515
6,807.1091,2815671.0,1677.9961,0.9735,0.08,0.0591
7,743.5922,2031598.0,1425.3412,0.9743,0.0758,0.0564
8,710.6207,1568959.0,1252.5809,0.9846,0.073,0.0568
9,700.6332,1575735.0,1255.283,0.985,0.0805,0.0589


Stacking the same 4 models has improved the RMSLE to `0.0740` from the `0.0799` achieved using `blend_models()` By default the meta model (final model to generate predictions) is Linear Regression which can be changed using the `meta_model` parameter. See the example below in which we have used `ridge` as the meta model:

In [24]:
stack_1 = stack_models([huber,dt,lightgbm], meta_model = ridge)

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,687.2751,1465322.0,1210.5049,0.9839,0.078,0.0578
1,801.778,3344191.0,1828.7129,0.9744,0.0737,0.0556
2,692.4283,1412630.0,1188.5412,0.987,0.0762,0.0581
3,704.8674,1542448.0,1241.9535,0.983,0.0707,0.0548
4,622.1424,1245232.0,1115.8997,0.9845,0.0715,0.0528
5,694.1466,1762068.0,1327.4289,0.985,0.0716,0.054
6,818.2428,2730477.0,1652.4156,0.9743,0.0823,0.0609
7,754.2589,2204336.0,1484.7007,0.9721,0.0786,0.0583
8,697.487,1690329.0,1300.1264,0.9834,0.075,0.0573
9,790.7905,2420769.0,1555.8821,0.9769,0.0876,0.0637


Before we wrap up this section, there is another parameter in `stack_models()` that we haven't seen yet called `restack`. This parameter controls the ability to expose the raw data to the meta model. When set to `True`, it exposes the raw data to the meta model along with all the predictions of the base level models. By default it is set to `True`. See the example below with the `restack` parameter changed to `False`.

In [25]:
stack_2 = stack_models([huber,dt,lightgbm,ridge], restack = False)

Unnamed: 0,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,675.121,1442102.0,1200.8753,0.9842,0.0769,0.0569
1,790.5722,3357276.0,1832.2872,0.9743,0.0731,0.0553
2,695.7298,1538085.0,1240.1955,0.9859,0.0759,0.0576
3,692.9396,1464492.0,1210.1622,0.9838,0.0699,0.0541
4,606.9323,1191251.0,1091.4444,0.9851,0.0704,0.0519
5,743.4778,4222940.0,2054.9793,0.964,0.0738,0.0537
6,826.4114,2848602.0,1687.7801,0.9732,0.0821,0.0606
7,745.8111,2221266.0,1490.3913,0.9719,0.078,0.0574
8,699.4137,1737362.0,1318.0902,0.9829,0.0752,0.0571
9,774.1966,2523665.0,1588.6047,0.976,0.0877,0.0624


# 10.0 Predict on Test / Hold-Out Sample

In section 9.4 above we mentioned that stacking is a less commonly implemented technique of ensembling due to practical difficulties. To understand this more, let's imagine a scenario where the model deployed in production is a stacking ensembler of 4 models plus a meta model (similar to `stack_1` created in section 9.4 above). To generate a prediction on an unseen dataset, every data point has to be predicted by all 4 models used in the stacking ensembler. All of these predictions are then passed through to the meta-model to generate a final prediction. As the size of your stacking ensembler increases, it becomes code intensive and hard to maintain for use in production.

In  __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we saw how to use a trained model to generate predictions on a test / hold-out or unseen dataset. In this example we will see that it is no different to generate predictions using a stacking ensembler in PyCaret. For the purposes of illustration, we will use the `stack_1` model created in section 10.4 above for remaining part of this tutorial.

In [26]:
predict_model(stack_1);

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Stacking Regressor,730.0421,1997308.0,1413.2616,0.9818,0.0738,0.056


The RMSLE on the hold-out sample is **`0.0738`** compared to the CV results of **`0.0740`** in section 9.4 above. We will finish the remaining part of this experiment using the stacking ensembler stored in the `stack_1` variable.

# 11.0 Finalize Model for Deployment

In __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we learned about the purpose of `finalize_model()` and how to use it. In this tutorial we will finalize the stacking ensembler which is no different than finalizing a single model.

In [27]:
final_stack_1 = finalize_model(stack_1)

# 12.0 Predict on Unseen Data

We will now use `final_stack_1` to generate predictions on `data_unseen` which is the variable created at the beginning of the tutorial and contains 10% (600 samples) of the original dataset which was never exposed to PyCaret. (see section 5 for explanations)

In [28]:
unseen_predictions = predict_model(final_stack_1, data=data_unseen, round=0)
unseen_predictions.head()

Unnamed: 0,Carat Weight,Cut,Color,Clarity,Polish,Symmetry,Report,Price,Label
0,1.53,Ideal,E,SI1,ID,ID,AGSL,12791,12823.0
1,1.5,Fair,F,SI1,VG,VG,GIA,10450,9859.0
2,1.01,Good,E,SI1,G,G,GIA,5161,5146.0
3,2.51,Very Good,G,VS2,VG,VG,GIA,34361,32988.0
4,1.01,Good,I,SI1,VG,VG,GIA,4238,4140.0


The `Label` column is added onto `data_unseen`. Label is the predicted value using the `final_stack_1` model. We have also used the `round` parameter inside `predict_model()` to round the predictions.

# 13.0 Save the Experiment

In __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__ we learned how to save and load the model. In this experiment we will learn how to save the entire experiment including all of the outputs and models that we have built. Saving the experiment is as simple as saving the model.

In [29]:
save_experiment('Experiment_123 08Feb2020')

Experiment Succesfully Saved


# 14.0 Loading the Saved Experiment

To load a saved experiment at a future date in the same or an alternative environment, we would use the `load_experiment()` function.

In [30]:
saved_experiment = load_experiment('Experiment_123 08Feb2020')

Unnamed: 0,Object
0,Regression Setup Config
1,X_training Set
2,y_training Set
3,X_test Set
4,y_test Set
5,Transformation Pipeline
6,Target Inverse Transformer
7,Compare Models Score Grid
8,Decision Tree Regressor
9,Decision Tree Regressor Score Grid


Notice that when `load_experiment()` was used, it has loaded the entire experiment and all of the intermediate outputs in the variable `saved_experiment`. You can access specific items in a similar way as you would access list elements in Python. See the example below where we access our final stacking ensembler and store it in the `final_stack_1_loaded` variable.

In [31]:
final_stack_1_loaded = saved_experiment[46]

In [32]:
new_prediction = predict_model(final_stack_1_loaded, data=data_unseen, round = 0)
new_prediction.head()

Unnamed: 0,Carat Weight,Cut,Color,Clarity,Polish,Symmetry,Report,Price,Label
0,1.53,Ideal,E,SI1,ID,ID,AGSL,12791,12823.0
1,1.5,Fair,F,SI1,VG,VG,GIA,10450,9859.0
2,1.01,Good,E,SI1,G,G,GIA,5161,5146.0
3,2.51,Very Good,G,VS2,VG,VG,GIA,34361,32988.0
4,1.01,Good,I,SI1,VG,VG,GIA,4238,4140.0


Notice that the results of `unseen_predictions` and `new_prediction` are identical.

# 15.0 Wrap-Up / Next Steps?

We have covered a lot of new concepts in this tutorial. Most importantly we have seen how to use exploratory data analysis to customize a pipeline in `setup()` which has improved the results considerably when compared to what we saw earlier in __[Regression Tutorial (REG101) - Level Beginner](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Beginner%20-%20REG101.ipynb)__. We have also learned how to perform and tune ensembling in PyCaret.

There are however a few more advanced things to cover in `pycaret.regression` which include interpreting more complex tree based models using shapley values, advanced ensembling techniques such as multiple layer stacknet and more pre-processing pipeline methods. We will cover all of this in our next and final tutorial in the `pycaret.regression` series. 

See you in the next tutorial. Follow the link to __[Regression Tutorial (REG103) - Level Expert](https://github.com/pycaret/pycaret/blob/master/Tutorials/Regression%20Tutorial%20Level%20Expert%20-%20REG103.ipynb)__