# Pycaret Tutorial (Regression use-case)

The Flash team is excited to share with you a small tutorial on how to use Pycaret.

Before jumping into this tutorial, we recommend giving a look to this [README](README.md) in order to get more familiar with Pycaret and its pros/cons !
Now that's being said, let's dig into a small example where we will predict a house's price based on couple of its features such as the neighborhood, number of bedrooms, house's age, etc.

We'll do the following:
1. Pre-process our data (Bonus demo on `YData Profilling` library)
2. Compare the performance of couple ML algorithms
3. Run some Hyper-parameter tuning
4. Evaluate and visualize our final model's performance



## Import dependencies


In [None]:
from pycaret.datasets import get_data
from pycaret.regression import *
from ydata_profiling import ProfileReport


## Load dataset

For this tutorial, we'll use one of sample dataset available in Pycaret.

In [None]:
housing_data = get_data('house')

In [None]:
profile = ProfileReport(housing_data, title="Profiling Report", explorative=True)
profile.to_file("housing_data_report.html")

Out of the 81 available columns, we will keep couple relevant columns:

In [None]:
COLS_TO_KEEP = [ 
    'MSZoning', 'Neighborhood', 'LotFrontage', 'GrLivArea',
    'GarageType', 'GarageCars', 'GarageArea',
    'BedroomAbvGr', 'TotRmsAbvGrd', 'KitchenAbvGr', 'FullBath',
    'HeatingQC', 'CentralAir', 'Electrical', 'Fireplaces',
    'YearBuilt', 'OverallQual', 'OverallCond', 'YrSold','SalePrice']

In [None]:
data = housing_data[COLS_TO_KEEP]

## Model comparison

In [None]:
regression_setup = setup(
    data = data, 
    target = 'SalePrice',
    train_size = 0.8,
    fold_strategy = 'kfold',
    fold = 5,
    categorical_features=['MSZoning', 'Neighborhood', 'GarageType', 'HeatingQC', 'CentralAir', 'Electrical'],
    max_encoding_ohe = 10,
    numeric_features=['LotFrontage', 'GrLivArea', 'GarageCars', 'BedroomAbvGr', 'TotRmsAbvGrd', 'KitchenAbvGr', 'FullBath', 'Fireplaces', 'YearBuilt', 'YrSold', 'OverallQual', 'OverallCond', 'GarageArea'],

    session_id = 123)

In [None]:
best_model = compare_models()

In [None]:
evaluate_model(best_model)

## Hyper-paramters tuning

Out of the three best performing models we will fine-tune the hyper-parameters and choose the best performing model:

In [None]:
GBR = create_model('gbr', verbose=False)
LGBM = create_model('lightgbm', verbose=False)
ET = create_model('et', verbose=False)

In [None]:
tuned_GBR = tune_model(GBR, optimize='RMSE')
tuned_LGBM = tune_model(LGBM, optimize='RMSE')
tuned_ET = tune_model(ET, optimize='RMSE')

## Analyze model

In [None]:
plot_model(tuned_ET, plot = 'residuals')


In [None]:
plot_model(tuned_ET, plot = 'error')



In [None]:
plot_model(tuned_ET, plot = 'feature')


In [None]:
evaluate_model(tuned_ET)

## Prediction
The `predict_model` function returns `prediction_label` as new column to the input dataframe. When `data` is None (default), it uses the test set (created during the setup function) for scoring.

In [None]:
holdout_pred = predict_model(tuned_ET)
holdout_pred.head()

In [None]:
save_model(tuned_ET, 'pycaret_regression_tutorial')