
<center>
<img src="https://media.geeksforgeeks.org/wp-content/uploads/20200524202456/pycaret-300x184.PNG" width='400'>
    </center>


<center> PyCaret is an open source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of notebook environment.</center>

<br>
<br>





* Step #0. Installing PyCaret + Import Library
* Step #1. Getting the Data
* Step #2. Setting up Environment in PyCaret
* Step #3. Comparing All Models
* Step #4. Create a Model
  * Step #4.1. AdaBoost Regressor
  * Step #4.2. Light Gradient Boosting Machine
  * Step #4.3. Decision Tree
* Step #5. Tune a Model
  * Step #5.1. AdaBoost Regressor
  * Step #5.2. Light Gradient Boosting Machine
  * Step #5.3. Decision Tree
* Step #6. Plot a Model
  * Step #6.1. Residual Plot
  * Step #6.2. Prediction Error Plot
  * Step #6.3. Feature Importance Plot
* Step #7. Predict on test / hold-out Sample
* Step #8. Finalize Model for Deployment
* Step #9. Predict on unseen data
* Step #10. Saving the model
* Step #11. Loading the saved model
* Step #12. References

<h3>Step #0. Installing PyCaret + Import Library</h3>

In [None]:
!pip install pycaret
import numpy as np 
import pandas as pd 
import datetime as dt


<h3>Step #1. Getting the Data</h3>

In [None]:
car_prices = pd.read_csv('../input/used-car-auction-prices/car_prices.csv', nrows=10000) #nrows=400000
#car_prices['Date']= pd.to_datetime(car_prices['saledate']).apply(lambda x: x.date())
#car_prices["Date"] = pd.to_datetime(car_prices["Date"], format = '%Y-%m-%d')
car_prices.head()

In [None]:
#check the shape of data
car_prices.shape

In [None]:
data = car_prices.sample(frac=0.9, random_state=786).reset_index(drop=True)
data_unseen = car_prices.drop(data.index).reset_index(drop=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))

<h3>Step #2. Setting up Environment in PyCaret</h3>

The setup() function initializes the environment in pycaret and creates the transformation pipeline to prepare the data for modeling and deployment. setup() must be called before executing any other function in pycaret. It takes two mandatory parameters: a pandas dataframe and the name of the target column.

In [None]:
from pycaret.regression import *


In [None]:
exp_reg101 = setup(data = data, 
             target = 'sellingprice',
             numeric_imputation = 'mean',
             categorical_features = ['make','model','trim','body','transmission','vin','state','color','interior','seller','saledate'], 
             silent = True,
             session_id=123)

<h3>Step #3. Comparing All Models</h3>

Comparing all models to evaluate performance is the recommended starting point for modeling once the setup is completed. This function trains all models in the model library and scores them using kfold cross validation for metric evaluation. The output prints a score grid that shows average MAE, MSE, RMSE, R2, RMSLE and MAPE accross the folds (10 by default) of all the available models in the model library.

In [None]:
compare_models()

<h3>Step #4. Create a Model</h3>

While compare_models() is a powerful function and often a starting point in any experiment, it does not return any trained models. PyCaret's recommended experiment workflow is to use compare_models() right after setup to evaluate top performing models and finalize a few candidates for continued experimentation. As such, the function that actually allows to you create a model is unimaginatively called create_model(). This function creates a model and scores it using stratified cross validation. Similar to compare_models(), the output prints a score grid that shows MAE, MSE, RMSE, R2, RMSLE and MAPE by fold.

<h4>Step #4.1. AdaBoost Regressor</h4>

In [None]:
ada = create_model('ada')


In [None]:
#trained model object is stored in the variable 'dt'. 
print(ada)

<h4>Step #4.2. Light Gradient Boosting Machine</h4>

In [None]:
lightgbm = create_model('lightgbm')


<h4>Step #4.3. Decision Tree</h4>

In [None]:
dt = create_model('dt')


<h3>Step #5. Tune a Model</h3>

When a model is created using the create_model() function it uses the default hyperparameters. In order to tune hyperparameters, the tune_model() function is used. This function automatically tunes the hyperparameters of a model on a pre-defined search space and scores it using kfold cross validation. The output prints a score grid that shows MAE, MSE, RMSE, R2, RMSLE and MAPE by fold.

<h4>Step #5.1. AdaBoost Regressor</h4>

In [None]:
tuned_ada = tune_model(ada)


In [None]:
#tuned model object is stored in the variable 'tuned_dt'. 
print(tuned_ada)

<h4>Step #5.2. Light Gradient Boosting Machine</h4>

In [None]:
tuned_lightgbm = tune_model(lightgbm)


<h4>Step #5.3. Decision Tree</h4>

In [None]:
tuned_dt = tune_model(dt)


<h3>Step #6.  Plot a Model</h3>

Before model finalization, the plot_model() function can be used to analyze the performance across different aspects such as Residuals Plot, Prediction Error, feature importance etc. This function takes a trained model object and returns a plot based on the test / hold-out set.

<h4>Step #6.1. Residual Plot</h4>

In [None]:
plot_model(tuned_lightgbm)


<h4>Step #6.2. Prediction Error Plot</h4>

In [None]:
plot_model(tuned_lightgbm, plot = 'error')


<h4>Step #6.3. Feature Importance Plot</h4>

In [None]:
plot_model(tuned_lightgbm, plot='feature')


In [None]:
evaluate_model(tuned_lightgbm)


<h3>Step #7. Predict on test / hold-out Sample</h3>

Before finalizing the model, it is advisable to perform one final check by predicting the test/hold-out set and reviewing the evaluation metrics.

In [None]:
predict_model(tuned_lightgbm);


<h3>Step #8. Finalize Model for Deployment</h3>

Model finalization is the last step in the experiment. A normal machine learning workflow in PyCaret starts with setup(), followed by comparing all models using compare_models() and shortlisting a few candidate models (based on the metric of interest) to perform several modeling techniques such as hyperparameter tuning, ensembling, stacking etc.

In [None]:
final_lightgbm = finalize_model(tuned_lightgbm)


In [None]:
#Final Light Gradient Boosting Machine parameters for deployment
print(final_lightgbm)

In [None]:
predict_model(final_lightgbm);

<h3>Step #9. Predict on unseen data</h3>

The predict_model() function is also used to predict on the unseen dataset.

In [None]:
unseen_predictions = predict_model(final_lightgbm, data=data_unseen)
unseen_predictions.to_csv('predicted.csv',index = False)
unseen_predictions.head()

The Label column is added onto the data_unseen set. Label is the predicted value using the final_lightgbm model. If you want predictions to be rounded, you can use round parameter inside predict_model().

In [None]:
from matplotlib import pyplot as plt
import seaborn as sns

plt.rcParams["figure.figsize"] = [18, 6]

unseen_predictions.sellingprice.plot(linewidth = 3, label = 'Actual', color = 'red')
unseen_predictions.Label.plot(linewidth = 2, label = 'Predicted', color = 'blue', linestyle = '--')
plt.legend(fontsize = 'large')

<h3>Step #10. Saving the model</h3>

In [None]:
save_model(final_lightgbm,'Final Lightgbm Model 30June2021')


<h3>Step #11. Loading the saved model</h3>

In [None]:
saved_final_lightgbm = load_model('Final Lightgbm Model 30June2021')


In [None]:
new_prediction = predict_model(saved_final_lightgbm, data=data_unseen)


In [None]:
new_prediction.head()


<h3>Step #12. References</h3>

This section provides more resources on the topic if you are looking to go deeper.

* https://pycaret.org/
* https://www.pycaret.org/tutorials/html/REG101.html