![](https://cdn.wealthygorilla.com/wp-content/uploads/2019/12/The-Best-Selling-Books-of-All-Time.jpg)

# What this analysis is all about?

**This quick tutorial showcases the ability to leverage PyCaret and Pandas to perform a high level EDA before stepping on the accelerator to start with the ML routine to predict price. This quick walkthrough also showcases the capabilities of Pandas Profiling package to understand interesting relationships inbetween the response variable and the predictor variables.**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Importing main packages

In [None]:
import pandas as pd
import numpy as np
import seaborn as sb

# Ingesting the dataset for analysis

In [None]:
d1 = pd.read_csv('/kaggle/input/amazon-top-50-bestselling-books-2009-2019/bestsellers with categories.csv')

In [None]:
d1.head()

# Installing PyCaret and associated options

Installing the PyCaret package is pretty straightforward there are two options:
* pip install pycaret (installing on the packages that are required to run basic analysis and ml activities)
* pip install pycaret[full] that is used to install all the dependencies irrespective of the fact whether they are required or not

I usually go with the Full Install option

In [None]:
pip install pycaret[full]

Once the PyCaret package is installed you just need to setup the environment post calling the ML model of your choice. 

You may ask a question as to why I picked regression? The data could have been used to classify the Genre as well. This thought is perfectly fine, however,for the demonstration purposes I picked up the regression option as a problem statement. As at the end of the day I will not be initiating an ML routine as my interest lies in quick EDA. 

Hence, I used the regression as a model.

Post calling the model using the from pycaret option you just need to setup the basic environment for further analysis by defining:
* data as the name of the dataset that you created while utilizing the pandas package
* define the target variable, given the fact that my interest lies in the price, hence, picked price as target variable and this will also help me to later understand its relationship with other variables
* Profile, this option is taken from the pandas package and marking it as True helps you to activate/initiate the data profiling routine
* Defining the session id is purely optional

# **Setting-up the environment and activating the Pandas Profiler**

In [None]:
from pycaret.regression import *
exp_reg101 = setup(data = d1, target = 'Price', profile = True,session_id=123)

# Conclusion

That's about it this is how you will use the PyCaret and Pandas Profiling to perform quick EDA. Also, during the setup process if you want to explicitly define the data types you can also do that. However, that functionality is out of the scope of this tutorial.

If you liked this quick analysis don't forget to share a like it.

# **Let's take this to the next level!**

We will perform a linear regression ML routine where we will try to predict the price of the book given the available variables. We will also see if we can fine tune the model, probably blending ensembling etc. We will perform all these activities in the most easy to understsand way by utilizing the PyCaret python library. 
So let's get started!

During the EDA process we have already defined the environment so the next step of the process is to build the model. PyCaret gives its users the ability to create multiple models with a single line of code and can further sort them by a specific metric of your choice,by default it will sort the models based on the Rsquare metric.

In [None]:
my_mod = compare_models(sort = "RMSLE")

based on my metric of choice, PyCaret has highlighted two best performing models for me Extra Trees Regressor and Bayesian Ridge model. The nest step is to create these two models and see if we can further improve their performance. 

# **Creating & Tuning Models**

In [None]:
et1 = create_model('et')

In [None]:
tuned_et1 = tune_model(et1,optimize = 'RMSLE',fold = 5)

In [None]:
br1 = create_model('br')

In [None]:
tuned_br1 = tune_model(br1,optimize = 'RMSLE',fold = 5)

Bayesian Ridge looks better compared to the Extra Trees Regressor model, Hence, I will move ahead with the BR model. One last step before model finalization, the plot_model() function can be used to analyze the performance across different aspects such as Residuals Plot, Prediction Error, Feature Importance etc.

In [None]:
plot_model(tuned_br1)

In [None]:
plot_model(tuned_br1, plot = 'error')

In [None]:

plot_model(tuned_br1, plot='feature')

I can also perform evaluate model analysis by just typing in evaluate_model command, this is quite handy when you want to see all the information in one go. 

In [None]:
evaluate_model(tuned_br1)

Now let's see how this model predicts on the hold out data set

In [None]:

predict_model(tuned_br1);

Better than a coin toss, for demonstration purposes we will go with this one. However, in reallife you may think of ensembling, blending options to come up with a better performing model. 

# **Finalizing the Model**

In [None]:
final_bayesianridge = finalize_model(tuned_br1)

In [None]:
print(final_bayesianridge)

In [None]:
predict_model(final_bayesianridge)

# **Saving the Finalized Model**

In [None]:
save_model(final_bayesianridge,'Final BR1 Model 30Jun2021')

In [None]:
saved_final_bayesianridge = load_model('Final BR1 Model 30Jun2021')

In [None]:
prediction_new = predict_model(saved_final_bayesianridge, data=d1
                              )

In [None]:
prediction_new.head()

# **Final Conclusion**

This tutorial has covered the entire machine learning pipeline from data ingestion, pre-processing, training the model, hyperparameter tuning, prediction and saving the model for later use. We have completed all of these steps in less than 10 commands which are naturally constructed and very intuitive to remember..

Final word of caution this is just the beginner level introduction to the Ml workflow we can use more intricate and sophitciated ML flow utilizing various other options available within this awesome tool. Till we meet next time, happy data munging!