30 Nov 2023

# 👋 PyCaret Binary Classification Tutorial

PyCaret is an open-source library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.

It makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks, such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more.

### Index:

The index of this notebook is based on:
- Installation.
- Decision Tree Module.
    - Setup.
    - Create Model.
    - Assign Labels.
    - Analyze Model.
    - Save Model.
      
## Installation


In [2]:
!pip install pycaret
!pip install pycaret[full]
!pip install pycaret[analysis]
!pip install pycaret[models]
!pip install pycaret[tuner]
!pip install pycaret[mlops]
!pip install pycaret[parallel]
!pip install pycaret[test]

# check installed version
import pycaret
pycaret.__version__

import pandas as pd

Collecting pycaret
  Downloading pycaret-3.2.0-py3-none-any.whl (484 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m484.7/484.7 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting category-encoders>=2.4.0 (from pycaret)
  Downloading category_encoders-2.6.3-py2.py3-none-any.whl (81 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.9/81.9 kB[0m [31m14.4 MB/s[0m eta [36m0:00:00[0m
Collecting deprecation>=2.1.0 (from pycaret)
  Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)
Collecting kaleido>=0.2.1 (from pycaret)
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.9/79.9 MB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
Collecting matplotlib<=3.6,>=3.3.0 (from pycaret)
  Downloading matplotlib-3.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.8/

Collecting Flask==2.2.3 (from pycaret[full])
  Downloading Flask-2.2.3-py3-none-any.whl (101 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m101.8/101.8 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting Werkzeug<3.0,>=2.2 (from pycaret[full])
  Downloading werkzeug-2.3.8-py3-none-any.whl (242 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.3/242.3 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting boto3>=1.24.56 (from pycaret[full])
  Downloading boto3-1.33.6-py3-none-any.whl (139 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.3/139.3 kB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
Collecting evidently<0.3,>=0.1.45.dev0 (from pycaret[full])
  Downloading evidently-0.2.8-py3-none-any.whl (12.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.1/12.1 MB[0m [31m92.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting explainerdashboard>=0.3.8 (from pycaret[full])
  Downloading e



## Classification Module

PyCaret’s Classification Module is a supervised machine learning module that is used for classifying elements into groups.

It provides several pre-processing features that prepare the data for modeling through the setup function. It has over 18 ready-to-use algorithms and several plots to analyze the performance of trained models.

In [4]:
# Loading dataset
dataset = pd.read_csv('casos_lectors_dummies.csv')
dataset.head()

data = dataset.drop(['llibre_recomanat', 'score', 'id_usuari'], axis=1)
data.head()

dataset = pd.read_csv('dataset.csv')

FileNotFoundError: ignored

## Setup
This function initializes the training environment and creates the transformation pipeline. Setup function must be called before executing any other function in PyCaret. It only has two required parameters i.e. `data` and `target`. All the other parameters are optional.

In [None]:
# import pycaret classification and init setup
from pycaret.classification import *

s = setup(dataset, target='Cluster', session_id=123, preprocess=False)

Once the setup has been successfully executed it shows the information grid containing experiment level information.

- **Session id:**  A pseudo-random number distributed as a seed in all functions for later reproducibility. If no `session_id` is passed, a random number is automatically generated that is distributed to all functions.<br/>
<br/>
- **Target type:**  Binary, Multiclass, or Regression. The Target type is automatically detected. <br/>
<br/>
- **Label Encoding:**  When the Target variable is of type string (i.e. 'Yes' or 'No') instead of 1 or 0, it automatically encodes the label into 1 and 0 and displays the mapping (0 : No, 1 : Yes) for reference. In this tutorial, no label encoding is required since the target variable is of numeric type. <br/>
<br/>
- **Original data shape:**  Shape of the original data prior to any transformations. <br/>
<br/>
- **Transformed train set shape :**  Shape of transformed train set <br/>
<br/>
- **Transformed test set shape :**  Shape of transformed test set <br/>
<br/>
- **Numeric features :**  The number of features considered as numerical. <br/>
<br/>
- **Categorical features :**  The number of features considered as categorical. <br/>

In [None]:
# import ClassificationExperiment and init the class
from pycaret.classification import ClassificationExperiment
exp = ClassificationExperiment()

# check the type of exp
type(exp)

# init setup on exp
exp.setup(dataset, target='Cluster', session_id=123, preprocess=False)

## Compare Models

This function trains and evaluates the performance of all the estimators available in the model library using cross-validation. The output of this function is a scoring grid with average cross-validated scores. Metrics evaluated during CV can be accessed using the `get_metrics` function. Custom metrics can be added or removed using `add_metric` and `remove_metric` function.

In [None]:
# compare baseline models
best = compare_models(include = ['dt', 'catboost'])
best

In [None]:
# compare models using OOP
exp.compare_models(include = ['dt', 'catboost'])

Notice that the output between functional and OOP API is consistent. Rest of the functions in this notebook will only be shown using functional API only.

## Analyze Model

In [None]:
# check docstring to see available plots
# help(plot_model)

In [None]:
evaluate_model(best)

## Save Model

To save the entire pipeline on disk for later use, using pycaret's `save_model` function.

In [None]:
# save pipeline
save_model(best, 'DecisionTree_Pipeline')

import pickle

# best es tu modelo entrenado
# 'modelo_guardado.pkl' es el nombre del archivo donde se guardará el modelo
fitxer = '/content/DecisionTree_Pipeline'

with open(fitxer, 'wb') as archivo:
    pickle.dump(best, archivo)

In [None]:
# load pipeline
loaded_best_pipeline = load_model('DecisionTree_Pipeline')
loaded_best_pipeline

In [None]:
from pycaret.classification import plot_model
plot_model(loaded_best_pipeline, plot = 'tree')  # Para visualizar el árbol
plot_model(loaded_best_pipeline, plot = 'feature')  # Para la importancia de las características

---
---