# PyCaret

https://pycaret.org/

There is low code for everything nowadays, so why not for AI. It can help you with


* Exploratory Data Analysis
* Data Preprocessing
* Model Training
* Model Explainability
* MLOps

So let's give it a test run.

Make sure you setup your virtual environment before running this code. Quick reminder:

```Shell
python -m venv venv
./venv/Scripts/activate
```

**Note**

I had some issues installing PyCaret in a Python 3.11 virtual environment. It was fixed by installing Python 3.10 (making sure not to overwrite the default 3.11-installation) and building a virtual environment from that version of Python. You can have two virtual environments in the same folder by making sure they have a different name. Something like this:

```Shell
&'C:\Python 3.10\python.exe' -m venv venv_caret
.\venv_caret\Scripts\activate
```

In [None]:
!pip install pandas numpy
!pip install pycaret

A [quickstart](https://pycaret.gitbook.io/docs/get-started/quickstart) sounds nice. The following is a copy of the code, for the explanations you'll need to visit the website.

In [None]:
# load sample dataset
from pycaret.datasets import get_data
data = get_data('diabetes')

In [None]:
from pycaret.classification import *
s = setup(data, target = 'Class variable', session_id = 123)

In [None]:
# from pycaret.classification import ClassificationExperiment
# s = ClassificationExperiment()
# s.setup(data, target = 'Class variable', session_id = 123)

# --> we'll be working with the functional API, not the OOP API

In [None]:
# functional API
best = compare_models()

In [None]:
print(best)

Do you see all the different metrics that are measured? We'll get deeper into them in some of the later chapters.

In [None]:
# functional API
evaluate_model(best)

In [None]:
# functional API
predict_model(best)

In [None]:
# functional API
predictions = predict_model(best, data=data)
predictions.head()


In [None]:
# functional API
save_model(best, 'my_best_pipeline')

In [None]:
# functional API
loaded_model = load_model('my_best_pipeline')
print(loaded_model)

# The actual exercise

There was not much exercise in the part before this. We also didn't complete the Quickstart, but more copy pasting would not have helped us any further.

What would help us, and the world, much more is to solve heart failure. Or just help predicting it. We'll be using a [kaggle](https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset) dataset.

## Step 1: import dependencies

You need pandas and pycaret. Import them.

In [None]:
# DELETE

import pandas as pd
from pycaret.classification import *

## Step 2: Download and import data

Download the data from above and import as a pandas dataframe. It's also stored in the files-folder.

In [None]:
# DELETE

df = pd.read_csv('files/heart.csv')
df.head()

Look at the types. Make a list with all the column names that contain categorical features.

In [None]:
df.dtypes

In [None]:
# DELETE

cat_features = ['sex', 'cp', 'fbs', 'restecg', 'exang', 'thal']

## Step 3: Train and evaluate model

Setup and experiment first. Make sure to pass the list of catergorical features.

In [None]:
# DELETE

experiment = setup(df, target='target', categorical_features=cat_features)


Now the experiment is setup we can use it to compare the different models. Save the result in a variable!

In [None]:
# DELETE

best_model = compare_models()

## Step 4: Test model

Now that you have tested a lot of models, test the best model. Use only the bottom five lines of the data to test on.

In [None]:
# DELETE

# predict_model(best_model, data=df.drop('target', axis=1).tail()) -> without the target column, so how you would normally use a model
predict_model(best_model, data=df.tail())

## Step 5: Save the model

In a pickle-file.

In [None]:
# DELETE

save_model(best_model, 'files/heart_model')

And you may feel bad for your teacher having to look all this up, but [don't](https://youtu.be/sL-4rWuEiVw?si=wr5YAFCrg1LlSkcP).