<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# PyCaret - AutoML Classification
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/PyCaret/PyCaret_automl_classification.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/open_in_naas.svg"/></a>

**Tags:** #automl #pandas #snippet #classification #dataframe #visualize #pycaret #operations

**Author:** [Minura Punchihewa](https://www.linkedin.com/in/minurapunchihewa/)

## Input

### Install PyCaret

In [1]:
! pip install pycaret





You should consider upgrading via the '/home/minura/anaconda3/envs/awesome-notebooks/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

### Import libraries

In [2]:
import pandas as pd
from pycaret.classification import setup, compare_models, evaluate_model, predict_model, finalize_model, \
     save_model, load_model, create_docker

### Variables

In [3]:
csv_path = "https://raw.githubusercontent.com/MinuraPunchihewa/pycaret-automl/main/data/iris.csv"
target_column = 'variety'

## Model

### Read the CSV from path

In [4]:
df = pd.read_csv(csv_path)

### VIew a sample of the data

In [5]:
df.head()

Unnamed: 0,sepal.length,sepal.width,petal.length,petal.width,variety
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


### Setup the dataset

In [6]:
# must be called before executing any other function
# can configure many types of transformation operations
# by default Missing Value Imputation, One-Hot Encoding and Train-Test Split operations will be performed
# press enter to continue
grid = setup(data=df, target=target_column)

Unnamed: 0,Description,Value
0,session_id,7445
1,Target,variety
2,Target Type,Multiclass
3,Label Encoded,"Setosa: 0, Versicolor: 1, Virginica: 2"
4,Original Data,"(150, 5)"
5,Missing Values,False
6,Numeric Features,4
7,Categorical Features,0
8,Ordinal Features,False
9,High Cardinality Features,False


### Train and compare all supported models

In [7]:
# uses cross-validation
best_model = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
knn,K Neighbors Classifier,0.9809,0.9916,0.9806,0.9852,0.9805,0.9712,0.9736,0.01
nb,Naive Bayes,0.9727,0.9976,0.9722,0.9791,0.9723,0.9589,0.9622,0.007
lr,Logistic Regression,0.9718,0.9976,0.9694,0.978,0.971,0.9572,0.9608,0.337
qda,Quadratic Discriminant Analysis,0.9718,1.0,0.9722,0.9784,0.9714,0.9576,0.9611,0.007
lda,Linear Discriminant Analysis,0.9718,1.0,0.9722,0.9784,0.9714,0.9576,0.9611,0.007
et,Extra Trees Classifier,0.9718,1.0,0.9722,0.9784,0.9714,0.9576,0.9611,0.072
rf,Random Forest Classifier,0.9627,0.9988,0.9611,0.9736,0.96,0.9433,0.9498,0.087
ada,Ada Boost Classifier,0.9618,1.0,0.9639,0.9714,0.9615,0.9428,0.9475,0.036
dt,Decision Tree Classifier,0.9536,0.9652,0.95,0.9663,0.9505,0.9294,0.937,0.007
gbc,Gradient Boosting Classifier,0.9536,0.9609,0.9528,0.9667,0.9509,0.9297,0.9373,0.105


### Report the best model

In [8]:
print(best_model)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=-1, n_neighbors=5, p=2,
                     weights='uniform')


### Evaluate the model using a number of different plots

In [9]:
# click on the different plot types to exlpore
# some plots may not work depending on the data and the model
evaluate_model(best_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'paramâ€¦

### Make predictions on new data

In [10]:
# data should be a DataFrame without label
# predict_model(best_model, new_data)

### Finalize model

In [11]:
# trains the model on the entire dataset including the hold-out set
# does not change any parameter of the model
final_model = finalize_model(best_model)

## Output

### Save model as a pickle file

In [12]:
save_model(final_model, 'classification_model')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[],
                                       ml_usecase='classification',
                                       numerical_features=[], target='variety',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric_stra...
                 ('fix_perfect', Remove_100(target='variety')),
                 ('clean_names', Clean_Colum_Names()),
                 ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'),
                 ('dfs', 'passthrough'), ('pca', 'passthrough'),
        

### Load saved model from pickle file

In [13]:
model = load_model('classification_model')

Transformation Pipeline and Model Successfully Loaded


### Create Dockerfile for model

In [14]:
# also creates a requirements.txt file for dependencies
create_docker('classification_model')

Writing requirements.txt
Writing Dockerfile
Dockerfile and requirements.txt successfully created.
To build image you have to run --> !docker image build -f "Dockerfile" -t IMAGE_NAME:IMAGE_TAG .
        
