# **Multiclass Classification Tutorial (Random Forest)**
![](https://skappal7.files.wordpress.com/2021/07/cropped-cropped-cropped-9datadojo-1.png)

This case study is around the need of developing a **Go To Market Strategy** based on various data variables that reflects the potential performance of an organization. The expected outcome of this analysis is to be able to predict the current maturity/performance level of an organization and post that prediction, the organization; with the help of its stakeholders can design a performance improvement plan. 

For this piece of analysis we will be using PyCaret a low code ML and data analytics library. The first step to get started with PyCaret is to install pycaret. Installation is easy and will only take a few minutes. Follow the instructions below:

In [None]:
pip install pycaret

As we are running this code within the google colab environment we need to enable the colab mode so that we can view all the charts and diagnostics during the entire analysis process.

In [None]:
from pycaret.utils import enable_colab 
enable_colab()

Colab mode enabled.


In [None]:
import pandas as pd
import numpy as np

In [None]:
data = pd.read_csv('/content/Analytics Maturity Data.csv')

In [None]:
data.head()

Unnamed: 0,AHT,NTT,Sentiment,Complaints,Repeats,Level,Mat_L
0,332.9,109.3,1.2,0.01,0.02,Mid_Maturity,3
1,350.9,138.4,1.2,0.04,0.02,Mid_Maturity,3
2,358.6,125.0,1.3,0.03,0.02,Mid_Maturity,3
3,409.1,174.1,1.1,0.05,0.02,Mid_Maturity,3
4,389.1,156.3,1.1,0.04,0.03,Mid_Maturity,3


In [None]:
from pycaret.classification import *

In [None]:
s = setup(data, target = 'Level', fix_imbalance = True, ignore_features = ['Mat_L'], profile=True, log_experiment = True, experiment_name = 'GTM', session_id = 112)

Summarize dataset:   0%|          | 0/20 [00:00<?, ?it/s]

Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]



In [None]:
best = compare_models(sort = 'AUC', n_select = 5)
compare_model_results = pull()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
lr,Logistic Regression,0.9853,0.9999,0.9599,0.9884,0.9853,0.9746,0.9754,0.464
et,Extra Trees Classifier,0.9823,0.9978,0.9211,0.9771,0.9789,0.9695,0.9705,0.471
lightgbm,Light Gradient Boosting Machine,0.9793,0.9976,0.9198,0.9766,0.9768,0.9649,0.9663,0.179
rf,Random Forest Classifier,0.9734,0.9975,0.9161,0.9755,0.9733,0.9548,0.9559,0.525
lda,Linear Discriminant Analysis,0.8792,0.9962,0.8999,0.9696,0.9084,0.8167,0.8391,0.023
qda,Quadratic Discriminant Analysis,0.9853,0.9955,0.9174,0.9816,0.9828,0.9746,0.9752,0.023
gbc,Gradient Boosting Classifier,0.9764,0.9926,0.9174,0.977,0.9758,0.9597,0.9607,0.496
dt,Decision Tree Classifier,0.9734,0.9809,0.8923,0.9714,0.9715,0.9546,0.9557,0.025
knn,K Neighbors Classifier,0.9499,0.9772,0.8981,0.9497,0.9477,0.9133,0.9151,0.125
nb,Naive Bayes,0.8055,0.9705,0.8255,0.9284,0.848,0.7102,0.7385,0.024


In [None]:
len(best)
15
print(best[:5])

[LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=1000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=112, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False), ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                     criterion='gini', max_depth=None, max_features='auto',
                     max_leaf_nodes=None, max_samples=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=-1,
                     oob_score=False, random_state=112, verbose=0,
                     warm_start=False), LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0,
               importance_type='split', learning_rate=0.1, max_d

In [None]:
rf = create_model('rf')

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.9706,1.0,0.9875,0.9853,0.9751,0.9498,0.9515
1,0.9706,1.0,0.9875,0.9853,0.9751,0.9498,0.9515
2,1.0,1.0,1.0,1.0,1.0,1.0,1.0
3,1.0,1.0,1.0,1.0,1.0,1.0,1.0
4,0.9706,0.998,0.9868,0.9733,0.9709,0.9505,0.952
5,1.0,1.0,1.0,1.0,1.0,1.0,1.0
6,0.9706,0.9982,0.75,0.9439,0.9566,0.9492,0.9507
7,0.9118,0.991,0.7118,0.9412,0.9257,0.8515,0.8528
8,0.9706,1.0,0.9868,0.9853,0.9751,0.9511,0.9528
9,0.9697,0.9878,0.75,0.9409,0.9549,0.9461,0.9481


In [None]:
evaluate_model(rf)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…

In [None]:
predict_model(rf);

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Random Forest Classifier,0.9863,0.9997,0.9442,0.9864,0.9863,0.9765,0.9766


In [None]:
final_rf = finalize_model(rf)

In [None]:
print(final_rf)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=-1, oob_score=False, random_state=112, verbose=0,
                       warm_start=False)


In [None]:
unseen = pd.read_csv('/content/Unseen Data.csv')

In [None]:
unseen.head()

Unnamed: 0,AHT,NTT,Sentiment,Complaints,Repeats
0,779,169,8,0.01,0.05
1,103,91,2,0.01,0.01
2,609,205,22,0.01,0.05
3,364,215,-1,0.01,0.02
4,429,93,8,0.01,0.03


In [None]:
unseen_predictions = predict_model(final_rf, data=unseen)
unseen_predictions.head()

Unnamed: 0,AHT,NTT,Sentiment,Complaints,Repeats,Label,Score
0,779,169,8,0.01,0.05,Low_Maturity,0.91
1,103,91,2,0.01,0.01,Nascent_Level,0.78
2,609,205,22,0.01,0.05,Low_Maturity,0.93
3,364,215,-1,0.01,0.02,Mid_Maturity,0.86
4,429,93,8,0.01,0.03,Low_Maturity,0.48


In [None]:
save_model(final_rf,'Final RF Model 23JUL2021')

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True,
                                       features_todrop=['Mat_L'], id_columns=[],
                                       ml_usecase='classification',
                                       numerical_features=[], target='Level',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric...
                  RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                         class_weight=None, criterion='gini',
                                         max_depth=None, max_features='auto',
                                         max_l

In [None]:
saved_final_rf = load_model('Final RF Model 23JUL2021')

Transformation Pipeline and Model Successfully Loaded


In [None]:
new_prediction = predict_model(saved_final_rf, data=unseen)
new_prediction.head()

Unnamed: 0,AHT,NTT,Sentiment,Complaints,Repeats,Label,Score
0,779,169,8,0.01,0.05,Low_Maturity,0.91
1,103,91,2,0.01,0.01,Nascent_Level,0.78
2,609,205,22,0.01,0.05,Low_Maturity,0.93
3,364,215,-1,0.01,0.02,Mid_Maturity,0.86
4,429,93,8,0.01,0.03,Low_Maturity,0.48
