## Model Analysis
Based on our [observation of the dataset](model_selection.ipynb), we decided on using `KNN`, `Gradient Boosting` and `Random Forest` as our classification algorithms. Now it is time to test each model and measure their performance

In [1]:
# importing the required libraries

import pandas as pd 
import numpy as np
import joblib as jb
from model_generation import X_test, X_train, y_test, y_train
from sklearn.metrics import classification_report

In [2]:
#loading the pre trained models
gb = jb.load('models/GBC')
rfc = jb.load('models/RFC')
knn = jb.load('models/KNN')

### K-Nearest Neighbours

In [3]:
# best accuracy on the test set
knn.best_score_

0.4769230769230769

In [4]:
# best parameters
knn.best_params_

{'algorithm': 'ball_tree', 'n_neighbors': 30, 'weights': 'uniform'}

In [5]:
# generating a classification report

print("Performace of train data on the model \n",classification_report(y_train, knn.predict(X_train)))
print("Performace of test data on the model \n",classification_report(y_test, knn.predict(X_test)))

Performace of train data on the model 
               precision    recall  f1-score   support

           0       0.48      0.43      0.45      1342
           1       0.41      0.29      0.34      1230
           2       0.49      0.63      0.55      1347
           3       0.63      0.68      0.65      1541

    accuracy                           0.52      5460
   macro avg       0.50      0.51      0.50      5460
weighted avg       0.51      0.52      0.51      5460

Performace of test data on the model 
               precision    recall  f1-score   support

           0       0.39      0.35      0.37       556
           1       0.30      0.19      0.23       576
           2       0.45      0.60      0.51       572
           3       0.61      0.68      0.64       636

    accuracy                           0.46      2340
   macro avg       0.44      0.46      0.44      2340
weighted avg       0.44      0.46      0.45      2340



### Random Forest

In [6]:
# best accuracy on the test set
rfc.best_score_

0.5293040293040293

In [7]:
# best parameters
rfc.best_params_

{'criterion': 'gini',
 'max_depth': 8,
 'max_features': 'log2',
 'n_estimators': 500}

In [8]:
# generating a classification report

print("Performace of train data on the model \n",classification_report(y_train, rfc.predict(X_train)))
print("Performace of test data on the model \n",classification_report(y_test, rfc.predict(X_test)))

Performace of train data on the model 
               precision    recall  f1-score   support

           0       0.55      0.61      0.58      1342
           1       0.56      0.41      0.47      1230
           2       0.62      0.63      0.63      1347
           3       0.69      0.77      0.73      1541

    accuracy                           0.62      5460
   macro avg       0.61      0.61      0.60      5460
weighted avg       0.61      0.62      0.61      5460

Performace of test data on the model 
               precision    recall  f1-score   support

           0       0.44      0.49      0.46       556
           1       0.44      0.30      0.36       576
           2       0.57      0.61      0.59       572
           3       0.66      0.74      0.70       636

    accuracy                           0.54      2340
   macro avg       0.53      0.54      0.53      2340
weighted avg       0.53      0.54      0.53      2340



### Gradient Boosting

In [9]:
# best accuracy on the test set
gb.best_score_

0.530952380952381

In [10]:
# best parameters
gb.best_params_

{'max_depth': 3, 'n_estimators': 100}

In [11]:
# generating a classification report

print("Performace of train data on the model \n",classification_report(y_train, gb.predict(X_train)))
print("Performace of test data on the model \n",classification_report(y_test, gb.predict(X_test)))

Performace of train data on the model 
               precision    recall  f1-score   support

           0       0.52      0.53      0.53      1342
           1       0.50      0.43      0.46      1230
           2       0.64      0.61      0.62      1347
           3       0.67      0.77      0.71      1541

    accuracy                           0.59      5460
   macro avg       0.58      0.58      0.58      5460
weighted avg       0.59      0.59      0.59      5460

Performace of test data on the model 
               precision    recall  f1-score   support

           0       0.45      0.45      0.45       556
           1       0.41      0.32      0.36       576
           2       0.57      0.60      0.58       572
           3       0.65      0.76      0.70       636

    accuracy                           0.54      2340
   macro avg       0.52      0.53      0.52      2340
weighted avg       0.53      0.54      0.53      2340



## Conclusion
By analysing the classification report, we observe that the `Gradient Boosting` algorithm gives us the **best score of 0.53%.** 

It is also noted that the `Performance of: 3` - those belonging to Segementation class 'D' - has the highest scores. This is because most of the elements in the training dataset were classified as 'D'