## Model Evaluation  

### Introduction
Our project requires a model that can count how many cars are on a road at any time.  
We identified YOLO as the best model for this task. We further narrowed down our options
to two specific YOLO models.  
Thus, we wish to evaluate how these two models perform: Yolo V4 & YOLOX + SAHI.  
Furthermore, we wished to perfom parameter tuning -> thus we evaluated each model with the following threshold values: [0.2, 0.4, 0.6, 0.8]  

### Data Labelling  
To evaluate our models, we first needed to create suitable data for us to be able to evaluate our models.  
Thus, we chose 100 random images of roads in Singapore and labelled manually how many cars were in each image.  

### Metrics
After labelling our data we decided to test our models on two metrics:   
1) Time taken  
2) Accuracy (using Mean Squared Error)

### Training  
We used our models to count how many cars were in all 100 images.  
Furthermore, we timed how long the total time taken to evaluate the count of each image took.  
All the data was collected and saved in modelEval.csv.

### Results

In [1]:
import pandas as pd
modelEval = pd.read_csv("modelEval.csv", index_col = 0)
modelEval

Unnamed: 0,Actual Values,YOLOXv4 (Threshold=0.2),YOLOXv4 (Threshold=0.4),YOLOXv4 (Threshold=0.6),YOLOXv4 (Threshold=0.8),YOLOX+SAHI (Threshold=0.2),YOLOX+SAHI (Threshold=0.4),YOLOX+SAHI (Threshold=0.6),YOLOX+SAHI (Threshold=0.8),Time Yolo4 (Threshold=0.2),Time Yolo4 (Threshold=0.4),Time Yolo4 (Threshold=0.6),Time Yolo4 (Threshold=0.8),Time YOLOX+SAHI (Threshold=0.2),Time YOLOX+SAHI (Threshold=0.4),Time YOLOX+SAHI (Threshold=0.6),Time YOLOX+SAHI (Threshold=0.8)
1,13,8,8,6,7,10,11,11,10,4.365235,3.937865,3.875859,3.922535,1.287637,1.531856,1.335031,1.842232
2,26,23,22,20,21,22,25,22,22,4.109791,3.731660,3.750964,3.647583,1.775996,2.193995,1.958537,1.608863
3,0,0,0,0,0,0,0,0,0,3.731220,4.190937,4.280349,3.909865,2.021194,1.688491,1.705768,1.511814
4,18,13,11,12,12,16,16,15,13,4.113969,4.074654,4.052475,3.776970,1.868822,1.801301,1.767688,1.692448
5,5,3,0,0,0,3,3,3,2,4.076686,4.316481,4.044921,3.827088,1.490469,1.784822,1.979990,1.257689
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,22,20,15,15,15,20,20,19,22,4.131295,4.215780,3.982743,3.670906,1.831209,1.355306,2.132994,1.747664
97,14,10,11,10,8,11,12,11,9,4.311541,4.071036,3.991782,3.877842,2.189771,1.213278,1.926241,1.760868
98,22,19,16,20,17,20,19,18,18,3.994668,4.076193,4.207943,4.673704,1.781043,2.104058,1.840192,1.728157
99,9,4,3,2,2,7,6,7,7,4.009657,4.060544,3.792525,3.800953,1.896519,1.558372,1.329246,2.177950


In [2]:
modelEval.columns

Index(['Actual Values', 'YOLOXv4 (Threshold=0.2)', 'YOLOXv4 (Threshold=0.4)',
       'YOLOXv4 (Threshold=0.6)', 'YOLOXv4 (Threshold=0.8)',
       'YOLOX+SAHI (Threshold=0.2)', 'YOLOX+SAHI (Threshold=0.4)',
       'YOLOX+SAHI (Threshold=0.6)', 'YOLOX+SAHI (Threshold=0.8)',
       'Time Yolo4 (Threshold=0.2)', 'Time Yolo4 (Threshold=0.4)',
       'Time Yolo4 (Threshold=0.6)', 'Time Yolo4 (Threshold=0.8)',
       'Time YOLOX+SAHI (Threshold=0.2)', 'Time YOLOX+SAHI (Threshold=0.4)',
       'Time YOLOX+SAHI (Threshold=0.6)', 'Time YOLOX+SAHI (Threshold=0.8)'],
      dtype='object')

### Time Evaluation  

Time taken on average for Yolo 4:

In [3]:
pd.DataFrame(modelEval[['Time Yolo4 (Threshold=0.2)', 'Time Yolo4 (Threshold=0.4)',
       'Time Yolo4 (Threshold=0.6)', 'Time Yolo4 (Threshold=0.8)',
       'Time YOLOX+SAHI (Threshold=0.2)', 'Time YOLOX+SAHI (Threshold=0.4)',
       'Time YOLOX+SAHI (Threshold=0.6)', 'Time YOLOX+SAHI (Threshold=0.8)']].mean(), columns = ["Average Time Taken"])

Unnamed: 0,Average Time Taken
Time Yolo4 (Threshold=0.2),3.981402
Time Yolo4 (Threshold=0.4),4.031642
Time Yolo4 (Threshold=0.6),4.018719
Time Yolo4 (Threshold=0.8),3.993737
Time YOLOX+SAHI (Threshold=0.2),1.76023
Time YOLOX+SAHI (Threshold=0.4),1.769867
Time YOLOX+SAHI (Threshold=0.6),1.793881
Time YOLOX+SAHI (Threshold=0.8),1.74557


Therefore, YOLOX + SAHI is much faster on average than YOLO V4.

### Accuracy Evaluation

Evaluate how accurate both models are by calculating Mean Square Error for both models:

In [4]:
def mse(model, column, y):
    return ((model[column] - model[y])**2).mean()

In [7]:
columns = ['YOLOXv4 (Threshold=0.2)', 'YOLOXv4 (Threshold=0.4)',
       'YOLOXv4 (Threshold=0.6)', 'YOLOXv4 (Threshold=0.8)',
       'YOLOX+SAHI (Threshold=0.2)', 'YOLOX+SAHI (Threshold=0.4)',
       'YOLOX+SAHI (Threshold=0.6)', 'YOLOX+SAHI (Threshold=0.8)']

MSE = []
for column in columns:
    MSE.append(mse(modelEval, column, "Actual Values"))

In [8]:
pd.DataFrame({"MSE": MSE}, index = columns)

Unnamed: 0,MSE
YOLOXv4 (Threshold=0.2),8.86
YOLOXv4 (Threshold=0.4),19.24
YOLOXv4 (Threshold=0.6),23.16
YOLOXv4 (Threshold=0.8),25.1
YOLOX+SAHI (Threshold=0.2),2.79
YOLOX+SAHI (Threshold=0.4),4.27
YOLOX+SAHI (Threshold=0.6),8.19
YOLOX+SAHI (Threshold=0.8),9.65


Therefore, YOLOX + SAHI has a much lower MSE than YOLO V4.  
Furthermore, the best performing YOLOX + SAHI model is when threshold = 0.2.

Thus, since in both metrics YOLOX + SAHI performs better than YOLO V4, we will use YOLOX + SAHI.  
Threshold value chosen = 0.2.