# Evaluation

This notebook contains the codes for performance assessment of our different tasks.

## Motion detection

Here, the performance of our motion detector is assessed.

The module takes as input a frame, and outputs a corresponding mask where the background is 0, and the foreground is 1.

We evaluate our model on annotations done specifically in the public image database of the project.

The metric we use is simply Pixel Accuracy, the ratio of pixels correctly classified over all pixels. A value of 1 thus corresponds to 100% accuracy (higher is better).

In [4]:
import os
import cv2
from motion_detection import MotionDetector
import numpy as np

label_ids = [
	[115, 156, 212, 275, 320, 368, 376, 430, 492, 550, 600, 668, 725, 773, 815, 873, 940, 994, 1055, 1100, 1165, 1205, 1275, 1385, 1494],
	[112, 167, 200, 260, 300, 344, 406, 482, 533, 649, 708, 761, 954, 988, 1120, 1165, 1203, 1313, 1345, 1378, 1401, 1425, 1469, 1499]
]

md = MotionDetector()

def eval_md_accuracy(scene):
	"""
	params:
		scene: scene id (1 for inside, 2 for outside)
	"""
	accuracy = 0
	for id in label_ids[scene - 1]:
		# Read frame and predict
		img_path = os.path.join('data', 'img_5_{}'.format(scene), 'img_5_{}_{}.jpg'.format(scene, str(id).zfill(4)))
		img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
		pred = md.detect(img)

		# Read ground truth
		img_path = os.path.join('data', 'bb_img_5_{}'.format(scene), 'seg_5_{}_{}.png'.format(scene, str(id).zfill(4)))
		mask_img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE) / 255

		# Calculate accuracy
		acc = np.sum(pred == mask_img) / (720 * 1280)
		accuracy += acc

	return accuracy / len(label_ids[scene - 1])
	
inside_acc = eval_md_accuracy(1) * 100
outside_acc = eval_md_accuracy(2) * 100
print('Inside accuracy: {}%'.format(inside_acc.round(2)))
print('Outside accuracy: {}%'.format(outside_acc.round(2)))
print('Total accuracy: {}%'.format(((inside_acc + outside_acc) / 2).round(2)))

Using cache found in /Users/sacha/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-12-2 Python-3.9.0 torch-1.13.0 CPU

Fusing layers... 
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients
Adding AutoShape... 


Inside accuracy: 86.82%
Outside accuracy: 98.71%
Total accuracy: 92.76%


## Object Detection

Here, the performance of our object detector is assessed.

The model takes as input a frame, and outputs a list of 5-value vectors.
Each vector contains the x1, y1, x2, y2 coordinates of the bounding box and the 5th value is the class (0 = person, 1 = ball).

Following the literature, we use the mAP (mean Average Precision) for evaluating our model. We compute the AP for both classes and then averaged.
Higher is better, again.

Once again, the reference data is the public image database of the project annotated for this purpose.

## Ball Tracking

Here, the performance of the object tracker is assessed.

The tracker outputs at each frame the position and depth ((x,y) and z) for the ball(s).

The ball center on the x and y axes is supposed to be the center of the bounding box. For performance about this, refer to Object Detection, which evaluates those bounding boxes.

***TODO*** For the depth, we record some frames and measure the physical depth, and then average the distance between the predicted and ground truth depth on all annotated images. 