# Evaluation

This notebook contains the codes for performance assessment of our different tasks.

## Motion detection

Here, the performance of our motion detector is assessed.

The module takes as input a frame, and outputs a corresponding mask where the background is 0, and the foreground is 1.

We evaluate our model on annotations done specifically in the public image database of the project.

The metric we use is simply Pixel Accuracy, the ratio of pixels correctly classified over all pixels. A value of 1 thus corresponds to 100% accuracy (higher is better).

In [2]:
import os
import cv2
from motion_detection import MotionDetector
import numpy as np

label_ids = [
	[115, 156, 212, 275, 320, 368, 376, 430, 492, 550, 600, 668, 725, 773, 815, 873, 940, 994, 1055, 1100, 1165, 1205, 1275, 1385, 1494],
	[112, 167, 200, 260, 300, 344, 406, 482, 533, 649, 708, 761, 954, 988, 1120, 1165, 1203, 1313, 1345, 1378, 1401, 1425, 1469, 1499]
]
def eval_md_accuracy(scene):
	"""
	params:
		scene: scene id (1 for inside, 2 for outside)
	"""
	accuracy = 0
	for id in label_ids[scene - 1]:
		# Read frame and predict
		img_path = os.path.join('data', 'img_5_{}'.format(scene), 'img_5_{}_{}.jpg'.format(scene, str(id).zfill(4)))
		img = cv2.imread(img_path, cv2.IMREAD_COLOR)
		pred = md.detect(img)['mask']

		# Read ground truth
		img_path = os.path.join('data', 'bb_img_5_{}'.format(scene), 'seg_5_{}_{}.png'.format(scene, str(id).zfill(4)))
		mask_img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE) / 255

		# Calculate accuracy
		acc = np.sum(pred == mask_img) / (720 * 1280)
		accuracy += acc

	return accuracy / len(label_ids[scene - 1])

md = MotionDetector(model='yolov5x')

inside_acc = eval_md_accuracy(1) * 100
outside_acc = eval_md_accuracy(2) * 100
print('Inside accuracy: {}%'.format(inside_acc.round(2)))
print('Outside accuracy: {}%'.format(outside_acc.round(2)))
print('Total accuracy: {}%'.format(((inside_acc + outside_acc) / 2).round(2)))

Using cache found in /home/sach/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-11-30 Python-3.6.9 torch-1.10.1 CUDA:0 (NVIDIA GeForce RTX 3080, 10015MiB)

Fusing layers... 




YOLOv5x summary: 444 layers, 86705005 parameters, 0 gradients
Adding AutoShape... 


Inside accuracy: 89.15%
Outside accuracy: 98.99%
Total accuracy: 94.07%


## Object Detection

Here, the performance of our object detector is assessed.

The model takes as input a frame, and outputs a list of 5-value vectors.
Each vector contains the x1, y1, x2, y2 coordinates of the bounding box and the 5th value is the class (0 = person, 1 = ball).

Following the literature, we use the mAP (mean Average Precision) for evaluating our model. We compute the AP for both classes and then averaged.
Higher is better, again.

Once again, the reference data is the public image database of the project annotated for this purpose.

#### Accuracy

In [3]:
import cv2
from motion_detection import MotionDetector
import pandas as pd
import torch
import time
from torchmetrics.detection.mean_ap import MeanAveragePrecision
import numpy as np

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

label_ids = [
	[115, 156, 212, 275, 320, 368, 376, 430, 492, 550, 600, 668, 725, 773, 815, 873, 940, 994, 1055, 1100, 1165, 1205, 1275, 1385, 1494],
	[112, 167, 200, 260, 300, 344, 406, 482, 533, 649, 708, 761, 954, 988, 1120, 1165, 1203, 1313, 1345, 1378, 1401, 1425, 1469, 1499]
]

def eval_map(scene, debug_inf_time=False, rgb=False):
	# Read reference data and build dictionary
	ids = label_ids[scene - 1]

	# Read file box_5_1.txt as csv
	df = pd.read_csv(os.path.join('data', 'box_5_{}.txt'.format(scene)), header=None, sep=', ', engine='python')
	df.columns = ['id', 'x1', 'y1', 'x2', 'y2', 'class']
	df.replace({'class': {'person': 0, 'ball': 1}}, inplace=True)
	df['id'] = df['id'].apply(lambda x: int(x.split('_')[-1]))

	preds = []
	target = []
	inf_time = []

	for id in ids:
		# Read frame and predict
		img_path = os.path.join('data', 'img_5_{}'.format(scene), 'img_5_{}_{}.jpg'.format(scene, str(id).zfill(4)))
		img = cv2.imread(img_path, cv2.IMREAD_COLOR if rgb else cv2.IMREAD_GRAYSCALE)
		t = time.time()
		pred = md.detect(img)['boxes']
		inf_time.append(time.time() - t)
		# pred is a tensor of shape (N, 6) where N is the number of bounding boxes, and the 6 values are (x1, y1, x2, y2, conf, class)

		boxes = pred[:, :4]
		scores = pred[:, 4]
		labels = pred[:, 5].int()

		preds.append({'boxes': boxes, 'scores': scores, 'labels': labels})

		# Get ground truth
		gt = df[df['id'] == id]
		gt = torch.from_numpy(gt[['x1', 'y1', 'x2', 'y2', 'class']].values).to(device)

		boxes = gt[:, :4]
		labels = gt[:, 4].int()

		target.append({'boxes': boxes, 'labels': labels})
		
	metric = MeanAveragePrecision(iou_thresholds=[0.5], class_metrics=True).to(device)
	metric.update(preds, target)
	results = metric.compute()

	if debug_inf_time:
		print('Average inference time: {}s'.format(np.mean(inf_time)))

	return (results['map_50'].item(), *results['map_per_class'].tolist())

md = MotionDetector(model='yolov5n')
rgb = True
inside_map, inside_map_person, inside_map_ball = eval_map(1, rgb=rgb)
outside_map, outside_map_person, outside_map_ball = eval_map(2, rgb=rgb)
print('Inside mAP: {}%'.format(round(inside_map * 100, 2)))
print('Inside mAP (person): {}%'.format(round(inside_map_person * 100, 2)))
print('Inside mAP (ball): {}%'.format(round(inside_map_ball * 100, 2)))
print('Outside mAP: {}%'.format(round(outside_map * 100, 2)))
print('Outside mAP (person): {}%'.format(round(outside_map_person * 100, 2)))
print('Outside mAP (ball): {}%'.format(round(outside_map_ball * 100, 2)))

Using cache found in /home/nvdia/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-12-3 Python-3.6.9 torch-1.10.0 CUDA:0 (NVIDIA Tegra X2, 7850MiB)





Fusing layers... 
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients
Adding AutoShape... 


[{'boxes': tensor([[130.33037,  75.28920, 427.97632, 287.91821]], device='cuda:0'), 'scores': tensor([0.56999], device='cuda:0'), 'labels': tensor([1], device='cuda:0', dtype=torch.int32)}, {'boxes': tensor([[ 826.62195,   11.83264, 1138.22327,  706.82214]], device='cuda:0'), 'scores': tensor([0.86603], device='cuda:0'), 'labels': tensor([0], device='cuda:0', dtype=torch.int32)}, {'boxes': tensor([], device='cuda:0', size=(0, 4)), 'scores': tensor([], device='cuda:0'), 'labels': tensor([], device='cuda:0', dtype=torch.int32)}, {'boxes': tensor([[ 73.66552,  70.79117, 297.99969, 692.96436],
        [220.15088, 320.62335, 258.97333, 349.94537]], device='cuda:0'), 'scores': tensor([0.85955, 0.56130], device='cuda:0'), 'labels': tensor([0, 1], device='cuda:0', dtype=torch.int32)}, {'boxes': tensor([[ 92.70454,  52.25848, 389.29022, 698.84503]], device='cuda:0'), 'scores': tensor([0.92556], device='cuda:0'), 'labels': tensor([0], device='cuda:0', dtype=torch.int32)}, {'boxes': tensor([[ 24.

#### Speed

This module helps to compare the speed of the different detection models we use.

In [3]:
import cv2
from motion_detection import MotionDetector
from motion_detection import MODELS
import pandas as pd
import torch
import time
from torchmetrics.detection.mean_ap import MeanAveragePrecision
import numpy as np

path = "data/img_5_1/img_5_1_"
nb_frames = 500

avg_times = dict()
for model in ['yolov5n', 'yolov5s']: 
    md = MotionDetector(model=model)
    inf_times = []
    for i in range(0, nb_frames):
        img = cv2.imread(path + str(i).zfill(4) + ".jpg", cv2.IMREAD_GRAYSCALE)
        t = time.time()
        md.detect(img)
        inf_times.append(time.time() - t)
    avg_times[model] = (np.mean(inf_times) * 1000).round(2)

print('Average inference time for models:')
for model, avg_time in avg_times.items():
    print('{}: {}ms ({}fps)'.format(model, avg_time, round(1000 / avg_time, 2)))

Using cache found in /Users/sacha/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-12-2 Python-3.9.0 torch-1.13.0 CPU

Fusing layers... 


[Errno 2] No such file or directory: '/Users/sacha/miniconda3/envs/tfe/lib/python3.9/site-packages/certifi-2022.9.24.dist-info/METADATA'


YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients
Adding AutoShape... 
Using cache found in /Users/sacha/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-12-2 Python-3.9.0 torch-1.13.0 CPU

Downloading https://github.com/ultralytics/yolov5/releases/download/v7.0/yolov5s.pt to yolov5s.pt...
100%|██████████| 14.1M/14.1M [00:03<00:00, 3.72MB/s]

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape... 


Average inference time for models:
yolov5n: 115.31ms (8.67fps)
yolov5s: 217.68ms (4.59fps)


## Ball Tracking

Here, the performance of the object tracker is assessed.

The tracker outputs at each frame the position and depth ((x,y) and z) for the ball(s).

The ball center on the x and y axes is supposed to be the center of the bounding box. For performance about this, refer to Object Detection, which evaluates those bounding boxes.

***TODO*** For the depth, we record some frames and measure the physical depth, and then average the distance between the predicted and ground truth depth on all annotated images. 