# Evaluation

This notebook contains the codes for performance assessment of our different tasks.

## Motion detection

Here, the performance of our motion detector is assessed.

The module takes as input a frame, and outputs a corresponding mask where the background is 0, and the foreground is 1.

We evaluate our model on annotations done specifically in the public image database of the project.

The metric we use is simply Pixel Accuracy, the ratio of pixels correctly classified over all pixels. A value of 1 thus corresponds to 100% accuracy (higher is better).

In [3]:
import os
import cv2
from motion_detection import MotionDetector
import numpy as np

label_ids = [
	[115, 156, 212, 275, 320, 368, 376, 430, 492, 550, 600, 668, 725, 773, 815, 873, 940, 994, 1055, 1100, 1165, 1205, 1275, 1385, 1494],
	[112, 167, 200, 260, 300, 344, 406, 482, 533, 649, 708, 761, 954, 988, 1120, 1165, 1203, 1313, 1345, 1378, 1401, 1425, 1469, 1499]
]

md = MotionDetector()

def eval_md_accuracy(scene):
	"""
	params:
		scene: scene id (1 for inside, 2 for outside)
	"""
	accuracy = 0
	for id in label_ids[scene - 1]:
		# Read frame and predict
		img_path = os.path.join('data', 'img_5_{}'.format(scene), 'img_5_{}_{}.jpg'.format(scene, str(id).zfill(4)))
		img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
		pred = md.detect(img)['mask']

		# Read ground truth
		img_path = os.path.join('data', 'bb_img_5_{}'.format(scene), 'seg_5_{}_{}.png'.format(scene, str(id).zfill(4)))
		mask_img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE) / 255

		# Calculate accuracy
		acc = np.sum(pred == mask_img) / (720 * 1280)
		accuracy += acc

	return accuracy / len(label_ids[scene - 1])
	
inside_acc = eval_md_accuracy(1) * 100
outside_acc = eval_md_accuracy(2) * 100
print('Inside accuracy: {}%'.format(inside_acc.round(2)))
print('Outside accuracy: {}%'.format(outside_acc.round(2)))
print('Total accuracy: {}%'.format(((inside_acc + outside_acc) / 2).round(2)))

Using cache found in /home/sach/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-11-30 Python-3.6.9 torch-1.10.1 CUDA:0 (NVIDIA GeForce RTX 3080, 10015MiB)

Fusing layers... 
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients
Adding AutoShape... 


Inside accuracy: 86.83%
Outside accuracy: 98.71%
Total accuracy: 92.77%


## Object Detection

Here, the performance of our object detector is assessed.

The model takes as input a frame, and outputs a list of 5-value vectors.
Each vector contains the x1, y1, x2, y2 coordinates of the bounding box and the 5th value is the class (0 = person, 1 = ball).

Following the literature, we use the mAP (mean Average Precision) for evaluating our model. We compute the AP for both classes and then averaged.
Higher is better, again.

Once again, the reference data is the public image database of the project annotated for this purpose.

In [1]:
import cv2
from motion_detection import MotionDetector
import pandas as pd
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

md = MotionDetector()

label_ids = [
	[115, 156, 212, 275, 320, 368, 376, 430, 492, 550, 600, 668, 725, 773, 815, 873, 940, 994, 1055, 1100, 1165, 1205, 1275, 1385, 1494],
	[112, 167, 200, 260, 300, 344, 406, 482, 533, 649, 708, 761, 954, 988, 1120, 1165, 1203, 1313, 1345, 1378, 1401, 1425, 1469, 1499]
]

iou_threshold = 0.5

def iou(bb1, bb2):
	"""
	params:
		bb1: bounding box 1
		bb2: bounding box 2
	"""
	# Calculate the intersection
	x1 = max(bb1[0], bb2[0])
	y1 = max(bb1[1], bb2[1])
	x2 = min(bb1[2], bb2[2])
	y2 = min(bb1[3], bb2[3])
	intersection = max(0, x2 - x1) * max(0, y2 - y1)

	# Calculate the union
	area1 = (bb1[2] - bb1[0]) * (bb1[3] - bb1[1])
	area2 = (bb2[2] - bb2[0]) * (bb2[3] - bb2[1])
	union = area1 + area2 - intersection

	return intersection / union

def calculate_ap(preds, gt):
	"""
	Computes the AP for a given class between two sets of bounding boxes

	params:
		preds: predicted bounding boxes (x1, y1, x2, y2, conf)
		gt: ground truth bounding boxes (x1, y1, x2, y2)
	"""
	# Sort predictions by confidence
	preds = preds.sort_values(by='confidence', ascending=False)

	# Calculate the number of true positives and false positives
	tp = []
	fp = []
	for _, pred in preds.iterrows():
		max_iou = 0
		for _, bb in gt.iterrows():
			max_iou = max(max_iou, iou(pred['bb'], bb['bb']))

		if max_iou >= iou_threshold:
			tp.append(1)
			fp.append(0)
		else:
			tp.append(0)
			fp.append(1)

	# Calculate the precision and recall
	tp = np.array(tp)
	fp = np.array(fp)
	precision = np.cumsum(tp) / (np.cumsum(tp) + np.cumsum(fp))
	recall = np.cumsum(tp) / len(gt)

	# Calculate the AP
	ap = 0
	for i in range(1, len(precision)):
		ap += (recall[i] - recall[i - 1]) * precision[i]

	return ap
	

def eval_map(scene):
	# Read reference data and build dictionary
	ids = label_ids[scene - 1]

	# Read file box_5_1.txt as csv
	df = pd.read_csv(os.path.join('data', 'box_5_{}.txt'.format(scene)), header=None, sep=', ', engine='python')
	df.columns = ['id', 'x1', 'y1', 'x2', 'y2', 'class']
	df.replace({'class': {'person': 0, 'ball': 1}}, inplace=True)
	df['id'] = df['id'].apply(lambda x: int(x.split('_')[-1]))

	# APs for each class
	APs = [0, 0]

	for id in ids:
		# Read frame and predict
		img_path = os.path.join('data', 'img_5_{}'.format(scene), 'img_5_{}_{}.jpg'.format(scene, str(id).zfill(4)))
		img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
		pred = md.detect(img)

		# Get ground truth
		gt = df[df['id'] == id]
		gt = torch.tensor(gt[['x1', 'y1', 'x2', 'y2', 'class']].values).to(device)

		# Calculate the average precision for each class
		for class_id, class_name in enumerate(['person', 'ball']):
			# Get the predicted bounding boxes for class c
			pred_c = pred[class_name + '_boxes']
			print(pred_c)

			# Get the ground truth bounding boxes for class c
			gt_c = gt[gt[:, 4] == class_id][:, :4]

			# Calculate the average precision for class c
			APs[class_id] += calculate_ap(pred_c, gt_c)
		

eval_map(1)

Using cache found in /home/sach/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-11-30 Python-3.6.9 torch-1.10.1 CUDA:0 (NVIDIA GeForce RTX 3080, 10015MiB)





Fusing layers... 
YOLOv5n summary: 213 layers, 1867405 parameters, 0 gradients
Adding AutoShape... 


tensor([[0.00000e+00, 7.14041e+00, 2.04462e+02, 6.69243e+02, 2.72891e-01, 0.00000e+00]], device='cuda:0')


AttributeError: 'Tensor' object has no attribute 'sort_values'

In [6]:
from torchmetrics.detection.mean_ap import MeanAveragePrecision

preds = [
  dict(
    boxes=torch.tensor([[258.0, 41.0, 606.0, 285.0]]),
    scores=torch.tensor([0.536]),
    labels=torch.tensor([0]),
  )
]
target = [
  dict(
    boxes=torch.tensor([[214.0, 41.0, 562.0, 285.0]]),
    labels=torch.tensor([0]),
  )
]
metric = MeanAveragePrecision()
metric.update(preds, target)
from pprint import pprint
pprint(metric.compute())

tensor(1.)


## Ball Tracking

Here, the performance of the object tracker is assessed.

The tracker outputs at each frame the position and depth ((x,y) and z) for the ball(s).

The ball center on the x and y axes is supposed to be the center of the bounding box. For performance about this, refer to Object Detection, which evaluates those bounding boxes.

***TODO*** For the depth, we record some frames and measure the physical depth, and then average the distance between the predicted and ground truth depth on all annotated images. 