# Exercise 3.1

Welcome to Exercise 3.1. In this exercise, we will learn how to use the DeepSORT algorithm with the YOLOv5 model to track objects in video.



## Overview

DeepSORT (Simple Online and Realtime Tracking) is a powerful online object tracking algorithm, developed as an enhancement to the SORT algorithm. DeepSORT improves SORT by integrating deep learning features to track objects in video with higher accuracy, especially in complex situations such as multiple objects and objects appearing or disappearing from the frame.

DeepSORT was first introduced in 2017 in the paper titled "Simple Online and Realtime Tracking with a Deep Association Metric" by Wojke, Bewley, and Paulus. The main improvement of DeepSORT over SORT is the use of a deep neural network to generate feature vectors that help distinguish between different objects, even when there are changes in position or shape.

In the field of autonomous vehicles, DeepSORT can be used in object tracking systems to identify and track other vehicles, pedestrians, and obstacles on the road. The continuous and accurate tracking capability of DeepSORT allows autonomous vehicles to make safer movement decisions, avoid collisions, and maintain an efficient travel path.


## Learning Objectives
After completing this exercise, learners will gain knowledge of:
- DeepSORT

## Related Knowledge
- Python
- DeepSORT

## Prerequisites
To complete this exercise, you will need the following knowledge:
- Basic programming skills in Python

## Problem Statement
**Objective**: Apply DeepSORT with YOLO model to track objects in a video.

**Requirements**:
- Input: video
- Output: video with object tracking boxes

## Instructions

Below are detailed instructions to help you understand the process of applying DeepSORT and YOLO to video.


### Libraries

In [21]:
import numpy as np
import torch
import cv2
import math
import time
from PIL import Image
from deep_sort_realtime.deepsort_tracker import DeepSort
import pathlib
temp = pathlib.PosixPath
pathlib.PosixPath = pathlib.WindowsPath

### Load Detection Mode

The objects that will be tracked depend on whether the model can detect them. In the labs of Chapter 2, we have guided you on how to effectively train models for different purposes. Please reload the weights of the model you find suitable to complete the following task.

**Exercise 1**: Complete the following code by filling in [...]


In [None]:
# force reload: avoid parameter conflicts when loading new models
model = torch.hub.load('ultralytics/yolov5', [...], path= [...], force_reload = [...])

### Initialize DeepSORT
The meaning of the parameters used in the `DeepSort` function:

1. **`max_age`**:
   - **Meaning**: Specifies the maximum number of frames that an object can be lost before it is removed from the tracker. 
   - **Explanation**: If an object is not detected for `max_age` consecutive frames, it will be considered lost and removed.

2. **`n_init`**:
   - **Meaning**: Specifies the number of initial frames in which an object must be detected consecutively before it is considered a valid tracked object.
   - **Explanation**: A new object must appear in at least `n_init` frames before it is confirmed as a valid object and starts being tracked.

3. **`nms_max_overlap`**:
   - **Meaning**: This parameter relates to Non-Maximum Suppression (NMS), an algorithm used to eliminate overlapping bounding boxes with different objects.
   - **Explanation**: `nms_max_overlap` defines the maximum overlap between bounding boxes that is still considered as two separate objects. The default value is `1.0`, meaning that bounding boxes can overlap completely without being removed.

4. **`max_cosine_distance`**:
   - **Meaning**: Specifies the maximum cosine distance between the features of objects (e.g., embedding vectors) to be considered the same object.
   - **Explanation**: Cosine distance is a measure of similarity between two vectors. A low `max_cosine_distance` value indicates a high requirement for similarity between two objects to be matched. The default value is typically `0.3`, meaning that only highly similar objects will be matched.

These parameters can be adjusted based on the data and specific problem you are working on to optimize the performance of the object tracking system.


**Exercise 2**: Complete the following code by filling in [...]

In [None]:
# Initialize DeepSORT
object_tracker = DeepSort(max_age=[...],
                          n_init=[...],
                          nms_max_overlap=[...],
                          max_cosine_distance=[...])

### Input processing functions for the DeepSORT algorithm

Function `score_frame(frame)`: This function takes a frame as input and returns the labels and coordinates of the objects detected in the frame.

Variable `classes`: is a list or dictionary containing the class names of the YOLO model, with the class index corresponding to the object name.
  
Function `class_to_label(x)`: This function takes the class index (x) and returns the corresponding class name.

Function `plot_boxes(results, height, width, confidence=0.3)`: This function processes the model's prediction results, filters objects based on confidence, and prepares bounding box information to be drawn on the frame.


**Exercise 3**: Complete the following code by filling in [...]


In [None]:
# Write a function to receive the frame and return the object's layer labels and coordinates
def score_frame(frame):
	results = [...]
	labels, cord = [...]
	return [...]

classes = model.names

# Based on the class index, returns the corresponding class name
def class_to_label(x):
	return [...]

# Complete the function to filter and prepare information about detected objects
def plot_boxes(results, height, width, confidence=0.3):
	labels, cord = [...]
	detections = []

	for i in range(len(labels)):
		row = cord[i]
		if [...]:
			x1, y1, x2, y2 = [...]
			conf = [...]
			class_label = [...]
			# print(feature)
			detections.append(
				([x1, y1, int(x2-x1), int(y2-y1)], conf, class_label))

	return detections

### Apply DeepSORT and YOLO to Video

**Exercise 4**: Complete the following code by filling in [...]

In [None]:
import cv2

input_video_path = [...] # Path to input video
output_video_path = [...]  # Path to input video

# Open input video
cap = cv2.VideoCapture(input_video_path)
if not cap.isOpened():
    print("Error: Could not open video.")
    exit()

# Get video parameters
fps = int(cap.get(cv2.CAP_PROP_FPS))
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Create video writer to save output video
fourcc = cv2.VideoWriter_fourcc(*'mp4v') 
out = cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))
    
while cap.isOpened():
	ret, frame = cap.read()
	if not ret:
		break

	# Predict objects in the frame

	results = score_frame([...])
	detections = plot_boxes([...])
	tracks = object_tracker.update_tracks(detections, frame=frame)

	# Draw frames and track IDs onto the video
	for track in tracks:
		bbox = track.to_tlbr()  # Bounding box as (x1, y1, x2, y2)
		track_id = track.track_id  # ID tracking
		x1, y1, x2, y2 = map(int, bbox)
		cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
		cv2.putText(frame, f"ID: {track_id}", (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)


	# Record frames to output video
	out.write(frame)

cap.release()
out.release()
cv2.destroyAllWindows()
