SOULIER\
François\
SCIA - 2024

# Multi-object IOU Tracker (Bounding-Box Tracker)

In [1]:
import os
import shutil
import pandas as pd
import numpy as np

from src.trackers import Trackers
from src.utils.video import VideoUtils
from src.utils.tracker_type import TrackerType
from src.utils.tracker_utils import TrackerUtils

1. Load detections (det) stored in a MOT-challenge like formatted text file. Each line represents one object instance and contains 10 values:
* frame = frame number
* id = number identifies that object as belonging to a trajectory by assigning a unique ID (set to
−1 in a detection file, as no ID is assigned yet).
* bb_left, bb_top, bb_width, bb_height: bounding box position in 2D image coordinates i.e. the
top-left corner as well as width and height
* conf: detection confidence score
* x,y,z: the world coordinates are ignored for the 2D challenge and can be filled with -1.

In [2]:
column_names = ['frame', 'id', 'bb_left', 'bb_top', 'bb_width', 'bb_height', 'conf', 'x', 'y', 'z']

det_df = pd.read_csv('data/det.txt', sep=',', header=None)
det_df.columns = column_names
det_df.head()

Unnamed: 0,frame,id,bb_left,bb_top,bb_width,bb_height,conf,x,y,z
0,1,-1,1689,385,146.62,332.71,67.567,-1,-1,-1
1,1,-1,1303,503,61.514,139.59,29.439,-1,-1,-1
2,1,-1,1258,569,40.123,91.049,19.601,-1,-1,-1
3,1,-1,31,525,113.37,257.27,17.013,-1,-1,-1
4,1,-1,1800,483,94.66,214.81,11.949,-1,-1,-1


2. Implement IoU for tracking
* Compute similarity score using the Jaccard index (intersection-over-union) for each pair of
bounding boxes
* Create a similarity matrix that stores the IoU for all boxes

The implementation of the IoU and the similarity matrix is in the file `src/utils/tracker.py`, in the class `TrackerUtils`. Let's print the full content of this class:

In [3]:
%pycat src/utils/tracker.py

Error: no such file, variable, URL, history range or macro


3. Associate the detections to tracks in a greedy manner using IoU/ threshold sigma_iou. A track gets the detection with the highest intersection-over-union to its last known object position (i.e. the previous detection of the track) assigned.

In [4]:
trackers = Trackers(df=det_df.copy())
greedy_df = trackers.track(TrackerType.GREEDY)
greedy_df.head(20)

Unnamed: 0,frame,id,bb_left,bb_top,bb_width,bb_height,conf,x,y,z
0,1,0,1689,385,146.62,332.71,67.567,-1,-1,-1
1,1,1,1303,503,61.514,139.59,29.439,-1,-1,-1
2,1,2,1258,569,40.123,91.049,19.601,-1,-1,-1
3,1,3,31,525,113.37,257.27,17.013,-1,-1,-1
4,1,4,1800,483,94.66,214.81,11.949,-1,-1,-1
5,2,0,1689,385,146.62,332.71,66.725,-1,-1,-1
6,2,1,1312,503,61.514,139.59,36.614,-1,-1,-1
7,2,4,1744,476,123.42,280.06,16.976,-1,-1,-1
8,2,2,1254,537,52.0,118.0,15.979,-1,-1,-1
9,2,3,55,542,94.66,214.81,9.3326,-1,-1,-1


5. Develop an interface for tracking results check to see if the tracker properly keeps track of objects by associating the correct IDs in the video stream
* Draw rectangular bounding box around the detected object in images
* Draw attributed ID to each tracked objects
* Draw the trajectory (tracking path ) in an image

Setup path before export.

In [5]:
if os.path.exists('outputs/'):
    shutil.rmtree('outputs/')
os.mkdir('outputs/')
os.mkdir('outputs/videos/')

Export the video.

In [6]:
VideoUtils.export_video_with_tracking(greedy_df, 'data/video_iou/', 'outputs/videos/greedy.avi', 30, (1920, 1080))

  0%|          | 0/525 [00:00<?, ?it/s]

The video can be found at path `./outputs/videos/greedy_video.avi`

## Hungarian Algorithm assignment

1. Integrate the Hungarian algorithm to find the optimal assignment
* Apply the Hungarian algorithm using existing libraries (e.g. function linear_sum_assignement
from scipy library for Python,)
* Use previously computed values of similarity matrix (IoU) at the input

In [7]:
trackers.reset()
hungarian_df = trackers.track(TrackerType.HUNGARIAN)
hungarian_df.head(20)

Unnamed: 0,frame,id,bb_left,bb_top,bb_width,bb_height,conf,x,y,z
0,1,0,1689,385,146.62,332.71,67.567,-1,-1,-1
1,1,1,1303,503,61.514,139.59,29.439,-1,-1,-1
2,1,2,1258,569,40.123,91.049,19.601,-1,-1,-1
3,1,3,31,525,113.37,257.27,17.013,-1,-1,-1
4,1,4,1800,483,94.66,214.81,11.949,-1,-1,-1
5,2,0,1689,385,146.62,332.71,66.725,-1,-1,-1
6,2,1,1312,503,61.514,139.59,36.614,-1,-1,-1
7,2,4,1744,476,123.42,280.06,16.976,-1,-1,-1
8,2,2,1254,537,52.0,118.0,15.979,-1,-1,-1
9,2,3,55,542,94.66,214.81,9.3326,-1,-1,-1


Save the video at path `./outputs/videos/hungarian_tracking.avi`

In [8]:
VideoUtils.export_video_with_tracking(hungarian_df, 'data/video_iou/', 'outputs/videos/hungarian.avi', 30, (1920, 1080))

  0%|          | 0/525 [00:00<?, ?it/s]

2. Save tracking results in a txt file.
The file name must be exactly like the sequence name. The file format should be the same as the ground truth file (gt.txt), which is a CSV text-file containing one-object instance per line. Each line must contain 10 values. Update the id column (2th value) with the unique ID assigned to the track. The 7th value (conf) act asaflag1.

In [9]:
hungarian_df.to_csv('outputs/hungarian_tracking.txt', index=False, header=False)

The hungarian tracking file is now exported to `./outputs/hungarian_tracking.txt`

## Hungarian Algorithm with Kalman Filter assignment

In [10]:
trackers.reset()
hungarian_kalman_df = trackers.track(TrackerType.HUNGARIAN_KALMAN)
hungarian_kalman_df.head(20)

Unnamed: 0,frame,id,bb_left,bb_top,bb_width,bb_height,conf,x,y,z
0,1,0,1689,385,146.62,332.71,67.567,-1,-1,-1
1,1,1,1303,503,61.514,139.59,29.439,-1,-1,-1
2,1,2,1258,569,40.123,91.049,19.601,-1,-1,-1
3,1,3,31,525,113.37,257.27,17.013,-1,-1,-1
4,1,4,1800,483,94.66,214.81,11.949,-1,-1,-1
5,2,0,1689,385,146.62,332.71,66.725,-1,-1,-1
6,2,1,1312,503,61.514,139.59,36.614,-1,-1,-1
7,2,4,1744,476,123.42,280.06,16.976,-1,-1,-1
8,2,2,1254,537,52.0,118.0,15.979,-1,-1,-1
9,2,3,55,542,94.66,214.81,9.3326,-1,-1,-1


In [11]:
VideoUtils.export_video_with_tracking(hungarian_kalman_df, 'data/video_iou/', 'outputs/videos/hungarian_kalman.avi', 30, (1920, 1080))

  0%|          | 0/525 [00:00<?, ?it/s]

## Tracker with Neural Networks

Load all the images into a single tensor.

In [12]:
video = VideoUtils.load_video_images('data/video_iou/')

  0%|          | 0/525 [00:00<?, ?it/s]

Launch the NN-based tracker.

In [13]:
trackers.reset()
nn_df = trackers.track(TrackerType.NN_HUNGARIAN_KALMAN, video)

  0%|          | 0/524 [00:00<?, ?it/s]

Save tracks in a txt file `./outputs/nn_tracking.txt`

In [14]:
nn_df.to_csv('outputs/nn_tracking.txt', index=False, header=False)

Export video with trackings.

In [15]:
VideoUtils.export_video_with_tracking(nn_df, 'data/video_iou/', 'outputs/videos/nn.avi', 30, (1920, 1080))

  0%|          | 0/525 [00:00<?, ?it/s]

## YOlOv5 Tracker

Compute new tracks with a YOLOv5 (pretrained) model.

In [18]:
yolo_tracks_df = TrackerUtils.get_yolo_tracks(video)
yolo_tracks_df.head(20)

Using cache found in /Users/francois.soulier/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2024-1-28 Python-3.11.4 torch-2.0.1 CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape... 


  0%|          | 0/525 [00:00<?, ?it/s]

Unnamed: 0,frame,id,bb_left,bb_top,bb_width,bb_height,conf,x,y,z
0,0,-1,1705.473022,396.843658,143.03064,331.744904,0.77368,-1,-1,-1
1,0,-1,258.216187,460.156891,94.228821,251.144806,0.755457,-1,-1,-1
2,0,-1,0.236366,340.628021,117.29068,560.05957,0.740257,-1,-1,-1
3,0,-1,111.752823,503.87384,89.405518,238.817505,0.643121,-1,-1,-1
4,0,-1,1249.333862,537.460327,57.606323,115.732727,0.566582,-1,-1,-1
5,0,-1,1288.902832,458.722229,62.128906,192.073242,0.514799,-1,-1,-1
6,0,-1,860.369812,524.143982,41.987,102.561584,0.355275,-1,-1,-1
7,0,-1,1883.506714,388.177368,36.493286,191.629639,0.355202,-1,-1,-1
8,1,-1,264.011169,458.857178,93.119568,250.533752,0.79317,-1,-1,-1
9,1,-1,0.0,321.430389,95.370262,581.912964,0.779198,-1,-1,-1


Compute matching with previous NN-based tracking algorithm.

In [20]:
trackers = Trackers(df=yolo_tracks_df.copy())
tracked_yolo_df = trackers.track(TrackerType.HUNGARIAN_KALMAN)

Export video with new tracks.

In [21]:
VideoUtils.export_video_with_tracking(tracked_yolo_df, 'data/video_iou/', 'outputs/videos/yolo.avi', 30, (1920, 1080))

  0%|          | 0/525 [00:00<?, ?it/s]

Export txt file with new tracks.

In [None]:
tracked_yolo_df.to_csv('outputs/yolo_tracking.txt', index=False, header=False)