# Aprendizaje Profundo para Procesamiento deSeñales de Imagen y Vídeo (APPIV)
## Master in Data Science
## Universidad Autonoma de Madrid

The following practice has the objective to implement and evaluate a Multi-object tracker. For this purpose,the [MOT16](https://motchallenge.net/data/MOT16/) dataset will be used.

Authors:
- Héctor Mejia Vallejo
- Ignacio Cordova Pou

Let's do it!

# 1. Setup

## 1.1 Install required modules

We are going to install:

* `gdown` to download the material
* `tqmd` in its latest version
* `Pytorch`, `torchvision`, and `torchaudio`
* `py-motmetrics` latest version directly from the repo
*  `scikit-learn` for the linear assignment

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

!pip3 install -q gdown
!pip3 install -q tqdm
!pip3 install -q torch torchvision torchaudio
!pip3 install -q git+https://github.com/cheind/py-motmetrics.git
!pip3 install -q scikit-learn

[0m

## 1.2 Installing code and data

We will download the required dataset (MOT16) and our custom tracker code from Google Drive using `gdown`. Then, we will place it in Kaggle's working directory (`kaggle/working/`) to begin the experiments.

In [3]:
import gdown, os, subprocess, zipfile
from tqdm.autonotebook import tqdm

# The path for the material
working_dir = '/kaggle/working/'  
data_dir = '/kaggle/input/mot-data/data/'
url_tracker = 'https://drive.google.com/drive/folders/1R0xgOiVEpf8_EwviQ72NZ2_9b_2wCcYj?usp=share_link'

In [4]:
def download_gdrive(url, check_dir):
    if not os.path.exists(check_dir):
        print("Downloading the material.")
        _ = gdown.download_folder(url, quiet=True)
        print("Complete!")
    else:
        print("Material is already downloaded!")

In [5]:
download_gdrive(url_tracker, "deep_tracker/")

Downloading the material.
Complete!


## 1.3 Defining macros

The following cell will define all the important macros in a single place so that different experiments and settings could be changed easily:

In [6]:
import sys

# General environment macros
# =======================
working_dir = '/kaggle/working/deep_tracker/'  
model_dir=working_dir+'models/'

# add the path to the system so we can import the tracker
sys.path.append(os.path.join(working_dir,'src/'))

# Dataset macros
# =======================
mode = "test" # mode = "test"
seq_name = f"MOT16-{mode}" #'MOT16-train', 'MOT16-02'
data_dir = os.path.join(data_dir, 'MOT16')

# Object detection macros
# =======================
obj_detect_model_file = os.path.join(working_dir, "models/faster_rcnn_fpn.model")
obj_detect_nms_thresh = 0.5
num_classes=2 
objectness_thresh = 0.5

# Object tracker macros
# =======================
iou_thresh = 0.5
t_missing = 5
alpha = 0.3
beta = 0.7
output_dir_tracker1 = os.path.join(working_dir, f'output_baseline_{mode}')
output_dir_tracker2 = os.path.join(working_dir, f'output_la_{mode}')
output_dir_tracker3 = os.path.join(working_dir, f'output_deep_{mode}')
output_dir_tracker4 = os.path.join(working_dir, f'output_deep_la_{mode}')


## 1.4 Imports

Run the following cell, it will import all the neccesary packages for this practice:

In [7]:
import torch, torchvision, torchaudio
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import motmetrics as mm
import time

from scipy import spatial
from tqdm.autonotebook import tqdm
from torch.utils.data import DataLoader
from tracker.data_track import MOT16Sequences
from tracker.detector.data_obj_detect import MOT16ObjDetect
from tracker.detector.object_detector import FRCNN_FPN
from tracker.utils import (evaluate_obj_detect, obj_detect_transforms)
from tracker.utils import get_mot_accum
from tracker.utils import evaluate_mot_accums

# Importing all of our trackers
from tracker.tracker import deep_tracker, iou_tracker

## 1.5 Sanity Checks and seed

We will check a few things before running the tracker:
- The Pytorch version
- the device available for DNN computations
- setting up a seed for reproducibility

In [8]:
print(torch.__version__)
print(torchvision.__version__)
print(torchaudio.__version__)

2.0.0
0.15.1
2.0.1


In [9]:
# See if GPU is available
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(device)

cuda


In [10]:
seed = 12345 #seed to allow repeatable results
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
np.random.seed(seed)
torch.backends.cudnn.deterministic = True

# 2. Dataset
The dataset was already installed when gdown was executed. But,we will make sure that the dataset is inside the directory before we run the experiments. If directories show empty, it means that the dataset was not properly installed in our working directory:

In [11]:
#list the contents of the 'train' directory
train_dir = os.path.join(data_dir,'train')
print('Train directory:')
!ls $train_dir

#list the contents of the 'test' directory
test_dir = os.path.join(data_dir,'test')
print('Test directory:')
!ls $test_dir

Train directory:
MOT16-02  MOT16-04  MOT16-05  MOT16-09
Test directory:
MOT16-10  MOT16-11  MOT16-13


# 3. Object Detector
The detector used in this practice is Faster R-CNN, which uses a resnet FPN backbone. There are no modifications from the baseline implementation of this practice. Let's instantiate the detector first:

In [12]:
# object detector
obj_detect = FRCNN_FPN(num_classes=num_classes, 
                       nms_thresh=obj_detect_nms_thresh)

# Load the pre-trained weights
obj_detect_state_dict = torch.load(obj_detect_model_file,map_location=lambda storage, loc: storage)
obj_detect.load_state_dict(obj_detect_state_dict)

# Set to evaluation and send to device
obj_detect.eval()
obj_detect.to(device)

Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%|██████████| 97.8M/97.8M [00:03<00:00, 28.3MB/s]


FRCNN_FPN(
  (transform): GeneralizedRCNNTransform(
      Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
      Resize(min_size=(800,), max_size=1333, mode='bilinear')
  )
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=1e-05)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=1e-05)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=1e-05)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=1e-05)
          (relu)

# 4. Single test run for the MOT


We have developed 4 versions of an MOT tracker. All of them will be defined, instantiated, and executed in the following code cells inside this section (section 4).

First, we are going to define a function to run all of our trackers:

In [13]:
def run_tracker(sequences, tracker, output_dir):
    """Runs a given tracker with the given sequences.
    
    Parameters:
    ===========
    - sequences: list
        The image sequences
    - tracker: Any subclass of AbstractTracker
        The tracker to execute
    """
    time_total = 0
    mot_accums = []
    results_seq = {}

    for seq in sequences:
        print(f"Tracking: {seq}")
        now = time.time()

        # restart tracker state for each sequence
        tracker.reset()

        #load data
        data_loader = DataLoader(seq, batch_size=1, shuffle=False)

        #run tracker
        for frame in tqdm(data_loader):
            tracker.step(frame)

        #keep results
        results = tracker.get_results()
        results_seq[str(seq)] = results

        #perform evaluation
        if seq.no_gt:
            print(f"No GT evaluation data available.")
        else:
            mot_accums.append(get_mot_accum(results, seq)) #compute and store eval metrics 

        time_total += time.time() - now

        print(f"Tracks found: {len(results)}")
        print(f"Runtime for {seq}: {time.time() - now:.1f} s.")

        #save results to output directory
        seq.write_results(results, os.path.join(output_dir))
        
    return results_seq, mot_accums

def evaluate_tracker(mot_accums, sequences):
    if mot_accums:
        return evaluate_mot_accums(mot_accums,
                            [str(s) for s in sequences if not s.no_gt],
                            generate_overall=True)

We only need to get the MOT sequences once:

In [14]:
# Instantiate the MOT sequences given the sequence name and dir
sequences = MOT16Sequences(seq_name, data_dir)
print('Loaded {:d} sequences for {:s}'.format(len(sequences),seq_name))

Loaded 3 sequences for MOT16-test


We are ready to execute!

## 4.1 Baseline IoU tracker

This tracker is an extension of the original provided in the APPIV tutorial. However, it introduces some changes:

- Added an objectness threshold for the detections, to eliminate detections with low confidence.
- Added an IoU score threshold, to avoid assigning detections with low IoU to existing tracks.
- Added a t_miss parameter, so that if a track misses a detection for t_miss timesteps it will not be eliminated, but put on hold. After the treshold, the track is eliminated.
- Fix the np.nanargmin assignment. Since np.nanargmin searches for the bounding box with the lowest IoU distance and nothing more, there is the posibility that for a timestep t a bounding box can be assigned to more than one tracker, which is horrible. Thus, we remove that possibility by running np.nanargmin on a copy of the distance matrix and then assigning NaN to the columns (bounding boxes) that already were assigned to a tracker.

Let's test it out:

In [15]:
baseline_iou_tracker = iou_tracker.TrackerIoUAssignment(obj_detect, 
                                                        t_missing=t_missing, 
                                                        det_thresh=objectness_thresh, 
                                                        iou_thresh=iou_thresh)

In [16]:
results_seq_base, mot_accums_base = run_tracker(sequences, 
                                                baseline_iou_tracker, 
                                                output_dir_tracker1)

Tracking: MOT16-10


  0%|          | 0/654 [00:00<?, ?it/s]

Tracks found: 55
Runtime for MOT16-10: 110.1 s.
Writing predictions to: /kaggle/working/deep_tracker/output_baseline_test/MOT16-10.txt
Tracking: MOT16-11


  0%|          | 0/900 [00:00<?, ?it/s]

Tracks found: 49
Runtime for MOT16-11: 123.9 s.
Writing predictions to: /kaggle/working/deep_tracker/output_baseline_test/MOT16-11.txt
Tracking: MOT16-13


  0%|          | 0/750 [00:00<?, ?it/s]

Tracks found: 75
Runtime for MOT16-13: 110.3 s.
Writing predictions to: /kaggle/working/deep_tracker/output_baseline_test/MOT16-13.txt


In [17]:
_ = evaluate_tracker(mot_accums_base, sequences)

          IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML   FP    FN IDs   FM  MOTA  MOTP IDt IDa IDm
MOT16-10 36.4% 55.3% 27.1% 45.4% 92.6%  57 16 27 14  463  7008 160  113 40.6% 0.139 134  26  20
MOT16-11 54.4% 71.9% 43.7% 58.0% 95.3%  75 18 29 28  271  3962  30   30 54.8% 0.080  42  11  23
MOT16-13 45.3% 60.8% 36.0% 54.4% 91.9% 110 41 36 33  559  5303 130  116 48.5% 0.134 154  34  60
OVERALL  44.7% 62.3% 34.8% 52.0% 93.2% 242 75 92 75 1293 16273 320  259 47.3% 0.119 330  71 103


## 4.2 Linear Assignment tracker

This tracker is an extension of our baseline tracker, tested above. The changes are the following:

- Replaced np.nanargmin assignment with a more sophisticated detection assignment using the Hungarian Algorithm, provided in scipy.optimize.linear_sum_assignment

Let's give it a go:

In [18]:
la_tracker = iou_tracker.TrackerLinearAssignment(obj_detect, 
                                     t_missing=t_missing, 
                                     det_thresh=objectness_thresh, 
                                     iou_thresh=iou_thresh)

In [19]:
results_seq_la, mot_accums_la = run_tracker(sequences, 
                                            la_tracker, 
                                            output_dir_tracker2)

Tracking: MOT16-10


  0%|          | 0/654 [00:00<?, ?it/s]

Tracks found: 90
Runtime for MOT16-10: 87.8 s.
Writing predictions to: /kaggle/working/deep_tracker/output_la_test/MOT16-10.txt
Tracking: MOT16-11


  0%|          | 0/900 [00:00<?, ?it/s]

Tracks found: 57
Runtime for MOT16-11: 120.7 s.
Writing predictions to: /kaggle/working/deep_tracker/output_la_test/MOT16-11.txt
Tracking: MOT16-13


  0%|          | 0/750 [00:00<?, ?it/s]

Tracks found: 65
Runtime for MOT16-13: 102.0 s.
Writing predictions to: /kaggle/working/deep_tracker/output_la_test/MOT16-13.txt


In [20]:
_ = evaluate_tracker(mot_accums_la, sequences)

          IDF1   IDP   IDR  Rcll  Prcn  GT  MT PT ML   FP   FN  IDs   FM  MOTA  MOTP  IDt IDa IDm
MOT16-10 31.6% 32.0% 31.2% 82.6% 84.7%  57  38 18  1 1921 2235  540  191 63.4% 0.153  462  53  28
MOT16-11 41.7% 44.1% 39.5% 81.0% 90.3%  75  42 28  5  821 1790   88   43 71.4% 0.083  100  20  41
MOT16-13 40.5% 39.7% 41.4% 86.8% 83.1% 110  85 22  3 2050 1535  473  164 65.1% 0.139  464  20  65
OVERALL  37.5% 37.9% 37.0% 83.6% 85.5% 242 165 68  9 4792 5560 1101  398 66.2% 0.129 1026  93 134


## 4.2 IoU + CNN features tracker

This tracker is an extension of the baseline tracker. It accepts a Deep Convolutional Neural Network (CNN) and computes features for the image patches coresponding to the bounding boxes. Those features are used to build a distance matrix by calculating the cosine distance for every pair of detection-track in each timestemp. 

The following changes to the baseline are:
- Implemented a new Track specification (class name is DeepTrack) that has an attribute to store a feature vector.
- Implemented a function "get_vector", to get latent feature vectors, using the CNN and an image patch.
- Implemented a function "get_patches_features" that computes latent features for every image patch, corresponding to all detections in the current image.
- Implemented a function "cosine_matrix" to compute the cosine distance for every pair detection-track using the feature vectors from the CNN.
- Modified the function "data_association" to compute the distance matrix as a weighted sum of the form:
$$
D = \alpha CosineDistance + \beta IoUDistance
$$

Let's run it:

First, we need to define the CNN that will extract the features from the image patches. This time we will use the small version of EfficientNetV2:

In [21]:
from torchvision.models import efficientnet_v2_s, EfficientNet_V2_S_Weights

# Initialize model with the best available weights
weights = EfficientNet_V2_S_Weights.DEFAULT
model = efficientnet_v2_s(weights=weights)
model.to(device)
model.classifier = model.classifier[:-1] 
model.eval()

Downloading: "https://download.pytorch.org/models/efficientnet_v2_s-dd5fe13b.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_v2_s-dd5fe13b.pth
100%|██████████| 82.7M/82.7M [00:01<00:00, 77.3MB/s]


EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 24, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): FusedMBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
        )
        (stochastic_depth): StochasticDepth(p=0.0, mode=row)
      )
      (1): FusedMBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (1): BatchNorm2d(24, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
  

and then we run the tracker:

In [22]:
deeps_tracker = deep_tracker.DeepSuperTracker(obj_detect, 
                                           model,
                                           weights,
                                           t_missing=t_missing, 
                                           det_thresh=objectness_thresh, 
                                           iou_thresh=iou_thresh,
                                           alpha=alpha,
                                           beta=beta)

In [23]:
results_seq_dt, mot_accums_dt = run_tracker(sequences, 
                                            deeps_tracker, 
                                            output_dir_tracker3)

Tracking: MOT16-10


  0%|          | 0/654 [00:00<?, ?it/s]

Tracks found: 55
Runtime for MOT16-10: 487.1 s.
Writing predictions to: /kaggle/working/deep_tracker/output_deep_test/MOT16-10.txt
Tracking: MOT16-11


  0%|          | 0/900 [00:00<?, ?it/s]

Tracks found: 51
Runtime for MOT16-11: 399.6 s.
Writing predictions to: /kaggle/working/deep_tracker/output_deep_test/MOT16-11.txt
Tracking: MOT16-13


  0%|          | 0/750 [00:00<?, ?it/s]

Tracks found: 76
Runtime for MOT16-13: 484.3 s.
Writing predictions to: /kaggle/working/deep_tracker/output_deep_test/MOT16-13.txt


In [24]:
_ = evaluate_tracker(mot_accums_dt, sequences)

          IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML   FP    FN IDs   FM  MOTA  MOTP IDt IDa IDm
MOT16-10 40.2% 58.3% 30.7% 49.0% 93.1%  57 18 25 14  469  6553 157  120 44.1% 0.137 128  26  19
MOT16-11 54.6% 72.1% 44.0% 58.0% 95.0%  75 18 29 28  286  3964  26   33 54.7% 0.080  37  10  21
MOT16-13 44.1% 60.1% 34.8% 53.0% 91.6% 110 37 45 28  565  5467 127  106 47.1% 0.131 148  31  55
OVERALL  45.7% 63.1% 35.8% 52.9% 93.1% 242 73 99 70 1320 15984 310  259 48.1% 0.118 313  67  95


In [25]:
_.loc["OVERALL", "mota"]

0.48067340861514873

## 4.3 IoU + CNN features + linear assignment tracker
This tracker extends DeepSuperTracker, implementing the Hungarian algorithm with the same compound distance:

$$
D = \alpha CosineDistance + \beta IoUDistance,
$$

for the assignment of detections to the tracks. The changes are:

- Replaced np.nanargmin assignment with a more sophisticated detection assignment using the Hungarian Algorithm, provided in scipy.optimize.linear_sum_assignment


In [26]:
deep_la_tracker = deep_tracker.DeepHungarianTracker(obj_detect, 
                                                   model,
                                                   weights,
                                                   t_missing=t_missing, 
                                                   det_thresh=objectness_thresh, 
                                                   iou_thresh=iou_thresh,
                                                   alpha=alpha,
                                                   beta=beta)

In [27]:
results_seq_dl, mot_accums_dl = run_tracker(sequences, 
                                            deep_la_tracker, 
                                            output_dir_tracker4)

Tracking: MOT16-10


  0%|          | 0/654 [00:00<?, ?it/s]

Tracks found: 98
Runtime for MOT16-10: 498.9 s.
Writing predictions to: /kaggle/working/deep_tracker/output_deep_la_test/MOT16-10.txt
Tracking: MOT16-11


  0%|          | 0/900 [00:00<?, ?it/s]

Tracks found: 67
Runtime for MOT16-11: 399.6 s.
Writing predictions to: /kaggle/working/deep_tracker/output_deep_la_test/MOT16-11.txt
Tracking: MOT16-13


  0%|          | 0/750 [00:00<?, ?it/s]

Tracks found: 70
Runtime for MOT16-13: 495.4 s.
Writing predictions to: /kaggle/working/deep_tracker/output_deep_la_test/MOT16-13.txt


In [28]:
_ = evaluate_tracker(mot_accums_dl, sequences)

          IDF1   IDP   IDR  Rcll  Prcn  GT  MT PT ML   FP   FN IDs   FM  MOTA  MOTP IDt IDa IDm
MOT16-10 34.7% 35.2% 34.2% 82.6% 85.0%  57  38 18  1 1877 2240 494  191 64.1% 0.153 400  60  26
MOT16-11 39.8% 42.7% 37.3% 80.8% 92.6%  75  42 27  6  611 1807  83   42 73.5% 0.082  91  20  32
MOT16-13 49.2% 48.2% 50.3% 86.9% 83.3% 110  86 21  3 2029 1525 331  155 66.6% 0.137 340  19  59
OVERALL  41.2% 41.9% 40.6% 83.6% 86.3% 242 166 66 10 4517 5572 908  388 67.6% 0.128 831  99 117
