# **Mapping and Perception for an autonomous robot (0510-7951)**
---
* **Exercise 4-section 2: Detection, Segmentation and Multi Object Tracking**

In this exercise we provide you with a baseline multi-object tracker on the [MOT16](https://motchallenge.net/data/MOT16/) dataset. Your task is to analize detection and tracking results based on different techniques from the lectures. 
We will use [MOT17Det Dataset](https://arxiv.org/pdf/1603.00831.pdf).See the [webpage](https://motchallenge.net/) for video sequences with ground truth annotation

* **Goals**

1. Experience with [**MOT challenge Dataset**](https://motchallenge.net/) and pytorch!

2. Object Detection with [**Faster R-CNN** ]( https://www.learnopencv.com/faster-r-cnn-object-detection-with-pytorch/)

3. Instance segemenation based on [**Mask-RCNN**](https://arxiv.org/abs/1703.06870)

4. Object Detection with YOLO5 [**YOLO5V5 in pytorch**]( https://pytorch.org/hub/ultralytics_yolov5/)

5. Object Detection (Faster R-CNN) + Multiple Object(ID) Tracking with Simple Online and Realtime Tracking [**SORT**](https://arxiv.org/abs/1602.00763)

6. Multi Target Tracking: Run given baseline of Multiple Object(ID) Tracking and compare to current state-of-the-art multi-object tracker [**Tracktor++**](https://arxiv.org/abs/1903.05625)


* **Please copy to the report:**
1. Outputs- Images, tables, scores,etc
2. Code ("TODO" section)
3. Performace, analysis and your explanations. 
4. Attach the completed notebook to the report package. 
* ## Setup

 Download and extract project data to your Google Drive

1.   Install Google Drive on your desktop.
2.   Save this notebook to your Google Drive by clicking `Save a copy in Drive` from the `File` menu.
3.   Download attched **exercise.rar** file to your desktop and extract it into the `Colab Notebooks` folder in your Google Drive.
4. Go to [**MOT 16 dataset**](https://motchallenge.net/data/MOT16/) ,at the bottom of the page go to download--> "Get all data"
then Copy the MOT 16 dataset to "\exercise\data\MOT16"
5.   Wait until Google Drive finished the synchronisation. (This might take a while.)


* **TODO**

1. Student name: Nadav Magal

2. ID: 303010326

#### Connect the notebook to your Google Drive

In [1]:
# !pip freeze

In [2]:
from google.colab import drive

drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [3]:
# terminal command
# !wget -P /content/gdrive/MyDrive/MPAR/fourth_project/exercise/data https://motchallenge.net/data/MOT16.zip


In [4]:
# !unzip /content/gdrive/MyDrive/MPAR/fourth_project/exercise/data/MOT16.zip -d /content/gdrive/MyDrive/MPAR/fourth_project/exercise/data/MOT16/

In [5]:
!ls "/content/gdrive/MyDrive/MPAR/fourth_project/exercise/data/MOT16/train"

MOT16-02  MOT16-04  MOT16-05  MOT16-09	MOT16-10  MOT16-11  MOT16-13


In [6]:
root_dir = "/content/gdrive/MyDrive/MPAR/fourth_project/exercise"

The `root_dir` path points to the directory and the content in your Google Drive.

In [7]:
# !ls "gdrive/My Drive/Colab Notebooks/perception"
# !ls "gdrive/My Drive/Colab Notebooks/perception/src/tracker"
!ls -l "/content/gdrive/MyDrive/MPAR/fourth_project/exercise"
!ls -l "/content/gdrive/MyDrive/MPAR/fourth_project/exercise/src/tracker"

total 16
drwx------ 3 root root 4096 Jun 27 15:51 data
drwx------ 2 root root 4096 Jun 27 15:51 models
drwx------ 2 root root 4096 Jun 27 15:51 output
drwx------ 2 root root 4096 Jun 27 15:51 src
total 35
-rw------- 1 root root 10825 Jun 28 15:46 data_obj_detect.py
-rw------- 1 root root  8516 Jun 28 15:39 data_track.py
-rw------- 1 root root     0 Nov 20  2019 __init__.py
-rw------- 1 root root   658 Nov 20  2019 object_detector.py
drwx------ 2 root root  4096 Jun 27 15:51 __pycache__
-rw------- 1 root root  1874 Nov 21  2019 tracker.py
-rw------- 1 root root  8700 Jul  3 10:13 utils.py


In [8]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import os
import sys
sys.path.append(os.path.join(root_dir, 'src'))

!pip install tqdm lap
!pip install https://github.com/timmeinhardt/py-motmetrics/archive/fix_pandas_deprecating_warnings.zip

Collecting lap
[?25l  Downloading https://files.pythonhosted.org/packages/bf/64/d9fb6a75b15e783952b2fec6970f033462e67db32dc43dfbb404c14e91c2/lap-0.4.0.tar.gz (1.5MB)
[K     |████████████████████████████████| 1.5MB 6.7MB/s 
[?25hBuilding wheels for collected packages: lap
  Building wheel for lap (setup.py) ... [?25l[?25hdone
  Created wheel for lap: filename=lap-0.4.0-cp37-cp37m-linux_x86_64.whl size=1590135 sha256=5a2c5bbd677b1e694e75c1eada6515def9dee57049d50c84d5c5bd4ce00e8fc6
  Stored in directory: /root/.cache/pip/wheels/da/3e/af/eddcd6ffaa27df8d0ddac573758f8953c4e57c64c4c8c8b7d0
Successfully built lap
Installing collected packages: lap
Successfully installed lap-0.4.0
[K     \ 225kB 800kB/s
Building wheels for collected packages: motmetrics
  Building wheel for motmetrics (setup.py) ... [?25l[?25hdone
  Created wheel for motmetrics: filename=motmetrics-1.1.3-cp37-none-any.whl size=134200 sha256=adbcf958d45a451b1c5d958cee8584cb84fe2c8fc712fbc561c377076e8d9c90
  Stored in di

In [9]:
import matplotlib.pyplot as plt
import numpy as np
import time
from tqdm.autonotebook import tqdm

import torch
from torch.utils.data import DataLoader

from tracker.data_track import MOT16Sequences
from tracker.data_obj_detect import MOT16ObjDetect
from tracker.object_detector import FRCNN_FPN ##FasterRCNN
from tracker.tracker import Tracker
from tracker.utils import (plot_sequence, evaluate_mot_accums, get_mot_accum,
                           evaluate_obj_detect, obj_detect_transforms)

import motmetrics as mm
mm.lap.default_solver = 'lap'

  after removing the cwd from sys.path.


# MOT16 dataset

The MOT16 challenge provides 7 train and 7 test video sequences with multiple objects (pedestrians) per frame. It includes many challening scenarios with camera movement, high crowdedness and object occlusions. See the [webpage](https://motchallenge.net/data/MOT16/) for video sequences with ground truth annotation.

The `MOT17Sequences` dataset class provides the possibilty to load single sequences, e.g., `seq_name = 'mot16_02'`, or the entire train/test set, e.g., `seq_name = 'mot16_train'`.

In [10]:
# !ls "gdrive/My Drive/Colab Notebooks/perception/data/MOT16/train"
# !ls "gdrive/My Drive/Colab Notebooks/perception/data/MOT16/test"
!ls "/content/gdrive/MyDrive/MPAR/fourth_project/exercise/data/MOT16/train"
!ls "/content/gdrive/MyDrive/MPAR/fourth_project/exercise/data/MOT16/test"

MOT16-02  MOT16-04  MOT16-05  MOT16-09	MOT16-10  MOT16-11  MOT16-13
MOT16-01  MOT16-03  MOT16-06  MOT16-07	MOT16-08  MOT16-12  MOT16-14


In order to compare the tracking performance of different trackers without the effect of the object detector, the MOTChallenge provides a precomputed set of public object detections. Trackers are then evaluated on their capabilities to form tracks with the provided set. However, we want to allow you to improve on the object detections as well. Therefore, we participate in the MOT16 challenge with private detections.

## Instance segmentations

We provide the instance segmentations for the sequences `02`, `05`, `09` and `11`. These can be used for example to train a method which improves the bounding box position in occluded situations. See the original MOTS [webpage](https://www.vision.rwth-aachen.de/page/mots) for more info.

In [11]:
from tracker.data_track import MOT16Sequences
from tracker.data_obj_detect import MOT16ObjDetect
from tracker.object_detector import FRCNN_FPN #FasterRCNN
from tracker.tracker import Tracker

In [None]:
seq_name = 'MOT16-05' #TODO: # selcect the Sequences number (#) which most close to the your final ID number
data_dir = os.path.join(root_dir, 'data/MOT16')
print(data_dir)
sequences = MOT16Sequences(seq_name, data_dir, load_seg=True)

/content/gdrive/MyDrive/MPAR/fourth_project/exercise/data/MOT16


1.1 TODO: Select an image for the sequences according to your final number in your ID X10 . copy results to the report

1.2 TODO: Display the image + Bounding boxes(GT)

1.3 TODO: Display the mask image of segmentation (GT)



In [None]:
seq = sequences[0]
#1.1 TODO: Select an image for the sequences according to your final number in your ID X10 . copy results to the report
frame = seq[60] # selcect image according to the final number in your ID X10
img=frame['img']
gt=frame['gt']
# 1.2 TODO: Display the image + Bounding boxes(GT)
# hint- gt_id, Bounding boxes in gt.items()
img_np = img.mul(255).permute(1, 2, 0).byte().numpy()
width, height, _ = img.shape

dpi = 96
fig, ax = plt.subplots(1, dpi=dpi)
fig.set_size_inches(width / dpi, height / dpi)
ax.set_axis_off()
ax.imshow(img_np)

for j, t in gt.items():
  t_i = t
  ax.add_patch(
      plt.Rectangle(
          (t_i[0], t_i[1]),
          t_i[2] - t_i[0],
          t_i[3] - t_i[1],
          fill=False,
          linewidth=1.0
      ))

  ax.annotate(j, (t_i[0] + (t_i[2] - t_i[0]) / 2.0, t_i[1] + (t_i[3] - t_i[1]) / 2.0),
              weight='bold', fontsize=6, ha='center', va='center')

plt.axis('off')
# plt.tight_layout()
plt.show()

# TODO# * (1.2)

In [None]:
seg_img = frame['seg_img']
# 1.3 TODO: Display the mask image of segmentation (GT)
from skimage import io, color

plt.figure()
img_np = img.mul(255).permute(1, 2, 0).byte().numpy()
io.imshow(color.label2rgb(seg_img, img_np, colors=[(255, 0, 0), (0, 0, 255), (0, 255, 0)], alpha=0.01, bg_label=0, bg_color=None))
plt.show(block=False)

2.1 TODO: Implement instance segemenation based on maskRCNN (in PyTorch)

2.2 Run and the same image (copy results to the report) and compare results to given instance segmenation GT 

Bonus- use IoU/Dice score metrics(+3)

In [None]:
import os
from os.path import exists, join, basename, splitext

import random
import PIL
import torchvision
import cv2
import numpy as np
import torch
torch.set_grad_enabled(False)
  
import time
import matplotlib
import matplotlib.pylab as plt
plt.rcParams["axes.grid"] = False

#2.1 TODO: Implement instance segemenation based on maskRCNN (in PyTorch)
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model = model.eval().cuda()

In [None]:
coco_names = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

In [None]:
COLORS = np.random.uniform(0, 255, size=(len(coco_names), 3))

In [None]:
def get_outputs(image, model, threshold):
    with torch.no_grad():
        # forward pass of the image through the modle
        outputs = model(image)
    
    # get all the scores
    scores = list(outputs[0]['scores'].detach().cpu().numpy())
    # index of those scores which are above a certain threshold
    thresholded_preds_inidices = [scores.index(i) for i in scores if i > threshold]
    thresholded_preds_count = len(thresholded_preds_inidices)
    # get the masks
    masks = (outputs[0]['masks']>0.5).squeeze().detach().cpu().numpy()
    # discard masks for objects which are below threshold
    masks = masks[:thresholded_preds_count]
    # get the bounding boxes, in (x1, y1), (x2, y2) format
    boxes = [[(int(i[0]), int(i[1])), (int(i[2]), int(i[3]))]  for i in outputs[0]['boxes'].detach().cpu()]
    # discard bounding boxes below threshold value
    boxes = boxes[:thresholded_preds_count]
    # get the classes labels
    labels = [coco_names[i] for i in outputs[0]['labels']]
    return masks, boxes, labels

In [None]:
def draw_segmentation_map(image, masks, boxes, labels):
    alpha = 1 
    beta = 0.6 # transparency for the segmentation map
    gamma = 0 # scalar added to each sum
    for i in range(len(masks)):
        red_map = np.zeros_like(masks[i]).astype(np.uint8)
        green_map = np.zeros_like(masks[i]).astype(np.uint8)
        blue_map = np.zeros_like(masks[i]).astype(np.uint8)
        # apply a randon color mask to each object
        color = COLORS[random.randrange(0, len(COLORS))]
        red_map[masks[i] == 1], green_map[masks[i] == 1], blue_map[masks[i] == 1]  = color
        # combine all the masks into a single image
        segmentation_map = np.stack([red_map, green_map, blue_map], axis=2)
        #convert the original PIL image into NumPy format
        image = np.array(image)
        # convert from RGN to OpenCV BGR format
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
        # apply mask on the image
        cv2.addWeighted(image, alpha, segmentation_map, beta, gamma, image)
        # draw the bounding boxes around the objects
        cv2.rectangle(image, boxes[i][0], boxes[i][1], color=color, 
                      thickness=2)
        # put the label text above the objects
        cv2.putText(image , labels[i], (boxes[i][0][0], boxes[i][0][1]-10), 
                    cv2.FONT_HERSHEY_SIMPLEX, 1, color, 
                    thickness=2, lineType=cv2.LINE_AA)
    
    return image

In [None]:
img=frame['img'];
import torch
import torchvision.transforms as transforms

#2.2 TODO: Run and the same image (copy results to the report) and compare results to given instance segmenation GT
#hint use: torchvision.transforms.functional.to_tensor(pil_image_single).cuda()
# https://debuggercafe.com/instance-segmentation-with-pytorch-and-mask-r-cnn/
trans_to_pil = transforms.ToPILImage()
image_pil = trans_to_pil(img)
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
transform = transforms.Compose([transforms.ToTensor()])
img = transform(image_pil)
image = img.unsqueeze(0).cuda()

masks, boxes, labels = get_outputs(image, model, 0.965)
instance_seg_image = draw_segmentation_map(img_np, masks, boxes, labels)
plt.figure()
plt.imshow(instance_seg_image)
plt.show(block=False)

# Object detector

We provide you with an object detector pretrained on the MOT challenge training set. This detector can be used and improved to generate the framewise detections necessary for the subsequent tracking and data association step.

The object detector is a [Faster R-CNN](https://arxiv.org/abs/1506.01497) with a Resnet50 feature extractor. We trained the native PyTorch implementation of Faster-RCNN. For more information check out the corresponding PyTorch [webpage](https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html).

The pretrained Faster R-CNN ResNet-50 model that we are going to use expects the input image tensor to be in the form [n, c, h, w] and have a min size of 800px, where:

n is the number of images
c is the number of channels , for RGB images its 3
h is the height of the image
w is the width of the image
The model will return

Bounding boxes [x0, y0, x1, y1] all the predicted classes of shape (N,4) where N is the number of classes predicted by the model to be present in the image.
Labels of all the predicted classes.
Scores of each of the predicted label.

In [None]:
# !ls "gdrive/My Drive/Colab Notebooks/perception/models"
!ls "/content/gdrive/MyDrive/MPAR/fourth_project/exercise/models"


## Configuration

In [None]:
obj_detect_model_file = os.path.join(root_dir, 'models/faster_rcnn_fpn.model')
obj_detect_nms_thresh = 0.3


In [None]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

# object detector
obj_detect = FRCNN_FPN(num_classes=2, nms_thresh=obj_detect_nms_thresh)
obj_detect_state_dict = torch.load(obj_detect_model_file,
                                   map_location=lambda storage, loc: storage)
obj_detect.load_state_dict(obj_detect_state_dict)
obj_detect.eval()
obj_detect.to(device)



* **Faster R-CNN results**

3.1 Run object detection (Faster R-CNN) on the same image and compare results to given detection GT 

Bonus: use IoU/Dice score metrics (+3)

In [None]:
# TODO 3.1 Run object detection on the same image and compare results to given detection GT
np_img_for_detection = img_np.copy()
bboxes, scores = obj_detect.detect(image)
print(bboxes)
for cur_bbox in bboxes:
  np_img_for_detection = cv2.rectangle(np_img_for_detection, (cur_bbox[0], cur_bbox[1]), (cur_bbox[2], cur_bbox[3]), (255,0,0), 2)
plt.figure()
plt.imshow(np_img_for_detection)


In the next section, run the following evaluation of the object detection training set, you should obtain the following evaluation result:


3.2 TODO: Copy your results to report

AP:  Prec:  Rec:  TP:  FP: 

3.3 what can we learn from the performance?

3.4 Describe pros & cons of the algorithm (at least one for each),based on attched the paper


In [None]:
data_dir = os.path.join(root_dir, 'data/MOT16/train')
dataset_test = MOT16ObjDetect(data_dir,
                              obj_detect_transforms(train=False))
def collate_fn(batch):
    return tuple(zip(*batch))
data_loader_test = DataLoader(
    dataset_test, batch_size=1, shuffle=False, num_workers=4,
    collate_fn=collate_fn)

if False:
  evaluate_obj_detect(obj_detect, data_loader_test)
#3.2 TODO: Copy your results to report

#AP: Prec: Rec: TP: FP:

#3.3 what can we learn from the performance?


In [None]:
#**YOLO5 Detector results**

4.1 TODO: Implement object detection based on YOLOV5 in PyTorch (pretrained model). 

4.2 Run the algorithm on your image. 

what can we learn from the results? , Which detector architecture has better performance?

4.3 Describe pros & cons of the algorithm (at least one for each). Use the attached paper

In [None]:
!pip install -qr https://raw.githubusercontent.com/ultralytics/yolov5/master/requirements.txt  # install dependencies


In [None]:
import torch
# 4.1 TODO: Implement object detection based on YOLOV5 in PyTorch (pretrained model).
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

In [None]:
# 4.2 Run the algorithm on your image.
#TODO#
results = model('/content/gdrive/MyDrive/MPAR/fourth_project/exercise/data/MOT16/train/MOT16-05/img1/000060.jpg')
results.show()

# Multi-object tracking

A. In this section you will run detection and MOT on 
your traning folder

1. Object Detection with Faster R-CNN and save json file

2. load the json file and run Multiple Object(ID) Tracking with Simple Online and Realtime Tracking(SORT) algorithm

In [None]:
from os.path import join

In [None]:
import sys
MOT_PATH = '/content/gdrive/MyDrive/MPAR/fourth_project/exercise/data/MOT16'
motdata = join(MOT_PATH,'train/MOT16-05/img1/')
sys.path.append(motdata)

In [None]:

import matplotlib.pylab as plt
import cv2

list_motdata = os.listdir(motdata)  
list_motdata.sort()

img_ex_path = motdata + list_motdata[0]
img_ex_origin = cv2.imread(img_ex_path)
img_ex = cv2.cvtColor(img_ex_origin, cv2.COLOR_BGR2RGB)

plt.imshow(img_ex)
plt.axis('off')
plt.show()

In [None]:
# Import required packages/modules first

from PIL import Image
import numpy as np
import torch
import torchvision
from torchvision import transforms as T

In [None]:
# Download the pretrained Faster R-CNN model from torchvision
##TODO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, min_size=800)
model.eval()

In [None]:
# Define the class names given by PyTorch's official Docs

COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
    'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A',
    'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
    'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
    'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
    'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
    'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table',
    'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
    'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book',
    'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]


In [None]:
# Defining a function for get a prediction result from the model

def get_prediction(img_path, threshold):
  img = Image.open(img_path) # Load the image
  transform = T.Compose([T.ToTensor()]) # Defing PyTorch Transform
  img = transform(img) # Apply the transform to the image
  pred = model([img]) # Pass the image to the model
  pred_class = [COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(pred[0]['labels'].numpy())] # Get the Prediction Score
  pred_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(pred[0]['boxes'].detach().numpy())] # Bounding boxes
  pred_score = list(pred[0]['scores'].detach().numpy())
  pred_t = [pred_score.index(x) for x in pred_score if x > threshold][-1] # Get list of index with score greater than threshold.
  pred_boxes = pred_boxes[:pred_t+1]
  pred_class = pred_class[:pred_t+1]
  return pred_boxes, pred_class,pred_score

Image is obtained from the image path
The image is converted to image tensor using PyTorch’s Transforms
The image is passed through the model to get the predictions
Class, box coordinates are obtained, but only prediction score > threshold are chosen

In [None]:
# Defining a api function for object detection

def object_detection_api(img_path, threshold=0.5, rect_th=3, text_size=1.5, text_th=3):
 
    # TODO# = get_prediction(img_path, threshold) # Get predictions
    img = cv2.imread(img_path)  # Read image with cv2
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # Convert to RGB

    boxes, pred_cls,pred_score = get_prediction(img_path, threshold)

    for i in range(len(boxes)):
        cv2.rectangle(img, boxes[i][0], boxes[i][1], color=(0, 255, 0),
                      thickness=rect_th)  # Draw Rectangle with the coordinates
        cv2.putText(img, pred_cls[i], boxes[i][0], cv2.FONT_HERSHEY_SIMPLEX, text_size, (0, 255, 0),
                    thickness=text_th)  # Write the prediction class
    plt.figure(figsize=(15, 20))  # display the output image
    plt.imshow(img)
    plt.xticks([])
    plt.yticks([])
    plt.show(block=False)

prediction is obtained from get_prediction method
for each prediction, bounding box is drawn and text is written
with opencv
the final image is displayed

In [None]:
# Example: After detection
object_detection_api(img_ex_path,threshold=0.8)

The picture above is an example of applying Detection Network (in our case, Faster R-CNN).
Since the purpose of dataset we are using is 'tracking', you can see that most of the detected classes are 'person'.
We need a prediction result (bbs offset, class label, pred scores) for at least 100 images

In [None]:
# save json file 
json_detection_fp = r'/content/gdrive/MyDrive/MPAR/fourth_project/exercise/data/data.json'

if False:
  import glob
  import json

  json_dict = {}

  path_to_data = join(MOT_PATH,'train/MOT16-05/img1/*jpg') # fill the correct path

  for file_ in sorted(glob.glob(path_to_data)):
    print('start processing ' + file_.split('/')[-1])
    boxes, pred_cls, score = get_prediction(file_, threshold=0.8)
    frames_list = []
    for i in range(len(boxes)):
      frames_list.append({'bbox' : [np.float64(boxes[i][0][0]), np.float64(boxes[i][0][1]), np.float64(boxes[i][1][0]) ,np.float64(boxes[i][1][1])],
                          'labels' : COCO_INSTANCE_CATEGORY_NAMES.index(pred_cls[i]) ,
                          'scores' : np.float64(score[i]) 
                          })
    json_dict[file_.split('/')[-1]]  =  frames_list

  with open(json_detection_fp, 'w') as outfile:
      json.dump(json_dict, outfile,sort_keys=True)
  print(json_dict)

**Object ID Tracking with SORT**

Simple Online and Realtime Tracking (SORT) algorithm for object ID tracking

In [None]:
MOT_PATH

In [None]:
# Git clone: SORT Algorithm
if False:
  !cd "{MOT_PATH}";git clone https://github.com/abewley/sort.git
    
  sort = join(MOT_PATH,'sort/')
  sys.path.append(sort)

In [None]:
# requirement for sort
!cd "{sort}";pip install -r requirements.txt

In [None]:
# Optional: if error occurs, you might need to re-install scikit-image and imgaug
if False:
  !pip uninstall scikit-image
  !pip uninstall imgaug
  !pip install imgaug
  !pip install -U scikit-image

  import skimage
  print(skimage.__version__)

In [None]:
print(json_detection_fp)

In [None]:
# load json file
import json
import collections
from pprint import pprint
from sort import *

jsonpath=json_detection_fp # load the saved json file
with open(jsonpath) as data_file:    
   data = json.load(data_file)
odata = collections.OrderedDict(sorted(data.items()))

In [None]:
# Let's check out downloaded json file

pprint(odata)

In [None]:
img_path = motdata    # img root path

# Making new directory for saving results
save_path = join(MOT_PATH,'save/')
!mkdir "{save_path}"

In [None]:
mot_tracker = Sort()      # Tracker using SORT Algorithm

In [None]:
for key in odata.keys():   
    arrlist = []
    det_img = cv2.imread(os.path.join(img_path, key))
    overlay = det_img.copy()
    det_result = data[key] 
    
    for info in det_result:
        bbox = info['bbox']
        labels = info['labels']
        scores = info['scores']
        templist = bbox+[scores]
        
        if labels == 1: # label 1 is a person in MS COCO Dataset
            arrlist.append(templist)
            
    track_bbs_ids = mot_tracker.update(np.array(arrlist))
    
    mot_imgid = key.replace('.jpg','')
    newname = save_path + mot_imgid + '_SORT.jpg'
    print(mot_imgid)
    
    for j in range(track_bbs_ids.shape[0]):  
        ele = track_bbs_ids[j, :]
        x = int(ele[0])
        y = int(ele[1])
        x2 = int(ele[2])
        y2 = int(ele[3])
        track_label = str(int(ele[4])) 
        cv2.rectangle(det_img, (x, y), (x2, y2), (0, 255, 255), 4)
        cv2.putText(det_img, '#'+track_label, (x+5, y-10), 0,0.6,(0,255,255),thickness=2)
        
    cv2.imwrite(newname,det_img)

It's all done!

Finally, you can get a sequence of image with each Tracking ID for every detected person.

Prepare 'MOT_SORT.gif' for demo experience (at least 100 frames).

B. Baseline tracker and MOT analysis

In this section we provide you with a simple baseline tracker which predicts object detections for each frame and generates tracks by assigning current detections to previous detections

## Configuration

In [None]:
seed = 12345
seq_name = 'MOT16-05' # TODO fill the correct index
data_dir = os.path.join(root_dir, 'data/MOT16')
output_dir = os.path.join(root_dir, 'output')

## Setup

In [None]:
use_dice = True

In [None]:
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
np.random.seed(seed)
torch.backends.cudnn.deterministic = True

# dataset
sequences = MOT16Sequences(seq_name, data_dir)

# tracker
class TrackerIoUAssignment(Tracker):

    def data_association(self, boxes, scores):
        if self.tracks:
            track_ids = [t.id for t in self.tracks]
            track_boxes = np.stack([t.box.numpy() for t in self.tracks], axis=0)
            
            distance = mm.distances.iou_matrix(track_boxes, boxes.numpy(), max_iou=0.5) 
            if use_dice:
                distance = np.divide(2 * distance, distance + 1) # this is for dice
            # print(distance)
            # update existing tracks
            remove_track_ids = []
            for t, dist in zip(self.tracks, distance):
                if np.isnan(dist).all():
                    remove_track_ids.append(t.id)
                else:
                    match_id = np.nanargmin(dist)
                    t.box = boxes[match_id]
            self.tracks = [t for t in self.tracks
                           if t.id not in remove_track_ids]

            # add new tracks
            new_boxes = []
            new_scores = []
            for i, dist in enumerate(np.transpose(distance)):
                if np.isnan(dist).all():
                    new_boxes.append(boxes[i])
                    new_scores.append(scores[i])
            self.add(new_boxes, new_scores)

        else:
            self.add(boxes, scores)
        

tracker = TrackerIoUAssignment(obj_detect)

## Run baseline tracker

In [None]:
time_total = 0
mot_accums = []
results_seq = {}
for seq in sequences:
    tracker.reset()
    now = time.time()

    print(f"Tracking: {seq}")

    data_loader = DataLoader(seq, batch_size=1, shuffle=False)

    for frame in tqdm(data_loader):
        tracker.step(frame)
    results = tracker.get_results()
    results_seq[str(seq)] = results

    if seq.no_gt:
        print(f"No GT evaluation data available.")
    else:
        mot_accums.append(get_mot_accum(results, seq))

    time_total += time.time() - now

    print(f"Tracks found: {len(results)}")
    print(f"Runtime for {seq}: {time.time() - now:.1f} s.")

    if use_dice:
        cur_output_dir = output_dir + '/dice/'
    else:
        cur_output_dir = output_dir

    os.makedirs(cur_output_dir, exist_ok=True)
    seq.write_results(results, os.path.join(cur_output_dir))
    
print(f"Runtime for all sequences: {time_total:.1f} s.")


** Multi object tracker analysis **

5.1  Go over the given baseline tracker, what is the main metric for the association bewteen the new detections and existing tracks?

5.2   Open the saved txt file and explain the initial frames results according to [MOT challange paper](https://arxiv.org/pdf/1603.00831.pdf)  

5.3 Analize the  tracking results based on  [**Evaluation Measures**](https://motchallenge.net/results/3D_MOT_2015/?chl=3&orderBy=IDF1&orderStyle=DESC&det=Public) metrics.  

5.4  Change the given baseline tracker metric to Dice score, compare the tracking results to 5.3 .Which metric has better performance?

5.5  Compare the tracking results (baseline tracker) to the given Tractor++ performance. Why do you think [**Tracktor++**](https://arxiv.org/abs/1903.05625) achieves better results? what are the main differences between the trackers (based on the paper)?

5.6  Describe pros & cons of SORT and Tracktor++ trackers (at least pros & cons one for each). Use the attached papaer

In [None]:
evaluate_mot_accums(mot_accums,
                     [str(s) for s in sequences if not s.no_gt],
                     generate_overall=True)

The current state-of-the-art multi-object tracker [Tracktor++](https://arxiv.org/abs/1903.05625) achieves the following tracking results on the `MOT16-train` sequences:

              IDF1   IDP   IDR  Rcll  Prcn  GT  MT  PT  ML  FP    FN IDs   FM  MOTA  MOTP
    MOT16-02 45.8% 78.3% 32.4% 41.3% 99.8%  62   9  32  21  18 10909  59   68 40.9% 0.080
    MOT16-04 71.1% 90.3% 58.6% 64.7% 99.8%  83  32  29  22  71 16785  22   29 64.5% 0.096
    MOT16-05 64.0% 86.6% 50.7% 57.5% 98.1% 133  32  65  36  75  2942  37   59 55.8% 0.144
    MOT16-09 54.6% 69.4% 45.0% 64.3% 99.1%  26  11  13   2  31  1903  22   31 63.3% 0.086
    MOT16-10 64.3% 75.7% 55.9% 72.4% 98.0%  57  28  26   3 189  3543  71  125 70.4% 0.148
    MOT16-11 63.3% 77.0% 53.7% 69.0% 98.9%  75  24  33  18  73  2924  26   26 68.0% 0.081
    MOT16-13 73.6% 85.1% 64.8% 74.2% 97.6% 110  60  39  11 213  3000  62   90 71.9% 0.132
    OVERALL  65.0% 84.0% 53.1% 62.6% 99.1% 546 196 237 113 670 42006 299  428 61.7% 0.106

For your final submission you should focus on improving `MOTA`.

## Visualize tracking results

In [None]:
plot_sequence(results_seq['MOT16-05'],
              [s for s in sequences if str(s) == 'MOT16-05'][0],
              first_n_frames=2)

# Notes 

*   Experiment and debug on a single train sequence. If something works on a single sequence evaluate all train sequences to check the generaliztion of your improvement.
*   Remember to split the training set into multiple sets with different sequences if you train something and want to avoid overfitting.
*   Sometimes the execution of a cell gets stuck. If this happends just abort the execution and restart the cell.
*   If the notebook warns you that currently no GPU hardware acceleration is available, try again later and focus on some debugging or experiments than can be done only with the CPU.
