# Multiple object tracker


Create multi-person tracker based on YOLOv5 detector and SORT tracker.

* Download [MOT17](https://motchallenge.net/data/MOT17/) data for single video
* Read the video, draw ground true bounding boxes along track id (in green color) and save new video in file *output.mp4* file.
* Create person detector based on YOLO to get bounding boxes e.g. [YOLO-NAS](https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md)
* Create a simple tracker to get track_id for each bounding box detected. You may use the [norfair](https://github.com/tryolabs/norfair) or this legacy repo [SORT](https://arxiv.org/abs/1602.00763).
* Append new bounding boxes along track id (in red color) to video  and save video to file *output.mp4*.




###Download MOT17 data and labels

In [1]:
!export MPLBACKEND=TKAgg

In [2]:
# Full data splitted by frames
# Really  we don't need it for this task but some images will be used for tests
# !wget https://motchallenge.net/data/MOT17Det.zip
# !unzip MOT17Det.zip

# Ground true bounding box
!wget https://motchallenge.net/data/MOT17Labels.zip
!unzip MOT17Labels.zip

# File with wideo
# Warning video resolution in this copy smaller than original video
!wget https://motchallenge.net/sequenceVideos/MOT17-09-SDP-raw.webm

in_video = 'content/MOT17-09-SDP-raw.webm'

--2024-03-04 17:51:41--  https://motchallenge.net/data/MOT17Labels.zip
Resolving motchallenge.net (motchallenge.net)... 131.159.19.34, 2a09:80c0:18::1034
Connecting to motchallenge.net (motchallenge.net)|131.159.19.34|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10107022 (9.6M) [application/zip]
Saving to: ‘MOT17Labels.zip’


2024-03-04 17:51:43 (8.09 MB/s) - ‘MOT17Labels.zip’ saved [10107022/10107022]

Archive:  MOT17Labels.zip
  inflating: test/MOT17-01-DPM/seqinfo.ini  
  inflating: test/MOT17-01-FRCNN/seqinfo.ini  
  inflating: test/MOT17-01-SDP/seqinfo.ini  
  inflating: test/MOT17-03-DPM/seqinfo.ini  
  inflating: test/MOT17-03-FRCNN/seqinfo.ini  
  inflating: test/MOT17-03-SDP/seqinfo.ini  
  inflating: test/MOT17-06-DPM/seqinfo.ini  
  inflating: test/MOT17-06-FRCNN/seqinfo.ini  
  inflating: test/MOT17-06-SDP/seqinfo.ini  
  inflating: test/MOT17-07-DPM/seqinfo.ini  
  inflating: test/MOT17-07-FRCNN/seqinfo.ini  
  inflating: test/MOT17-07-SDP/seqi

### Look at GT labels

All frame numbers, target IDs and bounding boxes are 1-based. World coordinates x,y,z are ignored for the 2D challenge


In [3]:
import pandas as pd
#https://github.com/dendorferpatrick/MOTChallengeEvalKit/tree/master/MOT
gt = '/content/train/MOT17-09-SDP/gt/gt.txt'
labels = pd.read_csv(gt,
                     sep=',',
                     names=["frame", "id", "bb_left", "bb_top", "bb_width", "bb_height", "conf", "x", "y", "z"])

print(labels.iloc[[2]])
labels.head()

   frame  id  bb_left  bb_top  bb_width  bb_height  conf  x    y   z
2      3   1      264     449       102        263     1  1  1.0 NaN


Unnamed: 0,frame,id,bb_left,bb_top,bb_width,bb_height,conf,x,y,z
0,1,1,260,450,102,262,1,1,1.0,
1,2,1,262,449,102,263,1,1,1.0,
2,3,1,264,449,102,263,1,1,1.0,
3,4,1,266,448,102,264,1,1,1.0,
4,5,1,268,448,102,264,1,1,1.0,


### Create person detector based on YOLO model

In [54]:
import torch

class PersonDetector:
  def __init__(self, conf=0.):
    # Place your code here
    self.conf = conf
    self.yolov5 = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

  def __call__(self,imgs):
    out = []
    # Return bboxes only

    # Place your code here
    tensors = self.yolov5(imgs).xyxy
    for xyxy_conf_class in tensors:
      out.append([row[0:5] for row in xyxy_conf_class.tolist() if int(row[5]) == 0 and row[4] >= self.conf])
    return out

### Smoke test for your PersonDetector

In [7]:
detector = PersonDetector()
imgs = ['/content/train/MOT17-09/img1/000001.jpg','/content/train/MOT17-09/img1/000001.jpg']
bboxes_with_persons = detector(imgs)
print("Persons",bboxes_with_persons)


Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2024-3-4 Python-3.10.12 torch-2.1.0+cu121 CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 


Persons [[[1702.4849853515625, 388.0796203613281, 1849.8570556640625, 726.8543090820312], [255.37469482421875, 455.0041198730469, 353.5467834472656, 708.5975952148438], [1250.3433837890625, 539.3751220703125, 1309.13671875, 656.4786987304688], [1290.06884765625, 462.34735107421875, 1359.756591796875, 655.0717163085938], [0.0, 327.29327392578125, 90.8331527709961, 902.8020629882812], [114.92710876464844, 499.80279541015625, 202.2029571533203, 743.3346557617188], [1883.2677001953125, 382.0382385253906, 1919.7115478515625, 577.444091796875], [22.578678131103516, 439.1407470703125, 122.76732635498047, 829.257080078125], [860.9136352539062, 523.9013061523438, 901.0994262695312, 622.630126953125]], [[1702.4849853515625, 388.0796203613281, 1849.8570556640625, 726.8543090820312], [255.37469482421875, 455.0041198730469, 353.5467834472656, 708.5975952148438], [1250.3433837890625, 539.3751220703125, 1309.13671875, 656.4786987304688], [1290.06884765625, 462.34735107421875, 1359.756591796875, 655.0

### Clone SORT
https://github.com/abewley/sort.git

or

https://github.com/tryolabs/norfair

In [8]:
# SORT
!git clone  https://github.com/abewley/sort.git
!pip install -r sort/requirements.txt

Cloning into 'sort'...
remote: Enumerating objects: 208, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 208 (delta 2), reused 2 (delta 1), pack-reused 202[K
Receiving objects: 100% (208/208), 1.21 MiB | 8.02 MiB/s, done.
Resolving deltas: 100% (74/74), done.
Collecting filterpy==1.4.5 (from -r sort/requirements.txt (line 1))
  Downloading filterpy-1.4.5.zip (177 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.0/178.0 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting scikit-image==0.17.2 (from -r sort/requirements.txt (line 2))
  Downloading scikit-image-0.17.2.tar.gz (29.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.8/29.8 MB[0m [31m31.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting lap==0.4.0 (from -r sort/requirements.txt (line 3))
  Downloa

In [11]:
# execfile('sort/sort.py')

usage: colab_kernel_launcher.py [-h] [--display] [--seq_path SEQ_PATH] [--phase PHASE]
                                [--max_age MAX_AGE] [--min_hits MIN_HITS]
                                [--iou_threshold IOU_THRESHOLD]
colab_kernel_launcher.py: error: unrecognized arguments: -f /root/.local/share/jupyter/runtime/kernel-a5dc42f4-181a-4397-9e53-f4c4b4b71d18.json


SystemExit: 2

In [22]:
import sys
sys.path.insert(0, '/content/sort')

In [83]:
gt = labels[labels['frame'] == 1][['bb_left', 'bb_top', 'bb_width', 'bb_height']]
gt

Unnamed: 0,bb_left,bb_top,bb_width,bb_height
0,260,450,102,262
3543,1686,387,171,345
3743,1886,327,156,404
3963,1253,533,63,129
4345,1292,459,70,202
4737,-348,235,477,695
5041,1035,174,136,532
5566,116,522,84,230
6091,234,395,21,440
6616,1682,470,65,122


### Place main code here

You can use this code as a template:

https://github.com/Gan4x4/CV-HSE2019/blob/master/video/exercise.ipynb

In [86]:
# from sort import Sort
import cv2
import multiprocessing as mp
import numpy as np

detector = PersonDetector() # YOLOv5 must be inside

# Open video input video

# Read video frame by frame

# Draw GT and predicted BB on ech frame along with track_id

# Save video with bounding boxes in `output.mp4` file

MAX_FRAMES = 1000
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
fps = 30.0
width = 960
height = 540

def read_video(q):
  # Не работает, если создается в основном потоке
  stream = cv2.VideoCapture('/content/MOT17-09-SDP-raw.webm')
  fps = stream.get(cv2.CAP_PROP_FPS)
  width  = int(stream.get(cv2.CAP_PROP_FRAME_WIDTH))
  height = int(stream.get(cv2.CAP_PROP_FRAME_HEIGHT))
  total_frames = int(stream.get(cv2.CAP_PROP_FRAME_COUNT))
  total_frames = min(total_frames, MAX_FRAMES)
  print(fourcc, fps, width, height)
  i = 0
  while i < total_frames:
    ret, frame = stream.read()
    if ret:
      print(f'read video: frame {i}')
      q.put(frame)
      i += 1
    else:
      break

  q.put(None)
  print('reading done')
  stream.release()

def write_video(q):
  writer = cv2.VideoWriter('/content/output.mp4', fourcc=fourcc, fps=fps, frameSize=(width, height))
  i = 0
  while True:
    print(f'write video: frame {i}')
    frame = q.get()
    if frame is None:
      break
    writer.write(frame)
    i += 1

  print('writing done')
  writer.release()

def process_video(in_q, out_q, detector, tracker):
  i = 0
  while True:
    print(f'process video: frame {i}')
    frame = in_q.get()
    if frame is None:
      break

    gt = labels[labels['frame'] == i + 1][['bb_left', 'bb_top', 'bb_width', 'bb_height']]
    for _, row in gt.iterrows():
      # Я не очень разобрался, почему такие значения и почему они бывают отрицательными
      x1 = row[0] // 2
      y1 = (row[1] - row[3]) // 2
      x2 = (row[0] + row[2]) // 2
      y2 = row[1] // 2
      cv2.rectangle(frame, (x1,y1), (x2,y2), color=(0,0,255), thickness=1)

    detections = detector([frame])[0]
    trackings = tracker.update(np.array(detections))
    for tracking in trackings:
      x1 = int(tracking[0])
      y1 = int(tracking[1])
      x2 = int(tracking[2])
      y2 = int(tracking[3])
      id = int(tracking[4])
      cv2.rectangle(frame, (x1,y1), (x2,y2), color=(0,255,0), thickness=1)
      cv2.putText(frame, str(id), (x1,y1), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color=(0,255,0), thickness=1)
    out_q.put(frame)
    i += 1

  out_q.put(None)

in_q = mp.Queue()
read_video_ps = mp.Process(target=read_video, args=(in_q,))
read_video_ps.start()

out_q = mp.Queue()
write_video_ps = mp.Process(target=write_video, args=(out_q,))
write_video_ps.start()

tracker = Sort()
print(tracker)
# main process
process_video(in_q, out_q, detector, tracker)

read_video_ps.join()
write_video_ps.join()



Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2024-3-4 Python-3.10.12 torch-2.1.0+cu121 CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape... 


1983148141 30.0 960 write video: frame 0540

read video: frame 0
read video: frame 1
read video: frame 2
read video: frame 3

read video: frame 4read video: frame 5
read video: frame 6
read video: frame 7
read video: frame 8
<__main__.Sort object at 0x7970c26ba4a0>
process video: frame 0
read video: frame 9
read video: frame 10
read video: frame 11
read video: frame 12
read video: frame 13
read video: frame 14
read video: frame 15
read video: frame 16
read video: frame 17
read video: frame 18
read video: frame 19
read video: frame 20
read video: frame 21
read video: frame 22
read video: frame 23
read video: frame 24
read video: frame 25
read video: frame 26
read video: frame 27
read video: frame 28
read video: frame 29
read video: frame 30
read video: frame 31
read video: frame 32
read video: frame 33
read video: frame 34
read video: frame 35
read video: frame 36
read video: frame 37
read video: frame 38
read video: frame 39
read video: frame 40
read video: frame 41
read video: frame 4

## Test speed of your code , and place brief conclusion here:

Видно, что задетектировать объекты получилось достаточно точно, хотя track_id немного хромают. Ну и еще это очень долго работает. Наверное можно в process_video перебирать по несколько кадров. Попробую

# Ideas for Extra work

* Measure overall speed of your tracker in FPS (Frame per second)
* Calculate tracking metrics (MOTA, MOTP)
* Try different tracker