# IOU Tracker

IOUTracker implements a tracking algorithm or method to track objects based on their Intersection-Over-Union (IOU) information across the consecutive frames. The core concept of this algorithm refers to the article (http://elvera.nue.tu-berlin.de/files/1517Bochinski2017.pdf). The idea or the assumption is based on an existing and powerful detector and the high frame rate across the consecutive frames. Under this assumption, you can conduct the object tracking with only the localization and the IOU information. The algorithm conducts under a super-high frame rate and provides a foundation for more complicated calculations upon it. 

On the other hand, such an algorithm requires an evaluation. The evaluation of this implement also refers to two articles, MOT16 benchmark (https://arxiv.org/abs/1603.00831) and Multi-Target Tracker (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.8335&rep=rep1&type=pdf).

This implementation uses MOT17Det dataset (https://motchallenge.net/data/MOT17Det/) as an example.

* More information please refer to https://github.com/jiankaiwang/ioutracker.
* More example videos please refer to .

In [1]:
import os
import logging

try:
  # you must install ioutracker first
  from ioutracker import loadLabel, outputAsFramesToVideo, IOUTracker
  from ioutracker import EvaluateOnMOTDatasets, ExampleEvaluateMOTDatasets
  logging.warning("Load ioutracker from the installed package.")
except Exception as e:
  import sys
  modulePaths = [os.path.join("..")]
  for path in modulePaths: sys.path.append(path)
  from ioutracker.dataloaders.MOTDataLoader import loadLabel
  from ioutracker.inference.MOTDet17Main import outputAsFramesToVideo
  from ioutracker.inference.MOTDet17Main import outputAsFramesToVideo
  from ioutracker.src.IOUTracker import IOUTracker
  from ioutracker.metrics.MOTMetrics import EvaluateOnMOTDatasets, ExampleEvaluateMOTDatasets
  logging.warning("Load ioutracker from the relative path.")



## Data Preprocessing

You can use the shell script under the path, (`./ioutracker/dataloaders/MOTDataDownloader.sh`), in the git repository to download the dataset.

In [2]:
SUB_DATASET = "train"
VERSION = "MOT17Det"
LOCAL_PATH = os.path.join("/", "tmp", "MOT")

# you can change the path pointing to the dataset
LABEL_PATH = os.path.join(LOCAL_PATH, "{}".format(VERSION + "Labels"), SUB_DATASET)
assert os.path.exists(LABEL_PATH), "{} is not found.".format(LABEL_PATH)

# you can change the path pointing to the dataset
FRAME_PATH = os.path.join(LOCAL_PATH, "{}".format(VERSION), SUB_DATASET)
assert os.path.exists(FRAME_PATH), "{} is not found.".format(FRAME_PATH)

## Sample Selection

In [3]:
totalSamples = next(os.walk(os.path.join(LABEL_PATH)))[1]
print(totalSamples)

['MOT17-02', 'MOT17-04', 'MOT17-05', 'MOT17-09', 'MOT17-10', 'MOT17-11', 'MOT17-13']


In [4]:
SAMPLE = "MOT17-10"
LABEL_FILE_PATH = os.path.join(LABEL_PATH, SAMPLE, "gt", "gt.txt")
assert os.path.exists(LABEL_FILE_PATH), "{} is not found.".format(LABEL_FILE_PATH)

FRAME_FILE_PATH = os.path.join(FRAME_PATH, SAMPLE, "img1")
assert os.path.exists(FRAME_FILE_PATH), "{} is not found.".format(FRAME_FILE_PATH)

## Output the tracking result on the video

In [5]:
FRAME_FPS = {"MOT17-13": 25, "MOT17-11": 30, "MOT17-10": 30, "MOT17-09": 30,
             "MOT17-05": 14, "MOT17-04": 30, "MOT17-02": 30}
assert SAMPLE in list(FRAME_FPS.keys()), "{} was not found.".format(SAMPLE)
fps = FRAME_FPS[SAMPLE]
print("Sample {} with FPS: {}".format(SAMPLE, fps))

Sample MOT17-10 with FPS: 30


Check whether or not the folder is existing, or create it if it isn't.

In [6]:
tracking_output = os.path.join(LOCAL_PATH, "tracking_output".format(VERSION))
if not os.path.exists(tracking_output):
  try:
    os.mkdir(tracking_output)
    print("Created the output path: {}".format(tracking_output))
  except Exception as e:
    raise Exception("Can't create the folder. ({})".format(e))

In this example, we introduce how to output the tracking result on the consecutive frame to a video file. Notice the flag `plotting` must be set to `True` if you want to output the video.

In [7]:
outputAsFramesToVideo(detection_conf=0.2,
                      iou_threshold=0.2,
                      min_t=fps,
                      track_min_conf=0.5,
                      labelFilePath=LABEL_FILE_PATH,
                      frameFilePath=FRAME_FILE_PATH,
                      trackingOutput=tracking_output,
                      fps=fps,
                      outputFileName=SAMPLE,
                      plotting=True)

100%|██████████| 653/653 [01:05<00:00, 10.04it/s]


Total time cost: 144.47552704811096


You can move the video to the desired path like below.

```sh
mv /tmp/MOT/tracking_output/tracking_MOT17-04.mp4 ~/Desktop/
```

## A simple example from scratch

Load the MOT data first via the API `loadLabel`. You can also use `help` to take a look into the parameters.

In [8]:
LABELS, DFPERSONS = loadLabel(src=LABEL_FILE_PATH, format_style="metrics_dict")

In [9]:
help(loadLabel)

Help on function loadLabel in module ioutracker.dataloaders.MOTDataLoader:

loadLabel(src, is_path=True, load_Pedestrian=True, load_Static_Person=True, visible_thresholde=0, format_style='onlybbox')
    LoadLabel: Load a label file in the csv format.
    
    Args:
      src: the MOT label file path (available when is_path is True)
      is_path: True or False for whether the src is the file path or not
      load_Pedestrian: whether to load the pedestrian data or not
      load_Static_Person: whether to load the statuc person data or not
      visible_thresholde: the threshold for filtering the invisible person data
      format_style: provides different styles in the lists,
                    "onlybbox" (func: formatBBoxAndVis), "onlybbox_dict" (func: formatBBoxAndVis),
                    "metrics" (func: formatForMetrics), "metrics_dict" (func: formatForMetrics)
    
    Returns:
      objects_in_frames: a list contains the person detection information per frames



LABELS is a dictionary whose key is the frame ID and whose value is each object detection result. The result keeps the localization and visibility in a list, more detail is `[x, y, w, h, visibility]`.

In [10]:
len(LABELS[1]), LABELS[1]

(20,
 [[1368.0, 394.0, 74.0, 226.0, 1.0, 1.0],
  [1478.0, 418.0, 74.0, 176.0, 1.0, 2.0],
  [680.0, 407.0, 67.0, 199.0, 1.0, 3.0],
  [1232.0, 412.0, 36.0, 112.0, 1.0, 4.0],
  [470.0, 432.0, 72.0, 176.0, 1.0, 5.0],
  [730.0, 421.0, 55.0, 173.0, 0.67857, 6.0],
  [550.0, 436.0, 60.0, 170.0, 1.0, 8.0],
  [960.0, 416.0, 39.0, 110.0, 0.82342, 12.0],
  [1170.0, 418.0, 38.0, 104.0, 1.0, 13.0],
  [590.0, 427.0, 55.0, 177.0, 0.6439600000000001, 15.0],
  [1112.0, 418.0, 32.0, 81.0, 1.0, 18.0],
  [398.0, 435.0, 28.0, 69.0, 0.62857, 22.0],
  [635.0, 426.0, 52.0, 137.0, 0.6430100000000001, 23.0],
  [1196.0, 418.0, 28.0, 94.0, 0.55172, 26.0],
  [843.0, 426.0, 20.0, 67.0, 0.7395, 27.0],
  [825.0, 433.0, 21.0, 57.0, 0.63636, 28.0],
  [858.0, 432.0, 21.0, 71.0, 1.0, 38.0],
  [809.0, 429.0, 19.0, 61.0, 0.8129, 39.0],
  [780.0, 427.0, 21.0, 67.0, 0.72727, 40.0],
  [1594.0, 423.0, 41.0, 107.0, 1.0, 61.0]])

DFPERSONS is a pandas dataframe object that is processed and filtered unnecessary columns.

In [11]:
DFPERSONS.head(5)

Unnamed: 0,fid,uid,bX,bY,bW,bH,conf,class,visible
0,1,1,1368,394,74,226,1,1,1.0
1,2,1,1366,394,75,229,1,1,1.0
2,3,1,1365,394,76,232,1,1,1.0
3,4,1,1363,394,77,235,1,1,1.0
4,5,1,1362,394,78,238,1,1,1.0


You can instantiate an IOUTracker to start the algorithm.

In [12]:
help(IOUTracker)

Help on class IOUTracker in module ioutracker.src.IOUTracker:

class IOUTracker(builtins.object)
 |  IOUTracker(detection_conf=0.2, iou_threshold=0.5, min_t=1, track_min_conf=0.2, assignedTID=True)
 |  
 |  IOUTracker implements the IOU tracker algorithm details.
 |  
 |  Methods defined here:
 |  
 |  __call__(self, detections, returnFinishedTrackers=False)
 |      Runs the IOU tracker algorithm across the consecutive frames.
 |      
 |      Args:
 |        detections: a list contains multiple detections per frame, each detection
 |                    keeps [[bX, bY, bWidth, bHeight, visible], [], []]
 |      
 |        returnFinishedTrackers: a bool for returning finished trackers
 |      
 |      Returns:
 |        detectionMapping: a list contains multiple dictionary-structure objects
 |                          representing each detection, the order of those objects
 |                          is the same to the detection, the prototype is like
 |      
 |                        

Create a ioutracker that implements the IOU tracker algorithm. In this example, we use the default ID increment mechanism to get the tracker ID for each box.

In [13]:
iouTracks = IOUTracker(detection_conf=0.2,
                       iou_threshold=0.2,
                       min_t=fps,
                       track_min_conf=0.5, 
                       assignedTID=True)

for frameIdx in range(1, len(LABELS), 1):
  if frameIdx % 10 == 0: print('.', end='')
  
  # iou tracker
  detectionMapping, _ = iouTracks(LABELS[frameIdx])

  if frameIdx % (fps * 5) == 0:
    print("Frame: {}".format(frameIdx))
    for bboxIdx in range(len(LABELS[frameIdx])):
      print("BBox: {}, and its Track ID: {}".format(LABELS[frameIdx][bboxIdx], detectionMapping[bboxIdx]["tid"]))
    print("...")

...............Frame: 150
BBox: [999.0, 357.0, 67.0, 167.0, 1.0, 2.0], and its Track ID: 24
BBox: [899.0, 353.0, 45.0, 127.0, 1.0, 4.0], and its Track ID: 4
BBox: [805.0, 345.0, 40.0, 124.0, 1.0, 7.0], and its Track ID: 51
BBox: [1796.0, 320.0, 194.0, 589.0, 0.64103, 9.0], and its Track ID: 1
BBox: [60.0, 351.0, 107.0, 243.0, 1.0, 11.0], and its Track ID: 71
BBox: [592.0, 354.0, 59.0, 141.0, 0.5493, 12.0], and its Track ID: 59
BBox: [890.0, 357.0, 40.0, 119.0, 0.21950999999999998, 13.0], and its Track ID: 82
BBox: [-12.0, 392.0, 73.0, 195.0, 0.7973, 16.0], and its Track ID: 79
BBox: [922.0, 354.0, 38.0, 104.0, 0.41026, 18.0], and its Track ID: 81
BBox: [336.0, 372.0, 49.0, 131.0, 1.0, 23.0], and its Track ID: 70
BBox: [1100.0, 336.0, 62.0, 169.0, 1.0, 24.0], and its Track ID: 72
BBox: [999.0, 353.0, 30.0, 115.0, 0.034483, 26.0], and its Track ID: -1
BBox: [535.0, 368.0, 31.0, 103.0, 0.98798, 29.0], and its Track ID: 75
BBox: [513.0, 376.0, 28.0, 87.0, 0.75862, 30.0], and its Track ID: 

The variable `tid_count` is used to assign the unique ID to each track. 

In this implementation, the IOUTracker is designed to take object detection results frame by frame, not a whole video. It keeps the information of each track. You can access the active tracks via the `get_active_tracks()` method, and watch the finished tracks via the `get_finished_tracks()` method.
 
On the other hand, you can access the attribute `tid` of each track to get the relative track ID.

In [14]:
iouTracks = IOUTracker(detection_conf=0.2,
                       iou_threshold=0.2,
                       min_t=fps,
                       track_min_conf=0.5, 
                       assignedTID=False)

tid_count = 1

for label in range(1, len(LABELS), 1):
  # iou tracker
  iouTracks.read_detections_per_frame(detections=LABELS[label])

  active_tacks = iouTracks.get_active_tracks()
  finished_tracks = iouTracks.get_finished_tracks()

  if label % 50 == 0:
    print("Frame {} tracker Info: active {}, finished {}".format(label, len(active_tacks), len(finished_tracks)))
  
  # simple way to assign the tracker ID
  for act_track in active_tacks:
    if not act_track.tid:
      # assign track id to use the color
      act_track.tid = tid_count
      tid_count += 1

Frame 50 tracker Info: active 19, finished 3
Frame 100 tracker Info: active 15, finished 13
Frame 150 tracker Info: active 20, finished 17
Frame 200 tracker Info: active 21, finished 22
Frame 250 tracker Info: active 20, finished 27
Frame 300 tracker Info: active 22, finished 31
Frame 350 tracker Info: active 16, finished 40
Frame 400 tracker Info: active 17, finished 45
Frame 450 tracker Info: active 14, finished 54
Frame 500 tracker Info: active 16, finished 57
Frame 550 tracker Info: active 20, finished 61
Frame 600 tracker Info: active 20, finished 66
Frame 650 tracker Info: active 15, finished 71


## Metrics

ExampleEvaluateMOTDatasets helps you evaluate on each dataset or each video. Here we use the same ground truth data as the predictions to test the functionality due to the lack of a detector.

Notice that this step takes a long time to process.

In [15]:
Predictions, _ = loadLabel(src=LABEL_FILE_PATH, is_path=True, load_Pedestrian=True, load_Static_Person=True,
    visible_thresholde=0.2, format_style="metrics_dict")

In [16]:
evalMOT = ExampleEvaluateMOTDatasets(LABEL_FILE_PATH, predictions=Predictions, printOnScreen=True)

100%|██████████| 653/653 [06:39<00:00,  1.64it/s]

TP: 11703
FP:     0
FN:     0
GT: 11703
Fragment Number:    131
SwitchID Number:    283
Recall: 100.000%
Precision: 100.000%
Accuracy: 100.000%
F1 Score: 1.000
MOTA: 0.975818
Total trajectories: 60
MT Number: 60, Ratio: 100.000%
PT Number: 0, Ratio: 0.000%
ML Number: 0, Ratio: 0.000%





EvaluateOnMOTDatasets class helps you evaluate the multiple datastes.

In [17]:
evalMOTData = EvaluateOnMOTDatasets()

# it is simple to pass the whole package of results into the multiple-dataset evaluator
evalMOTData(evalMOT)
evalMOTRes = evalMOTData.getResults()

Metric mota: Value 0.9758181662821499
Metric recall: Value 1.0
Metric precision: Value 1.0
Metric accuracy: Value 1.0
Metric f1score: Value 1.0
Metric rateMT: Value 1.0
Metric ratePT: Value 0.0
Metric rateML: Value 0.0
Metric TP: Value 23406
Metric FP: Value 0
Metric FN: Value 0
Metric GT: Value 23406
Metric numFragments: Value 131
Metric numSwitchID: Value 283
Metric numMT: Value 60
Metric numPT: Value 0
Metric numML: Value 0
Metric numTraj: Value 60


You can also evaluate metrics on each MOT datasets and then summarize them.

In [18]:
evalMOTData = EvaluateOnMOTDatasets()
for key, _ in FRAME_FPS.items():
  LABEL_FILE_PATH = os.path.join(LABEL_PATH, key, "gt", "gt.txt")
  assert os.path.exists(LABEL_FILE_PATH), "{} is not found.".format(LABEL_FILE_PATH)
  print("Sample: {}".format(key))
  
  # here predictions flag is set to None, it makes to use the ground truth as the prediction
  evalMOT = ExampleEvaluateMOTDatasets(LABEL_FILE_PATH, predictions=None, printOnScreen=True)
  evalMOTData(evalMOT)
  print("", end="\n\n")
evalMOTRes = evalMOTData.getResults()