# IOU Tracker

IOUTracker implements a tracking algorithm or method to track objects based on their Intersection-Over-Union (IOU) information across the consecutive frames. The core concept of this algorithm refers to the article (http://elvera.nue.tu-berlin.de/files/1517Bochinski2017.pdf). The idea or the assumption is based on an existing and powerful detector and the high frame rate across the consecutive frames. Under this assumption, you can conduct the object tracking with only the localization and the IOU information. The algorithm conducts under a super-high frame rate and provides a foundation for more complicated calculations upon it. 

On the other hand, such an algorithm requires an evaluation. The evaluation of this implement also refers to two articles, MOT16 benchmark (https://arxiv.org/abs/1603.00831) and Multi-Target Tracker (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.8335&rep=rep1&type=pdf).

This implementation uses MOT17Det dataset (https://motchallenge.net/data/MOT17Det/) as an example.

* More information please refer to https://github.com/jiankaiwang/ioutracker.
* More example videos please refer to .

In [1]:
import os
import tqdm
import time
import logging

try:
  # you must install ioutracker first
  from ioutracker import loadLabel, outputAsFramesToVideo, IOUTracker
  from ioutracker import EvaluateOnMOTDatasets, ExampleEvaluateMOTDatasets, EvaluateByFrame
  logging.warning("Load ioutracker from the installed package.")
except Exception as e:
  import sys
  modulePaths = [os.path.join("..")]
  for path in modulePaths: sys.path.append(path)
  from ioutracker.dataloaders.MOTDataLoader import loadLabel
  from ioutracker.inference.MOTDet17Main import outputAsFramesToVideo
  from ioutracker.inference.MOTDet17Main import outputAsFramesToVideo
  from ioutracker.src.IOUTracker import IOUTracker
  from ioutracker.metrics.MOTMetrics import EvaluateOnMOTDatasets, ExampleEvaluateMOTDatasets, EvaluateByFrame
  logging.warning("Load ioutracker from the relative path.")



## Data Preprocessing

You can use the shell script under the path, (`./ioutracker/dataloaders/MOTDataDownloader.sh`), in the git repository to download the dataset.

In [2]:
SUB_DATASET = "train"
VERSION = "MOT17Det"
LOCAL_PATH = os.path.join("/", "tmp", "MOT")

# you can change the path pointing to the dataset
LABEL_PATH = os.path.join(LOCAL_PATH, "{}".format(VERSION + "Labels"), SUB_DATASET)
assert os.path.exists(LABEL_PATH), "{} is not found.".format(LABEL_PATH)

# you can change the path pointing to the dataset
FRAME_PATH = os.path.join(LOCAL_PATH, "{}".format(VERSION), SUB_DATASET)
assert os.path.exists(FRAME_PATH), "{} is not found.".format(FRAME_PATH)

## Sample Selection

In [3]:
totalSamples = next(os.walk(os.path.join(LABEL_PATH)))[1]
print(totalSamples)

['MOT17-13', 'MOT17-09', 'MOT17-11', 'MOT17-10', 'MOT17-04', 'MOT17-05', 'MOT17-02']


In [4]:
SAMPLE = "MOT17-04"
LABEL_FILE_PATH = os.path.join(LABEL_PATH, SAMPLE, "gt", "gt.txt")
assert os.path.exists(LABEL_FILE_PATH), "{} is not found.".format(LABEL_FILE_PATH)

FRAME_FILE_PATH = os.path.join(FRAME_PATH, SAMPLE, "img1")
assert os.path.exists(FRAME_FILE_PATH), "{} is not found.".format(FRAME_FILE_PATH)

## Output the tracking result on the video

In [5]:
FRAME_FPS = {"MOT17-13": 25, "MOT17-11": 30, "MOT17-10": 30, "MOT17-09": 30,
             "MOT17-05": 14, "MOT17-04": 30, "MOT17-02": 30}
assert SAMPLE in list(FRAME_FPS.keys()), "{} was not found.".format(SAMPLE)
fps = FRAME_FPS[SAMPLE]
print("Sample {} with FPS: {}".format(SAMPLE, fps))

Sample MOT17-04 with FPS: 30


Check whether or not the folder is existing, or create it if it isn't.

In [6]:
tracking_output = os.path.join(LOCAL_PATH, "tracking_output".format(VERSION))
if not os.path.exists(tracking_output):
  try:
    os.mkdir(tracking_output)
    print("Created the output path: {}".format(tracking_output))
  except Exception as e:
    raise Exception("Can't create the folder. ({})".format(e))

In this example, we introduce how to output the tracking result on the consecutive frame to a video file. Notice the flag `plotting` must be set to `True` if you want to output the video.

In [7]:
outputAsFramesToVideo(detection_conf=0.2,
                      iou_threshold=0.2,
                      min_t=fps,
                      track_min_conf=0.5,
                      labelFilePath=LABEL_FILE_PATH,
                      frameFilePath=FRAME_FILE_PATH,
                      trackingOutput=tracking_output,
                      fps=fps,
                      outputFileName=SAMPLE,
                      plotting=True)

100%|██████████| 1049/1049 [01:39<00:00, 10.58it/s]


Total time cost: 192.13720297813416


You can move the video to the desired path like below.

```sh
mv /tmp/MOT/tracking_output/tracking_MOT17-04.mp4 ~/Desktop/
```

## A simple example from scratch

Load the MOT data first via the API `loadLabel`. You can also use `help` to take a look into the parameters.

In [8]:
LABELS, DFPERSONS = loadLabel(src=LABEL_FILE_PATH, format_style="metrics_dict")

In [9]:
help(loadLabel)

Help on function loadLabel in module ioutracker.dataloaders.MOTDataLoader:

loadLabel(src, is_path=True, load_Pedestrian=True, load_Static_Person=True, visible_thresholde=0, format_style='onlybbox')
    LoadLabel: Load a label file in the csv format.
    
    Args:
      src: the MOT label file path (available when is_path is True)
      is_path: True or False for whether the src is the file path or not
      load_Pedestrian: whether to load the pedestrian data or not
      load_Static_Person: whether to load the statuc person data or not
      visible_thresholde: the threshold for filtering the invisible person data
      format_style: provides different styles in the lists,
                    "onlybbox" (func: formatBBoxAndVis), "onlybbox_dict" (func: formatBBoxAndVis),
                    "metrics" (func: formatForMetrics), "metrics_dict" (func: formatForMetrics)
    
    Returns:
      objects_in_frames: a list contains the person detection information per frames



LABELS is a dictionary whose key is the frame ID and whose value is each object detection result. The result keeps the localization and visibility in a list, more detail is `[x, y, w, h, visibility]`.

In [10]:
len(LABELS[1]), LABELS[1]

(46,
 [[1363.0, 569.0, 103.0, 241.0, 0.86014, 1.0],
  [371.0, 410.0, 80.0, 239.0, 1.0, 2.0],
  [103.0, 549.0, 83.0, 251.0, 1.0, 3.0],
  [1734.0, 457.0, 76.0, 213.0, 0.9838600000000001, 4.0],
  [1098.0, 980.0, 78.0, 208.0, 0.48325, 5.0],
  [632.0, 761.0, 100.0, 251.0, 0.31903000000000004, 6.0],
  [687.0, 206.0, 77.0, 112.0, 0.7922600000000001, 45.0],
  [678.0, 291.0, 57.0, 121.0, 0.57603, 46.0],
  [796.0, 149.0, 60.0, 175.0, 1.0, 60.0],
  [1789.0, 206.0, 65.0, 184.0, 1.0, 61.0],
  [1487.0, 71.0, 53.0, 145.0, 0.76535, 62.0],
  [1636.0, 265.0, 66.0, 181.0, 1.0, 63.0],
  [1557.0, 254.0, 60.0, 192.0, 0.26705, 65.0],
  [1104.0, 212.0, 56.0, 181.0, 1.0, 66.0],
  [1031.0, 133.0, 80.0, 183.0, 0.9436399999999999, 67.0],
  [228.0, 401.0, 63.0, 203.0, 1.0, 68.0],
  [264.0, 365.0, 60.0, 221.0, 0.61542, 69.0],
  [872.0, 122.0, 58.0, 180.0, 1.0, 70.0],
  [912.0, 129.0, 48.0, 161.0, 0.61224, 71.0],
  [1040.0, -39.0, 46.0, 123.0, 0.67742, 72.0],
  [996.0, -41.0, 47.0, 121.0, 0.60109, 73.0],
  [1449.0, 

DFPERSONS is a pandas dataframe object that is processed and filtered unnecessary columns.

In [11]:
DFPERSONS.head(5)

Unnamed: 0,fid,uid,bX,bY,bW,bH,conf,class,visible
0,1,1,1363,569,103,241,1,1,0.86014
1,2,1,1362,568,103,241,1,1,0.86173
2,3,1,1362,568,103,241,1,1,0.86173
3,4,1,1362,568,103,241,1,1,0.86173
4,5,1,1362,568,103,241,1,1,0.86173


You can instantiate an IOUTracker to start the algorithm.

In [12]:
help(IOUTracker)

Help on class IOUTracker in module ioutracker.src.IOUTracker:

class IOUTracker(builtins.object)
 |  IOUTracker(detection_conf=0.2, iou_threshold=0.5, min_t=1, track_min_conf=0.2, assignedTID=True)
 |  
 |  IOUTracker implements the IOU tracker algorithm details.
 |  
 |  Methods defined here:
 |  
 |  __call__(self, detections, returnFinishedTrackers=False, runPreviousVersion=False)
 |      Runs the IOU tracker algorithm across the consecutive frames.
 |      
 |      Args:
 |        detections: a list contains multiple detections per frame, each detection
 |                    keeps [[bX, bY, bWidth, bHeight, visible], [], []]
 |      
 |        returnFinishedTrackers: a bool for returning finished trackers
 |      
 |        runPreviousVersion: whether to run the previous version of IOUTracker algorithm.
 |      
 |      Returns:
 |        detectionMapping: a list contains multiple dictionary-structure objects
 |                          representing each detection, the order of tho

Create a ioutracker that implements the IOU tracker algorithm. In this example, we use the default ID increment mechanism to get the tracker ID for each box.

In [13]:
iouTracks = IOUTracker(detection_conf=0.2,
                       iou_threshold=0.2,
                       min_t=fps,
                       track_min_conf=0.5, 
                       assignedTID=True)

differs = 0
allFrameIds = list(LABELS.keys())
start, end = min(allFrameIds), max(allFrameIds) + 1

for frameIdx in range(start, end, 1):
  # apply the IOUTracker algorithm
  # you can set runPreviousVersion to choose the latest or previous version
  # the latest version saves lots of time to return the tracker ID
  detectionMapping, finished = iouTracks(LABELS[frameIdx], returnFinishedTrackers=True, runPreviousVersion=False)

  if frameIdx % (fps * 5) == 0:
    print("Frame: {}".format(frameIdx))
    for bboxIdx in range(len(LABELS[frameIdx])):
      print("BBox: {}, and its Track ID: {}".format(LABELS[frameIdx][bboxIdx], detectionMapping[bboxIdx]["tid"]))
    print("...")

Frame: 150
BBox: [1358.0, 567.0, 103.0, 242.0, 0.8623, 1.0], and its Track ID: 1
BBox: [421.0, 256.0, 75.0, 220.0, 1.0, 2.0], and its Track ID: 2
BBox: [101.0, 543.0, 83.0, 253.0, 0.95735, 3.0], and its Track ID: 3
BBox: [1721.0, 457.0, 76.0, 209.0, 0.88033, 4.0], and its Track ID: 4
BBox: [1048.0, 913.0, 71.0, 273.0, 0.61314, 5.0], and its Track ID: 5
BBox: [629.0, 761.0, 100.0, 251.0, 0.31903000000000004, 6.0], and its Track ID: 6
BBox: [687.0, 206.0, 77.0, 112.0, 0.726, 45.0], and its Track ID: 57
BBox: [678.0, 291.0, 57.0, 121.0, 0.57603, 46.0], and its Track ID: 8
BBox: [795.0, 149.0, 60.0, 174.0, 0.5852, 60.0], and its Track ID: 9
BBox: [1490.0, 69.0, 52.0, 147.0, 0.74528, 62.0], and its Track ID: 11
BBox: [1313.0, 177.0, 49.0, 157.0, 0.30772, 63.0], and its Track ID: 12
BBox: [1283.0, 172.0, 62.0, 179.0, 0.8222200000000001, 65.0], and its Track ID: 43
BBox: [596.0, 176.0, 60.0, 159.0, 0.40665999999999997, 66.0], and its Track ID: 7
BBox: [861.0, 203.0, 68.0, 192.0, 0.9165, 67.0]

Frame: 600
BBox: [1575.0, 589.0, 95.0, 234.0, 0.9270799999999999, 1.0], and its Track ID: 1
BBox: [472.0, -59.0, 60.0, 182.0, 0.30055, 2.0], and its Track ID: 2
BBox: [646.0, 546.0, 79.0, 253.0, 0.40565999999999997, 3.0], and its Track ID: 3
BBox: [1710.0, 455.0, 89.0, 210.0, 0.82475, 4.0], and its Track ID: 4
BBox: [991.0, 939.0, 76.0, 249.0, 0.568, 5.0], and its Track ID: 5
BBox: [639.0, 757.0, 70.0, 250.0, 0.25856999999999997, 6.0], and its Track ID: 6
BBox: [687.0, 206.0, 77.0, 112.0, 0.7922600000000001, 45.0], and its Track ID: 57
BBox: [678.0, 291.0, 57.0, 121.0, 0.57603, 46.0], and its Track ID: 68
BBox: [801.0, 146.0, 61.0, 170.0, 1.0, 60.0], and its Track ID: 9
BBox: [972.0, -74.0, 49.0, 141.0, 0.47183, 63.0], and its Track ID: 12
BBox: [916.0, -83.0, 53.0, 152.0, 0.40087, 65.0], and its Track ID: 43
BBox: [620.0, 66.0, 40.0, 142.0, 0.32236, 66.0], and its Track ID: 90
BBox: [235.0, 264.0, 61.0, 184.0, 1.0, 68.0], and its Track ID: 16
BBox: [168.0, 247.0, 58.0, 206.0, 1.0, 69.

Frame: 1050
BBox: [1575.0, 591.0, 100.0, 239.0, 0.45578, 1.0], and its Track ID: 1
BBox: [587.0, 536.0, 57.0, 256.0, 0.98551, 3.0], and its Track ID: 3
BBox: [1721.0, 457.0, 76.0, 211.0, 0.8796299999999999, 4.0], and its Track ID: 4
BBox: [875.0, 933.0, 73.0, 253.0, 0.58268, 5.0], and its Track ID: 5
BBox: [639.0, 757.0, 70.0, 250.0, 0.25856999999999997, 6.0], and its Track ID: 6
BBox: [687.0, 206.0, 77.0, 112.0, 0.7922600000000001, 45.0], and its Track ID: 57
BBox: [678.0, 291.0, 57.0, 121.0, 0.57603, 46.0], and its Track ID: 68
BBox: [793.0, 140.0, 71.0, 170.0, 0.8992, 60.0], and its Track ID: 9
BBox: [631.0, 66.0, 39.0, 137.0, 0.21685, 66.0], and its Track ID: 109
BBox: [485.0, 174.0, 47.0, 166.0, 0.60329, 68.0], and its Track ID: 153
BBox: [516.0, 171.0, 71.0, 188.0, 0.92945, 69.0], and its Track ID: 84
BBox: [753.0, 74.0, 51.0, 161.0, 0.8205100000000001, 74.0], and its Track ID: 96
BBox: [731.0, 35.0, 58.0, 164.0, 0.5211100000000001, 75.0], and its Track ID: 23
BBox: [553.0, -47.0

The variable `tid_count` is used to assign the unique ID to each track. 

In this implementation, the IOUTracker is designed to take object detection results frame by frame, not a whole video. It keeps the information of each track. You can access the active tracks via the `get_active_tracks()` method, and watch the finished tracks via the `get_finished_tracks()` method.
 
On the other hand, you can assign the custom tracker IDs and access the attribute `tid` of each track to get the relative track ID. Notice the parameter `assignedTID` required to be `False`.

In [14]:
iouTracks = IOUTracker(detection_conf=0.2,
                       iou_threshold=0.2,
                       min_t=fps,
                       track_min_conf=0.5, 
                       assignedTID=False)

tid_count = 1

for label in range(1, len(LABELS), 1):
  # iou tracker
  iouTracks.read_detections_per_frame(detections=LABELS[label])

  active_tacks = iouTracks.get_active_tracks()
  finished_tracks = iouTracks.get_finished_tracks()

  if label % 50 == 0:
    print("Frame {} tracker Info: active {}, finished {}".format(label, len(active_tacks), len(finished_tracks)))
  
  # simple way to assign the tracker ID
  for act_track in active_tacks:
    if not act_track.tid:
      # assign track id to use the color
      act_track.tid = tid_count
      tid_count += 1

Frame 50 tracker Info: active 39, finished 2
Frame 100 tracker Info: active 37, finished 7
Frame 150 tracker Info: active 40, finished 7
Frame 200 tracker Info: active 41, finished 7
Frame 250 tracker Info: active 43, finished 9
Frame 300 tracker Info: active 45, finished 11
Frame 350 tracker Info: active 47, finished 11
Frame 400 tracker Info: active 46, finished 13
Frame 450 tracker Info: active 45, finished 17
Frame 500 tracker Info: active 44, finished 17
Frame 550 tracker Info: active 44, finished 19
Frame 600 tracker Info: active 44, finished 21
Frame 650 tracker Info: active 43, finished 27
Frame 700 tracker Info: active 40, finished 32
Frame 750 tracker Info: active 36, finished 34
Frame 800 tracker Info: active 43, finished 34
Frame 850 tracker Info: active 44, finished 35
Frame 900 tracker Info: active 46, finished 35
Frame 950 tracker Info: active 49, finished 36
Frame 1000 tracker Info: active 48, finished 37


## Metrics

ExampleEvaluateMOTDatasets helps you evaluate on each dataset or each video. Here we use the same ground truth data as the predictions to test the functionality due to the lack of a detector.

**Notice that this step takes a long time to process.**

In [15]:
Predictions, _ = loadLabel(src=LABEL_FILE_PATH, is_path=True, load_Pedestrian=True, load_Static_Person=True,
    visible_thresholde=0.2, format_style="metrics_dict")

In [16]:
evalMOT = ExampleEvaluateMOTDatasets(LABEL_FILE_PATH, predictions=Predictions, printOnScreen=True)

100%|██████████| 1049/1049 [36:06<00:00,  2.07s/it]

TP: 45298
FP:     0
FN:     0
GT: 45298
Fragment Number:     78
SwitchID Number:     94
Recall: 100.000%
Precision: 100.000%
Accuracy: 100.000%
F1 Score: 1.000
MOTA: 0.997925
FN: 0
FP: 0
IDSW: 94
GT: 45298
Total trajectories: 84
MT Number: 84, Ratio: 100.000%
PT Number: 0, Ratio: 0.000%
ML Number: 0, Ratio: 0.000%





On the other hand, you can also evaluate the tracking result to the ground truth. **Notice the tracking result must contain the tracking ID.**

In [17]:
def EvaluateFrameByFrame(groundTruth, predictions, printOnScreen=False):
  evalFrame = EvaluateByFrame(requiredTracking=False)  
  
  gtFKeys = list(groundTruth.keys())
  predFKeys = list(predictions.keys())
  numTNOnFrame = 0
  
  ttlFid = list(set(gtFKeys + predFKeys))
  ttlFid = sorted(ttlFid)
  start = ttlFid[0]
  end = ttlFid[-1] + 1
  logging.warning("start {} and end {}".format(start, end))
  
  for fid in tqdm.trange(start, end, 1):
    if fid not in gtFKeys and fid not in predFKeys:
      # skip the true negative
      numTNOnFrame += 1
      continue
    
    GTFrameInfo = groundTruth[fid] if fid in gtFKeys else [[]]
    prediction = predictions[fid] if fid in predFKeys else [[]]
    
    evalFrame.evaluateOnPredsWithTrackerID(GTFrameInfo, prediction)
    
  metaRes = evalFrame.getMetricsMeta(printOnScreen=printOnScreen)
  cmRes = evalFrame.getCM(printOnScreen=printOnScreen)
  motaRes = evalFrame.getMOTA(printOnScreen=printOnScreen)
  trajRes = evalFrame.getTrackQuality(printOnScreen=printOnScreen)
  
  if printOnScreen: print("The true negatives: {} frames".format(numTNOnFrame))
  
  return metaRes, cmRes, motaRes, trajRes

In [18]:
def transformUID(dicts, dataType="int"):
  dictCopy = dicts.copy()
  for key in list(dictCopy.keys()):
    if dataType in ["int", "integer"]:
      for objIdx in range(len(dictCopy[key])):
        dictCopy[key][objIdx][5] = int(dictCopy[key][objIdx][5])
    elif dataType in ["string", "str"]:
      for objIdx in range(len(dictCopy[key])):
        dictCopy[key][objIdx][5] = str(dictCopy[key][objIdx][5])
  return dictCopy

In [19]:
groundTruth, _ = loadLabel(src=LABEL_FILE_PATH, is_path=True, load_Pedestrian=True, load_Static_Person=True,
    visible_thresholde=0.2, format_style="metrics_dict")

predictions, _ = loadLabel(src=LABEL_FILE_PATH, is_path=True, load_Pedestrian=True, load_Static_Person=True,
    visible_thresholde=0.2, format_style="metrics_dict")

# transform the datatype of UID
groundTruth = transformUID(groundTruth, dataType="int")
predictions = transformUID(predictions, dataType="int")

evalMOT = EvaluateFrameByFrame(groundTruth=groundTruth, predictions=predictions, printOnScreen=True)

100%|██████████| 1050/1050 [16:33<00:00,  1.06it/s]

TP: 45348
FP:     0
FN:     0
GT: 45348
Fragment Number:     78
SwitchID Number:      0
Recall: 100.000%
Precision: 100.000%
Accuracy: 100.000%
F1 Score: 1.000
MOTA: 1.000000
FN: 0
FP: 0
IDSW: 0
GT: 45348
Total trajectories: 84
MT Number: 84, Ratio: 100.000%
PT Number: 0, Ratio: 0.000%
ML Number: 0, Ratio: 0.000%
The true negatives: 0 frames





The above two types of evaluation look similar, but they are different. The first one uses the detector result, applies an IOU Tracker algorithm on it, and evaluates the metrics. The second one uses the entire result, including the detector and the tracking result, and evaluates the metrics.

EvaluateOnMOTDatasets class helps you evaluate the multiple datastes.

In [20]:
evalMOTData = EvaluateOnMOTDatasets()

# it is simple to pass the whole package of results into the multiple-dataset evaluator
evalMOTData(evalMOT)
evalMOTRes = evalMOTData.getResults()

Metric mota: Value 1.0
Metric recall: Value 0.9999999999997795
Metric precision: Value 0.9999999999997795
Metric accuracy: Value 0.9999999999997795
Metric f1score: Value 0.9999999949997795
Metric rateMT: Value 1.0
Metric ratePT: Value 0.0
Metric rateML: Value 0.0
Metric TP: Value 90696
Metric FP: Value 0
Metric FN: Value 0
Metric GT: Value 90696
Metric numFragments: Value 78
Metric numSwitchID: Value 0
Metric numMT: Value 84
Metric numPT: Value 0
Metric numML: Value 0
Metric numTraj: Value 84


You can also evaluate metrics on each MOT datasets and then summarize them.

In [21]:
evalMOTData = EvaluateOnMOTDatasets()
for key, _ in FRAME_FPS.items():
  LABEL_FILE_PATH = os.path.join(LABEL_PATH, key, "gt", "gt.txt")
  assert os.path.exists(LABEL_FILE_PATH), "{} is not found.".format(LABEL_FILE_PATH)
  print("Sample: {}".format(key))
  
  # here predictions flag is set to None, it makes to use the ground truth as the prediction
  evalMOT = ExampleEvaluateMOTDatasets(LABEL_FILE_PATH, predictions=None, printOnScreen=True)
  evalMOTData(evalMOT)
  print("", end="\n\n")
evalMOTRes = evalMOTData.getResults()