<a href="https://colab.research.google.com/github/freida20git/bird-detection-tracking/blob/main/Tracking_metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**MOT Metrics to calculate how good is the tracking of the birds**

NOTE:
this metrics do not compare the ID numbers directly. The tracker evaluation does not care if your predicted IDs are 1, 100, or 9999. What is measured is how consistently a predicted ID is matched to a ground truth ID across frames based on bounding box overlaps (IoU).

 so its not a problem if youre predictions have different ID numbers than ground truth.

In [None]:
!pip install ultralytics==8.0.122

Tracking annotations(prediction and ground truth) in json format (created in tracker.py notebook):


In [None]:
!gdown 1sAuzIr_w0MD-SzqDtwEvlHtlEGoGQIt-
!gdown 1TkpvxrVR8Uik4VV_Kw5RPjRn3_LoE49H

Downloading...
From: https://drive.google.com/uc?id=1sAuzIr_w0MD-SzqDtwEvlHtlEGoGQIt-
To: /content/pred_annotations.json
100% 348k/348k [00:00<00:00, 68.4MB/s]
Downloading...
From: https://drive.google.com/uc?id=1TkpvxrVR8Uik4VV_Kw5RPjRn3_LoE49H
To: /content/annotations_gt.json
100% 355k/355k [00:00<00:00, 46.6MB/s]


In [None]:
pred_annotations ="/content/pred_annotations.json"
gt_annotations = "/content/annotations_gt.json"

In [None]:
!pip install ultralytics
!pip install motmetrics
import random
import cv2
import torch
import ultralytics
ultralytics.checks()
from ultralytics import YOLO
from IPython.display import Image
from collections import defaultdict
import numpy as np
import motmetrics as mm
import pandas as pd

Ultralytics YOLOv8.0.122 🚀 Python-3.11.12 torch-2.6.0+cu124 CPU
Setup complete ✅ (2 CPUs, 12.7 GB RAM, 41.2/107.7 GB disk)


In [None]:
import json

def convert_json_to_mot(json_path, output_path):
    with open(json_path, "r") as f:
        data = json.load(f)

    with open(output_path, "w") as f_out:
        for frame in data:
            frame_number = frame["frame_number"]
            for obj in frame["objects"]:
                track_id = obj["track_id"]
                confidence = obj["confidence"]
                class_id = obj["class_id"]

                # Extract bounding box
                x1, y1 = obj["bbox"]["x1"], obj["bbox"]["y1"]
                x2, y2 = obj["bbox"]["x2"], obj["bbox"]["y2"]

                # Convert to MOT format (x, y, width, height)
                width, height = x2 - x1, y2 - y1

                # Write in MOT format
                f_out.write(f"{frame_number},{track_id},{x1},{y1},{width},{height},{confidence},{class_id},1\n")

# Example Usage
convert_json_to_mot("annotations_gt.json", "gt_mot.txt")
convert_json_to_mot('/content/pred_annotations.json', "pred_annotations.txt")

In [None]:
def compute_tracking_metrics(gt_file, pred_file):
    acc = mm.MOTAccumulator(auto_id=True)

    # Load GT and prediction files
    gt_data = pd.read_csv(gt_file, header=None)
    pred_data = pd.read_csv(pred_file, header=None)

    frames = sorted(set(gt_data[0]) | set(pred_data[0]))

    for frame in frames:
        gt_frame = gt_data[gt_data[0] == frame]
        pred_frame = pred_data[pred_data[0] == frame]

        gt_ids = gt_frame[1].tolist()
        pred_ids = pred_frame[1].tolist()

        def iou(boxA, boxB):
            xA, yA, wA, hA = boxA
            xB, yB, wB, hB = boxB
            x1, y1 = max(xA, xB), max(yA, yB)
            x2, y2 = min(xA + wA, xB + wB), min(yA + hA, yB + hB)
            interArea = max(0, x2 - x1) * max(0, y2 - y1)
            boxAArea = wA * hA
            boxBArea = wB * hB
            return 1 - (interArea / float(boxAArea + boxBArea - interArea))

        gt_boxes = gt_frame.iloc[:, 2:6].values.tolist()
        pred_boxes = pred_frame.iloc[:, 2:6].values.tolist()
        distance_matrix = [[iou(gt, pred) for pred in pred_boxes] for gt in gt_boxes]
        # Creates a cost matrix where:
        # Rows = Ground truth objects.
        # Columns = Predicted objects.
        # Each cell stores the distance (1 - IoU).
        acc.update(gt_ids, pred_ids, distance_matrix)

    mh = mm.metrics.create()
    summary = mh.compute(acc, metrics=['idf1', 'mota', 'motp', 'num_switches'], name="Tracking")
    print(summary)
    return summary
compute_tracking_metrics("/content/gt_mot.txt", "/content/pred_annotations.txt")

              idf1      mota      motp  num_switches
Tracking  0.984369  0.978836  0.132966             1


Unnamed: 0,idf1,mota,motp,num_switches
Tracking,0.984369,0.978836,0.132966,1


1. IDF1 (Identity F1 Score)
Measures how well the predicted identities match the ground truth identities.

2. MOTA (Multiple Object Tracking Accuracy)
Measures overall tracking performance by penalizing missed detections, false positives, and identity switches.

3. MOTP (Multiple Object Tracking Precision)
Measures the localization precision of tracked objects.
It calculates the average overlap (IoU) between matched predictions and ground truth across all frames.

Interpretation of Results:

High IDF1 → The tracker maintains object identities correctly.

High MOTA → The tracker has fewer false positives, false negatives, and ID switches.

Low MOTP → The tracker’s bounding boxes are well-aligned with the ground truth.



# turn into coco format for mAP calculation:

In [None]:
import json
from pathlib import Path

# Load your original JSON
with open("/content/annotations_gt.json") as f:
    data = json.load(f)

# COCO template
coco_gt = {
    "images": [],
    "annotations": [],
    "categories": [{"id": 14, "name": "bird"}]  # Add all classes here
}

# Convert each frame
for frame in data:  # Assuming `data` is a list of frames
    image_id = frame["frame_number"]
    coco_gt["images"].append({
        "id": image_id,
        "file_name": f"frame_{image_id:04d}.jpg",  # Match frame names
        "width": 1920,  # Set your video width
        "height": 1080,  # Set your video height
    })

    for obj in frame["objects"]:
        x1, y1, x2, y2 = obj["bbox"]["x1"], obj["bbox"]["y1"], obj["bbox"]["x2"], obj["bbox"]["y2"]
        coco_gt["annotations"].append({
            "id": len(coco_gt["annotations"]),  # Unique ID per annotation
            "image_id": image_id,
            "category_id": obj["class_id"],
            "bbox": [x1, y1, x2 - x1, y2 - y1],  # COCO: [x, y, width, height]
            "area": (x2 - x1) * (y2 - y1),
            "iscrowd": 0,
        })

# Save COCO JSON
with open("coco_gt.json", "w") as f:
    json.dump(coco_gt, f)

In [None]:
# Load trained YOLO model predictions
with open("pred_annotations.json") as f:
    pred_data = json.load(f)

coco_preds = []
for frame in pred_data:
    for obj in frame["objects"]:
        x1, y1, x2, y2 = obj["bbox"]["x1"], obj["bbox"]["y1"], obj["bbox"]["x2"], obj["bbox"]["y2"]
        coco_preds.append({
            "image_id": frame["frame_number"],
            "category_id": 14, #because it detected birds only
            "bbox": [x1, y1, x2 - x1, y2 - y1],
            "score": obj["confidence"],  # Critical for mAP calculation
        })

with open("coco_preds.json", "w") as f:
    json.dump(coco_preds, f)

In [None]:
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval

# Load ground truth and predictions
coco_gt = COCO("coco_gt.json")
coco_pred = coco_gt.loadRes("coco_preds.json")

# Evaluate
coco_eval = COCOeval(coco_gt, coco_pred, "bbox")
coco_eval.evaluate()
coco_eval.accumulate()
coco_eval.summarize()  # Prints mAP scores

loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.32s).
Accumulating evaluation results...
DONE (t=0.05s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.725
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.969
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.860
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.616
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.793
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.246
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.770
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets