## Drone follow me using Kalman Filters

## Task 1: Video Library

In [None]:
# !conda install -c conda-forge pytube -y

In [None]:
from pytube import YouTube
import numpy as np
import cv2
from matplotlib import pyplot as plt
from tqdm import tqdm
import pandas as pd

We use Pytube python library for downloading the videos from YouTube. Pytube is a small, dependency-free Python module for accessing videos from the internet. In this use-case, we only download the videos and not the captions as we do not seem to use captions in this assignment.

Official Docs: https://pytube.io/en/latest/user/quickstart.html

In [None]:
# write function to build the YouTube module's object by supplying the URL in parameter
def Download(link, output_path, filename):
    youtubeObject = YouTube(link)
    youtubeObject = youtubeObject.streams.get_highest_resolution() # obtain video's highest resolution
    try:
        youtubeObject.download(output_path = output_path,
                               filename = filename) # download at specified output path and filename
    except:
        print("An error has occurred")
    print("Download is completed successfully")

In [None]:
# Download Video 1:
link = input("Enter the YouTube video URL: ") # input the link and press enter
Download(link, output_path = 'inputs/', filename = 'video1.mp4') # begin downloading once we hit 'enter'

Enter the YouTube video URL:  https://www.youtube.com/watch?v=WeF4wpw7w9k


Download is completed successfully


In [None]:
# Download Video 2:
link = input("Enter the YouTube video URL: ")
Download(link, output_path = 'inputs/', filename = 'video2.mp4')

Enter the YouTube video URL:  https://www.youtube.com/watch?v=2NFwY15tRtA


Download is completed successfully


In [None]:
# Download Video 3:
link = input("Enter the YouTube video URL: ")
Download(link, output_path = 'inputs/', filename = 'video3.mp4')

Enter the YouTube video URL:  https://www.youtube.com/watch?v=5dRramZVu2Q


Download is completed successfully


## Task 2: Object Detection

We use the Ultralytics API to instantiate a YOLOv8m (medium size version of YOLOv8) object detection model, pretrained on COCO dataset.

For small objects since we use drone imaging, lower-stride models typically fare better as these models generally maintain more detail from the input image, which can be essential for detecting and correctly classifying small objects. Hence we chose to use YOLOv8m model for our purposes. This medium size model also does not have extremely large number of parameters, to run inference using local system.

In [None]:
from ultralytics import YOLO

In [None]:
# Load a COCO-pretrained YOLOv8m model
model = YOLO('yolov8m.pt')

# Display model information
model.info()

YOLOv8m summary: 295 layers, 25902640 parameters, 0 gradients, 79.3 GFLOPs


(295, 25902640, 0, 79.3204224)

#### Visualize predictions on local computer

The following code utilizes YOLOv8 pre-trained on COCO dataset for object tracking on a specified input video stream.

The object detection tracking results are visualized by annotating the frame with bounding boxes, labels and confidence scores for detected objects. The annotated frame is then displayed in a separate window.

In [None]:
import cv2
from ultralytics import YOLO


# Open the video file
video_path = "inputs/video1.mp4"
cap = cv2.VideoCapture(video_path)

# Loop through the video frames
while cap.isOpened():
    # Read a frame from the video
    success, frame = cap.read()

    if success:
        # Run YOLOv8 tracking on the frame, persisting tracks between frames
        results = model.track(frame, persist=True)

        # Visualize the results on the frame
        annotated_frame = results[0].plot()

        # Display the annotated frame
        cv2.imshow("YOLOv8 Tracking", annotated_frame)

        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    else:
        # Break the loop if the end of the video is reached
        break

# Release the video capture object and close the display window
cap.release()
cv2.destroyAllWindows()


0: 352x640 (no detections), 129.3ms
Speed: 3.4ms preprocess, 129.3ms inference, 5.1ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 142.3ms
Speed: 0.8ms preprocess, 142.3ms inference, 0.5ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 137.0ms
Speed: 0.9ms preprocess, 137.0ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 motor, 139.4ms
Speed: 1.3ms preprocess, 139.4ms inference, 2.8ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 128.7ms
Speed: 0.9ms preprocess, 128.7ms inference, 0.2ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 122.0ms
Speed: 0.9ms preprocess, 122.0ms inference, 0.1ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 126.3ms
Speed: 0.9ms preprocess, 126.3ms inference, 0.2ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 131.1ms
Speed: 1.0ms preprocess, 131.

#### Save the Video file with predictions using YOLOv8 (medium) pre-trained on COCO dataset

We now save the result of object detection on our three input videos using the YOLOv8 model pre-trained on COCO, we use this model as the baseline. We make further improvement to this object detection model in the next sub-section.

In [None]:
results = model.track(source='inputs/video1.mp4', persist=True, save=True)
# setting stream=True, will not accumulate inference results in RAM



errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 (no detections), 146.3ms
video 1/1 (frame 2/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 (no detections), 130.8ms
video 1/1 (frame 3/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 (no detections), 126.2ms
video 1/1 (frame 4/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 (no detections), 121.4ms
video 1/1 (frame 5/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 1 person, 121.8ms
video 1/1 (frame 6/1347) /Users/rishienandhan/Drone/inputs/

In [None]:
model = YOLO('yolov8m.pt') # re-initialize model to reset tracking for for handling multiple videos sequentially
results = model.track(source='inputs/video2.mp4', persist=True, save=True)




errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 person, 1 train, 141.4ms
video 1/1 (frame 2/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 person, 1 boat, 131.8ms
video 1/1 (frame 3/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 person, 128.8ms
video 1/1 (frame 4/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 person, 120.8ms
video 1/1 (frame 5/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 person, 119.1ms
video 1/1 (frame 6/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x6

In [None]:
model = YOLO('yolov8m.pt') # re-initialize model to reset tracking for for handling multiple videos sequentially
results = model.track(source='/Users/rishienandhan/Drone/inputs/video3.mp4', persist=True, save=True)




errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 4 persons, 2 trucks, 147.9ms
video 1/1 (frame 2/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 1 person, 1 motorcycle, 2 trucks, 138.6ms
video 1/1 (frame 3/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 3 persons, 1 motorcycle, 2 trucks, 136.9ms
video 1/1 (frame 4/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 2 persons, 1 motorcycle, 2 trucks, 136.6ms
video 1/1 (frame 5/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 2 persons, 2 trucks,

We saved the videos with predictions using YOLOv8m (medium size) pretrained on COCO dataset to 'outputs/COCO_pretrain/' folder to a shared drive which can be accessed using this link: https://drive.google.com/drive/folders/1Bd1D3jyy9EzcCr2YJHxMZ4rRgvGimuCh?usp=sharing.

Notice how in video1.mp4, there are multiple erroneous detections such as cake, boat, knife, train, clock, along with irregular detection of cars, being our object of interest. Also, notice how the detection of the car in video1 is fluctuating across frames; this can be attributed to the object being extremely small in the Drone imagery and hence the object detector fails to constantly pick it up across frames. The same pattern can be observed in output videos video2.mp4, video3.mp4 as well.

So, as an imporvement to our object detection model, we now train it on the VisDrone dataset.


#### Train YOLOv8m on VisDrone dataset

[Bonus Points Question] We now train our YOLOv8m model on VisDrone dataset, which contains objects from 10 classes inlcuding bicycle, pedestrians, car, truck, etc. There are multiple detections across 6471 images that we use for training, 548 images for validation, and 1610 images for test purposes. Each image contains multiple detections across various objects from 10 classes. 

The training part of this pipeline was performed in Colab Pro using V100 GPU, using an image size of 640x640 for 100 epochs. The total training took ~8hours using this GPU. We have used a YAML file to define the dataset configuration. It contains information about the dataset's paths, classes, and other relevant information. The VisDrone.yaml file present in the parent directory is maintained for this purpose. We use the help of the Ultralytics API for training our model. 

The training results across 100 epochs are present in the cell outputs of below cells.


In [None]:
# Re-load the COCO-pretrained YOLOv8m model
model = YOLO('yolov8m.pt')

# Display model information
model.info()

Downloading https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8m.pt to 'yolov8m.pt'...


100%|██████████| 49.7M/49.7M [00:00<00:00, 293MB/s]


YOLOv8m summary: 295 layers, 25902640 parameters, 0 gradients, 79.3 GFLOPs


(295, 25902640, 0, 79.3204224)

In [None]:
# Train the model on the VisDrone dataset for 10 epochs
results = model.train(data='VisDrone.yaml', epochs=100, imgsz=640,
                      device=0, verbose=True, plots=True, batch=-1,
                      project='/content/drive/My Drive/Drone')


Ultralytics YOLOv8.1.42 🚀 Python-3.10.12 torch-2.2.1+cu121 CUDA:0 (Tesla V100-SXM2-16GB, 16151MiB)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolov8m.pt, data=VisDrone.yaml, epochs=100, time=None, patience=100, batch=-1, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=/content/drive/My Drive/Drone, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show

100%|██████████| 6.23M/6.23M [00:00<00:00, 192MB/s]


[34m[1mAMP: [0mchecks passed ✅
[34m[1mAutoBatch: [0mComputing optimal batch size for imgsz=640
[34m[1mAutoBatch: [0mCUDA:0 (Tesla V100-SXM2-16GB) 15.77G total, 0.30G reserved, 0.23G allocated, 15.24G free
      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
    25862110       79.09         0.810         33.79         120.1        (1, 3, 640, 640)                    list
    25862110       158.2         1.229         28.06         61.66        (2, 3, 640, 640)                    list
    25862110       316.4         1.829         29.77         61.94        (4, 3, 640, 640)                    list
    25862110       632.8         3.142         33.75         74.93        (8, 3, 640, 640)                    list
    25862110        1266         6.463         55.22         89.94       (16, 3, 640, 640)                    list
[34m[1mAutoBatch: [0mUsing batch-size 23 for CUDA:0 9.50G/15.77G (60%) ✅


[34m[1mtrain: [0mScanning /content/datasets/VisDrone/VisDrone2019-DET-train/labels.cache... 6471 images, 0 backgrounds, 0 corrupt: 100%|██████████| 6471/6471 [00:00<?, ?it/s]






[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))


[34m[1mval: [0mScanning /content/datasets/VisDrone/VisDrone2019-DET-val/labels.cache... 548 images, 0 backgrounds, 0 corrupt: 100%|██████████| 548/548 [00:00<?, ?it/s]


Plotting labels to /content/drive/My Drive/Drone/train/labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m SGD(lr=0.01, momentum=0.9) with parameter groups 77 weight(decay=0.0), 84 weight(decay=0.0005390625), 83 bias(decay=0.0)
[34m[1mTensorBoard: [0mmodel graph visualization added ✅
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to [1m/content/drive/My Drive/Drone/train[0m
Starting training for 100 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      1/100      15.6G      1.455      1.565     0.9884        445        640: 100%|██████████| 282/282 [05:44<00:00,  1.22s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:13<00:00,  1.12s/it]


                   all        548      38759      0.402      0.294      0.287      0.169

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      2/100      12.7G      1.408      1.117     0.9451        763        640: 100%|██████████| 282/282 [05:21<00:00,  1.14s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.409      0.307      0.295      0.171

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      3/100      15.2G      1.447       1.12     0.9517        453        640: 100%|██████████| 282/282 [05:17<00:00,  1.13s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.04s/it]


                   all        548      38759      0.406      0.315      0.304      0.178

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      4/100      15.7G      1.448      1.089     0.9525        705        640: 100%|██████████| 282/282 [05:15<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:14<00:00,  1.18s/it]


                   all        548      38759      0.384      0.298       0.29      0.166

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      5/100      14.3G      1.412       1.05      0.946        493        640: 100%|██████████| 282/282 [05:16<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.04s/it]


                   all        548      38759      0.446      0.317       0.32      0.185

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      6/100      13.9G      1.391      1.028     0.9397        639        640: 100%|██████████| 282/282 [05:14<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759       0.44      0.339      0.335      0.196

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      7/100      15.2G      1.373      1.005     0.9352        530        640: 100%|██████████| 282/282 [05:16<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.428       0.34      0.333      0.197

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      8/100      12.8G       1.35      0.982     0.9308        232        640: 100%|██████████| 282/282 [05:13<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.02it/s]

                   all        548      38759       0.45      0.344      0.346      0.204






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      9/100      12.1G      1.335     0.9609     0.9262        719        640: 100%|██████████| 282/282 [05:15<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.04it/s]


                   all        548      38759       0.48      0.348      0.361      0.212

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     10/100      13.8G      1.322     0.9441     0.9241        879        640: 100%|██████████| 282/282 [05:09<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759       0.47      0.348       0.36      0.214

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     11/100      13.5G       1.31     0.9346     0.9207        575        640: 100%|██████████| 282/282 [05:11<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.11it/s]


                   all        548      38759      0.488      0.354      0.364      0.215

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     12/100      13.6G      1.303     0.9202       0.92        412        640: 100%|██████████| 282/282 [05:11<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.00it/s]


                   all        548      38759      0.475      0.364      0.369      0.219

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     13/100      14.8G      1.295     0.9103     0.9173        875        640: 100%|██████████| 282/282 [05:11<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.485      0.368      0.377      0.226

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     14/100      12.6G      1.284      0.896     0.9166        618        640: 100%|██████████| 282/282 [05:07<00:00,  1.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.502      0.369      0.385       0.23

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     15/100      13.1G      1.284     0.8943     0.9135        586        640: 100%|██████████| 282/282 [05:23<00:00,  1.15s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.06it/s]


                   all        548      38759      0.499      0.379       0.39      0.233

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     16/100      13.5G      1.267     0.8809     0.9127        692        640: 100%|██████████| 282/282 [05:20<00:00,  1.14s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.07it/s]


                   all        548      38759      0.486      0.374      0.383      0.229

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     17/100      13.5G      1.259      0.875     0.9107        579        640: 100%|██████████| 282/282 [05:12<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.506      0.383      0.394      0.235

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     18/100      12.3G      1.258     0.8654     0.9079        890        640: 100%|██████████| 282/282 [05:14<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]


                   all        548      38759      0.503      0.393      0.399      0.239

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     19/100      14.7G      1.248     0.8562     0.9082        723        640: 100%|██████████| 282/282 [05:21<00:00,  1.14s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.04s/it]


                   all        548      38759      0.511      0.385      0.398      0.239

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     20/100      14.7G      1.242     0.8448      0.907        781        640: 100%|██████████| 282/282 [05:25<00:00,  1.15s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.11it/s]


                   all        548      38759      0.513      0.394      0.402      0.241

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     21/100      14.9G      1.242     0.8481     0.9055        756        640: 100%|██████████| 282/282 [05:21<00:00,  1.14s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.09it/s]


                   all        548      38759      0.524      0.389      0.404      0.244

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     22/100      13.5G      1.234     0.8353     0.9028        615        640: 100%|██████████| 282/282 [05:19<00:00,  1.13s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.06it/s]


                   all        548      38759      0.519      0.388      0.401      0.242

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     23/100      15.5G      1.231     0.8308     0.9018       1023        640: 100%|██████████| 282/282 [05:16<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]

                   all        548      38759      0.503      0.387      0.402      0.242






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     24/100      14.6G      1.217     0.8151     0.9011        438        640: 100%|██████████| 282/282 [05:23<00:00,  1.15s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.519      0.383        0.4       0.24

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     25/100      16.1G      1.221     0.8184     0.8987        438        640: 100%|██████████| 282/282 [05:24<00:00,  1.15s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]


                   all        548      38759      0.528      0.401      0.414       0.25

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     26/100        13G      1.211     0.8102        0.9        570        640: 100%|██████████| 282/282 [05:18<00:00,  1.13s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.06s/it]


                   all        548      38759      0.523      0.403      0.417      0.253

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     27/100        13G      1.215     0.8096     0.8996        608        640: 100%|██████████| 282/282 [05:16<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.12it/s]


                   all        548      38759      0.514      0.409       0.42      0.254

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     28/100        16G      1.207     0.8044     0.8977       1038        640: 100%|██████████| 282/282 [05:14<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.10it/s]


                   all        548      38759      0.523      0.406      0.418      0.251

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     29/100      16.4G      1.201      0.798     0.8971        823        640: 100%|██████████| 282/282 [05:12<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.528      0.392      0.412      0.248

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     30/100      14.1G      1.198     0.7891     0.8942        556        640: 100%|██████████| 282/282 [05:10<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.527      0.404       0.42      0.254

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     31/100      13.2G      1.193     0.7838     0.8945        696        640: 100%|██████████| 282/282 [05:06<00:00,  1.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.529      0.406      0.417      0.251

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     32/100      15.3G      1.186     0.7799     0.8945        787        640: 100%|██████████| 282/282 [05:04<00:00,  1.08s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.04it/s]


                   all        548      38759      0.538      0.403      0.425      0.257

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     33/100      15.7G      1.192      0.782     0.8947        504        640: 100%|██████████| 282/282 [05:11<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.17it/s]


                   all        548      38759      0.545      0.397      0.421      0.254

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     34/100      15.6G      1.186     0.7688     0.8928        573        640: 100%|██████████| 282/282 [05:06<00:00,  1.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.525      0.405      0.422      0.256

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     35/100      16.1G      1.183     0.7719     0.8918        517        640: 100%|██████████| 282/282 [05:10<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.14it/s]

                   all        548      38759      0.532      0.404      0.421      0.254






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     36/100      13.7G      1.177     0.7617     0.8912        949        640: 100%|██████████| 282/282 [05:09<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.00it/s]


                   all        548      38759      0.538      0.406      0.425      0.257

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     37/100      12.7G      1.173     0.7588      0.889        664        640: 100%|██████████| 282/282 [05:16<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.528      0.413      0.426      0.258

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     38/100      15.1G      1.162     0.7496      0.887        758        640: 100%|██████████| 282/282 [05:12<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.02s/it]


                   all        548      38759       0.54      0.405      0.426      0.259

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     39/100      16.1G      1.162     0.7481     0.8872        663        640: 100%|██████████| 282/282 [05:13<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.533      0.414      0.429      0.259

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     40/100      14.7G      1.168     0.7473     0.8861        863        640: 100%|██████████| 282/282 [05:15<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:13<00:00,  1.15s/it]


                   all        548      38759      0.535      0.409      0.427      0.258

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     41/100      14.7G      1.162     0.7382     0.8859        651        640: 100%|██████████| 282/282 [05:13<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.04it/s]

                   all        548      38759      0.529      0.417      0.431      0.259






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     42/100      12.3G      1.153     0.7337     0.8843        519        640: 100%|██████████| 282/282 [05:13<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]

                   all        548      38759      0.537      0.407      0.426      0.259






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     43/100      15.3G      1.157     0.7349     0.8856        477        640: 100%|██████████| 282/282 [05:14<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]


                   all        548      38759      0.541      0.412       0.43      0.261

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     44/100      13.5G      1.155     0.7282     0.8842        748        640: 100%|██████████| 282/282 [05:14<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.02it/s]


                   all        548      38759      0.541      0.413      0.431       0.26

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     45/100      14.3G      1.151     0.7265     0.8848        678        640: 100%|██████████| 282/282 [05:16<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.537      0.415      0.434      0.263

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     46/100      14.4G      1.143     0.7157     0.8826        672        640: 100%|██████████| 282/282 [05:16<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.538      0.417      0.434      0.264

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     47/100        16G      1.142     0.7146     0.8816        560        640: 100%|██████████| 282/282 [05:17<00:00,  1.13s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.04s/it]


                   all        548      38759      0.535      0.414       0.43      0.259

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     48/100      16.3G      1.138      0.712     0.8817        682        640: 100%|██████████| 282/282 [05:15<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.18it/s]


                   all        548      38759      0.549      0.416      0.435      0.263

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     49/100      14.3G      1.135     0.7082     0.8804        516        640: 100%|██████████| 282/282 [05:12<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:13<00:00,  1.14s/it]


                   all        548      38759       0.54      0.419      0.434      0.263

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     50/100      15.6G      1.142     0.7101     0.8793        633        640: 100%|██████████| 282/282 [05:10<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.05it/s]


                   all        548      38759      0.552      0.415       0.44      0.268

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     51/100      13.7G      1.129     0.6968     0.8776        566        640: 100%|██████████| 282/282 [05:11<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.02it/s]


                   all        548      38759      0.539      0.428      0.441      0.268

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     52/100      16.1G       1.13     0.6968     0.8798        630        640: 100%|██████████| 282/282 [05:14<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.02it/s]


                   all        548      38759      0.553      0.419      0.439      0.268

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     53/100      14.3G      1.124     0.6921     0.8779        702        640: 100%|██████████| 282/282 [05:11<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]


                   all        548      38759      0.543      0.417      0.436      0.264

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     54/100      11.3G      1.126      0.692     0.8772        522        640: 100%|██████████| 282/282 [05:10<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.02it/s]


                   all        548      38759      0.546      0.427      0.441       0.27

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     55/100      13.6G      1.117     0.6849     0.8761        776        640: 100%|██████████| 282/282 [05:10<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.548      0.428      0.442      0.269

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     56/100      13.3G      1.121      0.683     0.8761        813        640: 100%|██████████| 282/282 [05:10<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.05s/it]


                   all        548      38759      0.545      0.426       0.44      0.268

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     57/100      12.3G      1.114     0.6772     0.8754        597        640: 100%|██████████| 282/282 [05:09<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.536      0.426      0.437      0.265

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     58/100      12.9G      1.104     0.6702     0.8735        598        640: 100%|██████████| 282/282 [05:10<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.18it/s]


                   all        548      38759      0.543       0.42      0.437      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     59/100      13.9G      1.101     0.6625     0.8746        900        640: 100%|██████████| 282/282 [05:08<00:00,  1.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]

                   all        548      38759      0.546      0.426       0.44      0.268






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     60/100      15.4G      1.098      0.661     0.8731        345        640: 100%|██████████| 282/282 [05:10<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.12it/s]


                   all        548      38759      0.544      0.422      0.438      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     61/100      15.4G      1.098      0.658      0.872        468        640: 100%|██████████| 282/282 [05:08<00:00,  1.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]


                   all        548      38759      0.542      0.423      0.438      0.267

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     62/100      12.5G      1.104     0.6617     0.8713        690        640: 100%|██████████| 282/282 [05:09<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.04it/s]

                   all        548      38759      0.548      0.427      0.439      0.267






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     63/100        14G      1.097      0.656     0.8697        737        640: 100%|██████████| 282/282 [05:13<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]


                   all        548      38759      0.549      0.427      0.442      0.268

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     64/100      13.7G      1.098     0.6561     0.8704        672        640: 100%|██████████| 282/282 [05:11<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.13it/s]


                   all        548      38759      0.541       0.42      0.435      0.265

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     65/100      15.9G      1.092     0.6499     0.8723        782        640: 100%|██████████| 282/282 [05:11<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.539      0.422      0.439      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     66/100      15.9G      1.092     0.6477     0.8699        522        640: 100%|██████████| 282/282 [05:20<00:00,  1.14s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.02s/it]

                   all        548      38759      0.548      0.425       0.44      0.267






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     67/100      13.3G      1.082     0.6419      0.868        669        640: 100%|██████████| 282/282 [05:20<00:00,  1.14s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.03s/it]


                   all        548      38759      0.546      0.424       0.44      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     68/100      12.9G      1.088     0.6416     0.8684        465        640: 100%|██████████| 282/282 [05:14<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.04it/s]


                   all        548      38759      0.557      0.418      0.439      0.267

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     69/100      15.6G      1.071     0.6315     0.8667        768        640: 100%|██████████| 282/282 [05:11<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.02it/s]


                   all        548      38759      0.551      0.419      0.441      0.267

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     70/100      12.8G      1.076     0.6318     0.8673        279        640: 100%|██████████| 282/282 [05:07<00:00,  1.09s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.02it/s]


                   all        548      38759       0.55      0.423      0.439      0.265

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     71/100      13.9G      1.075     0.6301     0.8671        537        640: 100%|██████████| 282/282 [05:08<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.16it/s]


                   all        548      38759      0.554      0.424       0.44      0.267

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     72/100      11.8G      1.073     0.6263     0.8658        578        640: 100%|██████████| 282/282 [05:05<00:00,  1.08s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.03s/it]

                   all        548      38759      0.546      0.426      0.441      0.268






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     73/100      12.8G      1.065     0.6196     0.8642        523        640: 100%|██████████| 282/282 [05:04<00:00,  1.08s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.06it/s]


                   all        548      38759      0.542      0.431      0.446      0.271

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     74/100      14.4G      1.064     0.6165     0.8639        498        640: 100%|██████████| 282/282 [05:19<00:00,  1.13s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.545      0.426      0.445      0.271

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     75/100      15.3G       1.06     0.6141     0.8642       1043        640: 100%|██████████| 282/282 [05:18<00:00,  1.13s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.12it/s]


                   all        548      38759       0.55       0.43      0.443      0.269

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     76/100      16.4G      1.056     0.6113     0.8626        645        640: 100%|██████████| 282/282 [05:16<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:14<00:00,  1.19s/it]


                   all        548      38759      0.545      0.425       0.44      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     77/100      13.6G      1.059     0.6105     0.8641        776        640: 100%|██████████| 282/282 [05:17<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.00s/it]


                   all        548      38759      0.552      0.419      0.438      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     78/100      13.6G      1.055     0.6076     0.8638        766        640: 100%|██████████| 282/282 [05:15<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.12it/s]


                   all        548      38759      0.565      0.413      0.435      0.264

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     79/100      13.8G      1.045      0.598     0.8616        388        640: 100%|██████████| 282/282 [05:13<00:00,  1.11s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.551      0.421      0.437      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     80/100      14.8G      1.053     0.6051     0.8621        838        640: 100%|██████████| 282/282 [05:31<00:00,  1.17s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.01s/it]


                   all        548      38759      0.562      0.421      0.439      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     81/100      16.2G      1.052     0.6028     0.8618        873        640: 100%|██████████| 282/282 [05:28<00:00,  1.16s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.00s/it]


                   all        548      38759      0.549      0.422      0.438      0.265

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     82/100      15.8G       1.04      0.593     0.8588        857        640: 100%|██████████| 282/282 [05:28<00:00,  1.17s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.08it/s]


                   all        548      38759      0.557      0.423      0.439      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     83/100      16.4G      1.037     0.5893       0.86        841        640: 100%|██████████| 282/282 [05:31<00:00,  1.17s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.04s/it]


                   all        548      38759      0.562      0.417       0.44      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     84/100      16.3G      1.038     0.5888     0.8579        599        640: 100%|██████████| 282/282 [05:36<00:00,  1.19s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]


                   all        548      38759       0.55      0.426      0.441      0.268

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     85/100      14.1G       1.04     0.5891     0.8585        579        640: 100%|██████████| 282/282 [05:22<00:00,  1.15s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.09it/s]


                   all        548      38759      0.561      0.417      0.439      0.266

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     86/100      13.6G      1.033      0.583     0.8588        719        640: 100%|██████████| 282/282 [05:14<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.03s/it]


                   all        548      38759      0.554      0.421       0.44      0.267

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     87/100      16.3G      1.031     0.5808     0.8588        546        640: 100%|██████████| 282/282 [05:10<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.11it/s]


                   all        548      38759      0.542      0.428      0.441      0.267

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     88/100      13.1G      1.031     0.5777     0.8573        357        640: 100%|██████████| 282/282 [05:11<00:00,  1.10s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.05s/it]


                   all        548      38759      0.556      0.422       0.44      0.267

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     89/100        12G      1.027     0.5771     0.8557        779        640: 100%|██████████| 282/282 [05:15<00:00,  1.12s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.01it/s]


                   all        548      38759       0.55      0.426      0.441      0.268

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     90/100      13.6G      1.019      0.571     0.8556        753        640: 100%|██████████| 282/282 [05:17<00:00,  1.13s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.559      0.422       0.44      0.267
Closing dataloader mosaic
[34m[1malbumentations: [0mBlur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     91/100      12.3G      1.018     0.5576     0.8587        501        640: 100%|██████████| 282/282 [04:58<00:00,  1.06s/it]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.06it/s]


                   all        548      38759      0.553      0.418      0.436      0.264

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     92/100      13.5G      1.005     0.5459     0.8554        315        640: 100%|██████████| 282/282 [04:39<00:00,  1.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.03it/s]


                   all        548      38759      0.549       0.42      0.435      0.263

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     93/100      13.2G     0.9957     0.5412     0.8529        179        640: 100%|██████████| 282/282 [04:39<00:00,  1.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.08it/s]


                   all        548      38759      0.545      0.422      0.435      0.262

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     94/100        13G     0.9934     0.5371     0.8525        327        640: 100%|██████████| 282/282 [04:40<00:00,  1.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.20it/s]


                   all        548      38759      0.543      0.423      0.434      0.262

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     95/100      13.1G     0.9887     0.5347     0.8529        384        640: 100%|██████████| 282/282 [04:36<00:00,  1.02it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.12it/s]


                   all        548      38759      0.543       0.42      0.433      0.262

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     96/100      13.8G     0.9864     0.5304     0.8521        219        640: 100%|██████████| 282/282 [04:36<00:00,  1.02it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.05it/s]


                   all        548      38759      0.548      0.418      0.434      0.262

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     97/100      12.9G     0.9805     0.5261     0.8519        308        640: 100%|██████████| 282/282 [04:39<00:00,  1.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:11<00:00,  1.04it/s]


                   all        548      38759      0.546       0.42      0.433      0.262

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     98/100      14.1G     0.9829     0.5273       0.85        445        640: 100%|██████████| 282/282 [04:40<00:00,  1.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.09it/s]

                   all        548      38759      0.544      0.419      0.433      0.262






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


     99/100      13.4G      0.978     0.5232     0.8503        250        640: 100%|██████████| 282/282 [04:39<00:00,  1.01it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:10<00:00,  1.14it/s]


                   all        548      38759       0.55      0.419      0.433      0.262

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


    100/100      13.1G     0.9777     0.5221     0.8514        333        640: 100%|██████████| 282/282 [04:37<00:00,  1.02it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:12<00:00,  1.05s/it]


                   all        548      38759      0.552       0.42      0.434      0.262

100 epochs completed in 9.061 hours.
Optimizer stripped from /content/drive/My Drive/Drone/train/weights/last.pt, 52.0MB
Optimizer stripped from /content/drive/My Drive/Drone/train/weights/best.pt, 52.0MB

Validating /content/drive/My Drive/Drone/train/weights/best.pt...
Ultralytics YOLOv8.1.42 🚀 Python-3.10.12 torch-2.2.1+cu121 CUDA:0 (Tesla V100-SXM2-16GB, 16151MiB)
Model summary (fused): 218 layers, 25845550 parameters, 0 gradients, 78.7 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 12/12 [00:53<00:00,  4.44s/it]


                   all        548      38759      0.543      0.427      0.445      0.271
            pedestrian        548       8844      0.579      0.446      0.485       0.23
                people        548       5125      0.627      0.327      0.384      0.154
               bicycle        548       1287      0.335      0.209      0.184     0.0834
                   car        548      14064      0.771      0.785      0.818      0.595
                   van        548       1975      0.526       0.48      0.474      0.339
                 truck        548        750      0.517      0.393      0.417      0.286
              tricycle        548       1045      0.445      0.352      0.348      0.197
       awning-tricycle        548        532      0.307      0.188       0.18      0.113
                   bus        548        251      0.761      0.594      0.654       0.48
                 motor        548       4886      0.565      0.493      0.503      0.235
Speed: 0.2ms preproce

The 'VisDrone_train_plots' folder in the shared drive contains the training results such as confusion matrix, loss and acuracy plots. The results.csv file in this folder contains various training metrics such as box loss, classification loss, precision, recall, and other metrics for train and validation datasets for every epoch. 

Now, we use the YOLOv8m model pre-trained on COCO and trained on VisDrone dataset to run predictions on our input video streams. 

In [None]:
# Import trained model on VisDrone:

model = YOLO('/Users/rishienandhan/Drone/runs_VisDrone/train/weights/best.pt')  # load a custom trained model

# Display model information
model.info()

Model summary: 295 layers, 25862110 parameters, 0 gradients, 79.1 GFLOPs


(295, 25862110, 0, 79.0946304)

#### Save the Video file with predictions using YOLOv8 (medium) pre-trained on COCO dataset and trained on VisDrone for 100 epochs

In [None]:
model = YOLO('/Users/rishienandhan/Drone/runs_VisDrone/train/weights/best.pt') # re-initialize model to reset tracking IDs for for handling multiple videos sequentially
results = model.track(source='inputs/video1.mp4', persist=True, save=True)
# setting stream=True, will not accumulate inference results in RAM



errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 (no detections), 134.3ms
video 1/1 (frame 2/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 (no detections), 137.5ms
video 1/1 (frame 3/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 (no detections), 135.5ms
video 1/1 (frame 4/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 1 motor, 113.9ms
video 1/1 (frame 5/1347) /Users/rishienandhan/Drone/inputs/video1.mp4: 352x640 (no detections), 123.3ms
video 1/1 (frame 6/1347) /Users/rishienandhan/Drone/inputs/v

In [None]:
model = YOLO('/Users/rishienandhan/Drone/runs_VisDrone/train/weights/best.pt') # re-initialize model to reset tracking IDs for for handling multiple videos sequentially
results = model.track(source='inputs/video2.mp4', persist=True, save=True)




errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 pedestrian, 1 motor, 121.3ms
video 1/1 (frame 2/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 pedestrian, 1 motor, 114.4ms
video 1/1 (frame 3/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 pedestrian, 111.4ms
video 1/1 (frame 4/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 pedestrian, 116.7ms
video 1/1 (frame 5/490) /Users/rishienandhan/Drone/inputs/video2.mp4: 352x640 1 pedestrian, 115.6ms
video 1/1 (frame 6/490) /Users/rishienandhan/Drone/inp

In [None]:
model = YOLO('/Users/rishienandhan/Drone/runs_VisDrone/train/weights/best.pt') # re-initialize model to reset tracking IDs for for handling multiple videos sequentially
results = model.track(source='/Users/rishienandhan/Drone/inputs/video3.mp4', persist=True, save=True)




errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 1 pedestrian, 135.5ms
video 1/1 (frame 2/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 1 pedestrian, 132.4ms
video 1/1 (frame 3/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 1 pedestrian, 130.2ms
video 1/1 (frame 4/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 1 pedestrian, 125.8ms
video 1/1 (frame 5/2250) /Users/rishienandhan/Drone/inputs/video3.mp4: 384x640 1 pedestrian, 130.3ms
video 1/1 (frame 6/2250) /Users/rishienandhan/Drone/inputs/video3.m

We now save the videos with predictions using YOLOv8m (medium size) trained on VisDrone to 'outputs/VisDrone_train_results/' folder in the shared drive which can be accessed using this link: https://drive.google.com/drive/folders/1Bd1D3jyy9EzcCr2YJHxMZ4rRgvGimuCh?usp=sharing.

With our newly trained model, we are able to avoid erroneous detections and more importantly, pick up important features (car and people) more constantly across our video frames. You can observe in video1.mp4 that the detection of the car is more stable and is being picked up constantly across frames, which would certainly help us in tracking in next step. 

The same pattern is observed in other output videos as well. Hence, training our model on VisDrone dataset certainly helps in a more stable and continuous object tracking. 


## Task 3: Kalman Filters

The following function creates and initializes a Kalman Filter model along with its parameters for 2D motion object tracking model with specified dimensions.

We can then integrate this Kalman Filter model into the object tracking pipeline along with the object detections obtained from Task 2 to track the pedestrians and vehicles in the videos. We can use the predicted object positions from the Kalman Filter to plot trajectories and improve tracking accuracy. We plan to detect and track cars and people across video frames. 

In [None]:
# !conda install -c conda-forge filterpy -y

In [None]:
import cv2
from filterpy.kalman import KalmanFilter
import numpy as np

# Define Kalman Filter model for 2D object tracking
def create_kalman_filter(dim_x, dim_z):
    kf = KalmanFilter(dim_x=dim_x, dim_z=dim_z)

    # Define state transition matrix (motion model) for 2D tracking
    kf.F = np.array([[1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0], [0, 0, 0, 1]])

    # Define measurement function (observation model) for 2D tracking
    kf.H = np.array([[1, 0, 0, 0], [0, 1, 0, 0]])

    # Define process noise covariance matrix
    kf.Q *= 0.01

    # Define measurement noise covariance matrix
    kf.R *= 0.1

    # Initialize state vector and covariance matrix for 2D tracking
    kf.x = np.array([[0], [0], [0], [0]])
    kf.P = np.eye(dim_x) * 100

    return kf

In the following cell, we integrate the Kalman Filter into our object detection pipeline to estimate state of objects in subsequent frames.
Following are the steps we follow:

- Initialize the YOLO model with pre-trained weights from VisDrone dataset training.
- Create dictionaries to store Kalman filters (`kalman_filters`) for each object ID and past trajectories (`trajectories`) for each object ID.
- Define the number of past frames to keep track of (`num_past_frames`).
- Enter a loop to process each frame of the video until the end of the video is reached or the user presses 'q' to exit.
- Perform object detection and tracking on the frame using the YOLO model.
- Extract bounding boxes, object IDs, and predicted classes from the detection results.
- Iterate over each detected object in a given frame and update its Kalman filter to predict its next position (St+1).
- Store the current position in the trajectory list of past estimates for each object to draw trajectory.
- Visualize the object trajectory and bounding box on the frame by drawing lines and rectangles.
- Annotate the frame with the object class and ID using `cv2.putText`.
- Terminate the loop and release the video capture object when the 'q' key is pressed.

The inline comments in the code provide additional details.

In [None]:

# Import pretrained model on VisDrone
model = YOLO('/Users/rishienandhan/Drone/runs_VisDrone/train/weights/best.pt')

# Open the video file
video_path = "inputs/video3.mp4"
cap = cv2.VideoCapture(video_path)

# Initialize dictionary to store Kalman Filters for each object ID
kalman_filters = {}

# Initialize dictionary to store past trajectories for each object ID
trajectories = {}

# Define the number of past frames to keep track of
num_past_frames = 10

while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break

    # Perform object detection using YOLOv8 pretrained on COCO and trained on VisDrone
    results = model.track(frame, persist=True)

    # Iterate over each detection result
    for result in results:
        # Extract bounding boxes, object IDs and predicted classes from the result
        detections = result.boxes.xyxy # xyxy format
        object_ids = result.boxes.id
        classes = result.boxes.cls

        # Check if there are any detections before iterating
        if detections is not None and object_ids is not None:
            # For each detected object, update Kalman Filter prediction
            for detection, object_id, obj_class in zip(detections, object_ids, classes):
                x1, y1, x2, y2 = detection  # Bounding box coordinates
                center_x = (x1 + x2) / 2  # Calculating center x
                center_y = (y1 + y2) / 2  # Calculating center y

                # If object_id not in Kalman Filters dictionary, initialize a new Kalman Filter
                if object_id not in kalman_filters:
                    kalman_filters[object_id] = create_kalman_filter(dim_x=4, dim_z=2)
                    # Initialize Kalman Filter state using the detected object position
                    kalman_filters[object_id].x[:2] = np.array([[center_x], [center_y]])

                # Predict the next state (position and velocity) using Kalman Filter
                kalman_filters[object_id].predict()

                # Update the measurement (observed position) from the detection
                measurement = np.array([[center_x], [center_y]])

                # Update the Kalman Filter with the measurement
                kalman_filters[object_id].update(measurement)

                # Get the updated position estimate from Kalman Filter
                estimated_position = kalman_filters[object_id].x[:2]

                # Plot the object trajectory using the estimated position
                cv2.circle(frame, (int(estimated_position[0]), int(estimated_position[1])), 5, (0, 255, 0), -1) # green color chosen

                # Draw bounding box around the detected object
                cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (0, 0, 255), 2)

                # PLOT TRAJECTORY OF PAST ESTIMATES:
                # Store the current position in the trajectory list of past estimates
                if int(object_id) not in trajectories:
                    trajectories[int(object_id)] = [] # if new object, maintain separate new list
                # if current object already seen before, append position to list
                trajectories[int(object_id)].append((int(estimated_position[0]), int(estimated_position[1])))

                # Limit the length of past trajectory list to keep track of only a few past frames
                if len(trajectories[int(object_id)]) > num_past_frames:
                    trajectories[int(object_id)].pop(0)

                # Plot the trajectory using the stored positions of past estimates
                for index, item in enumerate(trajectories[int(object_id)]):
                    if len(trajectories[int(object_id)]) < 1:
                        break
                    cv2.line(frame, item, trajectories[int(object_id)][index], [255, 0, 0], 2)

                # Draw object class and ID on the frame
                cv2.putText(frame, f'{result.names[int(obj_class)]}', (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)


    # Display the frame with object trajectories and bounding boxes
    cv2.imshow('Frame', frame)

    # Press 'q' to exit the loop
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release video capture and close windows
cap.release()
cv2.destroyAllWindows()



0: 384x640 1 pedestrian, 130.8ms
Speed: 1.2ms preprocess, 130.8ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)



  cv2.circle(frame, (int(estimated_position[0]), int(estimated_position[1])), 5, (0, 255, 0), -1) # green color chosen
  trajectories[int(object_id)].append((int(estimated_position[0]), int(estimated_position[1])))


0: 384x640 1 pedestrian, 318.5ms
Speed: 20.0ms preprocess, 318.5ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 168.5ms
Speed: 0.8ms preprocess, 168.5ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 175.4ms
Speed: 1.1ms preprocess, 175.4ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 148.2ms
Speed: 0.7ms preprocess, 148.2ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 148.5ms
Speed: 0.8ms preprocess, 148.5ms inference, 0.4ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 162.9ms
Speed: 0.7ms preprocess, 162.9ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 pedestrians, 161.8ms
Speed: 0.8ms preprocess, 161.8ms inference, 0.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 pedestrians, 139.1ms
Speed: 0.8ms preprocess, 139.1ms inferen

#### Save the predictions as a video file:

We encompass the above code as a function to output the predictions as a video file along with bounding boxes, class labels, state estimates, and trajectory.

In [None]:
def track_objects_outputVideo(video_path, output_filename):
    # Import pretrained model on VisDrone
    model = YOLO('/Users/rishienandhan/Drone/runs_VisDrone/train/weights/best.pt')

    # Open the video file
    cap = cv2.VideoCapture(video_path)

    # Get the video frame dimensions
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))

    # Define the codec and create VideoWriter object
    out = cv2.VideoWriter(output_filename, cv2.VideoWriter_fourcc(*'mp4v'), fps, (frame_width, frame_height))

    # Initialize dictionary to store Kalman Filters for each object ID
    kalman_filters = {}

    # Initialize dictionary to store past trajectories for each object ID
    trajectories = {}

    # Define the number of past frames to keep track of
    num_past_frames = 10

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # Perform object detection using YOLOv8 pretrained on COCO and trained on VisDrone
        results = model.track(frame, persist=True)

        # Iterate over each detection result
        for result in results:
            # Extract bounding boxes, object IDs and predicted classes from the result
            detections = result.boxes.xyxy # xyxy format
            object_ids = result.boxes.id
            classes = result.boxes.cls

            # Check if there are any detections before iterating
            if detections is not None and object_ids is not None:
                # For each detected object, update Kalman Filter prediction
                for detection, object_id, obj_class in zip(detections, object_ids, classes):
                    x1, y1, x2, y2 = detection  # Bounding box coordinates
                    center_x = (x1 + x2) / 2  # Calculating center x
                    center_y = (y1 + y2) / 2  # Calculating center y

                    # If object_id not in Kalman Filters dictionary, initialize a new Kalman Filter
                    if object_id not in kalman_filters:
                        kalman_filters[object_id] = create_kalman_filter(dim_x=4, dim_z=2)
                        # Initialize Kalman Filter state using the detected object position
                        kalman_filters[object_id].x[:2] = np.array([[center_x], [center_y]])

                    # Predict the next state (position and velocity) using Kalman Filter
                    kalman_filters[object_id].predict()

                    # Update the measurement (observed position) from the detection
                    measurement = np.array([[center_x], [center_y]])

                    # Update the Kalman Filter with the measurement
                    kalman_filters[object_id].update(measurement)

                    # Get the updated position estimate from Kalman Filter
                    estimated_position = kalman_filters[object_id].x[:2]

                    # Plot the object trajectory using the estimated position
                    cv2.circle(frame, (int(estimated_position[0]), int(estimated_position[1])), 5, (0, 255, 0), -1) # green color chosen

                    # Draw bounding box around the detected object
                    cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (0, 0, 255), 2)

                    # PLOT TRAJECTORY OF PAST ESTIMATES:
                    # Store the current position in the trajectory list of past estimates
                    if int(object_id) not in trajectories:
                        trajectories[int(object_id)] = [] # if new object, maintain separate new list
                    # if current object already seen before, append position to list
                    trajectories[int(object_id)].append((int(estimated_position[0]), int(estimated_position[1])))

                    # Limit the length of past trajectory list to keep track of only a few past frames
                    if len(trajectories[int(object_id)]) > num_past_frames:
                        trajectories[int(object_id)].pop(0)

                    # Plot the trajectory using the stored positions of past estimates
                    for index, item in enumerate(trajectories[int(object_id)]):
                        if len(trajectories[int(object_id)]) < 1:
                            break
                        cv2.line(frame, item, trajectories[int(object_id)][index], [255, 0, 0], 2)

                    # Draw object class and ID on the frame
                    cv2.putText(frame, f'{result.names[int(obj_class)]}', (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)

        # Write the frame to the output video file
        out.write(frame)


    # Release video capture and close windows
    cap.release()
    out.release()
    cv2.destroyAllWindows()


In [None]:
# Call the function to output predictions for video1.mp4
video_path = "inputs/video1.mp4"
output_filename = "outputs/Final_predictions/video1_op.mp4"
track_objects_outputVideo(video_path, output_filename)


0: 352x640 (no detections), 144.7ms
Speed: 1.3ms preprocess, 144.7ms inference, 0.2ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 136.2ms
Speed: 0.7ms preprocess, 136.2ms inference, 0.2ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 131.2ms
Speed: 0.9ms preprocess, 131.2ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 motor, 130.3ms
Speed: 1.3ms preprocess, 130.3ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 145.6ms
Speed: 0.8ms preprocess, 145.6ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 133.4ms
Speed: 1.0ms preprocess, 133.4ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 (no detections), 134.8ms
Speed: 1.8ms preprocess, 134.8ms inference, 0.2ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 139.3ms
Speed: 0.9ms preprocess, 139.

  cv2.circle(frame, (int(estimated_position[0]), int(estimated_position[1])), 5, (0, 255, 0), -1) # green color chosen
  trajectories[int(object_id)].append((int(estimated_position[0]), int(estimated_position[1])))


0: 352x640 1 car, 320.8ms
Speed: 18.0ms preprocess, 320.8ms inference, 0.4ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 car, 137.3ms
Speed: 1.0ms preprocess, 137.3ms inference, 0.5ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 car, 138.3ms
Speed: 0.6ms preprocess, 138.3ms inference, 0.4ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 car, 135.3ms
Speed: 0.9ms preprocess, 135.3ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 car, 141.8ms
Speed: 0.7ms preprocess, 141.8ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 car, 150.0ms
Speed: 0.7ms preprocess, 150.0ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 car, 137.6ms
Speed: 0.7ms preprocess, 137.6ms inference, 0.5ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 car, 131.8ms
Speed: 0.9ms preprocess, 131.8ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)


In [None]:
# Call the function to output predictions for video2.mp4
video_path = "inputs/video2.mp4"
output_filename = "outputs/Final_predictions/video2_op.mp4"
track_objects_outputVideo(video_path, output_filename)


0: 352x640 1 pedestrian, 1 motor, 138.1ms
Speed: 1.2ms preprocess, 138.1ms inference, 0.8ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 1 motor, 132.4ms
Speed: 0.8ms preprocess, 132.4ms inference, 0.5ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 122.6ms
Speed: 0.7ms preprocess, 122.6ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 128.1ms
Speed: 0.8ms preprocess, 128.1ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 139.1ms
Speed: 0.9ms preprocess, 139.1ms inference, 0.5ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 129.7ms
Speed: 0.9ms preprocess, 129.7ms inference, 0.7ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 125.4ms
Speed: 0.9ms preprocess, 125.4ms inference, 0.4ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 133.5ms
Speed: 0.8ms preprocess,

  cv2.circle(frame, (int(estimated_position[0]), int(estimated_position[1])), 5, (0, 255, 0), -1) # green color chosen
  trajectories[int(object_id)].append((int(estimated_position[0]), int(estimated_position[1])))


0: 352x640 1 pedestrian, 320.0ms
Speed: 10.7ms preprocess, 320.0ms inference, 0.4ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 138.8ms
Speed: 0.8ms preprocess, 138.8ms inference, 0.5ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 138.5ms
Speed: 0.8ms preprocess, 138.5ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 121.6ms
Speed: 0.6ms preprocess, 121.6ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 116.4ms
Speed: 0.7ms preprocess, 116.4ms inference, 0.4ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 120.5ms
Speed: 0.6ms preprocess, 120.5ms inference, 0.6ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 125.4ms
Speed: 0.7ms preprocess, 125.4ms inference, 0.3ms postprocess per image at shape (1, 3, 352, 640)

0: 352x640 1 pedestrian, 130.7ms
Speed: 0.7ms preprocess, 130.7ms inference

In [None]:
# Call the function to output predictions for video3.mp4
video_path = "inputs/video3.mp4"
output_filename = "outputs/Final_predictions/video3_op.mp4"
track_objects_outputVideo(video_path, output_filename)


0: 384x640 1 pedestrian, 184.1ms
Speed: 1.0ms preprocess, 184.1ms inference, 0.3ms postprocess per image at shape (1, 3, 384, 640)



  cv2.circle(frame, (int(estimated_position[0]), int(estimated_position[1])), 5, (0, 255, 0), -1) # green color chosen
  trajectories[int(object_id)].append((int(estimated_position[0]), int(estimated_position[1])))


0: 384x640 1 pedestrian, 352.0ms
Speed: 11.9ms preprocess, 352.0ms inference, 0.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 199.1ms
Speed: 0.9ms preprocess, 199.1ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 191.6ms
Speed: 0.8ms preprocess, 191.6ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 178.8ms
Speed: 1.0ms preprocess, 178.8ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 187.8ms
Speed: 0.8ms preprocess, 187.8ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 1 pedestrian, 163.9ms
Speed: 1.0ms preprocess, 163.9ms inference, 0.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 pedestrians, 152.4ms
Speed: 0.9ms preprocess, 152.4ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 pedestrians, 126.9ms
Speed: 0.7ms preprocess, 126.9ms inferen

We save the predictions and state estimates as a video file in 'outputs/Final_predictions/' folder in the shared drive which can be accessed using this link: https://drive.google.com/drive/folders/1Bd1D3jyy9EzcCr2YJHxMZ4rRgvGimuCh?usp=sharing.


In each video outputs, the green point is the predicted estimate of the state of object (centroid of detected object) in next frame(St+1). The red rectangle is the bounding box predicted using the object detector and the blue trailing points represents the past state estimates of an object from its past 10 frames, which we call trajectory. 


**NOTE:** Notice how the blue dots represents the past state estimates of an object from its past 10 frames, but it is important to note this is not very indicative of its actual trajectory as the camera also moves relative to the object causing the blue points to be extremely wavery. Since the object follows a motion model and our camera also moves, the trajectory points (blue) may not be realistic.

What is important to note is our green dot (centroid of the object), which gives the estimate of the next state of the object (St+1) in the next time step (next frame). We obtain the estimate of the next immediate state of the object as this green dot, successfully through the recursive state estimate model (RSE) using Kalman Filters.


Each object detected in the video frames have a unique ID maintained, which helps track multiple objects simultaneously in a video frame. There is a unique bounding box and state estimate for every unique object detected across frames. Note from the code that we maintain  dictionary to store Kalman Filters for each object ID, so have one Kalman filter to track each of the required and present objects (cars and people).


To address false positives in object detection, we can use some of the following methods:
1. **Temporal Consistency with Tracking:** The tracker, especially when using methods like Kalman filters, provides temporal consistency by predicting the position of objects in subsequent frames based on their previous positions. By leveraging this temporal information, we can verify the consistency of detections over time. If a detection is inconsistent with the predicted trajectory, it is more likely to be a false positive and can be discarded.

1. Dynamically Adjusting Detection Threshold: Object detection such as YOLOv8 produce detection scores along with bounding boxes. By adjusting the detection threshold for each class individually based on the precision required, we can filter out detections with low confidence scores dynamically.

3. Non-Maximum Suppression (NMS): NMS is a technique used to suppress multiple overlapping bounding boxes for the same object. It retains only the bounding box with the highest confidence score while suppressing others. This helps in removing redundant detections and reducing false positives.
