### Object Detection and Tracking in Video Using YOLO and ByteTrack

This notebook demonstrates how to perform object detection and tracking in a video using the YOLO (You Only Look Once) model and the ByteTrack algorithm. The process involves loading a pre-trained YOLO model, applying it to each frame of a video, and tracking the detected objects across frames. The annotated video is then displayed with bounding boxes and labels for the detected objects.

In [None]:
import supervision as sv
from ultralytics import YOLO
import cv2

# Configuration
CONFIG = {
    "VIDEO_PATH": "../data/videos/input/badminton_test.mp4",  # Path to the input video
    "MODEL_PATH": "../weights/badminton_best.pt",  # Path to the YOLO model weights
    "CONFIDENCE_THRESHOLD": 0.5,  # Confidence threshold for detections
    "DEVICE": "cuda:0",  # Device to run the model on (e.g., 'cuda:0' for GPU or 'cpu' for CPU)
    "DISPLAY_RESOLUTION": (1280, 720),  # Resolution for displaying the annotated video
    "FPS_MONITOR_ENABLED": True,  # Enable or disable FPS monitoring
}

# Load YOLO model
model = YOLO(CONFIG["MODEL_PATH"], task='detect')

# Initialize FPS monitor
fps_monitor = sv.FPSMonitor()

# Initialize tracker and annotators
tracker = sv.ByteTrack()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()

# Get video frame generator
frame_generator = sv.get_video_frames_generator(source_path=CONFIG["VIDEO_PATH"])

# Process and display each frame of the video
for frame in frame_generator:
    # Perform prediction with YOLO model
    results = model.predict(frame, conf=CONFIG["CONFIDENCE_THRESHOLD"], device=CONFIG["DEVICE"])[0]
    detections = sv.Detections.from_ultralytics(results)

    # Update detections with tracker
    detections = tracker.update_with_detections(detections)

    # Annotate the frame with bounding boxes and labels
    annotated_image = box_annotator.annotate(scene=frame.copy(), detections=detections)

    annotated_image = label_annotator.annotate(
        scene=annotated_image,
        detections=detections
    )
    # Generate FPS and annotate the frame
    if CONFIG["FPS_MONITOR_ENABLED"]:
        fps_monitor.tick()
        fps = fps_monitor.fps
        cv2.putText(annotated_image, f"FPS: {fps:.2f}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    # Resize the annotated image for display
    annotated_image = cv2.resize(annotated_image, CONFIG["DISPLAY_RESOLUTION"])

    # Display the annotated frame in a window
    cv2.imshow("Annotated Video", annotated_image)

    # Exit the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release resources and close windows
cv2.destroyAllWindows()


0: 384x640 2 persons, 27.4ms
Speed: 4.8ms preprocess, 27.4ms inference, 1.9ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 21.3ms
Speed: 4.4ms preprocess, 21.3ms inference, 6.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 19.2ms
Speed: 2.3ms preprocess, 19.2ms inference, 2.5ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 11.0ms
Speed: 2.4ms preprocess, 11.0ms inference, 1.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 11.9ms
Speed: 2.3ms preprocess, 11.9ms inference, 1.3ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 14.9ms
Speed: 2.6ms preprocess, 14.9ms inference, 3.1ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 13.5ms
Speed: 1.8ms preprocess, 13.5ms inference, 2.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 2 persons, 12.2ms
Speed: 1.9ms preprocess, 12.2ms inference, 1.6ms postprocess per image at shape (