# **Vehicle Speed Estimation with YOLOv8**

This Jupyter Notebook demonstrates the application of YOLOv8 for vehicle speed estimation tasks in video sequences. Our goal is to accurately detect vehicles and estimate their speed, which is crucial for a variety of applications ranging from traffic monitoring to automated enforcement of speed limits. Leveraging YOLOv8's state-of-the-art object detection capabilities, we extend its utility to perform not only detection but also to calculate the speed of each identified vehicle within the video frames.

This approach capitalizes on YOLOv8's efficient detection mechanism to track vehicle movement across frames and employs mathematical models to estimate their speed based on frame rate and pixel displacement. The notebook navigates through the comprehensive process, encompassing video preprocessing, vehicle detection with YOLOv8, tracking vehicles across frames, speed estimation, and the visualization of results.

## **GPU Status Check and Mounting Drive**

We begin by checking the availability and status of our GPU, which is crucial for the computationally intensive tasks of video processing and running the YOLOv8 model. The nvidia-smi command gives us a snapshot of the GPU's model, memory usage, and active processes, ensuring our setup is ready for the subsequent operations. Then we mount the google drive.

In [None]:
!nvidia-smi

from google.colab import drive
drive.mount('/content/drive')

## **Installing Dependencies**

Next, we install necessary Python packages for our project.

In [None]:
!pip install -q supervision ultralytics

## **Importing Required Libraries**

In this section, we import all the necessary libraries and modules required for our vehicle speed estimation project.

In [4]:
import cv2
import numpy as np
import supervision as sv
from tqdm import tqdm
from ultralytics import YOLO
from supervision.assets import VideoAssets, download_assets
from collections import defaultdict, deque

## **Downloading Video for Vehicle Detection**

To facilitate our vehicle speed estimation task, we utilize video that contain various vehicle scenarios from Kaggle.

In [50]:
SOURCE_VIDEO_PATH = "/content/drive/MyDrive/demo_small.mp4"
TARGET_VIDEO_PATH = "/content/drive/MyDrive/demo_result.mp4"
CONFIDENCE_THRESHOLD = 0.3
IOU_THRESHOLD = 0.5
MODEL_NAME = "yolov8x.pt"
MODEL_RESOLUTION = 1280

## **Configuration Parameters Setup**

Here, we define key configuration parameters for our vehicle speed estimation project. These parameters are crucial for tuning the performance and output of our vehicle speed estimation process.

In [51]:
SOURCE = np.array([
  [168, 162],
  [981, 162],
  [1051, 543],
  [-228, 543]
])
TARGET_WIDTH = 25
TARGET_HEIGHT = 250
TARGET = np.array([ [0, 0], [TARGET_WIDTH - 1, 0], [TARGET_WIDTH - 1, TARGET_HEIGHT - 1], [0, TARGET_HEIGHT - 1], ])

## **Defining Perspective Transformation Parameters**

For accurate speed estimation, it's important to account for the perspective distortion in the video footage. This section sets up the parameters for a perspective transformation, aiming to map vehicles' positions from the source video to a standardized view, facilitating consistent speed calculations.

In [52]:
frame_generator = sv.get_video_frames_generator(source_path=SOURCE_VIDEO_PATH)
frame_iterator = iter(frame_generator)
frame = next(frame_iterator)

## **Visualizing the Detection Zone**

Before proceeding with vehicle detection and speed estimation, it's beneficial to visualize the detection zone within our video frame. This step involves drawing a polygon on the first frame of the video, corresponding to the SOURCE points we defined earlier. The polygon outlines the area of interest on the road where vehicles will be detected and tracked. This visualization aids in verifying that our defined zone accurately represents the portion of the roadway we intend to analyze.

The code snippet below copies the first video frame, draws a red polygon to denote the detection zone using the coordinates specified in SOURCE, and then displays the annotated frame. This allows us to visually confirm the detection zone's placement before applying the detection and speed estimation algorithms.

In [None]:
annotated_frame = frame.copy()
annotated_frame = sv.draw_polygon(scene=annotated_frame, polygon=SOURCE, color=sv.Color.red(), thickness=4)
sv.plot_image(annotated_frame)

## **Implementing Perspective Transformation**

To address the challenge of perspective distortion in our vehicle speed estimation task, we define the ViewTransformer class. This class is designed to facilitate the transformation of points from the source perspective to a target perspective, which is critical for accurate speed calculation across different frames.

In [47]:
class ViewTransformer:
    def __init__(self, source: np.ndarray, target: np.ndarray) -> None:
        source = source.astype(np.float32)
        target = target.astype(np.float32)
        self.m = cv2.getPerspectiveTransform(source, target)

    def transform_points(self, points: np.ndarray) -> np.ndarray:
        if points.size == 0:
            return points
        reshaped_points = points.reshape(-1, 1, 2).astype(np.float32)
        transform_points = cv2.perspectiveTransform(reshaped_points, self.m)
        return transform_points.reshape(-1, 2)


## **Initializing the Perspective View Transformer**

With the ViewTransformer class defined, we now instantiate an object of this class, view_transformer, using the previously specified SOURCE and TARGET points. This object will be responsible for transforming the coordinates of detected vehicles from the perspective view captured by the camera to a normalized bird's-eye view. This transformation is crucial for accurate speed estimation, as it allows us to measure distances and calculate speeds in a standardized frame of reference, mitigating the effects of perspective distortion.

The source parameter is the polygon defined in the video frame where vehicles are detected, and target is the desired polygon in the transformed view, which standardizes the scale for distance measurements. By applying this transformation, we can more accurately estimate the speeds of vehicles as they move across the detection zone.

In [48]:
view_transformer = ViewTransformer(source=SOURCE, target=TARGET)

## **Vehicle Detection, Tracking, and Speed Estimation**

This section outlines the comprehensive process of detecting vehicles, tracking their movements across frames, and estimating their speeds using the YOLOv8 model, combined with custom tracking and annotation tools.

### **Model Initialization and Video Preparation**

**Model Loading:** Initialize the model.

**Video Information:** video_info retrieves details from the source video, such as frame rate and resolution, crucial for processing and annotations.

**Frame Generator:** creates an iterable over the video frames, allowing sequential processing.

### **Tracking and Annotation Configuration**

**ByteTrack Initialization: **An instance of byte_track is created to track vehicles across frames, using the video's frame rate and a confidence threshold.

**Annotator Setup:** Bounding_box_annotator, label_annotator, and trace_annotator are configured with dynamic thickness and text scale to visually highlight detected vehicles, their trajectories, and estimated speeds.

**Detection Zone:** Polygon_zone defines the area of interest for vehicle detection within the video frames, based on SOURCE coordinates.

**Vehicle Detection:** Each frame is processed through the YOLO model to detect vehicles, filtering detections by confidence and class.

**Zone Filtering**: Detections outside the predefined polygon zone are excluded.

**Non-Max Suppression:** Overlapping detections are refined to ensure each vehicle is tracked individually.

**Tracking:** Detected vehicles are tracked across frames, maintaining their identities.

**Speed Estimation:** For each tracked vehicle, speed is estimated based on displacement over time, considering the frame rate and transformed coordinates for accuracy.

**Annotation:** Each frame is annotated with bounding boxes, labels indicating speed, and traces showing vehicles' paths.

**Output:** Annotated frames are compiled into a new video, saved to TARGET_VIDEO_PATH.

In [None]:
model = YOLO(MODEL_NAME)
video_info = sv.VideoInfo.from_video_path(video_path=SOURCE_VIDEO_PATH)
frame_generator = sv.get_video_frames_generator(source_path=SOURCE_VIDEO_PATH)

# Tracer initiation
byte_track = sv.ByteTrack(frame_rate=video_info.fps, track_thresh=CONFIDENCE_THRESHOLD)

# Annotators configuration
thickness = sv.calculate_dynamic_line_thickness(resolution_wh=video_info.resolution_wh)
text_scale = sv.calculate_dynamic_text_scale(resolution_wh=video_info.resolution_wh)
bounding_box_annotator = sv.BoundingBoxAnnotator(thickness=thickness)
label_annotator = sv.LabelAnnotator(text_scale=text_scale, text_thickness=thickness, text_position=sv.Position.BOTTOM_CENTER)
trace_annotator = sv.TraceAnnotator(thickness=thickness, trace_length=video_info.fps * 2, position=sv.Position.BOTTOM_CENTER)
polygon_zone = sv.PolygonZone(polygon=SOURCE, frame_resolution_wh=video_info.resolution_wh)
coordinates = defaultdict(lambda: deque(maxlen=video_info.fps))

# Open target video
with sv.VideoSink(TARGET_VIDEO_PATH, video_info) as sink:
    # Loop over source video frame
    for frame in tqdm(frame_generator, total=video_info.total_frames):
        result = model(frame, imgsz=MODEL_RESOLUTION, verbose=False)[0]
        detections = sv.Detections.from_ultralytics(result)

        # Filter out detections by class and confidence
        detections = detections[detections.confidence > CONFIDENCE_THRESHOLD]
        detections = detections[detections.class_id != 0]

        # Filter out detections outside the zone
        detections = detections[polygon_zone.trigger(detections)]

        # Refine detections using non-max suppression
        detections = detections.with_nms(IOU_THRESHOLD)

        # Pass detection through the tracker
        detections = byte_track.update_with_detections(detections=detections)
        points = detections.get_anchors_coordinates(anchor=sv.Position.BOTTOM_CENTER)

        # Calculate the detections position inside the target RoI
        points = view_transformer.transform_points(points=points).astype(int)

        # Store detections position
        for tracker_id, [_, y] in zip(detections.tracker_id, points):
            coordinates[tracker_id].append(y)

        # Format labels
        labels = []
        for tracker_id in detections.tracker_id:
            if len(coordinates[tracker_id]) < video_info.fps / 2:
                labels.append(f"#{tracker_id}")
            else:
                # Calculate speed
                coordinate_start = coordinates[tracker_id][-1]
                coordinate_end = coordinates[tracker_id][0]
                distance = abs(coordinate_start - coordinate_end)
                time = len(coordinates[tracker_id]) / video_info.fps
                speed = distance / time * 3.6
                labels.append(f"#{tracker_id} {int(speed)} km/h")

        # Annotate frame
        annotated_frame = frame.copy()
        annotated_frame = trace_annotator.annotate(scene=annotated_frame, detections=detections)
        annotated_frame = bounding_box_annotator.annotate(scene=annotated_frame, detections=detections)
        annotated_frame = label_annotator.annotate(scene=annotated_frame, detections=detections, labels=labels)

        # Add frame to target video
        sink.write_frame(annotated_frame)
