# YOLO_V5m Model

For this project, pre-trained YOLO-V5m model will be used as the base model for comparing the vehicle detection quality.

As a base model, pre-trained YOLO-V5m will be utilized to detect vehicles on the highway video clip. Later in the YOLO_SwinV2.ipynb, the modified YOLO model will be trained and tested to detect the vehicles on the same highway video clip.

## How To Run

It is recommended to run this notebook in Google Colab. However, it is implemented so that it can also be run in a local environment.

**To run this notebook in Google Colab:**
- Download the whole project folder (enhanced_vehicle_detection) from GitHub.
- Place it in MyDrive in Google Drive.
    - If the project folder is placed in a different path in Google Drive, the paths for the input video and outputs need to be edited accordingly.
- All set! You can now run the cells.

**To run this notebook in a local environment:**
- Fork or clone the GitHub repository.
- Run `pip install -r app/requirements.txt` to install all required libraries.
- Since the code requires video conversion, make sure to install **ffmpeg**:
    - macOS: `brew install ffmpeg`
    - Ubuntu/Linux: `sudo apt install ffmpeg`
    - Windows: Download from [ffmpeg.org](https://ffmpeg.org/download.html)
- All set! You can now run the cells.

## Setup YOLO V5 

The code below installs every required libraries to load and use YOLO-V5 model. This code only need to be run once while using this notebook.

In [None]:
!git clone -q https://github.com/ultralytics/yolov5
%cd yolov5
!pip -q install -r requirements.txt opencv-python-headless==4.10.0.84

## Import Necessary Libraries

In [None]:
import cv2, torch, numpy as np, matplotlib.pyplot as plt
from collections import defaultdict
from IPython.display import HTML
from base64 import b64encode
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

## Loading Pre-Trained YOLO-V5m model and Setup configurations

In [None]:
# Check if running in Google Colab
import os

IN_COLAB = 'COLAB_GPU' in os.environ or 'google.colab' in str(get_ipython())

if IN_COLAB:
    # For Google Colab:
    # Mount Google Drive in Colab
    from google.colab import drive
    drive.mount('/content/drive')
    
    # Set the video path to your Google Drive location
    # Update this path to match where you stored the video in Google Drive
    video_path = '/content/drive/MyDrive/enhanced_vehicle_detection/data/rainy_highway_video.mp4'
else:
    # For local environment: use relative path
    video_path = '../data/rainy_highway_video.mp4'

# Verify the video file exists
if not os.path.exists(video_path):
    raise FileNotFoundError(f"Video not found at: {video_path}\n"
                            f"Please ensure the video is at the correct location.")

# Setting the FPS, confident score threshold, and IOU match threshold
FPS = 25
CONF_THRESH = 0.4
IOU_MATCH_THRESH = 0.3

# Specifying the classes we are interested to detect from the pre-trained classes
VEHICLE_CLASSES = {"car", "truck", "bus"}

# Loading the pre-trained YOLO-V5m
yolo_model = torch.hub.load('ultralytics/yolov5', 'yolov5m', pretrained=True)

# Filtering the weak detection by setting the model's confident score to CONF_THRESH
# and setting the model's internal NMS IOU threshold to 0.45
yolo_model.conf = CONF_THRESH
yolo_model.iou = 0.45

## Helper Functions

The helper functions for detecting vehicles, computing IoU, and updating tracking line for each detection are implemented.

- **detect_vehicles(frame)**: It will pass each frame to YOLOY-V5m and utilize the bbox, confident score, and class label returned from the model. The returned values will be used to visualize on the input video.
- **iou_xyxy(boxA, boxB)**: It will compute IoU of between every detection. The computed IoU is applied with the threshold for tracking.
    - If IoU is greater than the threshold, the detection is assigned to the same track. Otherwise, it will create a new track for the detection with the new track ID.
- **update_tracks(detections, flow, frame_index)**: It will update the tracking for each detection using bbox, list of the previous center points, track ID, and previoud frame. IoU computation and thresholding is also used in this helper function.


In [None]:
def detect_vehicles(frame):
  # Pass RGB images to YOLO-V5
  results = yolo_model(frame[:, :, ::-1])

  detections = []

  for *xyxy, conf, cls in results.xyxy[0].cpu().numpy():
    # Obtain bbox coordinates
    x1, y1, x2, y2 = map(int, xyxy)
    cls_id = int(cls)
    # Obtain class name
    cls_name = results.names[cls_id]

    if cls_name not in VEHICLE_CLASSES:
      continue

    detections.append(
        {
            "bbox": [x1, y1, x2, y2],
            "conf": float(conf),
            "class": cls_name
        }
    )

  return detections

In [None]:
def iou_xyxy(boxA, boxB):
  """
  IoU between two boxes in (x1,y1,x2,y2) format.
  """
  xA = max(boxA[0], boxB[0])
  yA = max(boxA[1], boxB[1])
  xB = min(boxA[2], boxB[2])
  yB = min(boxA[3], boxB[3])

  interW = max(0, xB - xA)
  interH = max(0, yB - yA)
  interArea = interW * interH

  if interArea == 0:
      return 0.0

  areaA = (boxA[2] - boxA[0]) * (boxA[3] - boxA[1])
  areaB = (boxB[2] - boxB[0]) * (boxB[3] - boxB[1])

  return interArea / float(areaA + areaB - interArea)

In [None]:
def update_tracks(detections, flow, frame_index):
  """
  detections: list from detect_vehicles
  flow: dense optical flow from prev->current frame (HxWx2)
  frame_index: current frame number (int)
  """
  global tracks, next_track_id

  # Initially, no detections are assigned, all tracks are unmatched
  unmatched_tracks = set(tracks.keys())
  det_to_track = {}

  # IoU-based matching
  for det_idx, det in enumerate(detections):
    box_det = det["bbox"]
    best_iou = 0
    best_track = None

    for tid in unmatched_tracks:
      box_tr = tracks[tid]["bbox"]
      iou_val = iou_xyxy(box_det, box_tr)
      if iou_val > best_iou:
        best_iou = iou_val
        best_track = tid

    if best_iou > IOU_MATCH_THRESH:
      det_to_track[det_idx] = best_track
      unmatched_tracks.remove(best_track)

  # Update matched tracks (bbox, trace, speed)
  H, W = flow.shape[:2]

  for det_idx, track_id in det_to_track.items():
    x1, y1, x2, y2 = detections[det_idx]["bbox"]

    cx = (x1 + x2) // 2
    cy = (y1 + y2) // 2

    tracks[track_id]["bbox"] = (x1, y1, x2, y2)
    tracks[track_id]["trace"].append((cx, cy))
    tracks[track_id]["last_seen"] = frame_index

  # Create new tracks for unmatched detections
  for det_idx, det in enumerate(detections):
    if det_idx in det_to_track:
      continue

    x1, y1, x2, y2 = det["bbox"]
    cx = (x1 + x2) // 2
    cy = (y1 + y2) // 2

    tracks[next_track_id] = {
      "bbox": (x1, y1, x2, y2),
      "trace": [(cx, cy)],
      "last_seen": frame_index
    }
    next_track_id += 1

  # Remove tracks that disappeared
  max_missing_frames = FPS
  tracks = {
    tid: tr for tid, tr in tracks.items()
    if frame_index - tr["last_seen"] <= max_missing_frames
  }

## Main Run

Now utilizing every helper functions and configurations above, the main code below will

- Read the input video frame by frame and initialize the video writer for the output
- Detect vehicles in each frame using the pre-trained YOLO-V5m model
- Collect inference metrics for each frame including:
    - Number of detections
    - Confidence scores for all detections
    - Detection counts by class (car, truck, bus)
- Compute dense optical flow between consecutive frames using the Farneback method
- Update tracking for each detection by matching detections with existing tracks using IoU threshold
- Draw bounding boxes, track IDs, and tracking lines on each frame for visualization
- Write the annotated frames to an output video file
- Convert the output video to MP4 format using ffmpeg and display it in the notebook

### Farneback Optical Flow

The Farneback method is a dense optical flow algorithm that estimates motion vectors for every pixel between two consecutive frames. Unlike sparse methods (e.g., Lucas-Kanade) that track specific feature points, Farneback computes a complete motion field across the entire image.

#### How it works:
- Approximates each neighborhood of pixels using polynomial expansion
- Estimates displacement by analyzing how these polynomials change between frames
- Uses a multi-scale pyramid approach to handle large motions

#### Parameters used in cv2.calcOpticalFlowFarneback:
- **pyr_scale=0.5**: Each pyramid level is half the size of the previous
- **levels=3**: Number of pyramid levels
- **winsize=15**: Averaging window size (larger = smoother but less precise)
- **iterations=3**: Number of iterations at each pyramid level
- **poly_n=5**: Size of pixel neighborhood for polynomial expansion
- **poly_sigma=1.2**: Standard deviation of Gaussian for polynomial expansion

The resulting flow field helps improve vehicle tracking by providing motion context between frames.

In [None]:
cap = cv2.VideoCapture(video_path)

ret, frame = cap.read()
if not ret:
  raise RuntimeError("Couldn't read the video")

h, w = frame.shape[:2]
fps = int(cap.get(cv2.CAP_PROP_FPS)) or 15

out = cv2.VideoWriter(
  "YOLO_V5m_highway_with_detection.avi",
  cv2.VideoWriter_fourcc(*"XVID"),
  fps,
  (w, h)
)

old_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
tracks = {}
next_track_id = 0

# Collect inference metrics
metrics = {
    "frame_indices": [],
    "detections_per_frame": [],
    "confidence_scores": [],
    "class_counts": {"car": [], "truck": [], "bus": []},
}

cap.set(cv2.CAP_PROP_POS_FRAMES, 0)
frame_index = 0

while True:
  ret, frame = cap.read()
  if not ret:
      break

  frame_index += 1

  dets = detect_vehicles(frame)

  # Collect metrics for this frame
  metrics["frame_indices"].append(frame_index)
  metrics["detections_per_frame"].append(len(dets))

  # Count detections by class and collect confidence scores
  class_count = {"car": 0, "truck": 0, "bus": 0}
  for det in dets:
    metrics["confidence_scores"].append(det["conf"])
    class_count[det["class"]] += 1

  for cls in class_count:
    metrics["class_counts"][cls].append(class_count[cls])

  frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

  # Calculate Dense Optical Flow
  # This is the core function
  flow = cv2.calcOpticalFlowFarneback(old_gray, frame_gray,
                                      None,
                                      0.5,  # pyr_scale
                                      3,    # levels
                                      15,   # winsize
                                      3,    # iterations
                                      5,    # poly_n
                                      1.2,  # poly_sigma
                                      0)    # flags

  update_tracks(dets, flow, frame_index)

  # Draw track ID and track line for each detected car
  for track_id, track in tracks.items():
    x1, y1, x2, y2 = track["bbox"]

    # Draw bounding box
    cv2.rectangle(frame, (x1, y1), (x2, y2), (0,255,0), 2)

    # Write track ID
    cv2.putText(frame, f"ID: {track_id}", (x1, y1 - 7), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,225,0), 2)

    # Draw track line
    if len(track["trace"]) > 1:
      line = np.array(track["trace"], dtype=np.int32).reshape(-1,1,2)
      cv2.polylines(frame, [line], False, (0,225,255), 2)


  out.write(frame)

  old_gray = frame_gray.copy()

cap.release()
out.release()

print("Vehicle Detection video for Part B")

# Set output path based on environment
if IN_COLAB:
    output_path_part_b = '/content/drive/MyDrive/enhanced_vehicle_detection/outputs/YOLO_V5m/highway_with_detection.mp4'
else:
    output_dir = '../outputs/YOLO_V5m'
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    output_path_part_b = os.path.join(output_dir, 'highway_with_detection.mp4')

!ffmpeg -y -i YOLO_V5m_highway_with_detection.avi -vcodec libx264 -crf 23 -pix_fmt yuv420p {output_path_part_b} >/dev/null 2>&1

part_b_mp4 = open(output_path_part_b,'rb').read()
data_url = "data:video/mp4;base64," + b64encode(part_b_mp4).decode()
HTML(f'<video width=640 controls><source src="{data_url}" type="video/mp4"></video>')

## Inference Metrics Visualization

The following visualizations show the inference metrics collected during vehicle detection. These metrics can be compared with the YOLO-SwinV2 model to evaluate performance differences.

- **Detections per Frame**: Number of vehicles detected in each frame over time
- **Confidence Score Distribution**: Histogram showing the distribution of detection confidence scores
- **Class Distribution**: Breakdown of detections by vehicle class (car, truck, bus)
- **Summary Statistics**: Overall metrics including total detections and mean confidence


In [None]:
# Set visualization output directory based on environment
if IN_COLAB:
    viz_dir = '/content/drive/MyDrive/enhanced_vehicle_detection/outputs/YOLO_V5m/visualizations'
else:
    viz_dir = '../outputs/YOLO_V5m/visualizations'

# Create visualization directory if it doesn't exist
if not os.path.exists(viz_dir):
    os.makedirs(viz_dir)
    print(f"Created directory: {viz_dir}")

# Detections per Frame
fig1, ax1 = plt.subplots(figsize=(10, 6))
ax1.plot(metrics["frame_indices"], metrics["detections_per_frame"], color='#2E86AB', linewidth=1.5)
ax1.fill_between(metrics["frame_indices"], metrics["detections_per_frame"], alpha=0.3, color='#2E86AB')
ax1.set_xlabel('Frame Index')
ax1.set_ylabel('Number of Detections')
ax1.set_title('Detections per Frame')
ax1.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(viz_dir, 'detections_per_frame.png'), dpi=150, bbox_inches='tight')
plt.show()

# Confidence Score Distribution
fig2, ax2 = plt.subplots(figsize=(10, 6))
if metrics["confidence_scores"]:
    ax2.hist(metrics["confidence_scores"], bins=30, color='#A23B72', edgecolor='white', alpha=0.8)
    ax2.axvline(np.mean(metrics["confidence_scores"]), color='#F18F01', linestyle='--', 
                linewidth=2, label=f'Mean: {np.mean(metrics["confidence_scores"]):.3f}')
    ax2.legend()
ax2.set_xlabel('Confidence Score')
ax2.set_ylabel('Frequency')
ax2.set_title('Confidence Score Distribution')
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(viz_dir, 'confidence_distribution.png'), dpi=150, bbox_inches='tight')
plt.show()

# Class Distribution (Stacked Area)
fig3, ax3 = plt.subplots(figsize=(10, 6))
frames = metrics["frame_indices"]
car_counts = metrics["class_counts"]["car"]
truck_counts = metrics["class_counts"]["truck"]
bus_counts = metrics["class_counts"]["bus"]
ax3.stackplot(frames, car_counts, truck_counts, bus_counts, 
              labels=['Car', 'Truck', 'Bus'],
              colors=['#2E86AB', '#A23B72', '#F18F01'], alpha=0.8)
ax3.legend(loc='upper right')
ax3.set_xlabel('Frame Index')
ax3.set_ylabel('Count')
ax3.set_title('Detection Count by Class')
ax3.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(viz_dir, 'class_distribution_over_time.png'), dpi=150, bbox_inches='tight')
plt.show()

# Total Class Distribution (Pie Chart)
fig4, ax4 = plt.subplots(figsize=(8, 8))
total_cars = sum(car_counts)
total_trucks = sum(truck_counts)
total_buses = sum(bus_counts)
sizes = [total_cars, total_trucks, total_buses]
labels = [f'Car\n({total_cars})', f'Truck\n({total_trucks})', f'Bus\n({total_buses})']
colors = ['#2E86AB', '#A23B72', '#F18F01']
explode = (0.05, 0.05, 0.05)
ax4.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax4.set_title('Total Detection Distribution by Class')
plt.tight_layout()
plt.savefig(os.path.join(viz_dir, 'class_distribution_pie.png'), dpi=150, bbox_inches='tight')
plt.show()

# 5. Combined Summary Plot
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('YOLO-V5m Inference Metrics', fontsize=16, fontweight='bold')

axes[0, 0].plot(metrics["frame_indices"], metrics["detections_per_frame"], color='#2E86AB', linewidth=1.5)
axes[0, 0].fill_between(metrics["frame_indices"], metrics["detections_per_frame"], alpha=0.3, color='#2E86AB')
axes[0, 0].set_xlabel('Frame Index')
axes[0, 0].set_ylabel('Number of Detections')
axes[0, 0].set_title('Detections per Frame')
axes[0, 0].grid(True, alpha=0.3)

if metrics["confidence_scores"]:
    axes[0, 1].hist(metrics["confidence_scores"], bins=30, color='#A23B72', edgecolor='white', alpha=0.8)
    axes[0, 1].axvline(np.mean(metrics["confidence_scores"]), color='#F18F01', linestyle='--', 
                linewidth=2, label=f'Mean: {np.mean(metrics["confidence_scores"]):.3f}')
    axes[0, 1].legend()
axes[0, 1].set_xlabel('Confidence Score')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].set_title('Confidence Score Distribution')
axes[0, 1].grid(True, alpha=0.3)

axes[1, 0].stackplot(frames, car_counts, truck_counts, bus_counts, 
              labels=['Car', 'Truck', 'Bus'],
              colors=['#2E86AB', '#A23B72', '#F18F01'], alpha=0.8)
axes[1, 0].legend(loc='upper right')
axes[1, 0].set_xlabel('Frame Index')
axes[1, 0].set_ylabel('Count')
axes[1, 0].set_title('Detection Count by Class')
axes[1, 0].grid(True, alpha=0.3)

axes[1, 1].pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%',
        shadow=True, startangle=90)
axes[1, 1].set_title('Total Detection Distribution by Class')

plt.tight_layout()
plt.savefig(os.path.join(viz_dir, 'metrics_summary.png'), dpi=150, bbox_inches='tight')
plt.show()

# Print Summary Statistics
total_frames = len(metrics["frame_indices"])
total_detections = sum(metrics["detections_per_frame"])
avg_detections = np.mean(metrics["detections_per_frame"])
avg_confidence = np.mean(metrics["confidence_scores"]) if metrics["confidence_scores"] else 0
std_confidence = np.std(metrics["confidence_scores"]) if metrics["confidence_scores"] else 0

print()
print("YOLO-V5m Summary Statistics")
print(f"\nDetection Statistics")
print(f"   Total Frames Processed:  {total_frames}")
print(f"   Total Detections:        {total_detections}")
print(f"   Avg Detections/Frame:    {avg_detections:.2f}")
print(f"\nConfidence Statistics")
print(f"   Mean Confidence:         {avg_confidence:.4f}")
print(f"   Std Confidence:          {std_confidence:.4f}")
print(f"   Min Confidence:          {min(metrics['confidence_scores']) if metrics['confidence_scores'] else 0:.4f}")
print(f"   Max Confidence:          {max(metrics['confidence_scores']) if metrics['confidence_scores'] else 0:.4f}")

print(f"\nVisualizations saved to '{viz_dir}/':")
print("  - detections_per_frame.png")
print("  - confidence_distribution.png")
print("  - class_distribution_over_time.png")
print("  - class_distribution_pie.png")
print("  - metrics_summary.png")


## Exporting Inference Metrics

In [None]:
import json
import os

# Export metrics to JSON for comparison with YOLO-SwinV2
metrics_export = {
    "model_name": "YOLO-V5m (Pre-trained)",
    "config": {
        "conf_threshold": CONF_THRESH,
        "iou_match_threshold": IOU_MATCH_THRESH,
        "vehicle_classes": list(VEHICLE_CLASSES)
    },
    "summary": {
        "total_frames": len(metrics["frame_indices"]),
        "total_detections": sum(metrics["detections_per_frame"]),
        "avg_detections_per_frame": float(np.mean(metrics["detections_per_frame"])),
        "mean_confidence": float(np.mean(metrics["confidence_scores"])) if metrics["confidence_scores"] else 0,
        "std_confidence": float(np.std(metrics["confidence_scores"])) if metrics["confidence_scores"] else 0,
        "min_confidence": float(min(metrics["confidence_scores"])) if metrics["confidence_scores"] else 0,
        "max_confidence": float(max(metrics["confidence_scores"])) if metrics["confidence_scores"] else 0,
        "total_cars": sum(metrics["class_counts"]["car"]),
        "total_trucks": sum(metrics["class_counts"]["truck"]),
        "total_buses": sum(metrics["class_counts"]["bus"])
    },
    "per_frame_data": {
        "frame_indices": metrics["frame_indices"],
        "detections_per_frame": metrics["detections_per_frame"],
        "class_counts": metrics["class_counts"]
    }
}

# Create output directory if it doesn't exist
output_dir = "../outputs/YOLO_V5m"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
    print(f"Created directory: {output_dir}")

metrics_path = os.path.join(output_dir, "YOLO_V5m_metrics.json")
with open(metrics_path, "w") as f:
    json.dump(metrics_export, f, indent=2)

print(f"Metrics exported to '{metrics_path}'")
print("\nYou can load this file in YOLO_SwinV2.ipynb to compare performance.")
