# Vehicle Detection with YOLO-SwinV2 Model

This notebook implements a modified YOLO model with SwinV2-Tiny as the backbone for vehicle detection. The model is trained on the AAU RainSnow dataset (vehicles in rainy/snowy conditions) and evaluated on the same highway video as the pre-trained YOLO-V5m model.

**Key Features:**
- SwinV2-Tiny backbone replacing the original CSPDarknet backbone
- Training on weather-degraded vehicle data for improved detection in adverse conditions
- Metrics collection for comparison with the pre-trained YOLO-V5m baseline


## How To Run

It is recommended to run this notebook in Google Colab. However, it is implemented so that it can also be run in a local environment.

**To run this notebook in Google Colab:**
- Download the whole project folder (enhanced_vehicle_detection) from GitHub.
- Place it in MyDrive in Google Drive.
    - If the project folder is placed in a different path in Google Drive, the paths for the input video and outputs need to be edited accordingly.
- All set! You can now run the cells.

**To run this notebook in a local environment:**
- Fork or clone the GitHub repository.
- Run `pip install -r app/requirements.txt` to install all required libraries.
- Since the code requires video conversion, make sure to install **ffmpeg**:
    - macOS: `brew install ffmpeg`
    - Ubuntu/Linux: `sudo apt install ffmpeg`
    - Windows: Download from [ffmpeg.org](https://ffmpeg.org/download.html)
- All set! You can now run the cells.

## Setup YOLO V5

The code below installs every required libraries to load and use YOLO-V5 model. This code only need to be run once while using this notebook.

In [1]:
!git clone -q https://github.com/ultralytics/yolov5
%cd yolov5

!pip install -q -r requirements.txt opencv-python-headless==4.10.0.84 timm

/content/yolov5
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m49.9/49.9 MB[0m [31m55.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m1.2/1.2 MB[0m [31m75.5 MB/s[0m eta [36m0:00:00[0m
[?25h

## Clean YOLO V5 directory

If there is any old patches applied to the original YOLO V5 files, remove them and restore back to the original file.

In [2]:
%cd yolov5
!git status

# Reset modified core files (safe and important)
!git checkout -- models/yolo.py models/common.py

[Errno 2] No such file or directory: 'yolov5'
/content/yolov5
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean


## Import Necessary Libraries

In [None]:
import cv2, torch, numpy as np, matplotlib.pyplot as plt
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision.ops import nms
from collections import defaultdict
from IPython.display import HTML, display
from base64 import b64encode
import timm
import sys
import json, shutil
import os, pandas as pd, glob
import random
from pathlib import Path
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

## Environment Setup

Set up paths based on whether running in Google Colab or local environment.

In [None]:
# Check if running in Google Colab
IN_COLAB = 'COLAB_GPU' in os.environ or 'google.colab' in str(get_ipython())

if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')

    # Paths for Colab
    DATA_ROOT = Path('/content/drive/MyDrive/enhanced_vehicle_detection/data/training_data_vehicles_in_rain')
    VIDEO_PATH = Path('/content/drive/MyDrive/enhanced_vehicle_detection/data/rainy_highway_video.mp4')
    PROJECT_ROOT = Path('/content/drive/MyDrive/enhanced_vehicle_detection')
else:
    # Paths for local environment
    DATA_ROOT = Path('../data/training_data_vehicles_in_rain')
    VIDEO_PATH = Path('../data/rainy_highway_video.mp4')
    PROJECT_ROOT = Path('../')

# Set the random seed for reproducibility
SEED = 42

def set_seed(seed):
    """Set random seeds for reproducibility."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
      # For multi-GPU
    torch.cuda.manual_seed_all(seed)

    # For deterministic behavior
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

    # Set environment variable for additional reproducibility
    os.environ['PYTHONHASHSEED'] = str(seed)

set_seed(SEED)
print(f"Random seed set to: {SEED}")
# ============================================================================

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
print(f"Data root: {DATA_ROOT}")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Random seed set to: 42
Using device: cuda
Data root: /content/drive/MyDrive/DL/enhanced_vehicle_detection/data/training_data_vehicles_in_rain


In [None]:
%%writefile models/swinv2_backbone.py
import torch
import torch.nn as nn
import torch.nn.functional as F
import timm
from models.common import Conv

class SwinV2Backbone(nn.Module):
    """
    Single-output SwinV2 backbone for YOLOv5.
    - Takes 3xHxW RGB images (any size).
    - Resizes to expected size for SwinV2.
    - Uses timm SwinV2 Tiny.
    - Grabs the last stage feature map.
    - Projects to c2 channels and resizes output for YOLO.
    """
    def __init__(self, c1, c2, model_name="swinv2_tiny_window16_256", out_index=3, pretrained=True):
        super().__init__()
        
        # Expected input size for the SwinV2 model (256 for swinv2_tiny_window16_256)
        self.expected_size = 256

        # features_only=True for returning a list of feature maps
        self.swin = timm.create_model(
            model_name,
            pretrained=pretrained,
            features_only=True,
            out_indices=(out_index,)
        )
        in_ch = self.swin.feature_info.channels()[0]

        # Project Swin channels for c2 channels used by YOLO head
        self.proj = Conv(in_ch, c2, k=1, s=1)

    def forward(self, x):
        # Store original size
        _, _, orig_h, orig_w = x.shape
        
        # SwinV2 requires fixed input size
        if orig_h != self.expected_size or orig_w != self.expected_size:
            x = F.interpolate(x, size=(self.expected_size, self.expected_size), 
                            mode='bilinear', align_corners=False)
        
        # Extract the tensor from the list with 1 tensor
        feats = self.swin(x)
        f = feats[0]

        # Convert NHWC to NCHW as YOLO expects
        if f.dim() == 4:
            f = f.permute(0, 3, 1, 2).contiguous()

        # Project to output channels
        out = self.proj(f)
        
        # Resize output to match expected stride-32 feature map size
        expected_out_h = orig_h // 32
        expected_out_w = orig_w // 32
        _, _, out_h, out_w = out.shape
        
        if out_h != expected_out_h or out_w != expected_out_w:
            out = F.interpolate(out, size=(expected_out_h, expected_out_w), 
                              mode='bilinear', align_corners=False)
        
        return out

Overwriting models/swinv2_backbone.py


In [None]:
# Set yolo.py path based on environment
if IN_COLAB:
    file_path = "/content/yolov5/models/yolo.py"
else:
    # Assuming we're already in the yolov5 directory after %cd yolov5
    file_path = "models/yolo.py"

insert_line = "from models.swinv2_backbone import SwinV2Backbone\n"

# Read file
with open(file_path, "r") as f:
    lines = f.readlines()

# Only insert if the importing SwinV2Backbone is not already present
if insert_line not in "".join(lines):

    new_lines = []
    inserted = False

    for line in lines:
        # Before the common import, insert our Swin import
        if line.startswith("from models.common import"):
            new_lines.append(insert_line)
            inserted = True

        new_lines.append(line)

    if inserted:
        with open(file_path, "w") as f:
            f.writelines(new_lines)
        print("Successfully inserted SwinV2Backbone import before models.common.")
    else:
        print("Could not find 'from models.common import (' in yolo.py.")
else:
    print(f"SwinV2Backbone import already present ‚Äî no changes made.")

In [None]:
yolo_path = Path("models/yolo.py")
text = yolo_path.read_text()
import_line = "from models.swinv2_backbone import SwinV2Backbone\n"

# Patch parse_model to handle SwinV2Backbone
snippet = """        elif m is Contract:
            c2 = ch[f] * args[0] ** 2
        elif m is Expand:
            c2 = ch[f] // args[0] ** 2
        else:
            c2 = ch[f]
"""

replacement = """        elif m is Contract:
            c2 = ch[f] * args[0] ** 2
        elif m is Expand:
            c2 = ch[f] // args[0] ** 2
        elif m is SwinV2Backbone:
            # args: [c2, model_name, out_index, pretrained]
            c1, c2 = ch[f], args[0]
            args = [c1, c2, *args[1:]]
        else:
            c2 = ch[f]
"""

if snippet not in text:
    raise RuntimeError("Expected snippet not found in models/yolo.py. YOLOv5 version mismatch?")
text = text.replace(snippet, replacement)

yolo_path.write_text(text)
print("Patched models/yolo.py for SwinV2Backbone")

‚úÖ Patched models/yolo.py for SwinV2Backbone


In [None]:
swin_yaml = r"""
# YOLOv5 + SwinV2 Tiny backbone (single-scale detection)
nc: 3

# Single scale anchors - only one set for single-scale detection
anchors:
  - [30,61, 62,45, 59,119]

depth_multiple: 1.0
width_multiple: 1.0

backbone:
  # args: [c2, model_name, out_index, pretrained]
  # c1 is automatically set from previous layer channels
  - [-1, 1, SwinV2Backbone, [256, 'swinv2_tiny_window16_256', 3, true]]

head:
  # C3 args: [c2] - n is automatically handled by the repeat count (2nd value)
  - [0, 1, C3, [256]]
  # Single-scale Detect: input from layer 1 only, uses single anchor set
  - [[1], 1, Detect, [nc, anchors]]
"""

Path("models/yolov5m_swinv2.yaml").write_text(swin_yaml)
print("Created models/yolov5m_swinv2.yaml")

‚úÖ Created models/yolov5m_swinv2.yaml


In [None]:
dataset_yaml = f"""
train: /content/data/AAU_YOLO/images/train
val: /content/data/AAU_YOLO/images/val
nc: 3
names: ['car', 'truck', 'bus']
"""

Path("data/vehicles.yaml").write_text(dataset_yaml)
print("data/vehicles.yaml written")

‚úÖ data/vehicles.yaml written


## Process data

In [None]:
json_path = DATA_ROOT/"aauRainSnow-rgb.json"

output_root = PROJECT_ROOT/"data/AAU_YOLO"
output_root.mkdir(parents=True, exist_ok=True)

# Create folder structure
for folder in ["images/train","images/val","labels/train","labels/val"]:
    (output_root/folder).mkdir(parents=True, exist_ok=True)

print("Output folders created:", output_root)

‚úî Output folders created: /content/drive/MyDrive/DL/enhanced_vehicle_detection/data/AAU_YOLO


## Load JSON

In [None]:
print("Loading JSON:", json_path)
with open(json_path) as f:
    coco = json.load(f)

images = {im["id"]: im for im in coco["images"]}

# Map AAU classes to YOLO classes
VEHICLE_MAP = {3:0, 6:1, 8:2}  # car=0, truck=1, bus=2

# Collect annotations per image
anns = {}
for ann in coco["annotations"]:
    if ann["category_id"] in VEHICLE_MAP:
        anns.setdefault(ann["image_id"], []).append(ann)

Loading JSON: /content/drive/MyDrive/DL/enhanced_vehicle_detection/data/training_data_vehicles_in_rain/aauRainSnow-rgb.json


## Train/Val Split

In [21]:
image_ids = list(anns.keys())
np.random.shuffle(image_ids)

split = int(0.8 * len(image_ids))
train_ids = image_ids[:split]
val_ids = image_ids[split:]

print(f"Train images: {len(train_ids)}, Val images: {len(val_ids)}")

Train images: 1597, Val images: 400


In [22]:
def coco_to_yolo(img_id, img_dir, lbl_dir):
    info = images[img_id]

    src_img = DATA_ROOT / info["file_name"]
    fname = Path(info["file_name"]).name

    # Copy image
    dst_img = img_dir / fname
    shutil.copy2(src_img, dst_img)

    w, h = info["width"], info["height"]

    # Create YOLO label
    dst_lbl = lbl_dir / (fname.replace(".png", ".txt").replace(".jpg",".txt"))

    lines = []
    for ann in anns[img_id]:
        cls = VEHICLE_MAP[ann["category_id"]]
        x, y, bw, bh = ann["bbox"]

        xc = (x + bw/2) / w
        yc = (y + bh/2) / h
        bw /= w
        bh /= h

        lines.append(f"{cls} {xc:.6f} {yc:.6f} {bw:.6f} {bh:.6f}")

    dst_lbl.write_text("\n".join(lines))

In [None]:
def process(ids, split):
    img_dir = output_root / f"images/{split}"
    lbl_dir = output_root / f"labels/{split}"

    for img_id in tqdm(ids, desc=f"Processing {split}"):
        coco_to_yolo(img_id, img_dir, lbl_dir)

process(train_ids, "train")
process(val_ids, "val")

print("Dataset conversion completed.")

Processing train: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1597/1597 [48:49<00:00,  1.83s/it]
Processing val: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 400/400 [12:38<00:00,  1.90s/it]

‚úî Dataset conversion completed.





In [None]:
dataset_yaml = f"""
train: {output_root}/images/train
val: {output_root}/images/val

nc: 3
names: ['car','truck','bus']
"""

# Write vehicles.yaml to yolov5 data directory
if IN_COLAB:
    vehicles_yaml_path = Path("/content/yolov5/data/vehicles.yaml")
else:
    vehicles_yaml_path = Path("data/vehicles.yaml")

vehicles_yaml_path.write_text(dataset_yaml)
print(f"vehicles.yaml created at: {vehicles_yaml_path}")

‚úî vehicles.yaml created


In [None]:
# Set yolo.py path based on environment
if IN_COLAB:
    file_path = "/content/yolov5/models/yolo.py"
else:
    file_path = "models/yolo.py"

# Verify SwinV2Backbone import was added
with open(file_path, "r") as f:
    lines = f.readlines()

print("Checking for SwinV2Backbone in yolo.py:")
for line in lines:
    if "SwinV2Backbone" in line:
        print(f"{line.strip()}")

from models.swinv2_backbone import SwinV2Backbone

        elif m is SwinV2Backbone:



## Training: YOLO with SwinV2 Backbone

This notebook trains a **modified YOLO model with SwinV2-Tiny backbone** replacing the original CSPDarknet backbone.

### Key Architecture Change:
| Component | Original YOLOv5 | YOLO-SwinV2 (This Project) |
|-----------|----------------|---------------------------|
| Backbone | CSPDarknet53 | **SwinV2-Tiny** (ImageNet pretrained) |
| Neck | FPN + PAN | Single-scale C3 |
| Head | Multi-scale Detect | Single-scale Detect |

### Training Strategy:
The SwinV2 backbone is **pretrained on ImageNet** via the `timm` library, providing excellent feature extraction capabilities. Only the detection head is trained from scratch on the AAU RainSnow dataset.

---

### Option A: Fine-tune Standard YOLOv5m
> **Note**: The cells immediately below fine-tune standard YOLOv5m (CSPDarknet backbone), NOT SwinV2. 
> **Skip cells 26-31 and go directly to Cell 32** for training with SwinV2 backbone. fine-tuning data
- Standard YOLOv5 architecture (no SwinV2 backbone)

### Option B: Train YOLO-SwinV2 from Scratch
- Custom architecture with SwinV2-Tiny backbone
- Trained only on AAU RainSnow (~1,600 images)
- Requires many more epochs for good results
- Experimental architecture comparison


In [None]:
# OPTION A: FINE-TUNE PRETRAINED YOLOv5m ON AAU RAINSNOW
# This uses transfer learning: COCO pretrained weights + fine-tuning on rainy data
# The model already knows how to detect vehicles and just adapt it to rainy conditions

# Change to yolov5 directory
import os
yolov5_dir = "/content/yolov5" if IN_COLAB else "yolov5"
os.chdir(yolov5_dir)
print(f"Working directory: {os.getcwd()}")

# Create a dataset config that maps COCO vehicle classes
# COCO classes: car=2, truck=7, bus=5. Train with nc=80 and filter at inference

# Create vehicles dataset YAML (keeping original COCO class structure for fine-tuning)
vehicles_finetune_yaml = f"""
# AAU RainSnow dataset for fine-tuning (uses COCO class IDs)
train: {output_root}/images/train
val: {output_root}/images/val

# We use 3 classes for our custom training
nc: 3
names: ['car', 'truck', 'bus']
"""

Path("data/vehicles_finetune.yaml").write_text(vehicles_finetune_yaml)
print("Created data/vehicles_finetune.yaml for fine-tuning")

# Training hyperparameters for fine-tuning
FINETUNE_EPOCHS = 20
FINETUNE_BATCH = 16
FINETUNE_IMG_SIZE = 640   # Standard YOLOv5 size

print(f"\nFine-tuning Configuration:")
print(f"  - Base model: YOLOv5m (pretrained on COCO)")
print(f"  - Fine-tune dataset: AAU RainSnow")
print(f"  - Epochs: {FINETUNE_EPOCHS}")
print(f"  - Image size: {FINETUNE_IMG_SIZE}")
print(f"  - Batch size: {FINETUNE_BATCH}")


In [None]:
# RUN FINE-TUNING (Option A)
# Key difference: --weights yolov5m.pt (pretrained) instead of '' (from scratch)

!python train.py \
  --img {FINETUNE_IMG_SIZE} \
  --batch {FINETUNE_BATCH} \
  --epochs {FINETUNE_EPOCHS} \
  --data data/vehicles_finetune.yaml \
  --weights yolov5m.pt \
  --project YOLO_Finetuned \
  --name rainsnow_finetuned \
  --cache \
  --exist-ok

print("\nFine-tuning complete!")
print("Weights saved to: YOLO_Finetuned/rainsnow_finetuned/weights/best.pt")


## Training Metrics Visualization

YOLOv5 automatically logs the following metrics during training:
- **Training Losses**: Box loss, Objectness loss, Classification loss
- **Validation Losses**: Same three losses on validation set
- **Evaluation Metrics**: Precision, Recall, mAP@0.5, mAP@0.5:0.95

These are saved to `results.csv` in the training output folder.


In [None]:
# LOAD AND VISUALIZE TRAINING METRICS

# Find the training results
results_files = sorted(glob.glob("YOLO_Finetuned/rainsnow_finetuned*/results.csv"))

if not results_files:
    results_files = sorted(glob.glob("YOLO_SwinV2/rainsnow_swinv2*/results.csv"))

if results_files:
    results_path = results_files[-1]
    print(f"Loading training results from: {results_path}")
    
    # Load results
    df = pd.read_csv(results_path)
    # Remove whitespace from column names
    df.columns = df.columns.str.strip()
    
    print(f"\nAvailable metrics: {list(df.columns)}")
    print(f"Total epochs: {len(df)}")
    
    # Create comprehensive visualization
    fig, axes = plt.subplots(2, 3, figsize=(18, 10))
    fig.suptitle('YOLOv5 Training Metrics', fontsize=16, fontweight='bold')
    
    epochs = df.index + 1
    
    # 1. Training Losses
    ax1 = axes[0, 0]
    if 'train/box_loss' in df.columns:
        ax1.plot(epochs, df['train/box_loss'], label='Box Loss', color='#E63946', linewidth=2)
        ax1.plot(epochs, df['train/obj_loss'], label='Objectness Loss', color='#2E86AB', linewidth=2)
        ax1.plot(epochs, df['train/cls_loss'], label='Class Loss', color='#F4A261', linewidth=2)
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title('Training Losses')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # 2. Validation Losses
    ax2 = axes[0, 1]
    if 'val/box_loss' in df.columns:
        ax2.plot(epochs, df['val/box_loss'], label='Box Loss', color='#E63946', linewidth=2)
        ax2.plot(epochs, df['val/obj_loss'], label='Objectness Loss', color='#2E86AB', linewidth=2)
        ax2.plot(epochs, df['val/cls_loss'], label='Class Loss', color='#F4A261', linewidth=2)
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Loss')
    ax2.set_title('Validation Losses')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    # 3. Combined Train vs Val Loss
    ax3 = axes[0, 2]
    if 'train/box_loss' in df.columns and 'val/box_loss' in df.columns:
        total_train = df['train/box_loss'] + df['train/obj_loss'] + df['train/cls_loss']
        total_val = df['val/box_loss'] + df['val/obj_loss'] + df['val/cls_loss']
        ax3.plot(epochs, total_train, label='Training Loss', color='#2E86AB', linewidth=2)
        ax3.plot(epochs, total_val, label='Validation Loss', color='#E63946', linewidth=2)
    ax3.set_xlabel('Epoch')
    ax3.set_ylabel('Total Loss')
    ax3.set_title('Training vs Validation Loss')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # 4. Precision & Recall
    ax4 = axes[1, 0]
    if 'metrics/precision' in df.columns:
        ax4.plot(epochs, df['metrics/precision'], label='Precision', color='#2A9D8F', linewidth=2)
        ax4.plot(epochs, df['metrics/recall'], label='Recall', color='#E76F51', linewidth=2)
    ax4.set_xlabel('Epoch')
    ax4.set_ylabel('Score')
    ax4.set_title('Precision & Recall')
    ax4.set_ylim(0, 1)
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    # 5. mAP Scores
    ax5 = axes[1, 1]
    if 'metrics/mAP_0.5' in df.columns:
        ax5.plot(epochs, df['metrics/mAP_0.5'], label='mAP@0.5', color='#264653', linewidth=2)
        ax5.plot(epochs, df['metrics/mAP_0.5:0.95'], label='mAP@0.5:0.95', color='#E9C46A', linewidth=2)
    ax5.set_xlabel('Epoch')
    ax5.set_ylabel('mAP')
    ax5.set_title('Mean Average Precision (mAP)')
    ax5.set_ylim(0, 1)
    ax5.legend()
    ax5.grid(True, alpha=0.3)
    
    # 6. Learning Rate
    ax6 = axes[1, 2]
    lr_cols = [c for c in df.columns if 'lr' in c.lower()]
    if lr_cols:
        for col in lr_cols:
            ax6.plot(epochs, df[col], label=col.split('/')[-1], linewidth=2)
    ax6.set_xlabel('Epoch')
    ax6.set_ylabel('Learning Rate')
    ax6.set_title('Learning Rate Schedule')
    ax6.legend()
    ax6.grid(True, alpha=0.3)
    
    plt.tight_layout()
    
    # Save the plot
    train_viz_dir = os.path.dirname(results_path)
    plt.savefig(os.path.join(train_viz_dir, 'training_metrics_summary.png'), dpi=150, bbox_inches='tight')
    
    # Also save to output directory
    if 'OUTPUT_DIR' in dir():
        os.makedirs(os.path.join(OUTPUT_DIR, 'visualizations'), exist_ok=True)
        plt.savefig(os.path.join(OUTPUT_DIR, 'visualizations', 'training_metrics.png'), dpi=150, bbox_inches='tight')
    
    plt.show()
    
    # Print summary statistics
    print(f"\n")
    print("           TRAINING SUMMARY")
    print(f"\n")
    if 'metrics/mAP_0.5' in df.columns:
        best_epoch = df['metrics/mAP_0.5'].idxmax() + 1
        print(f"Best mAP@0.5: {df['metrics/mAP_0.5'].max():.4f} (Epoch {best_epoch})")
        print(f"Best mAP@0.5:0.95: {df['metrics/mAP_0.5:0.95'].max():.4f}")
    if 'metrics/precision' in df.columns:
        print(f"Best Precision: {df['metrics/precision'].max():.4f}")
        print(f"Best Recall: {df['metrics/recall'].max():.4f}")
    if 'val/box_loss' in df.columns:
        print(f"\nFinal Validation Losses:")
        print(f"  - Box Loss: {df['val/box_loss'].iloc[-1]:.4f}")
        print(f"  - Obj Loss: {df['val/obj_loss'].iloc[-1]:.4f}")
        print(f"  - Cls Loss: {df['val/cls_loss'].iloc[-1]:.4f}")
    print(f"\n")
    
else:
    print("No training results found. Run training first!")


In [None]:
# EXPORT TRAINING METRICS TO JSON

if results_files:
    # Create training metrics export
    training_metrics_export = {
        "model_name": "YOLOv5m Fine-tuned on AAU RainSnow" if 'Finetuned' in results_path else "YOLO-SwinV2",
        "training_config": {
            "epochs": len(df),
            "base_weights": "yolov5m.pt (COCO pretrained)" if 'Finetuned' in results_path else "scratch",
            "dataset": "AAU RainSnow",
        },
        "best_metrics": {
            "mAP_0.5": float(df['metrics/mAP_0.5'].max()) if 'metrics/mAP_0.5' in df.columns else None,
            "mAP_0.5_0.95": float(df['metrics/mAP_0.5:0.95'].max()) if 'metrics/mAP_0.5:0.95' in df.columns else None,
            "precision": float(df['metrics/precision'].max()) if 'metrics/precision' in df.columns else None,
            "recall": float(df['metrics/recall'].max()) if 'metrics/recall' in df.columns else None,
            "best_epoch": int(df['metrics/mAP_0.5'].idxmax() + 1) if 'metrics/mAP_0.5' in df.columns else None,
        },
        "final_losses": {
            "train_box_loss": float(df['train/box_loss'].iloc[-1]) if 'train/box_loss' in df.columns else None,
            "train_obj_loss": float(df['train/obj_loss'].iloc[-1]) if 'train/obj_loss' in df.columns else None,
            "train_cls_loss": float(df['train/cls_loss'].iloc[-1]) if 'train/cls_loss' in df.columns else None,
            "val_box_loss": float(df['val/box_loss'].iloc[-1]) if 'val/box_loss' in df.columns else None,
            "val_obj_loss": float(df['val/obj_loss'].iloc[-1]) if 'val/obj_loss' in df.columns else None,
            "val_cls_loss": float(df['val/cls_loss'].iloc[-1]) if 'val/cls_loss' in df.columns else None,
        },
        "per_epoch_data": {
            "epochs": list(range(1, len(df) + 1)),
            "train_loss": (df['train/box_loss'] + df['train/obj_loss'] + df['train/cls_loss']).tolist() if 'train/box_loss' in df.columns else [],
            "val_loss": (df['val/box_loss'] + df['val/obj_loss'] + df['val/cls_loss']).tolist() if 'val/box_loss' in df.columns else [],
            "mAP_0.5": df['metrics/mAP_0.5'].tolist() if 'metrics/mAP_0.5' in df.columns else [],
            "precision": df['metrics/precision'].tolist() if 'metrics/precision' in df.columns else [],
            "recall": df['metrics/recall'].tolist() if 'metrics/recall' in df.columns else [],
        }
    }
    
    # Save to training directory
    metrics_json_path = os.path.join(os.path.dirname(results_path), 'training_metrics.json')
    with open(metrics_json_path, 'w') as f:
        json.dump(training_metrics_export, f, indent=2)
    print(f"Training metrics exported to: {metrics_json_path}")
    
    # Also save to output directory
    if 'OUTPUT_DIR' in dir():
        output_metrics_path = os.path.join(OUTPUT_DIR, 'training_metrics.json')
        with open(output_metrics_path, 'w') as f:
            json.dump(training_metrics_export, f, indent=2)
        print(f"Also saved to: {output_metrics_path}")
else:
    print("No training results to export.")


In [None]:
# SET WEIGHTS PATH FOR INFERENCE
# This cell is for Option A (standard YOLOv5m fine-tuning)
# Skip this cell if you're using YOLO-SwinV2 (go to Cell 35 instead)

# Find the fine-tuned weights (Option A - standard YOLOv5m)
finetuned_weights = sorted(glob.glob("YOLO_Finetuned/rainsnow_finetuned*/weights/best.pt"))

if finetuned_weights:
    WEIGHTS_PATH = finetuned_weights[-1]
    print(f"Using fine-tuned YOLOv5m weights: {WEIGHTS_PATH}")
    print("(Note: This uses standard CSPDarknet backbone, not SwinV2)")
else:
    # Fallback to SwinV2 weights
    swinv2_weights = sorted(glob.glob("YOLO_SwinV2/rainsnow_swinv2*/weights/best.pt"))
    if swinv2_weights:
        WEIGHTS_PATH = swinv2_weights[-1]
        print(f"Fine-tuned weights not found. Using SwinV2 weights: {WEIGHTS_PATH}")
    else:
        raise FileNotFoundError("No trained weights found! Run training first.")

# Set output directory
if IN_COLAB:
    OUTPUT_DIR = '/content/drive/MyDrive/DL/enhanced_vehicle_detection/outputs/YOLO_Finetuned'
else:
    OUTPUT_DIR = '../outputs/YOLO_Finetuned'

os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(os.path.join(OUTPUT_DIR, 'visualizations'), exist_ok=True)
print(f"Output directory: {OUTPUT_DIR}")


---
## Option B - Train YOLO-SwinV2
---

## Train YOLO-SwinV2

**This is the main training section for the modified YOLO with SwinV2 backbone.**

### Architecture Details:
- **Backbone**: SwinV2-Tiny (`swinv2_tiny_window16_256`) - pretrained on ImageNet
- **Detection Head**: Custom single-scale head trained on AAU RainSnow
- **Input Size**: 256x256 (required by SwinV2 window size)

### Why SwinV2 Backbone?
- **Shifted Window Attention**: More efficient than global attention
- **Hierarchical Features**: Multi-scale representations like CNNs
- **Transfer Learning**: ImageNet pretraining provides strong visual features


In [None]:
# TRAIN YOLO-SwinV2 MODEL

# Change to yolov5 directory
import os
yolov5_dir = "/content/yolov5" if IN_COLAB else "yolov5"
os.chdir(yolov5_dir)
print(f"Working directory: {os.getcwd()}")

# ============================================================================
# TRAINING HYPERPARAMETERS - Adjust these for your time/quality tradeoff
SWINV2_EPOCHS = 20
SWINV2_BATCH = 16
SWINV2_IMG_SIZE = 256    # Required for SwinV2 window16_256

print(f"\nYOLO-SwinV2 Training Configuration:")
print(f"  - Backbone: SwinV2-Tiny (ImageNet pretrained)")
print(f"  - Epochs: {SWINV2_EPOCHS}")
print(f"  - Batch size: {SWINV2_BATCH}")
print(f"  - Image size: {SWINV2_IMG_SIZE}x{SWINV2_IMG_SIZE}")
print(f"  - Dataset: AAU RainSnow (vehicles in adverse weather)")

# Train the model
!python train.py \
  --img {SWINV2_IMG_SIZE} \
  --batch {SWINV2_BATCH} \
  --epochs {SWINV2_EPOCHS} \
  --data data/vehicles.yaml \
  --cfg models/yolov5m_swinv2.yaml \
  --weights '' \
  --project YOLO_SwinV2 \
  --name rainsnow_swinv2 \
  --cache \
  --exist-ok

print("\nYOLO-SwinV2 training complete!")
print("Weights saved to: YOLO_SwinV2/rainsnow_swinv2/weights/best.pt")

/content/yolov5
2025-12-13 04:48:00.862698: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1765601280.884170   36342 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1765601280.890674   36342 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1765601280.907167   36342 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1765601280.907194   36342 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1765601280.907197   36342 computation_placer.cc:177] comput

## Video Inference with Trained Model

Run the trained YOLO-SwinV2 model on the highway video to detect vehicles. The trained weights from the best epoch will be used.


In [None]:
# VIDEO INFERENCE WITH YOLO-SwinV2 MODEL
# This uses the trained YOLO model with SwinV2-Tiny backbone

# Find the latest YOLO-SwinV2 training run
swinv2_weights = sorted(glob.glob("YOLO_SwinV2/rainsnow_swinv2*/weights/best.pt"))

if swinv2_weights:
    WEIGHTS_PATH = swinv2_weights[-1]
    print(f"Using YOLO-SwinV2 weights: {WEIGHTS_PATH}")
    print("(SwinV2-Tiny backbone, trained on AAU RainSnow)")
else:
    raise FileNotFoundError("No YOLO-SwinV2 weights found! Run Cell 33 (training) first.")

# Set output directory
if IN_COLAB:
    OUTPUT_DIR = '/content/drive/MyDrive/DL/enhanced_vehicle_detection/outputs/YOLO_SwinV2'
else:
    OUTPUT_DIR = '../outputs/YOLO_SwinV2'

os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(os.path.join(OUTPUT_DIR, 'visualizations'), exist_ok=True)

# Run YOLOv5 detect.py on the video
output_detect_dir = os.path.join(OUTPUT_DIR, 'detect_output')

!python detect.py \
    --weights {WEIGHTS_PATH} \
    --source {VIDEO_PATH} \
    --img 256 \
    --conf-thres 0.25 \
    --iou-thres 0.45 \
    --project {OUTPUT_DIR} \
    --name detect_output \
    --exist-ok \
    --save-txt \
    --save-conf

print(f"\nDetection complete!")
print(f"Results saved to: {output_detect_dir}")


## Display Detection Video

Convert the output video to MP4 format and display it in the notebook.


In [None]:
# Checking what files were created
detect_output_dir = os.path.join(OUTPUT_DIR, 'detect_output')
print(f"Checking directory: {detect_output_dir}")

if os.path.exists(detect_output_dir):
    all_files = os.listdir(detect_output_dir)
    print(f"\nFiles in detect_output:")
    for f in all_files:
        filepath = os.path.join(detect_output_dir, f)
        size = os.path.getsize(filepath) if os.path.isfile(filepath) else 0
        print(f"  - {f} ({size / 1024:.1f} KB)")
else:
    print("detect_output directory not found!")

# Also checking the main output dir
print(f"\nFiles in OUTPUT_DIR ({OUTPUT_DIR}):")
if os.path.exists(OUTPUT_DIR):
    for f in os.listdir(OUTPUT_DIR):
        filepath = os.path.join(OUTPUT_DIR, f)
        if os.path.isfile(filepath):
            size = os.path.getsize(filepath)
            print(f"  - {f} ({size / 1024:.1f} KB)")


In [None]:
# Display detection video - using the file from detect_output
detect_video = os.path.join(OUTPUT_DIR, 'detect_output', 'rainy_highway_video.mp4')

if os.path.exists(detect_video):
    file_size_mb = os.path.getsize(detect_video) / 1024 / 1024
    print(f"Detection video found: {detect_video}")
    print(f"Size: {file_size_mb:.1f} MB")
    
    if file_size_mb > 25:
        compressed_video = os.path.join(OUTPUT_DIR, 'detection_compressed.mp4')
        print("\nCreating compressed version for display...")
        os.system(f'ffmpeg -y -i "{detect_video}" -vcodec libx264 -crf 28 -preset fast -vf scale=640:-2 "{compressed_video}" -loglevel error')
        
        if os.path.exists(compressed_video) and os.path.getsize(compressed_video) > 1000:
            print(f"Compressed: {os.path.getsize(compressed_video)/1024/1024:.1f} MB")
            video_to_display = compressed_video
        else:
            video_to_display = detect_video
    else:
        video_to_display = detect_video
    
    print("\nYOLO-SwinV2 Vehicle Detection Result:\n")
    video_data = open(video_to_display, 'rb').read()
    data_url = "data:video/mp4;base64," + b64encode(video_data).decode()
    display(HTML(f'''
        <video width="800" controls autoplay muted>
            <source src="{data_url}" type="video/mp4">
            Your browser does not support the video tag.
        </video>
    '''))
else:
    print(f"Video not found at: {detect_video}")
    # List what's available
    print("\nAvailable files in OUTPUT_DIR:")
    for f in os.listdir(OUTPUT_DIR):
        print(f"  - {f}")


## Enhanced Detection with Tracking and Optical Flow

Apply the same post-processing pipeline as YOLO-V5m:
- **IoU-based tracking**: Match detections across frames using Intersection over Union
- **Farneback Optical Flow**: Compute dense motion between frames for smoother tracking
- **Track management**: Create, update, and remove tracks with unique IDs
- **Visualization**: Draw bounding boxes with track IDs and motion trails


In [None]:
# ENHANCED DETECTION WITH TRACKING (Same pipeline as YOLO-V5m)

yolov5_path = '/content/yolov5' if IN_COLAB else 'yolov5'
sys.path.insert(0, yolov5_path)


# DETECTION QUALITY SETTINGS
FPS = 25

# Detect the model type used
is_swinv2 = 'YOLO_SwinV2' in WEIGHTS_PATH if 'WEIGHTS_PATH' in dir() else True

if is_swinv2:
    # Settings for YOLO-SwinV2 model
    CONF_THRESH = 0.20        # Lower threshold (detection head trained from scratch)
    NMS_IOU_THRESH = 0.30     # More aggressive NMS to reduce overlaps
    MIN_BOX_AREA = 300        
    print("Using settings for YOLO-SwinV2 model (SwinV2-Tiny backbone)")
else:
    # Settings for standard fine-tuned YOLOv5m
    CONF_THRESH = 0.40        
    NMS_IOU_THRESH = 0.45     
    MIN_BOX_AREA = 400        
    print("Using settings for fine-tuned YOLOv5m (CSPDarknet backbone)")

IOU_MATCH_THRESH = 0.3    # For tracking between frames
MAX_DETECTIONS = 30       # Maximum detections per frame

VEHICLE_CLASSES = {0: 'car', 1: 'truck', 2: 'bus'}

print(f"\nDetection Quality Settings:")
print(f"  - Confidence threshold: {CONF_THRESH}")
print(f"  - NMS IoU threshold: {NMS_IOU_THRESH}")
print(f"  - Min box area: {MIN_BOX_AREA} pixels")
print(f"  - Max detections/frame: {MAX_DETECTIONS}")

# Load the trained model
print("\nLoading trained YOLO-SwinV2 model...")
model = torch.hub.load(yolov5_path, 'custom', path=WEIGHTS_PATH, source='local')
model.conf = CONF_THRESH
model.iou = NMS_IOU_THRESH
print(f"Model loaded from: {WEIGHTS_PATH}")

# Helper Functions

def detect_vehicles(frame):
    """Run inference on a frame and return filtered detections."""
    results = model(frame[:, :, ::-1])  # BGR to RGB
    
    raw_detections = results.xyxy[0].cpu()
    
    if len(raw_detections) == 0:
        return []
    
    # Extract boxes, scores, and classes
    boxes = raw_detections[:, :4]
    scores = raw_detections[:, 4]
    classes = raw_detections[:, 5].int()
    
    # Filter by vehicle classes
    vehicle_mask = torch.zeros(len(classes), dtype=torch.bool)
    for cls_id in VEHICLE_CLASSES.keys():
        vehicle_mask |= (classes == cls_id)
    
    if not vehicle_mask.any():
        return []
    
    boxes = boxes[vehicle_mask]
    scores = scores[vehicle_mask]
    classes = classes[vehicle_mask]
    
    # Filter by minimum box area
    areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
    area_mask = areas >= MIN_BOX_AREA
    boxes = boxes[area_mask]
    scores = scores[area_mask]
    classes = classes[area_mask]
    
    if len(boxes) == 0:
        return []
    
    # Apply additional NMS to remove remaining overlaps
    keep_indices = nms(boxes, scores, NMS_IOU_THRESH)
    
    # Limit max detections (keep highest confidence)
    if len(keep_indices) > MAX_DETECTIONS:
        # Sort by score and keep top N
        sorted_idx = torch.argsort(scores[keep_indices], descending=True)
        keep_indices = keep_indices[sorted_idx[:MAX_DETECTIONS]]
    
    detections = []
    for idx in keep_indices:
        x1, y1, x2, y2 = boxes[idx].numpy().astype(int)
        detections.append({
            "bbox": [x1, y1, x2, y2],
            "conf": float(scores[idx]),
            "class": VEHICLE_CLASSES[int(classes[idx])]
        })
    
    return detections

def iou_xyxy(boxA, boxB):
    """Compute IoU between two boxes in [x1, y1, x2, y2] format."""
    xA = max(boxA[0], boxB[0])
    yA = max(boxA[1], boxB[1])
    xB = min(boxA[2], boxB[2])
    yB = min(boxA[3], boxB[3])
    
    interW = max(0, xB - xA)
    interH = max(0, yB - yA)
    interArea = interW * interH
    
    if interArea == 0:
        return 0.0
    
    areaA = (boxA[2] - boxA[0]) * (boxA[3] - boxA[1])
    areaB = (boxB[2] - boxB[0]) * (boxB[3] - boxB[1])
    
    return interArea / float(areaA + areaB - interArea)

def update_tracks(detections, flow, frame_index):
    """Update tracks based on detections and optical flow."""
    global tracks, next_track_id
    
    unmatched_tracks = set(tracks.keys())
    det_to_track = {}
    
    # IoU-based matching
    for det_idx, det in enumerate(detections):
        box_det = det["bbox"]
        best_iou = 0
        best_track = None
        
        for tid in unmatched_tracks:
            box_tr = tracks[tid]["bbox"]
            iou_val = iou_xyxy(box_det, box_tr)
            if iou_val > best_iou:
                best_iou = iou_val
                best_track = tid
        
        if best_iou > IOU_MATCH_THRESH:
            det_to_track[det_idx] = best_track
            unmatched_tracks.remove(best_track)
    
    # Update matched tracks
    for det_idx, track_id in det_to_track.items():
        x1, y1, x2, y2 = detections[det_idx]["bbox"]
        cx = (x1 + x2) // 2
        cy = (y1 + y2) // 2
        
        tracks[track_id]["bbox"] = (x1, y1, x2, y2)
        tracks[track_id]["conf"] = detections[det_idx]["conf"]
        tracks[track_id]["class"] = detections[det_idx]["class"]
        tracks[track_id]["trace"].append((cx, cy))
        tracks[track_id]["last_seen"] = frame_index
    
    # Create new tracks for unmatched detections
    for det_idx, det in enumerate(detections):
        if det_idx in det_to_track:
            continue
        
        x1, y1, x2, y2 = det["bbox"]
        cx = (x1 + x2) // 2
        cy = (y1 + y2) // 2
        
        tracks[next_track_id] = {
            "bbox": (x1, y1, x2, y2),
            "conf": det["conf"],
            "class": det["class"],
            "trace": [(cx, cy)],
            "last_seen": frame_index
        }
        next_track_id += 1
    
    # Remove tracks that haven't been seen recently
    max_missing_frames = FPS
    tracks_to_remove = [tid for tid, tr in tracks.items() 
                       if frame_index - tr["last_seen"] > max_missing_frames]
    for tid in tracks_to_remove:
        del tracks[tid]

# Process Video with Tracking

print(f"\nProcessing video with tracking: {VIDEO_PATH}")
cap = cv2.VideoCapture(str(VIDEO_PATH))

if not cap.isOpened():
    raise RuntimeError(f"Could not open video: {VIDEO_PATH}")

# Video properties
fps = int(cap.get(cv2.CAP_PROP_FPS)) or FPS
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

print(f"Video: {width}x{height} @ {fps}fps, {total_frames} frames")

# Output paths
output_avi = os.path.join(OUTPUT_DIR, 'YOLO_SwinV2_with_tracking.avi')
output_mp4 = os.path.join(OUTPUT_DIR, 'YOLO_SwinV2_with_tracking.mp4')

# Video writer
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter(output_avi, fourcc, fps, (width, height))

# Initialize tracking
tracks = {}
next_track_id = 0

# Metrics collection
metrics = {
    "frame_indices": [],
    "detections_per_frame": [],
    "confidence_scores": [],
    "class_counts": {"car": [], "truck": [], "bus": []},
}

# Read first frame for optical flow
ret, frame = cap.read()
if not ret:
    raise RuntimeError("Couldn't read the video")

old_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cap.set(cv2.CAP_PROP_POS_FRAMES, 0)

frame_index = 0
pbar = tqdm(total=total_frames, desc='Processing with tracking')

while True:
    ret, frame = cap.read()
    if not ret:
        break
    
    frame_index += 1
    
    # Detect vehicles
    dets = detect_vehicles(frame)
    
    # Collect metrics
    metrics["frame_indices"].append(frame_index)
    metrics["detections_per_frame"].append(len(dets))
    
    class_count = {"car": 0, "truck": 0, "bus": 0}
    for det in dets:
        metrics["confidence_scores"].append(det["conf"])
        class_count[det["class"]] += 1
    
    for cls in class_count:
        metrics["class_counts"][cls].append(class_count[cls])
    
    # Compute optical flow
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    flow = cv2.calcOpticalFlowFarneback(
        old_gray, frame_gray, None,
        0.5,   # pyr_scale
        3,     # levels
        15,    # winsize
        3,     # iterations
        5,     # poly_n
        1.2,   # poly_sigma
        0      # flags
    )
    
    # Update tracks
    update_tracks(dets, flow, frame_index)
    
    # Class-specific colors (BGR format)
    CLASS_COLORS = {
        'car': (0, 255, 0),      # Green
        'truck': (255, 165, 0),   # Orange  
        'bus': (255, 0, 255),     # Magenta
    }
    
    # Draw tracks
    for track_id, track in tracks.items():
        x1, y1, x2, y2 = track["bbox"]
        conf = track["conf"]
        cls_name = track["class"]
        
        color = CLASS_COLORS.get(cls_name, (0, 255, 0))
        
        # Draw bounding box with thicker line
        cv2.rectangle(frame, (x1, y1), (x2, y2), color, 3)
        
        # Draw label background for better visibility
        label = f"ID:{track_id} {cls_name} {conf:.2f}"
        (text_w, text_h), baseline = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.6, 2)
        cv2.rectangle(frame, (x1, y1 - text_h - 10), (x1 + text_w + 4, y1), color, -1)
        cv2.putText(frame, label, (x1 + 2, y1 - 5), 
                   cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 2)
        
        # Draw tracking trail (last 30 points only)
        trace = track["trace"][-30:]  # Limit trail length
        if len(trace) > 1:
            pts = np.array(trace, dtype=np.int32).reshape(-1, 1, 2)
            cv2.polylines(frame, [pts], False, (0, 255, 255), 2)
    
    out.write(frame)
    old_gray = frame_gray.copy()
    pbar.update(1)

pbar.close()
cap.release()
out.release()

# Convert to MP4
print("\nConverting to MP4...")
os.system(f'ffmpeg -y -i "{output_avi}" -vcodec libx264 -crf 23 -pix_fmt yuv420p "{output_mp4}" -loglevel error')

if os.path.exists(output_mp4) and os.path.getsize(output_mp4) > 1000:
    print(f"Video saved to: {output_mp4}")
    os.remove(output_avi)
else:
    print(f"Saved as AVI: {output_avi}")

# Print summary
print(f"\n")
print("           YOLO-SwinV2 DETECTION SUMMARY (WITH TRACKING)")
print(f"\n")
print(f"Total frames processed: {frame_index}")
print(f"Total detections: {sum(metrics['detections_per_frame'])}")
print(f"Avg detections/frame: {np.mean(metrics['detections_per_frame']):.2f}")
if metrics["confidence_scores"]:
    print(f"Mean confidence: {np.mean(metrics['confidence_scores']):.4f}")
print(f"Total cars: {sum(metrics['class_counts']['car'])}")
print(f"Total trucks: {sum(metrics['class_counts']['truck'])}")
print(f"Total buses: {sum(metrics['class_counts']['bus'])}")
print(f"Unique tracks created: {next_track_id}")
print(f"\n")


In [None]:
# Display the video with tracking
print("YOLO-SwinV2 Vehicle Detection with Tracking:\n")

if os.path.exists(output_mp4):
    file_size_mb = os.path.getsize(output_mp4) / 1024 / 1024
    print(f"Video: {output_mp4} ({file_size_mb:.1f} MB)")
    
    # Compress for display if large
    if file_size_mb > 20:
        compressed = os.path.join(OUTPUT_DIR, 'tracking_compressed.mp4')
        os.system(f'ffmpeg -y -i "{output_mp4}" -vcodec libx264 -crf 28 -preset fast -vf scale=640:-2 "{compressed}" -loglevel error')
        if os.path.exists(compressed):
            video_to_show = compressed
            print(f"Compressed for display: {os.path.getsize(compressed)/1024/1024:.1f} MB")
        else:
            video_to_show = output_mp4
    else:
        video_to_show = output_mp4
    
    video_data = open(video_to_show, 'rb').read()
    data_url = "data:video/mp4;base64," + b64encode(video_data).decode()
    display(HTML(f'''
        <video width="800" controls autoplay muted>
            <source src="{data_url}" type="video/mp4">
        </video>
    '''))
else:
    print(f"Video not found: {output_mp4}")


## Metrics Visualization and Comparison with YOLO-V5m

Visualize inference metrics and compare with the pre-trained YOLO-V5m baseline.


In [None]:
# METRICS VISUALIZATION AND COMPARISON

viz_dir = os.path.join(OUTPUT_DIR, 'visualizations')
os.makedirs(viz_dir, exist_ok=True)

# Load YOLO-V5m metrics for comparison
yolov5m_metrics_path = PROJECT_ROOT / 'outputs' / 'YOLO_V5m' / 'YOLO_V5m_metrics.json'
yolov5m_metrics = None

if os.path.exists(yolov5m_metrics_path):
    with open(yolov5m_metrics_path) as f:
        yolov5m_metrics = json.load(f)
    print(f"Loaded YOLO-V5m metrics from: {yolov5m_metrics_path}")
else:
    print(f"YOLO-V5m metrics not found at: {yolov5m_metrics_path}")
    print("   Run YOLO_V5m.ipynb first for comparison.")

# Individual Metric Plots

# Detections per Frame
fig1, ax1 = plt.subplots(figsize=(12, 5))
ax1.plot(metrics["frame_indices"], metrics["detections_per_frame"], 
         color='#E63946', linewidth=1.5, label='YOLO-SwinV2')
ax1.fill_between(metrics["frame_indices"], metrics["detections_per_frame"], 
                 alpha=0.3, color='#E63946')
if yolov5m_metrics:
    ax1.plot(yolov5m_metrics['per_frame_data']['frame_indices'], 
             yolov5m_metrics['per_frame_data']['detections_per_frame'],
             color='#2E86AB', linewidth=1.5, alpha=0.7, label='YOLO-V5m')
ax1.set_xlabel('Frame Index')
ax1.set_ylabel('Number of Detections')
ax1.set_title('Detections per Frame Comparison')
ax1.legend()
ax1.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(viz_dir, 'detections_per_frame_comparison.png'), dpi=150)
plt.show()

# Confidence Score Distribution
fig2, ax2 = plt.subplots(figsize=(10, 6))
if metrics["confidence_scores"]:
    ax2.hist(metrics["confidence_scores"], bins=30, color='#E63946', 
             edgecolor='white', alpha=0.7, label='YOLO-SwinV2')
    ax2.axvline(np.mean(metrics["confidence_scores"]), color='#E63946', 
                linestyle='--', linewidth=2, 
                label=f'SwinV2 Mean: {np.mean(metrics["confidence_scores"]):.3f}')

if yolov5m_metrics and yolov5m_metrics['summary']['mean_confidence'] > 0:
    ax2.axvline(yolov5m_metrics['summary']['mean_confidence'], color='#2E86AB', 
                linestyle='--', linewidth=2,
                label=f'V5m Mean: {yolov5m_metrics["summary"]["mean_confidence"]:.3f}')
ax2.set_xlabel('Confidence Score')
ax2.set_ylabel('Frequency')
ax2.set_title('Confidence Score Distribution')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(viz_dir, 'confidence_distribution.png'), dpi=150)
plt.show()

# Class Distribution Over Time
fig3, ax3 = plt.subplots(figsize=(12, 5))
frames = metrics["frame_indices"]
ax3.stackplot(frames, 
              metrics["class_counts"]["car"],
              metrics["class_counts"]["truck"],
              metrics["class_counts"]["bus"],
              labels=['Car', 'Truck', 'Bus'],
              colors=['#2E86AB', '#A23B72', '#F18F01'], alpha=0.8)
ax3.legend(loc='upper right')
ax3.set_xlabel('Frame Index')
ax3.set_ylabel('Count')
ax3.set_title('YOLO-SwinV2: Detection Count by Class')
ax3.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig(os.path.join(viz_dir, 'class_distribution_over_time.png'), dpi=150)
plt.show()

# Model Comparison Summary
if yolov5m_metrics:
    fig4, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    # Total Detections
    models = ['YOLO-V5m', 'YOLO-SwinV2']
    total_dets = [yolov5m_metrics['summary']['total_detections'],
                  sum(metrics['detections_per_frame'])]
    colors = ['#2E86AB', '#E63946']
    axes[0].bar(models, total_dets, color=colors)
    axes[0].set_ylabel('Total Detections')
    axes[0].set_title('Total Detections Comparison')
    for i, v in enumerate(total_dets):
        axes[0].text(i, v + 50, str(v), ha='center', fontweight='bold')
    
    # Mean Confidence
    mean_confs = [yolov5m_metrics['summary']['mean_confidence'],
                  np.mean(metrics['confidence_scores']) if metrics['confidence_scores'] else 0]
    axes[1].bar(models, mean_confs, color=colors)
    axes[1].set_ylabel('Mean Confidence')
    axes[1].set_title('Mean Confidence Comparison')
    axes[1].set_ylim(0, 1)
    for i, v in enumerate(mean_confs):
        axes[1].text(i, v + 0.02, f'{v:.3f}', ha='center', fontweight='bold')
    
    # Class Distribution
    x = np.arange(3)
    width = 0.35
    v5m_classes = [yolov5m_metrics['summary']['total_cars'],
                   yolov5m_metrics['summary']['total_trucks'],
                   yolov5m_metrics['summary']['total_buses']]
    swin_classes = [sum(metrics['class_counts']['car']),
                    sum(metrics['class_counts']['truck']),
                    sum(metrics['class_counts']['bus'])]
    axes[2].bar(x - width/2, v5m_classes, width, label='YOLO-V5m', color='#2E86AB')
    axes[2].bar(x + width/2, swin_classes, width, label='YOLO-SwinV2', color='#E63946')
    axes[2].set_xticks(x)
    axes[2].set_xticklabels(['Car', 'Truck', 'Bus'])
    axes[2].set_ylabel('Count')
    axes[2].set_title('Detection Count by Class')
    axes[2].legend()
    
    plt.tight_layout()
    plt.savefig(os.path.join(viz_dir, 'model_comparison.png'), dpi=150)
    plt.show()

# Print Comparison Summary

print(f"\n")
print("                    MODEL COMPARISON SUMMARY")
print(f"\n")
print(f"{'Metric':<30} {'YOLO-V5m':>15} {'YOLO-SwinV2':>15}")
print(f"\n")

if yolov5m_metrics:
    print(f"{'Total Frames':<30} {yolov5m_metrics['summary']['total_frames']:>15} {len(metrics['frame_indices']):>15}")
    print(f"{'Total Detections':<30} {yolov5m_metrics['summary']['total_detections']:>15} {sum(metrics['detections_per_frame']):>15}")
    print(f"{'Avg Detections/Frame':<30} {yolov5m_metrics['summary']['avg_detections_per_frame']:>15.2f} {np.mean(metrics['detections_per_frame']):>15.2f}")
    print(f"{'Mean Confidence':<30} {yolov5m_metrics['summary']['mean_confidence']:>15.4f} {np.mean(metrics['confidence_scores']) if metrics['confidence_scores'] else 0:>15.4f}")
    print(f"{'Total Cars':<30} {yolov5m_metrics['summary']['total_cars']:>15} {sum(metrics['class_counts']['car']):>15}")
    print(f"{'Total Trucks':<30} {yolov5m_metrics['summary']['total_trucks']:>15} {sum(metrics['class_counts']['truck']):>15}")
    print(f"{'Total Buses':<30} {yolov5m_metrics['summary']['total_buses']:>15} {sum(metrics['class_counts']['bus']):>15}")
else:
    print(f"{'Total Frames':<30} {'N/A':>15} {len(metrics['frame_indices']):>15}")
    print(f"{'Total Detections':<30} {'N/A':>15} {sum(metrics['detections_per_frame']):>15}")
    print(f"{'Avg Detections/Frame':<30} {'N/A':>15} {np.mean(metrics['detections_per_frame']):>15.2f}")

print(f"\n")
print(f"Visualizations saved to: {viz_dir}")


## Export YOLO-SwinV2 Metrics

Save the inference metrics to JSON for future reference and comparison.


In [None]:
# Export YOLO-SwinV2 metrics to JSON
metrics_export = {
    "model_name": "YOLO-SwinV2 (Trained on AAU RainSnow)",
    "config": {
        "conf_threshold": CONF_THRESH,
        "iou_match_threshold": IOU_MATCH_THRESH,
        "vehicle_classes": list(VEHICLE_CLASSES.values()),
        "weights_path": WEIGHTS_PATH,
    },
    "summary": {
        "total_frames": len(metrics["frame_indices"]),
        "total_detections": sum(metrics["detections_per_frame"]),
        "avg_detections_per_frame": float(np.mean(metrics["detections_per_frame"])),
        "mean_confidence": float(np.mean(metrics["confidence_scores"])) if metrics["confidence_scores"] else 0,
        "std_confidence": float(np.std(metrics["confidence_scores"])) if metrics["confidence_scores"] else 0,
        "min_confidence": float(min(metrics["confidence_scores"])) if metrics["confidence_scores"] else 0,
        "max_confidence": float(max(metrics["confidence_scores"])) if metrics["confidence_scores"] else 0,
        "total_cars": sum(metrics["class_counts"]["car"]),
        "total_trucks": sum(metrics["class_counts"]["truck"]),
        "total_buses": sum(metrics["class_counts"]["bus"]),
        "unique_tracks": next_track_id,
    },
    "per_frame_data": {
        "frame_indices": metrics["frame_indices"],
        "detections_per_frame": metrics["detections_per_frame"],
        "class_counts": metrics["class_counts"]
    }
}

metrics_path = os.path.join(OUTPUT_DIR, "YOLO_SwinV2_metrics.json")
with open(metrics_path, "w") as f:
    json.dump(metrics_export, f, indent=2)

print(f"Metrics exported to: {metrics_path}")
print(f"\nYou can compare these metrics with YOLO-V5m results.")
print(f"\nFiles saved:")
print(f"  - Video with tracking: {output_mp4}")
print(f"  - Metrics JSON: {metrics_path}")
print(f"  - Visualizations: {viz_dir}/")
