# Drone detection — moving-camera preprocessing + video benchmark (YOLO11)

This notebook continues your existing project state and focuses on **preprocessing experiments** that remain valid under:

- moving cameras (no static background assumptions)
- no hard-coded ROIs
- realistic CCTV-like clutter

You asked for:

- **imgsz kept at 640** (trained at 640)
- **video input inference** (not folder-based images)
- ability to run **each method individually** and visually inspect detections in (near) real time
- **logging** per method to compare recall/precision (when annotations exist) and FPS
- optional **ROI-zoom** preprocessing to help tiny drones without retraining

Notebook structure is strictly:

1) Pre-processing
2) YOLO11 inference
3) Post-processing + evaluation

---

## Why earlier you saw 384
Your footage often looks like wide frames (for example, 16:9). A common fast trick is to run at 640×384 to match aspect ratio and reduce padding.

But **you trained at imgsz=640**, so here we keep **imgsz=640** and let Ultralytics letterbox as needed.

## Why not only `yolo` CLI commands
The `yolo detect predict` CLI is great for baseline runs, but it does **not** easily let you:

- inject custom preprocessing per frame
- do ROI-zoom (crop → re-infer → map boxes back)
- collect per-frame logs and optional GT evaluation

So this notebook uses the Ultralytics **Python API** for flexibility, and also includes CLI cells for reference.


In [20]:
# Cell 1 — Imports
import os
import time
from pathlib import Path

import cv2
import numpy as np
import pandas as pd

from ultralytics import YOLO

import torch
import ultralytics
print(f"PyTorch: {torch.__version__}")
print(f"Ultralytics: {ultralytics.__version__}")

## Configuration

Set your **model weights** and **video path** here.

Important: paths are not guessed. If a path is empty or invalid, the notebook will stop with a clear error.


In [21]:
# Cell 2 — Config (edit these)

# --- REQUIRED ---
MODEL_WEIGHTS = r"S:\IntelliJ\Projects\ES_Drone_Detection\runs\detect\yolo11\drone_finetune_full_mixed4\weights\best.pt"
VIDEO_PATH    = r"S:\IntelliJ\Projects\ES_Drone_Detection\video_test\GOPR5844_002.mp4"

# If you place a text annotation next to the video with the same stem (e.g., swarm_dji_phantom.txt),
# the notebook will auto-load it for GT evaluation.
# You can also set this explicitly.
VIDEO_ANN_PATH = r""  # optional; leave empty to auto-detect VIDEO_PATH with .txt

# --- Output ---
OUT_DIR = Path("video_benchmark_outputs")
OUT_DIR.mkdir(parents=True, exist_ok=True)

# --- YOLO inference params ---
IMGSZ = 640           # keep 640 (trained at 640)
CONF = 0.25
IOU_NMS = 0.7
DEVICE = 0            # 0 for GPU, 'cpu' for CPU
HALF = False           # FP16 on GPU => faster
MAX_DET = 30         # cap detections per frame (speed + stability)

# --- Display ---
# 'window' => cv2.imshow (best for local runs)
# 'inline' => renders frames inline (Colab-friendly, slower)
DISPLAY_MODE = 'window'
DISPLAY_MAX_FPS = 60  # throttles UI refresh only (not inference)

# --- Logging ---
LOG_EVERY_N_FRAMES = 5

# --- Ground-truth evaluation ---
# Supports either:
#   A) VIDEO_ANN_PATH in WOSDETC-like format: frame_id num_objs (x y w h label)+
#   B) Standard YOLO frame labels via extracted frames (optional; not required)
IOU_MATCH = 0.5

GT_FRAMES_DIR = r""  # optional (standard YOLO frames)
GT_LABELS_DIR = r""  # optional (standard YOLO labels)

# --- Optional: ROI-zoom (crop-based) ---
# WARNING: any crop-based zoom that uses YOLO to propose ROIs adds extra inference.
# To reduce the speed hit, you can make it CONDITIONAL.
ROI_MODE = "off"        # 'off' | 'conditional' | 'always'
ROI_COARSE_CONF = 0.08   # lower conf to get candidates for ROI proposal
ROI_PAD_FRAC = 0.40      # expand crop by this fraction around each box
ROI_MAX_CROPS = 4        # cap crops per frame
ROI_MIN_BOX_AREA_FRAC = 0.00002  # ignore ultra-tiny coarse boxes (often noise)
ROI_MERGE_IOU = 0.55     # NMS when merging full-frame + crop detections

# Conditional ROI triggers (only used when ROI_MODE='conditional')
ROI_TRIGGER_IF_NO_DETS = True
ROI_TRIGGER_MAX_FULLFRAME_DETS = 0  # if >0, trigger ROI when full-frame det count <= this
ROI_TRIGGER_IF_SMALL = True
ROI_TRIGGER_SMALL_AREA_FRAC = 0.00008  # trigger ROI if smallest det area <= this


def _require_path(p, name):
    if not p or str(p).strip() == "":
        raise ValueError(f"{name} is empty. Set it to a valid path.")
    if not os.path.exists(p):
        raise FileNotFoundError(f"{name} does not exist: {p}")

_require_path(MODEL_WEIGHTS, "MODEL_WEIGHTS")
_require_path(VIDEO_PATH, "VIDEO_PATH")

# Auto-detect sidecar annotation if not explicitly set
if not VIDEO_ANN_PATH:
    sidecar = str(Path(VIDEO_PATH).with_suffix('.txt'))
    if os.path.exists(sidecar):
        VIDEO_ANN_PATH = sidecar

print("OK: config loaded")
print("MODEL_WEIGHTS:", MODEL_WEIGHTS)
print("VIDEO_PATH:", VIDEO_PATH)
print("VIDEO_ANN_PATH:", VIDEO_ANN_PATH if VIDEO_ANN_PATH else "(none)")


OK: config loaded
MODEL_WEIGHTS: S:\IntelliJ\Projects\ES_Drone_Detection\runs\detect\yolo11\drone_finetune_full_mixed4\weights\best.pt
VIDEO_PATH: S:\IntelliJ\Projects\ES_Drone_Detection\video_test\GOPR5844_002.mp4
VIDEO_ANN_PATH: S:\IntelliJ\Projects\ES_Drone_Detection\video_test\GOPR5844_002.txt


## Pre-processing methods (moving-camera safe)

All methods below are **per-frame** (no background model, no ROI assumptions), so they stay valid with strong camera motion.

You can run any method alone, or combine it with ROI-zoom.


In [22]:
# Cell 3 — Pre-processing functions

def to_ycrcb_clahe(bgr, clip_limit=2.0, tile_grid=(8, 8)):
    """CLAHE on luminance (Y) only: boosts contrast without color artifacts."""
    ycrcb = cv2.cvtColor(bgr, cv2.COLOR_BGR2YCrCb)
    y, cr, cb = cv2.split(ycrcb)
    clahe = cv2.createCLAHE(clipLimit=float(clip_limit), tileGridSize=tuple(tile_grid))
    y2 = clahe.apply(y)
    out = cv2.merge([y2, cr, cb])
    return cv2.cvtColor(out, cv2.COLOR_YCrCb2BGR)


def gray_world_wb(bgr):
    """Simple gray-world white balance: reduces camera color cast."""
    b, g, r = cv2.split(bgr.astype(np.float32))
    mb, mg, mr = b.mean(), g.mean(), r.mean()
    m = (mb + mg + mr) / 3.0
    b *= (m / (mb + 1e-6))
    g *= (m / (mg + 1e-6))
    r *= (m / (mr + 1e-6))
    out = cv2.merge([b, g, r])
    return np.clip(out, 0, 255).astype(np.uint8)


def unsharp_mask(bgr, amount=0.5, blur_ksize=3):
    """Mild edge boost. Keep amount low to avoid false positives."""
    k = int(blur_ksize)
    if k % 2 == 0:
        k += 1
    blur = cv2.GaussianBlur(bgr, (k, k), 0)
    sharp = cv2.addWeighted(bgr, 1.0 + float(amount), blur, -float(amount), 0)
    return sharp


def light_bilateral(bgr, d=5, sigma_color=35, sigma_space=35):
    """Edge-preserving denoise. Use small parameters for speed."""
    return cv2.bilateralFilter(bgr, int(d), float(sigma_color), float(sigma_space))


def preprocess_pipeline(name, bgr):
    if name == "baseline":
        return bgr

    if name == "clahe_y":
        return to_ycrcb_clahe(bgr)

    if name == "clahe_y + unsharp":
        x = to_ycrcb_clahe(bgr)
        return unsharp_mask(x, amount=0.5, blur_ksize=3)

    if name == "wb + clahe_y":
        x = gray_world_wb(bgr)
        return to_ycrcb_clahe(x)

    if name == "clahe_y + bilateral + unsharp":
        x = to_ycrcb_clahe(bgr)
        x = light_bilateral(x)
        return unsharp_mask(x, amount=0.4, blur_ksize=3)

    raise ValueError(f"Unknown pipeline: {name} Valid: baseline, clahe_y, clahe_y + unsharp, wb + clahe_y, clahe_y + bilateral + unsharp")

PIPELINES = [
    "baseline",
    "clahe_y",
    "clahe_y + unsharp",
    "wb + clahe_y",
    "clahe_y + bilateral + unsharp",
]

print("Available pipelines:", PIPELINES)


Available pipelines: ['baseline', 'clahe_y', 'clahe_y + unsharp', 'wb + clahe_y', 'clahe_y + bilateral + unsharp']


## YOLO11 model load

Loads your trained YOLO11n weights. We keep **imgsz=640** for predict.


In [23]:
# Cell 4 — Load model
model = YOLO(MODEL_WEIGHTS)
print("Model loaded:", MODEL_WEIGHTS)


Model loaded: S:\IntelliJ\Projects\ES_Drone_Detection\runs\detect\yolo11\drone_finetune_full_mixed4\weights\best.pt


## Baseline CLI note

Ultralytics CLI is useful for **baseline** runs, but it needs real model and video paths.

This notebook uses the **Python API** because it gives per-frame hooks for preprocessing, optional ROI-zoom, visualization, and logging.


## Post-processing and optional ROI-zoom

### ROI-zoom (crop-based) idea

When drones are tiny, letterboxing into 640 can shrink them further. ROI-zoom tries to help without retraining:

1) **Coarse pass** on full frame at low conf to propose candidate regions.
2) Crop around candidates (with padding), resize each crop to 640, run YOLO again.
3) Map crop detections back to original frame and merge with NMS.

This does **not** assume static background or fixed ROIs. It does cost more compute, so you benchmark FPS.


In [24]:
# Cell 6 — Helpers: IoU + NMS + ROI crops


def box_iou_xyxy(a, b):
    ax1, ay1, ax2, ay2 = a
    bx1, by1, bx2, by2 = b
    ix1 = max(ax1, bx1)
    iy1 = max(ay1, by1)
    ix2 = min(ax2, bx2)
    iy2 = min(ay2, by2)
    iw = max(0.0, ix2 - ix1)
    ih = max(0.0, iy2 - iy1)
    inter = iw * ih
    area_a = max(0.0, ax2-ax1) * max(0.0, ay2-ay1)
    area_b = max(0.0, bx2-bx1) * max(0.0, by2-by1)
    union = area_a + area_b - inter + 1e-9
    return inter / union


def nms_xyxy_merge(dets, iou_thr=0.55):
    """dets: list of (x1,y1,x2,y2,conf,cls). returns filtered list."""
    if not dets:
        return []
    dets = sorted(dets, key=lambda d: d[4], reverse=True)
    keep = []
    used = [False]*len(dets)
    for i, di in enumerate(dets):
        if used[i]:
            continue
        keep.append(di)
        xi1, yi1, xi2, yi2, ci, cls_i = di
        for j in range(i+1, len(dets)):
            if used[j]:
                continue
            dj = dets[j]
            if int(dj[5]) != int(cls_i):
                continue
            iou = box_iou_xyxy((xi1,yi1,xi2,yi2), (dj[0],dj[1],dj[2],dj[3]))
            if iou >= float(iou_thr):
                used[j] = True
    return keep


def build_crops_from_dets(dets, img_w, img_h, pad_frac=0.4, max_crops=6, min_area_frac=0.00002):
    """Create crop rectangles (x1,y1,x2,y2) from detections.
    dets: (x1,y1,x2,y2,conf,cls)
    """
    if not dets:
        return []
    denom = float(img_w * img_h) + 1e-9
    # sort by confidence and keep top candidates
    dets = sorted(dets, key=lambda d: d[4], reverse=True)

    crops = []
    for d in dets:
        x1,y1,x2,y2,conf,cls = d
        area_frac = ((x2-x1)*(y2-y1))/denom
        if area_frac < float(min_area_frac):
            continue
        bw = (x2-x1)
        bh = (y2-y1)
        pad_x = bw * float(pad_frac)
        pad_y = bh * float(pad_frac)
        cx1 = max(0.0, x1 - pad_x)
        cy1 = max(0.0, y1 - pad_y)
        cx2 = min(float(img_w), x2 + pad_x)
        cy2 = min(float(img_h), y2 + pad_y)
        # Avoid degenerate crops
        if (cx2 - cx1) < 10 or (cy2 - cy1) < 10:
            continue
        crops.append((cx1, cy1, cx2, cy2))
        if len(crops) >= int(max_crops):
            break

    # Merge overlapping crops (simple)
    merged = []
    for c in crops:
        added = False
        for k in range(len(merged)):
            m = merged[k]
            iou = box_iou_xyxy(m, c)
            if iou >= 0.2:
                merged[k] = (
                    min(m[0], c[0]),
                    min(m[1], c[1]),
                    max(m[2], c[2]),
                    max(m[3], c[3]),
                )
                added = True
                break
        if not added:
            merged.append(c)

    return merged


def map_crop_dets_to_full(dets_crop, crop_x1, crop_y1):
    """Map detections from crop coords back to full-frame coords."""
    out = []
    for x1,y1,x2,y2,conf,cls in dets_crop:
        out.append((x1+crop_x1, y1+crop_y1, x2+crop_x1, y2+crop_y1, conf, cls))
    return out


def match_detections_to_gt(pred_boxes, gt_boxes, iou_thr=0.5):
    """pred_boxes: list (cls, conf, x1,y1,x2,y2)
       gt_boxes: list (cls, x1,y1,x2,y2)
    """
    gt_used = [False]*len(gt_boxes)
    tp = 0
    fp = 0
    ious = []

    pred_boxes = sorted(pred_boxes, key=lambda x: x[1], reverse=True)

    for p in pred_boxes:
        pcls, conf, px1, py1, px2, py2 = p
        best_iou = 0.0
        best_j = -1
        for j, g in enumerate(gt_boxes):
            if gt_used[j]:
                continue
            gcls, gx1, gy1, gx2, gy2 = g
            if int(pcls) != int(gcls):
                continue
            iou = box_iou_xyxy((px1,py1,px2,py2), (gx1,gy1,gx2,gy2))
            if iou > best_iou:
                best_iou = iou
                best_j = j
        if best_iou >= float(iou_thr) and best_j >= 0:
            gt_used[best_j] = True
            tp += 1
            ious.append(best_iou)
        else:
            fp += 1

    fn = sum(1 for u in gt_used if not u)
    return tp, fp, fn, ious


## Optional evaluation (if you prepared YOLO labels per extracted frame)

If `GT_FRAMES_DIR` and `GT_LABELS_DIR` are set, the notebook will score:

- TP / FP / FN per frame at IoU = `IOU_MATCH`
- precision / recall / F1 over the video segment

If not set, it will still log detection stats + FPS.


In [25]:
# Cell 7 — Ground truth loaders (optional)

# 1) Sidecar video annotations (recommended for your current benchmark)
# Format (WOSDETC-like):
#   frame_id num_objs  x y w h label  x y w h label ...
# Example:
#   0 2 233 137 16 14 drone 450 248 11 12 drone


def load_sidecar_annotations(txt_path, class_names=None):
    """Returns dict[int frame_idx] -> list[(cls, x1,y1,x2,y2)] in PIXELS."""
    if not txt_path or not os.path.exists(txt_path):
        return {}

    # Map string labels to class indices.
    # If model is single-class, everything becomes 0.
    name_to_idx = None
    if class_names and isinstance(class_names, dict):
        # ultralytics may store dict idx->name
        name_to_idx = {v: int(k) for k, v in class_names.items()}
    elif class_names and isinstance(class_names, (list, tuple)):
        name_to_idx = {str(v): i for i, v in enumerate(class_names)}

    out = {}
    with open(txt_path, 'r', encoding='utf-8') as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            parts = line.split()
            # frame_id num_objs then 5*num_objs tokens
            frame_id = int(parts[0])
            n = int(parts[1])
            expect = 2 + 5*n
            if len(parts) < expect:
                # skip malformed lines
                continue
            boxes = []
            j = 2
            for _ in range(n):
                x = float(parts[j]); y = float(parts[j+1]); w = float(parts[j+2]); h = float(parts[j+3]); lab = parts[j+4]
                j += 5
                # label -> cls
                if name_to_idx is None:
                    cls = 0
                else:
                    cls = name_to_idx.get(lab, 0)
                x1 = x
                y1 = y
                x2 = x + w
                y2 = y + h
                boxes.append((cls, x1, y1, x2, y2))
            out[frame_id] = boxes
    return out


# 2) Standard YOLO per-frame labels (optional)

def _gt_yolo_enabled():
    return bool(GT_FRAMES_DIR and GT_LABELS_DIR and os.path.exists(GT_LABELS_DIR))


def yolo_txt_to_boxes(label_txt_path, img_w, img_h):
    # returns list of (cls, x1,y1,x2,y2)
    if not os.path.exists(label_txt_path):
        return []
    out = []
    with open(label_txt_path, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            cls, xc, yc, w, h = line.split()
            cls = int(float(cls))
            xc, yc, w, h = map(float, (xc, yc, w, h))
            x1 = (xc - w/2) * img_w
            y1 = (yc - h/2) * img_h
            x2 = (xc + w/2) * img_w
            y2 = (yc + h/2) * img_h
            out.append((cls, x1, y1, x2, y2))
    return out


def frame_idx_to_yolo_label_path(frame_idx):
    # expects frame_000001.jpg -> frame_000001.txt
    stem = f"frame_{frame_idx:06d}.txt"
    return str(Path(GT_LABELS_DIR) / stem)


## Video inference runner (one method at a time)

This is the main loop:

- reads video frames
- applies the selected preprocessing
- runs YOLO inference
- optional ROI-zoom
- draws boxes + FPS overlay
- logs per-frame stats and optional GT scores

Outputs:

- annotated MP4
- CSV log per run
- one-line summary metrics printed at the end


In [26]:
# Cell 8 — Video benchmark runner

from collections import deque


def draw_dets(frame_bgr, dets, class_names=None):
    out = frame_bgr.copy()
    for x1, y1, x2, y2, conf, cls in dets:
        x1, y1, x2, y2 = map(int, [x1, y1, x2, y2])
        cls_i = int(cls)
        label = f"{cls_i}:{conf:.2f}" if not class_names else f"{class_names.get(cls_i, cls_i)}:{conf:.2f}"
        cv2.rectangle(out, (x1, y1), (x2, y2), (0, 255, 0), 2)
        cv2.putText(out, label, (x1, max(0, y1-6)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 1, cv2.LINE_AA)
    return out


def _inline_show(bgr):
    # Inline display for Colab/Jupyter. Slower.
    from IPython.display import display, clear_output
    import matplotlib.pyplot as plt
    rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
    plt.figure(figsize=(12, 7))
    plt.imshow(rgb)
    plt.axis('off')
    clear_output(wait=True)
    display(plt.gcf())
    plt.close()


def _predict_on_bgr(bgr, conf, iou):
    # Ultralytics accepts numpy arrays. Keep calls centralized.
    res = model.predict(
        source=bgr,
        imgsz=IMGSZ,
        conf=conf,
        iou=iou,
        device=DEVICE,
        half=HALF,
        max_det=MAX_DET,
        verbose=False,
    )[0]

    dets = []
    if res.boxes is None or len(res.boxes) == 0:
        return dets

    xyxy = res.boxes.xyxy.cpu().numpy()
    confs = res.boxes.conf.cpu().numpy()
    clss = res.boxes.cls.cpu().numpy()
    for (x1, y1, x2, y2), c, k in zip(xyxy, confs, clss):
        dets.append((float(x1), float(y1), float(x2), float(y2), float(c), float(k)))
    return dets


def _dets_area_frac(dets, img_w, img_h):
    if not dets:
        return []
    a = []
    denom = float(img_w * img_h) + 1e-9
    for x1,y1,x2,y2,conf,cls in dets:
        a.append(((x2-x1)*(y2-y1))/denom)
    return a


def run_video(method_name, max_frames=None, start_frame=0, save_video=True):
    if method_name not in PIPELINES:
        raise ValueError(f"Unknown method: {method_name}. Valid: {PIPELINES}")

    cap = cv2.VideoCapture(VIDEO_PATH)
    if not cap.isOpened():
        raise RuntimeError(f"Failed to open video: {VIDEO_PATH}")

    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT) or 0)
    fps_src = float(cap.get(cv2.CAP_PROP_FPS) or 0.0)
    w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH) or 0)
    h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT) or 0)

    # Seek
    if start_frame > 0:
        cap.set(cv2.CAP_PROP_POS_FRAMES, int(start_frame))

    # Output paths
    stamp = time.strftime('%Y%m%d_%H%M%S')
    roi_tag = ROI_MODE
    tag = f"{Path(VIDEO_PATH).stem}__{method_name.replace(' ', '_').replace('+', 'plus')}__roi{roi_tag}__{stamp}"
    out_video_path = OUT_DIR / f"{tag}.mp4"
    out_frame_csv = OUT_DIR / f"{tag}__frame_log.csv"
    out_summary_csv = OUT_DIR / f"{tag}__summary.csv"

    writer = None

    # For smooth FPS reporting
    t_hist = deque(maxlen=30)
    ui_hist = deque(maxlen=10)

    # Class names
    try:
        class_names = model.names  # usually dict
    except Exception:
        class_names = None

    # Sidecar GT
    sidecar_gt = load_sidecar_annotations(VIDEO_ANN_PATH, class_names=class_names)

    # Metrics accumulators
    tot_tp = tot_fp = tot_fn = 0
    tot_frames_scored = 0

    frame_logs = []

    # Video writer
    if save_video:
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        writer = cv2.VideoWriter(str(out_video_path), fourcc, fps_src if fps_src > 0 else 30.0, (w, h))

    # UI pacing
    last_ui_t = 0.0

    frame_idx = int(cap.get(cv2.CAP_PROP_POS_FRAMES) or start_frame)
    processed = 0

    while True:
        ok, frame = cap.read()
        if not ok:
            break

        # frame_idx is current frame number *before* incrementing
        this_frame_idx = frame_idx
        frame_idx += 1

        if max_frames is not None and processed >= int(max_frames):
            break

        t0 = time.perf_counter()

        # 1) preprocess per-frame (moving-camera-safe)
        pre = preprocess_pipeline(method_name, frame)

        # 2) full-frame inference
        dets_full = _predict_on_bgr(pre, conf=CONF, iou=IOU_NMS)

        # Decide whether to trigger ROI zoom
        do_roi = False
        if ROI_MODE == 'always':
            do_roi = True
        elif ROI_MODE == 'conditional':
            if ROI_TRIGGER_IF_NO_DETS and len(dets_full) == 0:
                do_roi = True
            if ROI_TRIGGER_MAX_FULLFRAME_DETS > 0 and len(dets_full) <= ROI_TRIGGER_MAX_FULLFRAME_DETS:
                do_roi = True
            if ROI_TRIGGER_IF_SMALL and len(dets_full) > 0:
                areas = _dets_area_frac(dets_full, w, h)
                if areas and min(areas) <= ROI_TRIGGER_SMALL_AREA_FRAC:
                    do_roi = True

        dets = dets_full

        # 3) optional ROI zoom
        if do_roi and ROI_MODE != 'off':
            # coarse pass uses SAME model but lower conf to propose ROIs
            dets_coarse = _predict_on_bgr(pre, conf=ROI_COARSE_CONF, iou=IOU_NMS)

            # Build crops from coarse boxes
            crops = build_crops_from_dets(
                dets_coarse,
                img_w=w,
                img_h=h,
                pad_frac=ROI_PAD_FRAC,
                max_crops=ROI_MAX_CROPS,
                min_area_frac=ROI_MIN_BOX_AREA_FRAC,
            )

            dets_zoom_all = []
            for (cx1, cy1, cx2, cy2) in crops:
                crop = pre[int(cy1):int(cy2), int(cx1):int(cx2)]
                if crop.size == 0:
                    continue
                dets_crop = _predict_on_bgr(crop, conf=CONF, iou=IOU_NMS)
                # Map crop dets back to full-frame coords
                dets_mapped = map_crop_dets_to_full(dets_crop, cx1, cy1)
                dets_zoom_all.extend(dets_mapped)

            # Merge full-frame + zoom detections with NMS
            dets = nms_xyxy_merge(dets_full + dets_zoom_all, iou_thr=ROI_MERGE_IOU)

        t1 = time.perf_counter()
        infer_ms = (t1 - t0) * 1000.0

        # 4) optional GT scoring from sidecar annotations
        tp = fp = fn = None
        if sidecar_gt:
            gt = sidecar_gt.get(this_frame_idx, [])
            # Convert dets to evaluator format: (cls, conf, x1,y1,x2,y2)
            preds_eval = [(int(d[5]), float(d[4]), float(d[0]), float(d[1]), float(d[2]), float(d[3])) for d in dets]
            tp, fp, fn, _ = match_detections_to_gt(preds_eval, gt, iou_thr=IOU_MATCH)
            tot_tp += tp
            tot_fp += fp
            tot_fn += fn
            tot_frames_scored += 1

        # 5) overlay + display
        vis = draw_dets(frame, dets, class_names=class_names)

        # FPS
        t_hist.append(t1)
        fps_inst = 0.0
        if len(t_hist) >= 2:
            fps_inst = (len(t_hist) - 1) / max(1e-9, (t_hist[-1] - t_hist[0]))

        # UI throttle
        now = time.perf_counter()
        if DISPLAY_MODE == 'window':
            if DISPLAY_MAX_FPS <= 0:
                show = True
            else:
                show = (now - last_ui_t) >= (1.0 / float(DISPLAY_MAX_FPS))
            if show:
                last_ui_t = now
                cv2.imshow(f"{method_name} | ROI={ROI_MODE} | FPS~{fps_inst:.1f}", vis)
                # ESC to exit
                if cv2.waitKey(1) & 0xFF == 27:
                    break
        elif DISPLAY_MODE == 'inline':
            if DISPLAY_MAX_FPS <= 0:
                show = True
            else:
                show = (now - last_ui_t) >= (1.0 / float(DISPLAY_MAX_FPS))
            if show:
                last_ui_t = now
                _inline_show(vis)

        if writer is not None:
            writer.write(vis)

        if (processed % LOG_EVERY_N_FRAMES) == 0:
            frame_logs.append({
                'frame_idx': this_frame_idx,
                'method': method_name,
                'roi_mode': ROI_MODE,
                'roi_triggered': int(do_roi),
                'num_det': len(dets),
                'infer_ms': infer_ms,
                'fps_inst': fps_inst,
                'tp': tp,
                'fp': fp,
                'fn': fn,
            })

        processed += 1

    cap.release()
    if writer is not None:
        writer.release()
    if DISPLAY_MODE == 'window':
        cv2.destroyAllWindows()

    # Summary
    precision = recall = f1 = None
    if tot_frames_scored > 0:
        precision = tot_tp / (tot_tp + tot_fp + 1e-9)
        recall = tot_tp / (tot_tp + tot_fn + 1e-9)
        f1 = 2*precision*recall / (precision + recall + 1e-9)

    df_frames = pd.DataFrame(frame_logs)
    df_frames.to_csv(out_frame_csv, index=False)

    summary = {
        'video': str(VIDEO_PATH),
        'method': method_name,
        'roi_mode': ROI_MODE,
        'frames_processed': int(processed),
        'frames_scored': int(tot_frames_scored),
        'precision@0.5': precision,
        'recall@0.5': recall,
        'f1@0.5': f1,
        'out_video': str(out_video_path) if save_video else '',
        'frame_log_csv': str(out_frame_csv),
    }

    pd.DataFrame([summary]).to_csv(out_summary_csv, index=False)
    print("Saved:", out_frame_csv)
    print("Saved:", out_summary_csv)
    if save_video:
        print("Saved:", out_video_path)

    return summary


## Run one method (interactive benchmark)

Set `METHOD` and `USE_ROI_ZOOM` and run.

Tips:

- Start with `max_frames=600` for fast iteration.
- Use `start_frame` to jump into a busy segment.
- ROI-zoom is slower; benchmark it separately.


In [27]:
# Cell 9 — Run a single method

# Choose preprocessing method
METHOD = "baseline"  # choose from PIPELINES

# ROI_MODE controls crop-based zoom:
#   'off'         => no extra inference (fastest)
#   'conditional' => only triggers ROI when full-frame looks weak (recommended for speed)
#   'always'      => always does extra crop inference (slowest)
ROI_MODE = "off"  # edit here per run

summary = run_video(
    method_name=METHOD,
    max_frames=600,     # set None to run full video
    start_frame=0,
    save_video=True,
)

pd.DataFrame([summary])


Saved: video_benchmark_outputs\GOPR5844_002__baseline__roioff__20260116_140659__frame_log.csv
Saved: video_benchmark_outputs\GOPR5844_002__baseline__roioff__20260116_140659__summary.csv
Saved: video_benchmark_outputs\GOPR5844_002__baseline__roioff__20260116_140659.mp4


Unnamed: 0,video,method,roi_mode,frames_processed,frames_scored,precision@0.5,recall@0.5,f1@0.5,out_video,frame_log_csv
0,S:\IntelliJ\Projects\ES_Drone_Detection\video_...,baseline,off,497,497,0.940199,0.634529,0.757697,video_benchmark_outputs\GOPR5844_002__baseline...,video_benchmark_outputs\GOPR5844_002__baseline...


## Batch run (all methods)

Runs all pipelines (baseline + preprocess variants) for a short segment and produces a summary table.

Recommended: keep `max_frames` small (for example, 300–1000) while exploring.


In [None]:
# Cell 10 — Batch run all methods (short segment)

ROI_MODE = "off"  # keep fast for batch scans

summaries = []
for m in PIPELINES:
    print("Running:", m)
    s = run_video(method_name=m, max_frames=300, start_frame=0, save_video=False)
    summaries.append(s)

df = pd.DataFrame(summaries)
df.sort_values(by=['f1@0.5','recall@0.5','precision@0.5'], ascending=False)
