# Drone detection video benchmark (YOLO) — baseline vs adaptive ROI (v4)

This notebook benchmarks drone detection on a video while drawing overlays on **every video frame** and running YOLO at a configurable inference rate (e.g., **5 FPS**).

It is designed for your exact workflow:

- **Baseline** exists and is selected by:
  - `PREPROCESS_MODE = "None"` and `VERIFY_MODE = "None"`
- **Adaptive ROI** features are selectable and (mostly) stackable:
  - Guided ROI (crop around previous detected drone)
  - Motion-based ROIs (1–3 moving-object crops after a no-drone streak)
  - Verification passes using the **same YOLO weights**:
    - `"DoublePassHighConf"` / `"DoublePassHighRes"` / `"DoublePassTiled"`
  - Tiny-object rescue mode (tiled inference when the object is tiny)

Hard rules enforced:

- **4-out-of-5 confirmation** is always enabled (common to all modes).
- **Continuity gating** is selectable (3 methods) and used inside confirmation windows.
- **One inference step = one hit** no matter how many ROIs are inferred in that step.
- **mAP50** is computed on **inference frames only** (and label is always exactly `drone` in annotation).

Outputs are written to `video_benchmark_outputs/` in your project root.

Last generated: 2026-01-23 10:10:35


## What the main building blocks mean

### 1) Preprocessing (selects what image YOLO sees)
- **None**: YOLO runs on the full frame (baseline).
- **Guided ROI**: if YOLO found a drone previously, crop a 640×640 region centered on that detection and run YOLO on it.
- **Motion ROIs**: after a no-drone streak, find up to 3 moving regions and run YOLO on each crop (still counts as 1 hit for the step).

### 2) Verification (optional extra pass using the SAME YOLO weights)
This is **not** the 4/5 confirmation. This is a second YOLO pass used to suppress false positives.

`VERIFY_MODE` options:
- **None**: no extra pass (baseline when combined with PREPROCESS_MODE=None).
- **DoublePassHighConf**: re-run YOLO on a crop around the detected box using a higher confidence threshold.
- **DoublePassHighRes**: re-run YOLO on a crop around the detected box using a larger `imgsz` (more detail, slower).
- **DoublePassTiled**: re-run YOLO on tiles (slicing) to help tiny objects.

### 3) Confirmation (always-on, common to all)
A detection becomes a confirmed event if:
- it is a hit in **≥4 of the last 5 inference steps**, AND
- continuity gating passes (selectable method)

We also track a separate **warning window** that includes small detections (for later filtering).


## What each confidence threshold does (practical)

- `BASE_CONF_FULL`: minimum confidence for detections from full-frame YOLO and motion ROI YOLO.
  - Lower → higher recall, more false positives (birds).
- `GUIDED_ROI_CONF`: minimum confidence when using guided ROI (previous detection crop).
  - Often set higher because search area is smaller.
- `LOW_CONF_DETECTION`: if the best detection confidence is below this, the detection is considered **suspect**.
  - Suspect detections are forced into a verification pass on the **same inference step** (confirmation guided ROI).

Two important flags:
- `ENABLE_CONFIRMATION_GUIDED_ROI_ALL`:
  - If True, **every** detection must be verified by a secondary pass before it can be counted.
  - Default False.
- `FORCE_VERIFY_SMALL_OR_LOWCONF`:
  - Always True unless you are in baseline.
  - Forces verification when the detection is tiny or low confidence.


## How many frames does motion detection need to buffer?

If the camera is mostly static (your case), **3–5 inference frames** is typically enough.
- More frames (7–9) can reduce noise, but increases lag and can smear motion.
- The more important knob for leaves/trees is **temporal consistency**: we only keep motion pixels that appear in **multiple** of the buffer differences.

This notebook defaults to 5 buffered inference frames and requires motion to appear in at least 3 of the 4 differences (tunable).


In [None]:
# Cell 1 — Imports
import os, time, math, json
from dataclasses import dataclass
from typing import List, Tuple, Dict, Optional

import cv2
import numpy as np
import pandas as pd

from ultralytics import YOLO


In [None]:
# Cell 2 — Config (your real paths + benchmark options)

# --- Your real paths (keep these exactly as you use them) ---
MODEL_WEIGHTS = r"S:\IntelliJ\Projects\ES_Drone_Detection\runs\detect\yolo26\drone_finetune_full_mixed\weights\best.pt"
VIDEO_PATH    = r"S:\IntelliJ\Projects\ES_Drone_Detection\video_test\gopro_006.mp4"

# Annotation path: prefer your local file; else use the uploaded example (GOPR5844_002.txt).
ANNOTATION_PATH_PRIMARY  = r"S:\IntelliJ\Projects\ES_Drone_Detection\video_test\gopro_006.txt"
ANNOTATION_PATH_FALLBACK = r"/mnt/data/gopro_006.txt"
ANNOTATION_PATH = ANNOTATION_PATH_PRIMARY if os.path.exists(ANNOTATION_PATH_PRIMARY) else ANNOTATION_PATH_FALLBACK

# --- Output root folder (MUST be project-root/video_benchmark_outputs/) ---
OUTPUT_ROOT = "video_benchmark_outputs"
os.makedirs(OUTPUT_ROOT, exist_ok=True)

# --- Inference policy ---
INFER_FPS = 5          # inference steps per second (e.g., 5)
ROI_SIZE = 640         # crop size fed to YOLO
MAX_FULLFRAME_DETECTIONS = 3
MAX_ROI_DETECTIONS = 1

# --- Core mode selection ---
# Baseline = PREPROCESS_MODE="None" and VERIFY_MODE="None"
PREPROCESS_MODE = "Auto"  # "None" | "Auto"
USE_GUIDED_ROI = True     # stackable
USE_MOTION_ROIS = True    # stackable

# Motion ROI method (choose ONE — not stackable)
MOTION_METHOD = "ORB_Affine_StabilizedDiff"  # see Cell 5 for options

# Continuity gate (choose ONE)
CONTINUITY_GATE_MODE = "ExpandedIoU"  # "ExpandedIoU" | "CenterDistance" | "Kalman"

# Verification pass using SAME YOLO weights (choose ONE)
VERIFY_MODE = "None"  # "None" | "DoublePassHighConf" | "DoublePassHighRes" | "DoublePassTiled"

# --- Drawing / debug toggles ---
DRAW_GT_BOX = True
DRAW_CONTINUITY_DEBUG = True
DRAW_GUIDED_ROI_BOX = True
DRAW_MOTION_ROI_BOXES = True

EXPORT_MOTION_DEBUG_IMAGES = True
EXPORT_TILED_DEBUG_IMAGES = True

# --- Motion buffering & filtering ---
MOTION_BUFFER_INFER_FRAMES = 5      # buffered inference frames used for motion detection
NO_DRONE_STREAK_FOR_MOTION = 5      # trigger motion ROIs after this many inference misses
MIN_MOVER_W = 25
MIN_MOVER_H = 25
MAX_MOVER_ROIS = 3

# Suppress leaves/trees: require motion to persist across multiple diffs
MOTION_MIN_HITS = 3   # minimum number of diff-frames a pixel must be active in
MOTION_GLOBAL_COVERAGE_THRESH = 0.35  # if motion mask covers >35% of frame, treat as global motion and discard motion ROIs

# --- Confidence thresholds ---
BASE_CONF_FULL     = 0.25
GUIDED_ROI_CONF    = 0.40
LOW_CONF_DETECTION = 0.40

# Verification thresholds / sizes
VERIFY_HIGH_CONF = 0.60
VERIFY_HIGHRES_IMGSZ = 1280

# Tiny-object rescue (tiled inference)
ENABLE_TINY_RESCUE = True
TINY_RESCUE_MIN_WH = 25               # if detected box smaller than this, rescue can trigger
TINY_RESCUE_TILED_TILE = 640
TINY_RESCUE_TILED_OVERLAP = 0.20

# --- Confirmation windows ---
CONFIRM_WINDOW = 10
CONFIRM_REQUIRED_HITS = 9

# Big-confirm window: only detections >=25x25 on ORIGINAL FRAME, unless verified by confirmation guided ROI
CONFIRM_MIN_WH_BIG = 25

# Warning window: tracks detections >=15x15 (mix of small+big) for later filtering
WARNING_MIN_WH = 15

# Confirmation guided ROI (secondary inference on same step)
ENABLE_CONFIRMATION_GUIDED_ROI_ALL = False  # option 8 (default off)
FORCE_VERIFY_SMALL_OR_LOWCONF = True        # option 7 (forced unless baseline)

# --- Continuity parameters ---
# ExpandedIoU
EXPAND_IOU_K = 3.0          # expand previous box by this factor before IoU test
EXPAND_IOU_THRESH = 0.10    # IoU threshold after expansion

# CenterDistance
CENTERDIST_ALPHA = 2.5      # scales with sqrt(area_prev)
CENTERDIST_BETA = 30.0      # pixels slack

# Kalman
KALMAN_Q = 10.0             # process noise
KALMAN_R = 50.0             # measurement noise
KALMAN_GATE_MAHALANOBIS = 6.0

# --- Speed / IO ---
MAX_EXPORT_DEBUG_IMAGES = 500   # safety cap


## Motion ROI method options (choose ONE)

Static-camera-friendly:
- `MOG2` (adaptive background model)
- `KNN`
- `RunningAverage`
- `TemporalMedian`
- `FrameDifference`

More tolerant of camera shake / mild motion:
- `ORB_Affine_StabilizedDiff` (align frames using ORB features → warp → absdiff)
- `ECC_Affine_StabilizedDiff` (align using ECC optimization → warp → absdiff; slower)
- `OpticalFlowMagnitude`
- `KLT_PointCluster` (tracks points, clusters residual motion)

Notes:
- If your camera moves a lot, prefer stabilized diff / flow methods.
- If motion mask indicates the whole scene moved, the notebook discards motion ROIs and runs full-frame YOLO.


In [None]:
# Cell 5 — Helpers (annotation parsing, geometry, NMS)

def read_annotation_file(path: str) -> Dict[int, Optional[Tuple[int,int,int,int]]]:
    '''
    Annotation format (from your example GOPR5844_002.txt):
      - "frame_id 0"                          => no drone
      - "frame_id 1 x y w h drone"            => drone box
    Returns dict: frame_id -> (x,y,w,h) or None
    '''
    gt = {}
    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            line=line.strip()
            if not line:
                continue
            parts=line.split()
            if len(parts)==2:
                fid=int(parts[0]); has=int(parts[1])
                gt[fid]=None
            elif len(parts)>=7:
                fid=int(parts[0]); has=int(parts[1])
                if has==0:
                    gt[fid]=None
                else:
                    x=int(float(parts[2])); y=int(float(parts[3])); w=int(float(parts[4])); h=int(float(parts[5]))
                    # label is always exactly 'drone' by your rule
                    gt[fid]=(x,y,w,h)
    return gt

def xywh_to_xyxy(box):
    x,y,w,h=box
    return (x, y, x+w, y+h)

def clamp(v, lo, hi):
    return max(lo, min(hi, v))

def expand_box_xyxy(b, k, W, H):
    x1,y1,x2,y2=b
    cx=(x1+x2)/2; cy=(y1+y2)/2
    w=(x2-x1)*k; h=(y2-y1)*k
    nx1=int(clamp(cx-w/2, 0, W-1))
    ny1=int(clamp(cy-h/2, 0, H-1))
    nx2=int(clamp(cx+w/2, 0, W-1))
    ny2=int(clamp(cy+h/2, 0, H-1))
    return (nx1,ny1,nx2,ny2)

def box_wh_xyxy(b):
    x1,y1,x2,y2=b
    return (max(0,x2-x1), max(0,y2-y1))

def box_area_xyxy(b):
    w,h=box_wh_xyxy(b)
    return w*h

def iou_xyxy(a,b):
    ax1,ay1,ax2,ay2=a
    bx1,by1,bx2,by2=b
    ix1=max(ax1,bx1); iy1=max(ay1,by1)
    ix2=min(ax2,bx2); iy2=min(ay2,by2)
    iw=max(0,ix2-ix1); ih=max(0,iy2-iy1)
    inter=iw*ih
    if inter<=0: return 0.0
    areaA=(ax2-ax1)*(ay2-ay1); areaB=(bx2-bx1)*(by2-by1)
    return inter/(areaA+areaB-inter+1e-9)

def nms_xyxy(boxes, scores, iou_thresh=0.5):
    if len(boxes)==0:
        return []
    idx=np.argsort(scores)[::-1]
    keep=[]
    while idx.size>0:
        i=idx[0]
        keep.append(i)
        if idx.size==1:
            break
        rest=idx[1:]
        ious=np.array([iou_xyxy(boxes[i], boxes[j]) for j in rest])
        idx=rest[ious < iou_thresh]
    return keep


In [None]:
# Cell 6 — YOLO helpers

def load_yolo(weights_path: str) -> YOLO:
    model = YOLO(weights_path)
    return model

def yolo_predict_xyxy(model: YOLO, img_bgr: np.ndarray, conf: float, imgsz: int, max_det: int):
    '''
    Returns: boxes_xyxy (list of tuples), confs (list of floats)
    '''
    res = model.predict(
        source=img_bgr,
        conf=conf,
        imgsz=imgsz,
        max_det=max_det,
        verbose=False
    )[0]
    boxes=[]
    confs=[]
    if res.boxes is None or len(res.boxes)==0:
        return boxes, confs
    xyxy = res.boxes.xyxy.cpu().numpy()
    cs   = res.boxes.conf.cpu().numpy()
    # class name filtering: label is always drone in GT; but model might have multiple classes.
    # We keep ALL predictions and treat "best" as the highest conf prediction.
    for b,c in zip(xyxy, cs):
        x1,y1,x2,y2 = map(float,b)
        boxes.append((int(x1),int(y1),int(x2),int(y2)))
        confs.append(float(c))
    return boxes, confs


In [None]:
# Cell 7 — Motion ROI extraction (buffered inference frames → 1..3 ROIs)

def _to_gray_blur(img):
    g=cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    g=cv2.GaussianBlur(g,(5,5),0)
    return g

def _orb_affine(prev_g, curr_g):
    orb=cv2.ORB_create(800)
    kp1, des1 = orb.detectAndCompute(prev_g, None)
    kp2, des2 = orb.detectAndCompute(curr_g, None)
    if des1 is None or des2 is None or len(kp1)<8 or len(kp2)<8:
        return None
    bf=cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
    matches=bf.match(des1, des2)
    matches=sorted(matches, key=lambda m: m.distance)[:80]
    if len(matches)<8:
        return None
    pts1=np.float32([kp1[m.queryIdx].pt for m in matches])
    pts2=np.float32([kp2[m.trainIdx].pt for m in matches])
    M, inl = cv2.estimateAffinePartial2D(pts1, pts2, method=cv2.RANSAC, ransacReprojThreshold=3.0)
    return M

def _ecc_affine(prev_g, curr_g):
    # ECC can be slower but stable when it converges
    warp=np.eye(2,3, dtype=np.float32)
    criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 30, 1e-4)
    try:
        cc, warp = cv2.findTransformECC(prev_g, curr_g, warp, cv2.MOTION_AFFINE, criteria, None, 5)
        return warp
    except Exception:
        return None

def _motion_mask_from_pair(prev_bgr, curr_bgr, method):
    prev_g=_to_gray_blur(prev_bgr); curr_g=_to_gray_blur(curr_bgr)
    H,W=curr_g.shape

    if method == "FrameDifference":
        diff=cv2.absdiff(prev_g, curr_g)
        _,mask=cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
        return mask

    if method == "ORB_Affine_StabilizedDiff":
        M=_orb_affine(prev_g, curr_g)
        if M is None:
            return None
        warped=cv2.warpAffine(prev_g, M, (W,H), flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_REPLICATE)
        diff=cv2.absdiff(warped, curr_g)
        _,mask=cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
        return mask

    if method == "ECC_Affine_StabilizedDiff":
        M=_ecc_affine(prev_g, curr_g)
        if M is None:
            return None
        warped=cv2.warpAffine(prev_g, M, (W,H), flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_REPLICATE)
        diff=cv2.absdiff(warped, curr_g)
        _,mask=cv2.threshold(diff, 25, 255, cv2.THRESH_BINARY)
        return mask

    if method == "OpticalFlowMagnitude":
        flow=cv2.calcOpticalFlowFarneback(prev_g, curr_g, None, 0.5, 3, 15, 3, 5, 1.2, 0)
        mag,_=cv2.cartToPolar(flow[...,0], flow[...,1])
        mask=(mag>1.5).astype(np.uint8)*255
        return mask

    # Methods that need a background model will be handled elsewhere
    return None

def extract_motion_rois(buffer_frames: List[np.ndarray], method: str,
                        min_w: int, min_h: int, max_rois: int,
                        min_hits: int, global_coverage_thresh: float):
    '''
    buffer_frames: list of BGR frames sampled at inference times, newest last
    Returns: rois_xyxy (list), debug_mask (uint8), global_motion_flag (bool)
    '''
    if len(buffer_frames) < 2:
        return [], None, False

    H,W=buffer_frames[-1].shape[:2]

    # Background-model methods (static camera) — build a model over buffer frames and produce mask for the last frame.
    if method in ("MOG2","KNN","RunningAverage","TemporalMedian"):
        frames=buffer_frames
        if method=="TemporalMedian":
            stack=np.stack([f.astype(np.uint8) for f in frames], axis=0)
            med=np.median(stack, axis=0).astype(np.uint8)
            diff=cv2.absdiff(med, frames[-1])
            g=_to_gray_blur(diff)
            _,mask=cv2.threshold(g, 25, 255, cv2.THRESH_BINARY)
        elif method=="RunningAverage":
            avg=frames[0].astype(np.float32)
            alpha=0.1
            for fr in frames[1:]:
                cv2.accumulateWeighted(fr.astype(np.float32), avg, alpha)
            bg=cv2.convertScaleAbs(avg)
            diff=cv2.absdiff(bg, frames[-1])
            g=_to_gray_blur(diff)
            _,mask=cv2.threshold(g, 25, 255, cv2.THRESH_BINARY)
        elif method=="MOG2":
            fgbg=cv2.createBackgroundSubtractorMOG2(history=500, varThreshold=16, detectShadows=True)
            mask=None
            for fr in frames:
                mask=fgbg.apply(fr)
        elif method=="KNN":
            fgbg=cv2.createBackgroundSubtractorKNN(history=500, dist2Threshold=400.0, detectShadows=True)
            mask=None
            for fr in frames:
                mask=fgbg.apply(fr)
        else:
            mask=None
        if mask is None:
            return [], None, False
        debug_mask=mask.copy()
    else:
        # Pairwise mask accumulation (robust to leaves by requiring persistence)
        masks=[]
        for i in range(len(buffer_frames)-1):
            m=_motion_mask_from_pair(buffer_frames[i], buffer_frames[i+1], method)
            if m is None:
                continue
            masks.append(m)
        if len(masks)==0:
            return [], None, False

        acc=np.zeros_like(masks[0], dtype=np.uint16)
        for m in masks:
            acc += (m>0).astype(np.uint16)
        # keep pixels that appear in at least min_hits diffs
        keep=(acc >= min_hits).astype(np.uint8)*255
        debug_mask=keep.copy()

    # Morphology to reduce speckle (leaves) and merge regions
    k=cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
    m=cv2.morphologyEx(debug_mask, cv2.MORPH_OPEN, k, iterations=1)
    m=cv2.morphologyEx(m, cv2.MORPH_CLOSE, k, iterations=2)

    coverage = float(np.count_nonzero(m)) / float(H*W + 1e-9)
    if coverage > global_coverage_thresh:
        return [], m, True

    # Contours → ROIs
    cnts,_=cv2.findContours(m, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    rois=[]
    scored=[]
    for c in cnts:
        x,y,w,h=cv2.boundingRect(c)
        if w < min_w or h < min_h:
            continue
        area=cv2.contourArea(c)
        extent=area / float(w*h + 1e-9)   # leaves often have low extent (wispy)
        if extent < 0.15:
            continue
        rois.append((x,y,x+w,y+h))
        scored.append(area)

    if len(rois)==0:
        return [], m, False

    # pick up to max_rois by motion area
    order=np.argsort(scored)[::-1]
    rois=[rois[i] for i in order[:max_rois]]
    return rois, m, False


In [None]:
# Cell 8 — Tiled inference (slicing) for tiny-object rescue / verification

def tiled_inference(model: YOLO, frame_bgr: np.ndarray, conf: float, imgsz: int,
                    tile: int, overlap: float, max_det_per_tile: int,
                    debug_draw: bool=False):
    '''
    Runs YOLO on overlapping tiles and merges boxes with NMS.
    Returns: merged_boxes_xyxy, merged_confs, debug_image (optional), tiles_xyxy
    '''
    H,W=frame_bgr.shape[:2]
    step=int(tile*(1.0-overlap))
    step=max(1, step)

    tiles=[]
    for y in range(0, H, step):
        for x in range(0, W, step):
            x2=min(W, x+tile)
            y2=min(H, y+tile)
            x1=max(0, x2-tile)
            y1=max(0, y2-tile)
            tiles.append((x1,y1,x2,y2))
        if y+tile>=H:
            break

    all_boxes=[]
    all_scores=[]
    for (x1,y1,x2,y2) in tiles:
        crop=frame_bgr[y1:y2, x1:x2]
        b, s = yolo_predict_xyxy(model, crop, conf=conf, imgsz=imgsz, max_det=max_det_per_tile)
        for bb,sc in zip(b,s):
            bx1,by1,bx2,by2=bb
            all_boxes.append((bx1+x1, by1+y1, bx2+x1, by2+y1))
            all_scores.append(sc)

    if len(all_boxes)==0:
        dbg=None
        if debug_draw:
            dbg=frame_bgr.copy()
            for (x1,y1,x2,y2) in tiles:
                cv2.rectangle(dbg, (x1,y1),(x2,y2),(255,255,0),1)
        return [], [], dbg, tiles

    keep=nms_xyxy(all_boxes, all_scores, iou_thresh=0.5)
    boxes=[all_boxes[i] for i in keep]
    scores=[all_scores[i] for i in keep]

    dbg=None
    if debug_draw:
        dbg=frame_bgr.copy()
        for (x1,y1,x2,y2) in tiles:
            cv2.rectangle(dbg, (x1,y1),(x2,y2),(255,255,0),1)
        for b,s in zip(boxes,scores):
            cv2.rectangle(dbg, (b[0],b[1]),(b[2],b[3]),(0,255,255),2)
            cv2.putText(dbg, f"{s:.2f}", (b[0], max(0,b[1]-5)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,255), 1, cv2.LINE_AA)

    return boxes, scores, dbg, tiles


In [None]:
# Cell 9 — Continuity gates (selectable)

@dataclass
class Kalman2D:
    # state: [x, y, vx, vy]
    x: np.ndarray
    P: np.ndarray

def kalman_init(cx, cy, q, r):
    x=np.array([[cx],[cy],[0.0],[0.0]], dtype=np.float32)
    P=np.eye(4, dtype=np.float32)*1000.0
    return Kalman2D(x=x, P=P)

def kalman_predict(k: Kalman2D, dt: float, q: float):
    F=np.array([[1,0,dt,0],
                [0,1,0,dt],
                [0,0,1,0],
                [0,0,0,1]], dtype=np.float32)
    Q=np.eye(4, dtype=np.float32)*q
    k.x = F @ k.x
    k.P = F @ k.P @ F.T + Q
    return k

def kalman_update(k: Kalman2D, z: np.ndarray, r: float):
    H=np.array([[1,0,0,0],
                [0,1,0,0]], dtype=np.float32)
    R=np.eye(2, dtype=np.float32)*r
    y = z - (H @ k.x)
    S = H @ k.P @ H.T + R
    K = k.P @ H.T @ np.linalg.inv(S)
    k.x = k.x + K @ y
    I=np.eye(4, dtype=np.float32)
    k.P = (I - K @ H) @ k.P
    return k, y, S

def continuity_accept(prev_box_xyxy, curr_box_xyxy, mode: str, W: int, H: int,
                      kalman_state: Optional[Kalman2D], dt: float):
    '''
    Returns: (accept_bool, debug_shapes_dict, updated_kalman_state)
    debug_shapes_dict may contain boxes/circles to draw.
    '''
    debug={}
    if prev_box_xyxy is None or curr_box_xyxy is None:
        return True, debug, kalman_state

    if mode == "ExpandedIoU":
        exp=expand_box_xyxy(prev_box_xyxy, EXPAND_IOU_K, W, H)
        debug["expanded_prev"]=exp
        i=iou_xyxy(exp, curr_box_xyxy)
        debug["expanded_iou"]=i
        return (i >= EXPAND_IOU_THRESH), debug, kalman_state

    if mode == "CenterDistance":
        px1,py1,px2,py2=prev_box_xyxy
        cxp=(px1+px2)/2; cyp=(py1+py2)/2
        cx,cy=((curr_box_xyxy[0]+curr_box_xyxy[2])/2, (curr_box_xyxy[1]+curr_box_xyxy[3])/2)
        area=max(1.0, box_area_xyxy(prev_box_xyxy))
        thr = CENTERDIST_ALPHA*math.sqrt(area) + CENTERDIST_BETA
        d=math.hypot(cx-cxp, cy-cyp)
        debug["center_prev"]=(int(cxp),int(cyp))
        debug["center_thr"]=thr
        debug["center_curr"]=(int(cx),int(cy))
        return (d <= thr), debug, kalman_state

    if mode == "Kalman":
        # Use curr center as measurement, gate by Mahalanobis distance
        cx,cy=((curr_box_xyxy[0]+curr_box_xyxy[2])/2, (curr_box_xyxy[1]+curr_box_xyxy[3])/2)
        if kalman_state is None:
            kalman_state = kalman_init(cx, cy, KALMAN_Q, KALMAN_R)
            debug["kalman_pred_center"]=(int(cx),int(cy))
            return True, debug, kalman_state

        kalman_state = kalman_predict(kalman_state, dt, KALMAN_Q)
        z=np.array([[cx],[cy]], dtype=np.float32)
        # Gate before update: compute innovation y and S without applying update
        Hm=np.array([[1,0,0,0],[0,1,0,0]], dtype=np.float32)
        R=np.eye(2, dtype=np.float32)*KALMAN_R
        y = z - (Hm @ kalman_state.x)
        S = Hm @ kalman_state.P @ Hm.T + R
        # Mahalanobis distance
        Sinv=np.linalg.inv(S)
        md = float((y.T @ Sinv @ y).squeeze())
        pred=(float(kalman_state.x[0]), float(kalman_state.x[1]))
        debug["kalman_pred_center"]=(int(pred[0]), int(pred[1]))
        debug["kalman_md"]=md

        accept = (md <= KALMAN_GATE_MAHALANOBIS)
        if accept:
            kalman_state,_,_ = kalman_update(kalman_state, z, KALMAN_R)
        return accept, debug, kalman_state

    return True, debug, kalman_state


In [None]:
# Cell 10 — Main runner (writes annotated video + logs + summary CSV)

def crop_roi_center(frame_bgr, cx, cy, size):
    H,W=frame_bgr.shape[:2]
    half=size//2
    x1=int(clamp(cx-half, 0, W-size))
    y1=int(clamp(cy-half, 0, H-size))
    x2=x1+size; y2=y1+size
    crop=frame_bgr[y1:y2, x1:x2].copy()
    return crop, (x1,y1,x2,y2)

def best_detection(boxes, confs):
    if len(boxes)==0:
        return None, None
    i=int(np.argmax(confs))
    return boxes[i], confs[i]

def write_overlay(img, lines):
    y=20
    for line in lines:
        cv2.putText(img, line, (10,y), cv2.FONT_HERSHEY_SIMPLEX, 0.55, (255,255,255), 2, cv2.LINE_AA)
        cv2.putText(img, line, (10,y), cv2.FONT_HERSHEY_SIMPLEX, 0.55, (0,0,0), 1, cv2.LINE_AA)
        y += 18

def compute_map50_inference_frames(pred_records, gt_dict, iou_thr=0.5):
    '''
    pred_records: list of dicts for inference frames only:
      {frame_id, conf, box_xyxy or None}
    Single-class AP50 across inference frames.
    '''
    # Build list of predictions with frame association
    preds=[]
    gts_by_frame={}
    for fid,gtxywh in gt_dict.items():
        if gtxywh is None:
            continue
        gts_by_frame[fid]=xywh_to_xyxy(gtxywh)

    for r in pred_records:
        fid=r["frame_id"]
        b=r.get("pred_box_xyxy")
        c=r.get("pred_conf")
        if b is None:
            continue
        preds.append((c,fid,b))

    if len(preds)==0:
        return 0.0

    preds.sort(key=lambda x: x[0], reverse=True)

    matched=set()
    tp=[]
    fp=[]
    for conf,fid,box in preds:
        gt=gts_by_frame.get(fid)
        if gt is None:
            fp.append(1); tp.append(0); continue
        i=iou_xyxy(box, gt)
        if i>=iou_thr and fid not in matched:
            matched.add(fid)
            tp.append(1); fp.append(0)
        else:
            fp.append(1); tp.append(0)

    tp=np.cumsum(tp); fp=np.cumsum(fp)
    rec = tp / max(1, len(gts_by_frame))
    prec = tp / np.maximum(1, (tp+fp))
    # 11-point interpolated AP (simple, stable)
    ap=0.0
    for t in np.linspace(0,1,11):
        p = prec[rec>=t].max() if np.any(rec>=t) else 0.0
        ap += p/11.0
    return float(ap)

def run_benchmark():
    # Load model + GT
    model = load_yolo(MODEL_WEIGHTS)
    gt = read_annotation_file(ANNOTATION_PATH) if os.path.exists(ANNOTATION_PATH) else {}

    cap=cv2.VideoCapture(VIDEO_PATH)
    if not cap.isOpened():
        raise RuntimeError(f"Cannot open video: {VIDEO_PATH}")

    fps=cap.get(cv2.CAP_PROP_FPS)
    if fps <= 0:
        fps = 30.0
    infer_stride=max(1, int(round(fps / float(INFER_FPS))))

    vid_base=os.path.splitext(os.path.basename(VIDEO_PATH))[0]
    run_name=f"{vid_base}__infer{INFER_FPS}fps__motion-{MOTION_METHOD}__cont-{CONTINUITY_GATE_MODE}__verify-{VERIFY_MODE}"
    out_video=os.path.join(OUTPUT_ROOT, run_name + ".mp4")
    out_log=os.path.join(OUTPUT_ROOT, run_name + "__per_frame_log.csv")

    # debug folders
    motion_dir=os.path.join(OUTPUT_ROOT, "motion_debug")
    tiled_dir=os.path.join(OUTPUT_ROOT, "tiled_debug")
    os.makedirs(motion_dir, exist_ok=True)
    os.makedirs(tiled_dir, exist_ok=True)

    W=int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    H=int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fourcc=cv2.VideoWriter_fourcc(*"mp4v")
    vw=cv2.VideoWriter(out_video, fourcc, fps, (W,H))

    # Master summary CSV
    master_csv=os.path.join(OUTPUT_ROOT, "benchmark_runs_summary.csv")
    master_exists=os.path.exists(master_csv)

    # Baseline override logic (your rule): when PREPROCESS_MODE=None and VERIFY_MODE=None, we force baseline.
    baseline_mode = (PREPROCESS_MODE=="None" and VERIFY_MODE=="None")
    eff_use_guided = False if baseline_mode else USE_GUIDED_ROI
    eff_use_motion = False if baseline_mode else USE_MOTION_ROIS
    eff_verify_mode = "None" if baseline_mode else VERIFY_MODE
    eff_force_verify = False if baseline_mode else FORCE_VERIFY_SMALL_OR_LOWCONF
    eff_verify_all = False if baseline_mode else ENABLE_CONFIRMATION_GUIDED_ROI_ALL
    eff_tiny_rescue = False if baseline_mode else ENABLE_TINY_RESCUE

    # State
    infer_idx=0
    no_drone_streak=0
    last_det_box=None
    last_det_conf=None
    last_det_source="None"

    # Confirmation windows
    confirm_hist=[]   # list of booleans (hits) length <=5
    warn_hist=[]
    confirm_prev_box=None
    warn_prev_box=None
    kalman_confirm=None
    kalman_warn=None

    confirmed_events=0
    warning_events=0

    # Metrics accumulators
    per_frame_rows=[]
    infer_rows=[]
    t_pre=[]; t_inf=[]; t_post=[]

    # For stable overlay on every frame
    last_overlay = {
        "last_infer_decision":"NO DETECTION",
        "last_infer_source":"None",
        "last_infer_conf":0.0,
        "last_infer_box":None,
        "last_infer_rois":[],
        "last_infer_motion_rois":[],
        "last_infer_motion_candidates":[],
        "last_infer_guided_roi":None,
        "last_infer_continuity_dbg":{},
        "last_infer_verified":False,
        "last_infer_verify_mode":"None",
        "last_infer_motion_global":False,
        "last_infer_yolo_calls":0,
        "confirm_hits":0,
        "warn_hits":0
    }

    debug_export_count=0

    frame_id=0
    while True:
        ok, frame = cap.read()
        if not ok:
            break

        is_infer = (frame_id % infer_stride == 0)

        # Draw GT (optional)
        gt_box_xywh = gt.get(frame_id) if gt else None
        gt_box_xyxy = xywh_to_xyxy(gt_box_xywh) if (gt_box_xywh is not None) else None
        if DRAW_GT_BOX and gt_box_xyxy is not None:
            x1,y1,x2,y2=gt_box_xyxy
            cv2.rectangle(frame, (x1,y1),(x2,y2),(0,255,0),2)
            cv2.putText(frame, "GT drone", (x1, max(0,y1-6)), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2, cv2.LINE_AA)

        # Defaults for this frame's logs
        pred_box=None
        pred_conf=0.0
        pred_source="None"
        verified=False
        verify_mode_used="None"
        yolo_calls=0
        motion_rois=[]
        motion_infer_rois=[]
        guided_roi_box=None
        continuity_debug_shapes={}
        motion_global=False

        # Motion buffer stores inference frames only
        # We'll keep a list in closure-like variable
        if frame_id==0:
            infer_buffer=[]

        if is_infer:
            t0=time.perf_counter()

            # Update inference buffer
            infer_buffer.append(frame.copy())
            if len(infer_buffer) > MOTION_BUFFER_INFER_FRAMES:
                infer_buffer = infer_buffer[-MOTION_BUFFER_INFER_FRAMES:]

            # Decide main input strategy
            use_motion_now = eff_use_motion and (no_drone_streak >= NO_DRONE_STREAK_FOR_MOTION) and (len(infer_buffer) >= MOTION_BUFFER_INFER_FRAMES)
            use_guided_now = eff_use_guided and (last_det_box is not None) and (not use_motion_now)

            inputs=[]   # list of (tag, img, offset_xy) where offset is (ox,oy) to map back to full frame
            rois_for_draw=[]

            if use_motion_now:
                motion_rois, motion_mask, motion_global = extract_motion_rois(
                    infer_buffer,
                    MOTION_METHOD,
                    MIN_MOVER_W, MIN_MOVER_H, MAX_MOVER_ROIS,
                    MOTION_MIN_HITS,
                    MOTION_GLOBAL_COVERAGE_THRESH
                )
                if motion_global or len(motion_rois)==0:
                    # fallback full frame
                    inputs=[("FullFrame", frame, (0,0))]
                else:
                    # 1..3 ROI crops, centered on ROI center
                    for r in motion_rois:
                        x1,y1,x2,y2=r
                        cx=int((x1+x2)/2); cy=int((y1+y2)/2)
                        crop, roi_xyxy = crop_roi_center(frame, cx, cy, ROI_SIZE)
                        rois_for_draw.append(("MotionROI", roi_xyxy))
                        motion_infer_rois.append(roi_xyxy)
                        inputs.append(("MotionROI", crop, (roi_xyxy[0], roi_xyxy[1])))

                    # Export motion debug image
                    if EXPORT_MOTION_DEBUG_IMAGES and debug_export_count < MAX_EXPORT_DEBUG_IMAGES:
                        dbg=frame.copy()
                        if motion_mask is not None:
                            small=cv2.resize(motion_mask, (int(W*0.25), int(H*0.25)))
                            small=cv2.cvtColor(small, cv2.COLOR_GRAY2BGR)
                            dbg[0:small.shape[0], W-small.shape[1]:W] = small
                        for _,roi in rois_for_draw:
                            cv2.rectangle(dbg, (roi[0],roi[1]),(roi[2],roi[3]),(0,255,255),2)
                        cv2.imwrite(os.path.join(motion_dir, f"{run_name}__infer{infer_idx:06d}.jpg"), dbg)
                        debug_export_count += 1

            elif use_guided_now:
                # guided ROI around last detection center
                x1,y1,x2,y2=last_det_box
                cx=int((x1+x2)/2); cy=int((y1+y2)/2)
                crop, roi_xyxy = crop_roi_center(frame, cx, cy, ROI_SIZE)
                guided_roi_box=roi_xyxy
                inputs=[("GuidedROI", crop, (roi_xyxy[0], roi_xyxy[1]))]
                rois_for_draw.append(("GuidedROI", roi_xyxy))

            else:
                inputs=[("FullFrame", frame, (0,0))]

            t1=time.perf_counter()

            # Run YOLO on inputs (full or 1..3 rois). Still counts as ONE inference step.
            all_candidates=[]
            per_input_detect_counts=[]
            for tag, img_in, (ox,oy) in inputs:
                conf_th = BASE_CONF_FULL if tag!="GuidedROI" else GUIDED_ROI_CONF
                boxes, confs = yolo_predict_xyxy(model, img_in, conf=conf_th, imgsz=ROI_SIZE, max_det=(MAX_ROI_DETECTIONS if tag!="FullFrame" else MAX_FULLFRAME_DETECTIONS))
                yolo_calls += 1
                per_input_detect_counts.append((tag, len(boxes)))
                for b,c in zip(boxes, confs):
                    bb=(b[0]+ox, b[1]+oy, b[2]+ox, b[3]+oy)
                    all_candidates.append((c, bb, tag))

            # Pick best candidate across inputs
            if len(all_candidates)>0:
                all_candidates.sort(key=lambda x: x[0], reverse=True)
                pred_conf, pred_box, pred_source = all_candidates[0]
            else:
                pred_box=None
                pred_conf=0.0
                pred_source="None"

            # --- Confirmation guided ROI verification on SAME inference step ---
            # Required if: low conf OR small box OR verify-all enabled (and we have a detection)
            verify_required=False
            if pred_box is not None:
                pw,ph=box_wh_xyxy(pred_box)
                if eff_force_verify and (pred_conf < LOW_CONF_DETECTION or pw < MIN_MOVER_W or ph < MIN_MOVER_H):
                    verify_required=True
                if eff_verify_all:
                    verify_required=True

            if pred_box is not None and verify_required:
                # Determine mode used (if user chose None but verification is required, we force HighConf)
                mode = eff_verify_mode if eff_verify_mode!="None" else "DoublePassHighConf"
                verify_mode_used=mode

                x1,y1,x2,y2=pred_box
                cx=int((x1+x2)/2); cy=int((y1+y2)/2)

                vt0=time.perf_counter()
                if mode=="DoublePassHighConf":
                    crop, roi_xyxy = crop_roi_center(frame, cx, cy, ROI_SIZE)
                    boxes2, confs2 = yolo_predict_xyxy(model, crop, conf=VERIFY_HIGH_CONF, imgsz=ROI_SIZE, max_det=1)
                    yolo_calls += 1
                    b2,c2=best_detection(boxes2, confs2)
                    if b2 is not None:
                        b2=(b2[0]+roi_xyxy[0], b2[1]+roi_xyxy[1], b2[2]+roi_xyxy[0], b2[3]+roi_xyxy[1])
                        pred_box=b2
                        pred_conf=c2
                        verified=True
                    else:
                        # Verification failed → dismiss detection
                        pred_box=None
                        pred_conf=0.0
                        pred_source="None"
                        verified=False

                elif mode=="DoublePassHighRes":
                    crop, roi_xyxy = crop_roi_center(frame, cx, cy, ROI_SIZE)
                    boxes2, confs2 = yolo_predict_xyxy(model, crop, conf=BASE_CONF_FULL, imgsz=VERIFY_HIGHRES_IMGSZ, max_det=1)
                    yolo_calls += 1
                    b2,c2=best_detection(boxes2, confs2)
                    if b2 is not None:
                        b2=(b2[0]+roi_xyxy[0], b2[1]+roi_xyxy[1], b2[2]+roi_xyxy[0], b2[3]+roi_xyxy[1])
                        pred_box=b2
                        pred_conf=c2
                        verified=True
                    else:
                        pred_box=None
                        pred_conf=0.0
                        pred_source="None"
                        verified=False

                elif mode=="DoublePassTiled":
                    # Tiled verification on full frame (best for tiny objects), save debug image
                    boxes2, confs2, dbg, tiles = tiled_inference(model, frame, conf=BASE_CONF_FULL, imgsz=ROI_SIZE,
                                                                tile=TINY_RESCUE_TILED_TILE, overlap=TINY_RESCUE_TILED_OVERLAP,
                                                                max_det_per_tile=1, debug_draw=EXPORT_TILED_DEBUG_IMAGES)
                    yolo_calls += len(tiles)
                    b2,c2=best_detection(boxes2, confs2)
                    if EXPORT_TILED_DEBUG_IMAGES and dbg is not None and debug_export_count < MAX_EXPORT_DEBUG_IMAGES:
                        cv2.imwrite(os.path.join(tiled_dir, f"{run_name}__infer{infer_idx:06d}.jpg"), dbg)
                        debug_export_count += 1
                    if b2 is not None:
                        pred_box=b2
                        pred_conf=c2
                        verified=True
                    else:
                        pred_box=None
                        pred_conf=0.0
                        pred_source="None"
                        verified=False

                vt1=time.perf_counter()
                t_post.append((vt1-vt0)*1000.0)
            else:
                t_post.append(0.0)

            # Tiny-object rescue mode (only if enabled and we still have a detection)
            if eff_tiny_rescue and pred_box is not None and eff_verify_mode != "DoublePassTiled":
                pw,ph=box_wh_xyxy(pred_box)
                if pw < TINY_RESCUE_MIN_WH or ph < TINY_RESCUE_MIN_WH:
                    rt0=time.perf_counter()
                    boxes2, confs2, dbg, tiles = tiled_inference(model, frame, conf=BASE_CONF_FULL, imgsz=ROI_SIZE,
                                                                tile=TINY_RESCUE_TILED_TILE, overlap=TINY_RESCUE_TILED_OVERLAP,
                                                                max_det_per_tile=1, debug_draw=EXPORT_TILED_DEBUG_IMAGES)
                    yolo_calls += len(tiles)
                    b2,c2=best_detection(boxes2, confs2)
                    if EXPORT_TILED_DEBUG_IMAGES and dbg is not None and debug_export_count < MAX_EXPORT_DEBUG_IMAGES:
                        cv2.imwrite(os.path.join(tiled_dir, f"{run_name}__rescue_infer{infer_idx:06d}.jpg"), dbg)
                        debug_export_count += 1
                    if b2 is not None and c2 > pred_conf:
                        pred_box=b2
                        pred_conf=c2
                        verified=True
                    rt1=time.perf_counter()
                    t_post[-1] += (rt1-rt0)*1000.0

            t2=time.perf_counter()

            # Update streak + last detection (for guided ROI next time)
            hit = (pred_box is not None)
            if hit:
                no_drone_streak=0
                last_det_box=pred_box
                last_det_conf=pred_conf
                last_det_source=pred_source
            else:
                no_drone_streak += 1

            # Confirmation windows logic
            # A hit can enter warning if >=15x15. It can enter big-confirm if >=25x25 OR verified True.
            big_ok=False
            warn_ok=False
            if hit:
                pw,ph=box_wh_xyxy(pred_box)
                warn_ok = (pw>=WARNING_MIN_WH and ph>=WARNING_MIN_WH)
                big_ok = (pw>=CONFIRM_MIN_WH_BIG and ph>=CONFIRM_MIN_WH_BIG) or verified

            # Continuity gating per window
            if warn_ok:
                accept, dbg_shapes, kalman_warn = continuity_accept(warn_prev_box, pred_box, CONTINUITY_GATE_MODE, W,H, kalman_warn, dt=1.0/INFER_FPS)
                if accept:
                    warn_hist.append(True)
                    warn_prev_box=pred_box
                    continuity_debug_shapes["warn"]=dbg_shapes
                else:
                    warn_hist.append(False)
            else:
                warn_hist.append(False)

            if big_ok:
                accept, dbg_shapes, kalman_confirm = continuity_accept(confirm_prev_box, pred_box, CONTINUITY_GATE_MODE, W,H, kalman_confirm, dt=1.0/INFER_FPS)
                if accept:
                    confirm_hist.append(True)
                    confirm_prev_box=pred_box
                    continuity_debug_shapes["confirm"]=dbg_shapes
                else:
                    confirm_hist.append(False)
            else:
                confirm_hist.append(False)

            confirm_hist=confirm_hist[-CONFIRM_WINDOW:]
            warn_hist=warn_hist[-CONFIRM_WINDOW:]

            if sum(confirm_hist) >= CONFIRM_REQUIRED_HITS:
                confirmed_events += 1
                confirm_hist=[]  # reset event window
                confirm_prev_box=None
                kalman_confirm=None

            if sum(warn_hist) >= CONFIRM_REQUIRED_HITS:
                warning_events += 1
                warn_hist=[]
                warn_prev_box=None
                kalman_warn=None

            # timings
            t_pre.append((t1-t0)*1000.0)
            t_inf.append((t2-t1)*1000.0)

            # record inference frame for AP/mAP
            infer_rows.append({
                "frame_id": frame_id,
                "pred_box_xyxy": pred_box,
                "pred_conf": float(pred_conf) if pred_box is not None else None
            })

            # Store overlay state (used on all subsequent non-infer frames too)
            last_overlay.update({
                "last_infer_decision":"DETECTION" if hit else "NO DETECTION",
                "last_infer_source":pred_source,
                "last_infer_conf":float(pred_conf),
                "last_infer_box":pred_box,
                "last_infer_motion_rois": motion_infer_rois,
                "last_infer_motion_candidates": motion_rois,
                "last_infer_guided_roi": guided_roi_box,
                "last_infer_continuity_dbg": continuity_debug_shapes.get("confirm", {}),
                "last_infer_verified": verified,
                "last_infer_verify_mode": verify_mode_used,
                "last_infer_motion_global": motion_global,
                "last_infer_yolo_calls": yolo_calls,
                "confirm_hits": sum(confirm_hist),
                "warn_hits": sum(warn_hist),
            })

            infer_idx += 1

        # Draw predicted box from LAST inference (stable overlay on all frames)
        if last_overlay["last_infer_box"] is not None:
            b=last_overlay["last_infer_box"]
            cv2.rectangle(frame, (b[0],b[1]),(b[2],b[3]),(0,0,255),2)
            cv2.putText(frame, f"Pred drone {last_overlay['last_infer_conf']:.2f}",
                        (b[0], max(0,b[1]-6)), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,0,255), 2, cv2.LINE_AA)

        # Draw motion ROIs (last inference)
        if DRAW_MOTION_ROI_BOXES:
            for r in last_overlay["last_infer_motion_rois"][:MAX_MOVER_ROIS]:
                x1,y1,x2,y2=r
                cv2.rectangle(frame, (x1,y1),(x2,y2),(0,255,255),2)
                cv2.putText(frame, "Motion ROI", (x1, min(H-5,y2+18)), cv2.FONT_HERSHEY_SIMPLEX, 0.55, (0,255,255), 2, cv2.LINE_AA)

        # Draw guided ROI (last inference)
        if DRAW_GUIDED_ROI_BOX and last_overlay["last_infer_guided_roi"] is not None:
            x1,y1,x2,y2=last_overlay["last_infer_guided_roi"]
            cv2.rectangle(frame, (x1,y1),(x2,y2),(255,0,0),2)
            cv2.putText(frame, "Guided ROI", (x1, min(H-5,y2+18)), cv2.FONT_HERSHEY_SIMPLEX, 0.55, (255,0,0), 2, cv2.LINE_AA)

        # Draw continuity debug (from last inference only, to avoid flicker)
        if DRAW_CONTINUITY_DEBUG:
            # confirm window shapes
            dbg=last_overlay.get("last_infer_continuity_dbg")
            if dbg and "expanded_prev" in dbg:
                x1,y1,x2,y2=dbg["expanded_prev"]
                cv2.rectangle(frame, (x1,y1),(x2,y2),(255,255,0),2)
                cv2.putText(frame, "Continuity gate (expanded)", (x1, max(0,y1-6)), cv2.FONT_HERSHEY_SIMPLEX, 0.55, (255,255,0), 2, cv2.LINE_AA)
            if dbg and "center_prev" in dbg:
                cx,cy=dbg["center_prev"]; thr=int(dbg["center_thr"])
                cv2.circle(frame, (cx,cy), thr, (255,255,0), 2)
                cv2.putText(frame, "Continuity gate (center)", (cx, max(0,cy-thr-6)), cv2.FONT_HERSHEY_SIMPLEX, 0.55, (255,255,0), 2, cv2.LINE_AA)
            if dbg and "kalman_pred_center" in dbg:
                cx,cy=dbg["kalman_pred_center"]
                cv2.circle(frame, (cx,cy), 20, (255,255,0), 2)
                cv2.putText(frame, "Continuity gate (kalman)", (cx, max(0,cy-26)), cv2.FONT_HERSHEY_SIMPLEX, 0.55, (255,255,0), 2, cv2.LINE_AA)

        # Overlay text (stable structure, no flicker)
        tsec = frame_id / fps
        overlay_lines = [
            f"Frame: {frame_id}  Time: {tsec:.2f}s  Inference frame: {'YES' if is_infer else 'NO'}  (stride={infer_stride})",
            f"Baseline mode: {baseline_mode}   Preprocess: {PREPROCESS_MODE}   Use guided ROI: {eff_use_guided}   Use motion ROIs: {eff_use_motion}",
            f"Motion method: {MOTION_METHOD}   Verify mode: {eff_verify_mode}   Confirm guided ROI all: {eff_verify_all}   Force verify small/lowconf: {eff_force_verify}",
            f"Last inference: {last_overlay['last_infer_decision']}  Source: {last_overlay['last_infer_source']}  Conf: {last_overlay['last_infer_conf']:.2f}  Verified: {last_overlay['last_infer_verified']} ({last_overlay['last_infer_verify_mode']})",
            f"No-drone streak: {no_drone_streak}  Motion triggered: {('YES' if (no_drone_streak>=NO_DRONE_STREAK_FOR_MOTION) else 'NO')}  Motion global discarded: {last_overlay['last_infer_motion_global']}",
            f"Confirm window hits (big >= {CONFIRM_MIN_WH_BIG}px): {last_overlay['confirm_hits']}/{CONFIRM_WINDOW}   Confirmed events: {confirmed_events}",
            f"Warning window hits (>= {WARNING_MIN_WH}px): {last_overlay['warn_hits']}/{CONFIRM_WINDOW}   Warning events: {warning_events}",
            f"YOLO calls in last inference step: {last_overlay['last_infer_yolo_calls']}"
        ]
        write_overlay(frame, overlay_lines)

        # Per-frame log row
        row={
            "frame_id": frame_id,
            "time_s": tsec,
            "is_infer_frame": bool(is_infer),
            "gt_has_drone": bool(gt_box_xyxy is not None) if gt else None,
            "gt_box_xyxy": gt_box_xyxy,
            "last_pred_has_drone": bool(last_overlay["last_infer_box"] is not None),
            "last_pred_conf": last_overlay["last_infer_conf"],
            "last_pred_source": last_overlay["last_infer_source"],
            "last_pred_verified": last_overlay["last_infer_verified"],
            "confirmed_events": confirmed_events,
            "warning_events": warning_events
        }
        per_frame_rows.append(row)

        vw.write(frame)
        frame_id += 1

    cap.release()
    vw.release()

    # Compute summary metrics
    # Inference-frame PR metrics:
    # Use last inference decision as prediction for that inference frame (already in infer_rows)
    y_true=[]; y_pred=[]
    confs=[]
    for r in infer_rows:
        fid=r["frame_id"]
        gtbox=gt.get(fid) if gt else None
        y_true.append(1 if gtbox is not None else 0)
        y_pred.append(1 if r["pred_box_xyxy"] is not None else 0)
        if r["pred_conf"] is not None:
            confs.append(float(r["pred_conf"]))

    tp=sum(1 for yt,yp in zip(y_true,y_pred) if yt==1 and yp==1)
    fp=sum(1 for yt,yp in zip(y_true,y_pred) if yt==0 and yp==1)
    fn=sum(1 for yt,yp in zip(y_true,y_pred) if yt==1 and yp==0)
    precision = tp / max(1, (tp+fp))
    recall    = tp / max(1, (tp+fn))
    f1 = (2*precision*recall)/max(1e-9, (precision+recall))
    map50 = compute_map50_inference_frames(infer_rows, gt, iou_thr=0.5) if gt else 0.0
    avg_conf = float(np.mean(confs)) if len(confs)>0 else 0.0

    summary={
        "run_name": run_name,
        "video": VIDEO_PATH,
        "weights": MODEL_WEIGHTS,
        "annotation": ANNOTATION_PATH if os.path.exists(ANNOTATION_PATH) else None,
        "fps": fps,
        "infer_fps": INFER_FPS,
        "infer_stride": infer_stride,
        "baseline_mode": baseline_mode,
        "preprocess_mode": PREPROCESS_MODE,
        "use_guided_roi": eff_use_guided,
        "use_motion_rois": eff_use_motion,
        "motion_method": MOTION_METHOD,
        "continuity_gate": CONTINUITY_GATE_MODE,
        "verify_mode": eff_verify_mode,
        "verify_all": eff_verify_all,
        "force_verify_small_lowconf": eff_force_verify,
        "tiny_rescue": eff_tiny_rescue,
        "precision": precision,
        "recall": recall,
        "f1": f1,
        "mAP50_infer_frames": map50,
        "avg_pred_conf": avg_conf,
        "confirmed_events": confirmed_events,
        "warning_events": warning_events,
        "avg_pre_ms": float(np.mean(t_pre)) if len(t_pre)>0 else 0.0,
        "avg_infer_ms": float(np.mean(t_inf)) if len(t_inf)>0 else 0.0,
        "avg_post_ms": float(np.mean(t_post)) if len(t_post)>0 else 0.0,
    }

    # Save per-frame log
    pd.DataFrame(per_frame_rows).to_csv(out_log, index=False)

    # Append to master summary CSV
    df_row=pd.DataFrame([summary])
    if master_exists:
        df_row.to_csv(master_csv, mode="a", header=False, index=False)
    else:
        df_row.to_csv(master_csv, mode="w", header=True, index=False)

    print("Saved video:", out_video)
    print("Saved per-frame log:", out_log)
    print("Updated master summary:", master_csv)
    print("\nSummary:")
    for k,v in summary.items():
        print(f"  {k}: {v}")

    return summary

# Run it:
summary = run_benchmark()


## Notes for your current problem (far-away drones vs birds)

- If birds are confidently detected as drone, repetition confirmation alone won't fix it.
  You need either:
  - verification passes (same YOLO weights) + stricter continuity gating, or
  - better training (bird as explicit class, or a separate verifier).

- Tiny-object rescue via tiling is the most honest way to improve far-away objects without hallucinating pixels.
  Super-resolution can help sometimes, but it also often creates artifacts that detectors misread.
