<a href="https://colab.research.google.com/github/mehri-satari/Data-Mining-Course-Project/blob/main/YOLO_newdataset_robustness_check.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Step 0 ‚Äî Environment Setup (Colab + Dependencies)

This cell prepares the Colab runtime for the full experiment. It installs the required packages (Ultralytics YOLOv8 for training/validation, OpenCV and PIL for image processing, tqdm for progress tracking, and the AV2 library for reading Argoverse 2 files). Then it mounts Google Drive so the AV2 dataset can be accessed from /content/drive/. Finally, it imports all core libraries used later for loading AV2 annotations/calibration, building the YOLO-format dataset, and training/evaluating the detector.

In [26]:
# --- Colab installs (run once) ---
!pip -q install ultralytics opencv-python pillow tqdm

# AV2: if not installed already in your env
!pip -q install av2

from google.colab import drive
drive.mount('/content/drive')

import os
import math
import json
import shutil
from dataclasses import dataclass
from pathlib import Path
from typing import Dict, Tuple, List, Optional

import numpy as np
import pandas as pd
from PIL import Image
from tqdm import tqdm

import cv2
from ultralytics import YOLO

from av2.utils import io as io_utils


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Step 1 ‚Äî Define Scene and Load Core Inputs (Annotations + Calibration)

Step 1 is just the setup: I load the 3D annotations and the camera calibration. The annotations give me 3D cuboids per timestamp in the ego frame, and intrinsics/extrinsics let me transform ego-to-camera and project those cuboids into each camera image to get 2D boxes. I finish by sanity-checking schemas and confirming the scene stats: 9 cameras, 157 timestamps, 89 tracked objects.

In [2]:
SCENE_PATH = Path("/content/drive/MyDrive/Argoverse2/0526e68e-2ff1-3e53-b0f8-45df02e45a93")

def load_initial_data(scene_path: Path):
    ann_df = io_utils.read_feather(scene_path / "annotations.feather")
    intr_df = io_utils.read_feather(scene_path / "calibration" / "intrinsics.feather")
    extr_df = io_utils.read_feather(scene_path / "calibration" / "egovehicle_SE3_sensor.feather")
    return ann_df, intr_df, extr_df

ann_df, intr_df, extr_df = load_initial_data(SCENE_PATH)

print("ann_df columns:", ann_df.columns.tolist())
print("intr_df columns:", intr_df.columns.tolist())
print("extr_df columns:", extr_df.columns.tolist())

print("cameras:", intr_df["sensor_name"].unique().tolist())
print("timestamps in annotations:", ann_df["timestamp_ns"].nunique())
print("unique tracks:", ann_df["track_uuid"].nunique())


ann_df columns: ['timestamp_ns', 'track_uuid', 'category', 'length_m', 'width_m', 'height_m', 'qw', 'qx', 'qy', 'qz', 'tx_m', 'ty_m', 'tz_m', 'num_interior_pts']
intr_df columns: ['sensor_name', 'fx_px', 'fy_px', 'cx_px', 'cy_px', 'k1', 'k2', 'k3', 'height_px', 'width_px']
extr_df columns: ['sensor_name', 'qw', 'qx', 'qy', 'qz', 'tx_m', 'ty_m', 'tz_m']
cameras: ['ring_front_center', 'ring_front_left', 'ring_front_right', 'ring_rear_left', 'ring_rear_right', 'ring_side_left', 'ring_side_right', 'stereo_front_left', 'stereo_front_right']
timestamps in annotations: 157
unique tracks: 89


### Step 2 ‚Äî Define the 3D Geometry / Rigid-Transform Utilities (Quaternion ‚Üí Rotation, SE(3) Pose)


In Step 2, I implement the core geometry tools that let us **move objects and points between coordinate frames** in Argoverse 2. This is necessary because AV2 provides object poses and sensor poses in **3D**, and we must correctly transform them into the **camera coordinate system** before any 2D projection is possible.

---

#### What I Define in This Step

**1) `quat_to_rotmat()` ‚Äî Quaternion to 3√ó3 Rotation Matrix**

* AV2 stores rotations as quaternions `(qw, qx, qy, qz)` because they are numerically stable for 3D orientation.
* For projection and frame transforms, we typically need a standard **3√ó3 rotation matrix**.
* `quat_to_rotmat()` performs this conversion, giving us a usable rotation matrix **R**.

**2) `SE3` dataclass ‚Äî Full Rigid-Body Transform in 3D**

* I define an `SE3` object to represent a complete **rigid transformation** in 3D:

  * Rotation **R** (3√ó3)
  * Translation **t** (3√ó1)
* This is the standard mathematical representation used to map coordinates across frames (e.g., **ego ‚Üí camera**, or **camera ‚Üí ego**).

---

#### Key Methods Provided by `SE3`

* **`as_matrix()`**
  Builds the 4√ó4 homogeneous transform matrix (useful for debugging and consistent math operations).

* **`inverse()`**
  Computes the inverse pose. This is essential because AV2 sometimes provides a transform in the opposite direction of what we need (e.g., we may need **ego‚Üícamera** but are given **camera‚Üíego**, or vice versa).

* **`transform_points()`**
  Applies the transform to a set of 3D points efficiently (e.g., cuboid corners).





In [3]:
def quat_to_rotmat(qw, qx, qy, qz) -> np.ndarray:
    # Unit quaternion -> rotation matrix
    q = np.array([qw, qx, qy, qz], dtype=np.float64)
    q = q / np.linalg.norm(q)
    w, x, y, z = q
    R = np.array([
        [1-2*(y*y+z*z), 2*(x*y - z*w), 2*(x*z + y*w)],
        [2*(x*y + z*w), 1-2*(x*x+z*z), 2*(y*z - x*w)],
        [2*(x*z - y*w), 2*(y*z + x*w), 1-2*(x*x+y*y)],
    ], dtype=np.float64)
    return R

@dataclass
class SE3:
    R: np.ndarray  # 3x3
    t: np.ndarray  # 3,

    def as_matrix(self) -> np.ndarray:
        T = np.eye(4, dtype=np.float64)
        T[:3,:3] = self.R
        T[:3, 3] = self.t
        return T

    def inverse(self) -> "SE3":
        R_inv = self.R.T
        t_inv = -R_inv @ self.t
        return SE3(R_inv, t_inv)

    def transform_points(self, pts: np.ndarray) -> np.ndarray:
        # pts: (N,3)
        return (pts @ self.R.T) + self.t.reshape(1,3)


### Step 3 ‚Äî Build a Camera Intrinsics Lookup Table

In this step, I convert the intrinsics table (`intr_df`) into a clean dictionary that we can query quickly during projection.

* I define a small data structure **`CameraIntrinsics`** that stores the four core pinhole parameters: **fx, fy, cx, cy**.
* In `build_intrinsics_dict()`, I first **validate** that the required columns exist in `intr_df`.
* Then I iterate over each row and build a dictionary:

[
\text{INTR}[camera_name] \rightarrow (fx, fy, cx, cy)
]

So later, when I project 3D points into a specific camera image, I can directly do `INTR["ring_front_left"]` (for example) instead of repeatedly searching the dataframe.

**Output:** `INTR`, a dictionary mapping each camera name to its intrinsic parameters.


In [4]:
@dataclass
class CameraIntrinsics:
    fx: float
    fy: float
    cx: float
    cy: float

def build_intrinsics_dict(intr_df: pd.DataFrame) -> Dict[str, CameraIntrinsics]:
    req = ["sensor_name", "fx_px", "fy_px", "cx_px", "cy_px"]
    for c in req:
        if c not in intr_df.columns:
            raise ValueError(f"Missing intrinsics column: {c}")

    intr = {}
    for _, r in intr_df.iterrows():
        intr[r["sensor_name"]] = CameraIntrinsics(
            fx=float(r["fx_px"]), fy=float(r["fy_px"]),
            cx=float(r["cx_px"]), cy=float(r["cy_px"])
        )
    return intr

INTR = build_intrinsics_dict(intr_df)


### Step 4 ‚Äî Build Extrinsics Dictionary (Ego ‚Üî Camera Poses)

This step converts the AV2 extrinsics table (`extr_df`) into usable **SE(3) transforms** for each camera, which is required to transform 3D annotations from the ego frame into each camera frame.

* `build_extrinsics_dict()` loops over `extr_df` and builds a dictionary keyed by `sensor_name`.
* It supports the common AV2 format where extrinsics are stored as:

  * a **quaternion** (`qw,qx,qy,qz`) for rotation, and
  * a **translation** (`tx_m,ty_m,tz_m`) in meters.
    These are converted into an `SE3(R,t)` object per camera.
* (As a fallback) it also supports a less common case where a full **4√ó4 transform matrix** is stored in a single column.

**Outputs produced:**

* `T_EGO_SENSOR`: a dictionary mapping each camera to an SE(3) transform as read from the file (we initially assume it represents **sensor ‚Üí ego**).
* `T_SENSOR_EGO`: the inverse transforms, computed as `inverse()`, which represent **ego ‚Üí sensor**.

These two directions matter because projection requires a consistent transform direction; later we use the correct one to move 3D cuboid points into the camera coordinate system before applying intrinsics.


In [5]:
def build_extrinsics_dict(extr_df: pd.DataFrame) -> Dict[str, SE3]:
    # Try common naming conventions
    if "sensor_name" not in extr_df.columns:
        raise ValueError("extrinsics missing 'sensor_name'")

    # Case A: explicit quaternion + translation
    quat_cols = [c for c in ["qw", "qx", "qy", "qz"] if c in extr_df.columns]
    trans_cols = [c for c in ["tx_m", "ty_m", "tz_m"] if c in extr_df.columns]

    extr = {}
    for _, r in extr_df.iterrows():
        name = r["sensor_name"]

        if len(quat_cols) == 4 and len(trans_cols) == 3:
            R = quat_to_rotmat(float(r["qw"]), float(r["qx"]), float(r["qy"]), float(r["qz"]))
            t = np.array([float(r["tx_m"]), float(r["ty_m"]), float(r["tz_m"])], dtype=np.float64)
            extr[name] = SE3(R=R, t=t)
        else:
            # Case B: flattened 4x4 matrix columns like "T_egovehicle_sensor" or similar
            # Search for 16-number vector-ish columns
            mat_col = None
            for cand in ["T_egovehicle_sensor", "egovehicle_SE3_sensor", "transform_matrix"]:
                if cand in extr_df.columns:
                    mat_col = cand
                    break
            if mat_col is None:
                raise ValueError(
                    "Extrinsics format not recognized. "
                    "Expected qw/qx/qy/qz + tx_m/ty_m/tz_m OR a 4x4 matrix column."
                )
            T = np.array(r[mat_col], dtype=np.float64).reshape(4,4)
            R, t = T[:3,:3], T[:3,3]
            extr[name] = SE3(R=R, t=t)

    return extr

T_EGO_SENSOR = build_extrinsics_dict(extr_df)         # sensor -> ego (assumed)
T_SENSOR_EGO = {k: v.inverse() for k, v in T_EGO_SENSOR.items()}  # ego -> sensor


### Step 5 ‚Äî Locate Camera Images and Build a Timestamp‚ÜíFile Index

This step prepares the **raw image inputs** needed for training and redundancy analysis by indexing all camera frames in the selected AV2 scene.

* First, `find_camera_root()` searches for the camera-image folder using common AV2 directory layouts (e.g., `.../sensors/cameras/...`). This makes the notebook robust to slight differences in how the dataset is stored.
* Next, `parse_timestamp_from_filename()` extracts the timestamp from each image filename. AV2 camera frames are typically saved as files named by their `timestamp_ns` (e.g., `1234567890123456789.jpg`), which allows alignment with the annotation timestamps.
* Then `index_images()` loops over all cameras and builds a dictionary:

[
\text{IMG_INDEX}[camera][timestamp_ns] \rightarrow \text{path to image file}
]

* The printed output confirms how many frames were found per camera (about 319‚Äì320 frames each), which indicates the scene has a consistent multi-camera stream.

**Output:** `IMG_INDEX`, a fast lookup structure that lets us retrieve the exact image file for a given camera and timestamp when generating 2D labels and computing redundancy.


In [6]:
def find_camera_root(scene_path: Path) -> Path:
    # Common patterns across AV2 dumps
    candidates = [
        scene_path / "sensors" / "cameras",
        scene_path / "sensor" / "cameras",
        scene_path / "cameras",
    ]
    for p in candidates:
        if p.exists():
            return p
    raise FileNotFoundError(f"Cannot find camera root under {scene_path} (tried: {candidates})")

CAM_ROOT = find_camera_root(SCENE_PATH)

def parse_timestamp_from_filename(p: Path) -> Optional[int]:
    # expects something like ".../1234567890123456789.jpg"
    stem = p.stem
    if stem.isdigit():
        return int(stem)
    return None

def index_images(cam_root: Path, cameras: List[str]) -> Dict[str, Dict[int, Path]]:
    idx = {}
    for cam in cameras:
        cam_dir = cam_root / cam
        if not cam_dir.exists():
            print(f"[WARN] camera directory missing: {cam_dir}")
            continue
        ts_map = {}
        for ext in ["*.jpg", "*.jpeg", "*.png"]:
            for p in cam_dir.glob(ext):
                ts = parse_timestamp_from_filename(p)
                if ts is not None:
                    ts_map[ts] = p
        idx[cam] = ts_map
        print(f"{cam}: {len(ts_map)} images indexed")
    return idx

CAMERAS = intr_df["sensor_name"].unique().tolist()
IMG_INDEX = index_images(CAM_ROOT, CAMERAS)


ring_front_center: 319 images indexed
ring_front_left: 319 images indexed
ring_front_right: 319 images indexed
ring_rear_left: 319 images indexed
ring_rear_right: 319 images indexed
ring_side_left: 320 images indexed
ring_side_right: 320 images indexed
stereo_front_left: 319 images indexed
stereo_front_right: 319 images indexed


### Step 6 ‚Äî Timestamp Alignment (Nearest-Neighbor Matching)

This step defines how we align **annotation timestamps** with **camera image timestamps** when they do not match perfectly.

* In AV2, object annotations are indexed by `timestamp_ns`, and each camera frame is also timestamped, but in practice they may be slightly offset due to sensor timing and logging.
* `nearest_timestamp()` takes:

  * `target`: the annotation timestamp we want to match,
  * `available`: the list of image timestamps available for that camera,
  * `max_diff_ns`: the maximum allowed mismatch (default **50 ms**).

It then:

1. finds the closest image timestamp using a binary search (`np.searchsorted`),
2. checks the neighbor timestamps around that insertion point,
3. returns the nearest one **only if** it is within the tolerance; otherwise it returns `None`.

**Purpose:** ensure we only pair an annotation with an image when the time difference is small enough to be valid, which prevents incorrect 2D projections caused by using the wrong frame.


In [7]:
def nearest_timestamp(target: int, available: List[int], max_diff_ns: int = 50_000_000) -> Optional[int]:
    # max_diff_ns default = 50ms
    if not available:
        return None
    # binary search
    arr = np.array(available, dtype=np.int64)
    i = int(np.searchsorted(arr, target))
    cand = []
    if i < len(arr): cand.append(arr[i])
    if i > 0: cand.append(arr[i-1])
    best = min(cand, key=lambda x: abs(int(x) - target))
    if abs(int(best) - target) <= max_diff_ns:
        return int(best)
    return None


### Step 7 ‚Äî Convert Each 3D Cuboid Annotation into 8 Corner Points (Ego Frame)

This step takes one AV2 3D annotation row and converts it into the **8 corner points of the 3D cuboid in the ego-vehicle coordinate frame**. This is the key geometric object we later project into camera images to create 2D boxes and to measure redundancy at the instance level.

**What the code is doing:**

* **`get_col()`**: AV2 field names can vary slightly across versions or exports, so this helper searches for the first matching column name from a list of candidates (e.g., `tx_m` vs `center_x`). This makes the notebook more robust.

* **`cuboid_corners_ego(row)`**:

  1. Reads the cuboid **center position** from the annotation (in ego frame):

     * `tx_m, ty_m, tz_m` ‚Üí the 3D center of the object.
  2. Reads the cuboid **size**:

     * `length_m, width_m, height_m`
  3. Reads the cuboid **orientation** as a quaternion `(qw,qx,qy,qz)` and converts it to a rotation matrix `R`.
  4. Builds the **8 corners in a local cuboid coordinate system** centered at (0,0,0) using half-dimensions `(L/2, W/2, H/2)`.
  5. Rotates these local corners by the object orientation and then shifts them by the object center, producing:

[
\text{corners_ego} \in \mathbb{R}^{8 \times 3}
]

**Output:** an `8√ó3` array of cuboid corners in the ego frame.
This is essential because AV2 provides **3D cuboids**, not 2D bounding boxes‚Äîso we must generate the 3D geometry first before projecting into each camera.


In [8]:
def get_col(available_cols, candidates):
    for c in candidates:
        if c in available_cols:
            return c
    raise KeyError(f"None of {candidates} found in columns.")

def cuboid_corners_ego(row: pd.Series) -> np.ndarray:
    cols = set(row.index)

    # Common AV2-ish names (adjust if your ann_df differs)
    cx = float(row[get_col(cols, ["center_x", "tx_m", "x", "translation_x"])])
    cy = float(row[get_col(cols, ["center_y", "ty_m", "y", "translation_y"])])
    cz = float(row[get_col(cols, ["center_z", "tz_m", "z", "translation_z"])])

    length = float(row[get_col(cols, ["length_m", "length"])])
    width  = float(row[get_col(cols, ["width_m", "width"])])
    height = float(row[get_col(cols, ["height_m", "height"])])

    # orientation quaternion (ego)
    qw = float(row[get_col(cols, ["qw", "rotation_qw"])])
    qx = float(row[get_col(cols, ["qx", "rotation_qx"])])
    qy = float(row[get_col(cols, ["qy", "rotation_qy"])])
    qz = float(row[get_col(cols, ["qz", "rotation_qz"])])

    R = quat_to_rotmat(qw, qx, qy, qz)
    center = np.array([cx, cy, cz], dtype=np.float64)

    # local cuboid corners around (0,0,0) with length along x, width along y, height along z (convention)
    l2, w2, h2 = length/2, width/2, height/2
    corners_local = np.array([
        [ l2,  w2,  h2],
        [ l2, -w2,  h2],
        [-l2, -w2,  h2],
        [-l2,  w2,  h2],
        [ l2,  w2, -h2],
        [ l2, -w2, -h2],
        [-l2, -w2, -h2],
        [-l2,  w2, -h2],
    ], dtype=np.float64)

    corners_ego = (corners_local @ R.T) + center.reshape(1,3)
    return corners_ego


### Step 8 ‚Äî Project 3D Cuboids to 2D Boxes and Compute BCS (Instance Visibility Score)

This step converts each **3D cuboid (ego-frame corners)** into a **2D bounding box in a specific camera image**, and it also computes the **BCS score**, which we use later for pruning.

#### 8.1 Define output container: `Box2D`

* `Box2D` stores the final 2D box coordinates `(xmin, ymin, xmax, ymax)` plus:
* `bcs`: a visibility/validity score that measures how much of the box is actually inside the image.

#### 8.2 `project_points_to_image()`: camera projection (pinhole model)

* Input: 3D points in the **camera frame** `(x,y,z)`
* Output: pixel coordinates `(u,v)` using intrinsics:

[
u = f_x \cdot \frac{x}{z} + c_x,\quad v = f_y \cdot \frac{y}{z} + c_y
]

We keep `z` as well because we need depth filtering.

#### 8.3 `bbox_and_bcs_from_cuboid()`: full pipeline for one object in one camera

This function does four key things:

1. **Ego ‚Üí Camera transform**
   We use the extrinsics transform (`T_sensor_ego`) to move cuboid corners from ego frame into the camera frame:

   * `corners_cam = T_sensor_ego.transform_points(corners_ego)`

2. **Depth filtering (valid projection only)**
   If all corners are behind the camera (`z <= 0.1`), we drop the object.
   Then we keep only the corners with positive depth (`z > 0.1`) to avoid unstable projections.

3. **Compute the 2D bounding box**
   We project corners ‚Üí image pixels and take min/max over projected points:

   * `xmin_full, xmax_full, ymin_full, ymax_full`
     This produces the **full (unclipped) projected box**, which can extend beyond the image.

4. **Clip to image + compute BCS**

   * We clip the full box to image boundaries.
   * Compute:

     * `area_full`: area of the projected box before clipping
     * `area_clip`: area after clipping into image bounds

Then the **BCS** is:

[
\text{BCS} = \frac{\text{area_clip}}{\text{area_full}}
]

Interpretation:

* **BCS ‚âà 1.0** ‚Üí object box is fully inside the image (good visibility)
* **BCS small** ‚Üí most of the box lies outside the image (partial visibility / edge cut)

Finally, if the clipped box is basically empty (`area_clip <= 1.0`), we drop it.

**Output:** either `None` (object not valid in that camera view), or a `Box2D` containing:

* clipped 2D box coordinates
* BCS score for later pruning decisions


In [9]:
@dataclass
class Box2D:
    xmin: float
    ymin: float
    xmax: float
    ymax: float
    bcs: float

def project_points_to_image(pts_cam: np.ndarray, intr: CameraIntrinsics) -> np.ndarray:
    # pts_cam: (N,3) camera frame; assume z-forward
    x, y, z = pts_cam[:,0], pts_cam[:,1], pts_cam[:,2]
    eps = 1e-9
    u = intr.fx * (x / (z + eps)) + intr.cx
    v = intr.fy * (y / (z + eps)) + intr.cy
    return np.stack([u, v, z], axis=1)

def bbox_and_bcs_from_cuboid(corners_ego: np.ndarray,
                            cam: str,
                            intr: CameraIntrinsics,
                            T_sensor_ego: SE3,
                            img_w: int,
                            img_h: int) -> Optional[Box2D]:
    # Transform corners ego -> camera(sensor)
    corners_cam = T_sensor_ego.transform_points(corners_ego)

    # Require some positive depth
    if np.all(corners_cam[:,2] <= 0.1):
        return None

    uvz = project_points_to_image(corners_cam, intr)
    u = uvz[:,0]
    v = uvz[:,1]
    z = uvz[:,2]

    # Keep only corners with positive depth to avoid wild projections
    valid = z > 0.1
    if valid.sum() < 2:
        return None

    u_full = u[valid]
    v_full = v[valid]

    xmin_full, xmax_full = float(u_full.min()), float(u_full.max())
    ymin_full, ymax_full = float(v_full.min()), float(v_full.max())

    # full area can be off-image
    full_w = max(0.0, xmax_full - xmin_full)
    full_h = max(0.0, ymax_full - ymin_full)
    area_full = full_w * full_h
    if area_full <= 1e-6:
        return None

    # clip to image boundaries
    xmin_clip = max(0.0, min(float(img_w - 1), xmin_full))
    xmax_clip = max(0.0, min(float(img_w - 1), xmax_full))
    ymin_clip = max(0.0, min(float(img_h - 1), ymin_full))
    ymax_clip = max(0.0, min(float(img_h - 1), ymax_full))

    clip_w = max(0.0, xmax_clip - xmin_clip)
    clip_h = max(0.0, ymax_clip - ymin_clip)
    area_clip = clip_w * clip_h

    bcs = float(area_clip / area_full)  # Eq.(1) in the paper :contentReference[oaicite:4]{index=4}

    # If it‚Äôs entirely outside after clipping, drop it
    if area_clip <= 1.0:
        return None

    return Box2D(xmin=xmin_clip, ymin=ymin_clip, xmax=xmax_clip, ymax=ymax_clip, bcs=bcs)



# Step 9
builds a geometric overlap map between cameras. For each camera, I estimate its viewing direction (yaw of the optical axis in ego frame) and its horizontal FOV from the intrinsics, then I compute pairwise overlap on the circular angle domain with proper wrap-around handling. The resulting top pairs‚Äîlike the stereo front-left/right (~62¬∞) and front-center with stereo (~47¬∞)‚Äîare the exact camera pairs we use later to decide where redundancy pruning is applied.


In [10]:
import numpy as np
import math
from typing import List, Tuple, Dict
from PIL import Image

def wrap_pi(a: float) -> float:
    return float((a + np.pi) % (2*np.pi) - np.pi)

def fov_segments(center: float, hfov: float):
    a1 = wrap_pi(center - hfov/2)
    a2 = wrap_pi(center + hfov/2)
    if a1 <= a2:
        return [(a1, a2)]
    # wraps around -pi/pi
    return [(a1, np.pi), (-np.pi, a2)]

def seg_overlap(s1, s2) -> float:
    left = max(s1[0], s2[0])
    right = min(s1[1], s2[1])
    return max(0.0, right - left)

def circular_overlap(center1: float, hfov1: float, center2: float, hfov2: float) -> float:
    segs1 = fov_segments(center1, hfov1)
    segs2 = fov_segments(center2, hfov2)
    ov = 0.0
    for a in segs1:
        for b in segs2:
            ov += seg_overlap(a, b)
    # cannot exceed the smaller hfov
    return float(min(ov, min(hfov1, hfov2)))

def camera_yaw_center_in_ego(cam: str, T_ego_sensor: SE3) -> float:
    # assuming camera forward is +Z in camera frame
    forward_cam = np.array([0.0, 0.0, 1.0], dtype=np.float64)
    forward_ego = T_ego_sensor.R @ forward_cam
    return float(np.arctan2(forward_ego[1], forward_ego[0]))

def hfov_from_intrinsics(intr: CameraIntrinsics, img_w: int) -> float:
    return float(2.0 * np.arctan(img_w / (2.0 * intr.fx)))

def compute_overlap_pairs(cameras: List[str],
                          INTR: Dict[str, CameraIntrinsics],
                          T_EGO_SENSOR: Dict[str, SE3],
                          IMG_INDEX: Dict[str, Dict[int, Path]],
                          min_overlap_deg: float = 5.0) -> List[Tuple[str,str,float]]:

    min_overlap = math.radians(min_overlap_deg)

    # one image per camera to get W,H
    cam_sizes = {}
    for cam in cameras:
        ts_map = IMG_INDEX.get(cam, {})
        if not ts_map:
            continue
        any_path = next(iter(ts_map.values()))
        with Image.open(any_path) as im:
            cam_sizes[cam] = im.size  # (W,H)

    cam_info = {}
    for cam in cameras:
        if cam not in cam_sizes or cam not in INTR or cam not in T_EGO_SENSOR:
            continue
        W, H = cam_sizes[cam]
        yaw = camera_yaw_center_in_ego(cam, T_EGO_SENSOR[cam])
        hfov = hfov_from_intrinsics(INTR[cam], W)
        cam_info[cam] = (yaw, hfov)

    cams = list(cam_info.keys())
    pairs = []
    for i in range(len(cams)):
        for j in range(i+1, len(cams)):
            c1, c2 = cams[i], cams[j]
            yaw1, hfov1 = cam_info[c1]
            yaw2, hfov2 = cam_info[c2]

            ov = circular_overlap(yaw1, hfov1, yaw2, hfov2)

            # HARD sanity: overlap must be <= pi
            if ov > np.pi + 1e-6:
                raise RuntimeError(f"Impossible overlap > 180deg for {c1},{c2}: {ov} rad")

            if ov >= min_overlap:
                pairs.append((c1, c2, ov))

    pairs.sort(key=lambda x: x[2], reverse=True)
    return pairs

# IMPORTANT: force recompute (don‚Äôt reuse cached OVERLAP_PAIRS)
OVERLAP_PAIRS = compute_overlap_pairs(CAMERAS, INTR, T_EGO_SENSOR, IMG_INDEX, min_overlap_deg=5.0)

print("Top overlap pairs (deg):")
for c1, c2, ov in OVERLAP_PAIRS[:15]:
    print(c1, c2, ov * 180/np.pi)


Top overlap pairs (deg):
stereo_front_left stereo_front_right 62.11493808031442
ring_front_center stereo_front_left 47.049396999969844
ring_front_center stereo_front_right 47.049396999969844
ring_front_left stereo_front_right 17.787091136414052
ring_front_right stereo_front_left 17.66136077378026
ring_front_left stereo_front_left 17.344141408992847
ring_front_right stereo_front_right 17.243924899568892
ring_front_center ring_front_right 9.831879144830882
ring_front_center ring_front_left 9.6906460833863
ring_rear_left ring_rear_right 8.741142569844872
ring_rear_left ring_side_left 8.63792319760331
ring_rear_right ring_side_right 8.448340829841815
ring_front_right ring_side_right 8.441358960305164
ring_front_left ring_side_left 8.415167332173757


THe overlap results make sense and are consistent with the yaw + HFOV values

* **Stereo L‚ÄìR ~62¬∞**: both cameras face ~0¬∞ with ~62¬∞ HFOV ‚Üí almost full overlap.
* **Ring front center‚Äìstereo ~47¬∞**: ring_front_center HFOV is **47¬∞**, so overlap is capped at **47¬∞**.
* **Ring front left/right with opposite/near stereo ~17‚Äì18¬∞**: yaws are ~¬±45¬∞ apart, so you only get a small edge overlap.

* **Other pairs ~8‚Äì10¬∞**: neighboring ring cameras are farther apart in yaw, so overlap is smaller.




In [27]:
def deg(x):
    return float(x * 180 / np.pi)

for cam in CAMERAS:
    if cam in INTR and cam in T_EGO_SENSOR:
        W = next(iter(IMG_INDEX[cam].values()))
        with Image.open(W) as im:
            width = im.size[0]
        yaw = camera_yaw_center_in_ego(cam, T_EGO_SENSOR[cam])
        hfov = hfov_from_intrinsics(INTR[cam], width)
        print(f"{cam:20s} yaw={deg(yaw):7.2f}  hfov={deg(hfov):7.2f}")


ring_front_center    yaw=  -0.11  hfov=  47.05
ring_front_left      yaw=  44.99  hfov=  62.52
ring_front_right     yaw= -45.06  hfov=  62.52
ring_rear_left       yaw= 153.00  hfov=  62.54
ring_rear_right      yaw=-153.19  hfov=  62.57
ring_side_left       yaw=  99.10  hfov=  62.54
ring_side_right      yaw= -99.11  hfov=  62.47
stereo_front_left    yaw=  -0.19  hfov=  62.53
stereo_front_right   yaw=   0.24  hfov=  62.56


In [31]:
import numpy as np
import math
from typing import List, Tuple, Dict, Any
from pathlib import Path
from PIL import Image

# ----------------------------
# 0) Helpers to be robust to different AV2 object APIs
# ----------------------------
def _get_R(T: Any) -> np.ndarray:
    """Return 3x3 rotation matrix from an SE3-like object."""
    if hasattr(T, "R"):
        return np.asarray(T.R)
    if hasattr(T, "rotation"):
        return np.asarray(T.rotation)
    if hasattr(T, "rot"):
        return np.asarray(T.rot)
    raise AttributeError("Cannot find rotation matrix on T (expected .R or .rotation or .rot)")

def _get_fx(intr: Any) -> float:
    """Return fx from a CameraIntrinsics-like object."""
    if hasattr(intr, "fx"):
        return float(intr.fx)
    if hasattr(intr, "K"):
        K = np.asarray(intr.K)
        return float(K[0, 0])
    if hasattr(intr, "intrinsic_matrix"):
        K = np.asarray(intr.intrinsic_matrix)
        return float(K[0, 0])
    raise AttributeError("Cannot find fx on intr (expected .fx or .K or .intrinsic_matrix)")

# ----------------------------
# 1) Angle utilities + overlap
# ----------------------------
def wrap_pi(a: float) -> float:
    return float((a + np.pi) % (2 * np.pi) - np.pi)

def fov_segments(center: float, hfov: float):
    a1 = wrap_pi(center - hfov / 2)
    a2 = wrap_pi(center + hfov / 2)
    if a1 <= a2:
        return [(a1, a2)]
    # wraps around -pi/pi
    return [(a1, np.pi), (-np.pi, a2)]

def seg_overlap(s1, s2) -> float:
    left = max(s1[0], s2[0])
    right = min(s1[1], s2[1])
    return max(0.0, right - left)

def circular_overlap(center1: float, hfov1: float, center2: float, hfov2: float) -> float:
    segs1 = fov_segments(center1, hfov1)
    segs2 = fov_segments(center2, hfov2)
    ov = 0.0
    for a in segs1:
        for b in segs2:
            ov += seg_overlap(a, b)
    # cap by the smaller HFOV
    return float(min(ov, min(hfov1, hfov2)))

def simple_overlap(center1: float, hfov1: float, center2: float, hfov2: float) -> float:
    """
    Simple symmetric overlap (no wrap handling).
    Uses angular separation on circle and overlap = max(0, (hfov1+hfov2)/2 - d) * 2? -> actually interval overlap:
    overlap = max(0, hfov1/2 + hfov2/2 - d)
    Then cap by min(hfov1, hfov2).
    """
    # circular shortest distance between angles
    d = abs(wrap_pi(center1 - center2))
    ov = max(0.0, (hfov1 / 2) + (hfov2 / 2) - d)
    return float(min(ov, min(hfov1, hfov2)))

# ----------------------------
# 2) Camera yaw center + HFOV
# ----------------------------
# IMPORTANT: set this if AV2 camera forward axis differs.
# Default assumes camera forward is +Z in camera frame.
FORWARD_AXIS_CAM = np.array([0.0, 0.0, 1.0], dtype=np.float64)

def camera_yaw_center_in_ego(T_ego_sensor: Any) -> float:
    R = _get_R(T_ego_sensor)
    forward_ego = R @ FORWARD_AXIS_CAM
    return float(np.arctan2(forward_ego[1], forward_ego[0]))

def hfov_from_intrinsics(intr: Any, img_w: int) -> float:
    fx = _get_fx(intr)
    return float(2.0 * np.arctan(img_w / (2.0 * fx)))

# ----------------------------
# 3) Build cam_info and compute overlap pairs
# ----------------------------
def build_cam_info(
    cameras: List[str],
    INTR: Dict[str, Any],
    T_EGO_SENSOR: Dict[str, Any],
    IMG_INDEX: Dict[str, Dict[int, Path]],
):
    # one image per camera to get W,H
    cam_sizes = {}
    for cam in cameras:
        ts_map = IMG_INDEX.get(cam, {})
        if not ts_map:
            continue
        any_path = next(iter(ts_map.values()))
        with Image.open(any_path) as im:
            cam_sizes[cam] = im.size  # (W,H)

    cam_info = {}
    for cam in cameras:
        if cam not in cam_sizes or cam not in INTR or cam not in T_EGO_SENSOR:
            continue
        W, H = cam_sizes[cam]
        yaw = camera_yaw_center_in_ego(T_EGO_SENSOR[cam])
        hfov = hfov_from_intrinsics(INTR[cam], W)
        cam_info[cam] = (yaw, hfov, (W, H))

    return cam_info

def compute_overlap_pairs(
    cam_info: Dict[str, Tuple[float, float, Tuple[int, int]]],
    min_overlap_deg: float = 5.0
) -> List[Tuple[str, str, float]]:
    min_overlap = math.radians(min_overlap_deg)

    cams = list(cam_info.keys())
    pairs = []
    for i in range(len(cams)):
        for j in range(i + 1, len(cams)):
            c1, c2 = cams[i], cams[j]
            yaw1, hfov1, _ = cam_info[c1]
            yaw2, hfov2, _ = cam_info[c2]

            ov = circular_overlap(yaw1, hfov1, yaw2, hfov2)

            # hard sanity: overlap must be <= 180deg
            if ov > np.pi + 1e-6:
                raise RuntimeError(f"Impossible overlap >180deg for {c1},{c2}: {ov} rad")

            if ov >= min_overlap:
                pairs.append((c1, c2, ov))

    pairs.sort(key=lambda x: x[2], reverse=True)
    return pairs

# ----------------------------
# 4) RUN
# ----------------------------
# Assumes these exist in your notebook already:
# CAMERAS, INTR, T_EGO_SENSOR, IMG_INDEX
cam_info = build_cam_info(CAMERAS, INTR, T_EGO_SENSOR, IMG_INDEX)
print(f"Built cam_info for {len(cam_info)} cameras: {list(cam_info.keys())}\n")

print("=== Camera yaw/hfov (deg) ===")
for cam, (yaw, hfov, (W, H)) in cam_info.items():
    print(f"{cam:17s} yaw={yaw*180/np.pi:8.2f}  hfov={hfov*180/np.pi:8.2f}  (W,H)=({W},{H})")

OVERLAP_PAIRS = compute_overlap_pairs(cam_info, min_overlap_deg=5.0)

# Sanity check on top few pairs
print("\n=== Pair sanity check (circular vs simple overlap) ===")
for (c1, c2, ov) in OVERLAP_PAIRS[:6]:
    yaw1, hfov1, _ = cam_info[c1]
    yaw2, hfov2, _ = cam_info[c2]
    dyaw = abs(wrap_pi(yaw1 - yaw2)) * 180 / np.pi
    ov_c = circular_overlap(yaw1, hfov1, yaw2, hfov2) * 180 / np.pi
    ov_s = simple_overlap(yaw1, hfov1, yaw2, hfov2) * 180 / np.pi
    print(f"\nPair: {c1} vs {c2}")
    print(f"  |Œîyaw|=    {dyaw:6.2f}¬∞")
    print(f"  overlap(circular)= {ov_c:8.3f}¬∞")
    print(f"  overlap(simple)  = {ov_s:8.3f}¬∞")
    print(f"  diff             = {ov_c-ov_s:8.5f}¬∞")

print("\nTop overlap pairs (deg) recomputed:")
for c1, c2, ov in OVERLAP_PAIRS[:15]:
    print(f"{c1:17s} {c2:19s} {ov*180/np.pi:8.3f}¬∞")


Built cam_info for 9 cameras: ['ring_front_center', 'ring_front_left', 'ring_front_right', 'ring_rear_left', 'ring_rear_right', 'ring_side_left', 'ring_side_right', 'stereo_front_left', 'stereo_front_right']

=== Camera yaw/hfov (deg) ===
ring_front_center yaw=   -0.11  hfov=   47.05  (W,H)=(1550,2048)
ring_front_left   yaw=   44.99  hfov=   62.52  (W,H)=(2048,1550)
ring_front_right  yaw=  -45.06  hfov=   62.52  (W,H)=(2048,1550)
ring_rear_left    yaw=  153.00  hfov=   62.54  (W,H)=(2048,1550)
ring_rear_right   yaw= -153.19  hfov=   62.57  (W,H)=(2048,1550)
ring_side_left    yaw=   99.10  hfov=   62.54  (W,H)=(2048,1550)
ring_side_right   yaw=  -99.11  hfov=   62.47  (W,H)=(2048,1550)
stereo_front_left yaw=   -0.19  hfov=   62.53  (W,H)=(2048,1550)
stereo_front_right yaw=    0.24  hfov=   62.56  (W,H)=(2048,1550)

=== Pair sanity check (circular vs simple overlap) ===

Pair: stereo_front_left vs stereo_front_right
  |Œîyaw|=      0.43¬∞
  overlap(circular)=   62.115¬∞
  overlap(simple)

# **Step 10 (Label schema for YOLO):**
In the annotations, object categories are stored as text labels (e.g., PEDESTRIAN, BUS). Since YOLO requires numeric class IDs in the label files, I first identified the correct category column (category), then extracted the unique categories present in this scene, sorted them, and created a deterministic mapping from each category name to an integer ID. In this run there are 8 classes, mapped as: BOLLARD‚Üí0, BOX_TRUCK‚Üí1, BUS‚Üí2, CONSTRUCTION_CONE‚Üí3, LARGE_VEHICLE‚Üí4, PEDESTRIAN‚Üí5, REGULAR_VEHICLE‚Üí6, TRUCK‚Üí7. These IDs are what appear as the first number in each YOLO label line.

In [11]:
# Inspect category column name first
print("Potential category columns:", [c for c in ann_df.columns if "category" in c.lower() or "label" in c.lower()])

CATEGORY_COL = None
for cand in ["category", "category_name", "label", "class_name"]:
    if cand in ann_df.columns:
        CATEGORY_COL = cand
        break
if CATEGORY_COL is None:
    raise ValueError("Could not find a category column in ann_df. Update CATEGORY_COL manually.")

# Build mapping from observed categories
cats = sorted(ann_df[CATEGORY_COL].dropna().unique().tolist())
CLASS_MAP = {c:i for i,c in enumerate(cats)}
NAMES = cats

print("Num classes:", len(NAMES))
print("Example class map (first 10):", list(CLASS_MAP.items())[:10])


Potential category columns: ['category']
Num classes: 8
Example class map (first 10): [('BOLLARD', 0), ('BOX_TRUCK', 1), ('BUS', 2), ('CONSTRUCTION_CONE', 3), ('LARGE_VEHICLE', 4), ('PEDESTRIAN', 5), ('REGULAR_VEHICLE', 6), ('TRUCK', 7)]


#step 11-
This cell converts each projected 2D bounding box into the exact YOLO label format. Given a box in pixel coordinates `(xmin, ymin, xmax, ymax)`, it computes the box **center** `(cx, cy)` and **size** `(w, h)`, then **normalizes** all four by the image width/height so they are in [0, 1]. Finally, it clamps values to [0, 1] for safety and returns one label line as:
`<class_id> <cx> <cy> <w> <h>`
with 6 decimal precision. This is what gets written into each `.txt` label file next to the image.

In [12]:
def yolo_line_from_box(box: Box2D, cls_id: int, img_w: int, img_h: int) -> str:
    # YOLO format: class cx cy w h (normalized)
    cx = ((box.xmin + box.xmax) / 2.0) / img_w
    cy = ((box.ymin + box.ymax) / 2.0) / img_h
    w  = (box.xmax - box.xmin) / img_w
    h  = (box.ymax - box.ymin) / img_h
    cx, cy = float(cx), float(cy)
    w, h = float(w), float(h)
    # clamp lightly
    cx = min(max(cx, 0.0), 1.0)
    cy = min(max(cy, 0.0), 1.0)
    w  = min(max(w, 0.0), 1.0)
    h  = min(max(h, 0.0), 1.0)
    return f"{cls_id} {cx:.6f} {cy:.6f} {w:.6f} {h:.6f}"


# Step 12
exports AV2 into a YOLO dataset: it projects 3D cuboids to 2D boxes for each camera image, splits train/val by timestamp, then prunes redundant duplicate views across overlapping cameras using a BCS threshold `tau_bcs` (keep the higher-quality view).
 It removes redundancy by comparing overlapping camera pairs and, for any object that appears in both views, we keep only the higher-quality projection (higher BCS) when the BCS gap exceeds a threshold `tau_bcs`; otherwise we keep both.

It saves `images/`, `labels/`, and a `data.yaml` ready for YOLOv8 training.







In [28]:
def build_yolo_from_av2_scene(
    scene_path: Path,
    ann_df: pd.DataFrame,
    INTR: Dict[str, CameraIntrinsics],
    T_EGO_SENSOR: Dict[str, SE3],
    T_SENSOR_EGO: Dict[str, SE3],
    IMG_INDEX: Dict[str, Dict[int, Path]],
    overlap_pairs: List[Tuple[str,str,float]],
    out_root: Path,
    tau_bcs: float,
    train_ratio: float = 0.8,
    seed: int = 7,
    max_frames: Optional[int] = None,
    drop_empty_images: bool = False,
    max_ts_diff_ns: int = 50_000_000
):
    """
    Creates a YOLO dataset with projected 2D boxes.
    If tau_bcs is small -> more pruning (less redundancy).
    If tau_bcs is large -> less pruning (more redundancy).
    """
    rng = np.random.RandomState(seed)

    cameras = [c for c in INTR.keys() if c in IMG_INDEX and len(IMG_INDEX[c]) > 0]
    if not cameras:
        raise ValueError("No cameras with images found.")

    # Precompute one image size per camera
    cam_size = {}
    for cam in cameras:
        any_path = next(iter(IMG_INDEX[cam].values()))
        with Image.open(any_path) as im:
            cam_size[cam] = im.size  # (W,H)

    # Group annotations by timestamp
    ts_groups = list(ann_df.groupby("timestamp_ns"))
    timestamps = [int(ts) for ts,_ in ts_groups]
    if max_frames is not None:
        timestamps = timestamps[:max_frames]

    # train/val split by timestamp
    timestamps = np.array(timestamps, dtype=np.int64)
    rng.shuffle(timestamps)
    n_train = int(len(timestamps) * train_ratio)
    train_ts = set(map(int, timestamps[:n_train]))
    val_ts   = set(map(int, timestamps[n_train:]))

    # Prepare dirs
    def reset_dir(p: Path):
        if p.exists():
            shutil.rmtree(p)
        p.mkdir(parents=True, exist_ok=True)

    dataset_dir = out_root
    reset_dir(dataset_dir)

    for split in ["train", "val"]:
        (dataset_dir / "images" / split).mkdir(parents=True, exist_ok=True)
        (dataset_dir / "labels" / split).mkdir(parents=True, exist_ok=True)

    # Fast lookup: timestamps available per camera
    cam_ts_sorted = {cam: sorted(IMG_INDEX[cam].keys()) for cam in cameras}

    # Helper: assign split
    def which_split(ts: int) -> Optional[str]:
        if ts in train_ts: return "train"
        if ts in val_ts: return "val"
        return None

    # Iterate timestamps
    kept_images = 0
    dropped_images = 0

    for ts in tqdm(sorted(list(train_ts | val_ts)), desc=f"Exporting (tau_bcs={tau_bcs})"):
        split = which_split(ts)
        if split is None:
            continue

        # Get annotations at ts
        ann_rows = ann_df[ann_df["timestamp_ns"] == ts]
        if ann_rows.empty:
            continue

        # 1) Build per-camera label candidates (before pruning)
        per_cam_boxes = {cam: {} for cam in cameras}
        # structure: per_cam_boxes[cam][track_uuid] = (Box2D, class_id)

        for _, row in ann_rows.iterrows():
            track = row["track_uuid"]
            cat = row[CATEGORY_COL]
            if pd.isna(cat):
                continue
            cls_id = CLASS_MAP[str(cat)]

            corners = cuboid_corners_ego(row)

            for cam in cameras:
                # match nearest image timestamp for that camera
                ts_img = nearest_timestamp(ts, cam_ts_sorted[cam], max_diff_ns=max_ts_diff_ns)
                if ts_img is None:
                    continue
                img_path = IMG_INDEX[cam][ts_img]
                W,H = cam_size[cam]

                box = bbox_and_bcs_from_cuboid(
                    corners_ego=corners,
                    cam=cam,
                    intr=INTR[cam],
                    T_sensor_ego=T_SENSOR_EGO[cam],  # ego -> camera
                    img_w=W,
                    img_h=H
                )
                if box is None:
                    continue
                per_cam_boxes[cam][track] = (box, cls_id, ts_img, img_path, W, H)

        # 2) Apply redundancy pruning across overlap pairs using BCS rule (Eq.2)
        # For each overlap pair, if same track appears in both cams, compare BCS.
        to_drop = set()  # entries identified by (cam, track, ts_img)

        for camA, camB, _ in overlap_pairs:
            if camA not in per_cam_boxes or camB not in per_cam_boxes:
                continue
            common_tracks = set(per_cam_boxes[camA].keys()) & set(per_cam_boxes[camB].keys())
            for track in common_tracks:
                boxA, clsA, tsA, _, _, _ = per_cam_boxes[camA][track]
                boxB, clsB, tsB, _, _, _ = per_cam_boxes[camB][track]

                # (sanity) if categories differ, skip pruning for this object
                if clsA != clsB:
                    continue

                bcs_vals = [boxA.bcs, boxB.bcs]
                if (max(bcs_vals) - min(bcs_vals)) > tau_bcs:
                    # keep higher BCS only, drop lower :contentReference[oaicite:9]{index=9}
                    if boxA.bcs >= boxB.bcs:
                        to_drop.add((camB, track, tsB))
                    else:
                        to_drop.add((camA, track, tsA))

        # 3) Write images + labels (post-pruning)
        for cam in cameras:
            # Group by ts_img because nearest_timestamp may create slightly different image timestamps across cams
            # We'll write one file per (cam, ts_img).
            # Collect all tracks for this cam at this base annotation timestamp.
            entries = list(per_cam_boxes[cam].items())
            if not entries:
                continue

            # We might have multiple tracks but same ts_img (likely)
            # Use the first entry‚Äôs ts_img/img_path
            _, (_, _, ts_img, img_path, W, H) = entries[0]
            out_img_name = f"{cam}_{ts_img}.jpg"
            out_lbl_name = f"{cam}_{ts_img}.txt"

            # Build label lines
            lines = []
            for track, (box, cls_id, ts_img2, img_path2, W2, H2) in per_cam_boxes[cam].items():
                key = (cam, track, ts_img2)
                if key in to_drop:
                    continue
                lines.append(yolo_line_from_box(box, cls_id, W2, H2))

            # Optionally drop empty images
            if drop_empty_images and len(lines) == 0:
                dropped_images += 1
                continue

            # Copy image
            dst_img = dataset_dir / "images" / split / out_img_name
            shutil.copy(img_path, dst_img)

            # Write label file (empty allowed)
            dst_lbl = dataset_dir / "labels" / split / out_lbl_name
            with open(dst_lbl, "w") as f:
                f.write("\n".join(lines))

            kept_images += 1

    # 4) Write data.yaml
    data_yaml = dataset_dir / "data.yaml"
    yaml_text = (
        f"path: {dataset_dir}\n"
        f"train: images/train\n"
        f"val: images/val\n"
        f"nc: {len(NAMES)}\n"
        f"names: {json.dumps(NAMES)}\n"
    )
    with open(data_yaml, "w") as f:
        f.write(yaml_text)

    print(f"Done. kept_images={kept_images}, dropped_images={dropped_images}")
    print("data.yaml:", data_yaml)
    return dataset_dir


### Step 13

In this cell, I generate **two YOLO datasets** from the same AV2 scene so we can compare **no-pruning vs. redundancy-pruning** under identical conditions. Both use the same camera calibrations, timestamp-based train/val split (80/20), and the same overlap camera-pairs list. The only change is the pruning threshold `tau_bcs`:

* **Baseline (`tau_bcs = 1.0`)**: pruning is very mild, so it behaves close to ‚Äòkeep everything‚Äô.
* **Pruned (`tau_bcs = 0.2`)**: pruning is stricter; when the same object appears in two overlapping cameras and their BCS differs enough, we drop the lower-quality duplicate view.

The output confirms both datasets were exported successfully (157 timestamps processed) and it prints where the `data.yaml` files are saved for YOLO training.‚Äù

### What the output is telling

* `kept_images=1413` for both runs means **the number of exported image files ended up the same** in baseline and pruned.
* `dropped_images=0` means **no entire images were removed** (because `drop_empty_images=False`, and even if some images had zero labels they are still kept).

So: pruning may still have happened at the **box level** (fewer labels per image), but not at the **image-file level**‚Äîthat‚Äôs why these counters didn‚Äôt change.


In [14]:
OUT_ROOT = Path("/content/drive/MyDrive/av2_redundancy_yolo")

baseline_dir = build_yolo_from_av2_scene(
    scene_path=SCENE_PATH,
    ann_df=ann_df,
    INTR=INTR,
    T_EGO_SENSOR=T_EGO_SENSOR,
    T_SENSOR_EGO=T_SENSOR_EGO,
    IMG_INDEX=IMG_INDEX,
    overlap_pairs=OVERLAP_PAIRS,
    out_root=OUT_ROOT / "baseline_tau1.0",
    tau_bcs=1.0,
    train_ratio=0.8,
    max_frames=None,
    drop_empty_images=False
)

pruned_dir = build_yolo_from_av2_scene(
    scene_path=SCENE_PATH,
    ann_df=ann_df,
    INTR=INTR,
    T_EGO_SENSOR=T_EGO_SENSOR,
    T_SENSOR_EGO=T_SENSOR_EGO,
    IMG_INDEX=IMG_INDEX,
    overlap_pairs=OVERLAP_PAIRS,
    out_root=OUT_ROOT / "pruned_tau0.2",
    tau_bcs=0.2,
    train_ratio=0.8,
    max_frames=None,
    drop_empty_images=False
)


Exporting (tau_bcs=1.0): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 157/157 [00:29<00:00,  5.34it/s]


Done. kept_images=1413, dropped_images=0
data.yaml: /content/drive/MyDrive/av2_redundancy_yolo/baseline_tau1.0/data.yaml


Exporting (tau_bcs=0.2): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 157/157 [00:30<00:00,  5.21it/s]

Done. kept_images=1413, dropped_images=0
data.yaml: /content/drive/MyDrive/av2_redundancy_yolo/pruned_tau0.2/data.yaml





## Step 14: Train + Evaluate YOLO

### Goal of Step 14

In this step, we **train YOLOv8** on each exported dataset (baseline vs pruned) and then **evaluate performance** to see whether redundancy pruning improves detection quality.

### What the Step 14 function does (`train_and_eval_yolo`)

For each dataset directory (`baseline_dir` or `pruned_dir`), the function runs the same pipeline:

1. **Initialize the same model**

* `model = YOLO("yolov8n.pt")`
  So both experiments start from the same pretrained YOLOv8n weights.

2. **Train the model**

* `model.train(...)` is called with:

  * `data = data.yaml` (YOLO format dataset created in Step 13)
  * `epochs=30`, `imgsz=640`, `batch=16`
  * `device=0` (GPU)
  * `cache=True`, `workers=4`
  * Outputs saved under: `data_dir / "runs" / run_name`

This ensures the only real difference between the two trainings is the **dataset content** (baseline vs pruned labels).

3. **Validate the model**
   After training finishes, the function runs:

* `metrics = model.val(data=data.yaml, device=0)`

From YOLO validation it extracts:

* **Precision** = `metrics.box.mp`
* **Recall** = `metrics.box.mr`
* **mAP50** = `metrics.box.map50`
* **mAP50-95** = `metrics.box.map`

Then it computes:

* **F1-score** = `2PR/(P+R)`

4. **Pull final training box loss**
   The function reads:

* `results.csv` from the training run directory
  and extracts the last available value of `train/box_loss` (if present) as a sanity check on training stability.

5. **Return a compact metrics dictionary**
   So we can store results for baseline and pruned in a clean table.

---

## What Step 14 produced

### Baseline training + eval (tau=1.0)

* Precision: **0.942**
* Recall: **0.753**
* F1: **0.837**
* mAP50: **0.862**
* mAP50-95: **0.722**
* Box loss: **0.578**

### Pruned training + eval (tau=0.2)

* Precision: **0.948**
* Recall: **0.829**
* F1: **0.885**
* mAP50: **0.888**
* mAP50-95: **0.743**
* Box loss: **0.579**

### Interpretation

* Step 14 shows that **training on pruned labels improved performance** overall.
* The **largest gain is Recall** (0.753 ‚Üí 0.829), meaning the pruned-label model is detecting more true objects.
* **mAP50 and mAP50-95 improved**, indicating better overall detection quality across IoU thresholds.
* **Box loss stayed almost the same**, so the training behavior remains stable; performance gains are not coming from an unstable training run.



## **Threshold = 0.2**

In [15]:
def train_and_eval_yolo(data_dir: Path, run_name: str, epochs: int = 30, imgsz: int = 640, batch: int = 16):
    model = YOLO("yolov8n.pt")

    train_res = model.train(
        data=str(data_dir / "data.yaml"),
        epochs=epochs,
        imgsz=imgsz,
        batch=batch,
        device=0,          # force GPU
        cache=True,        # recommended
        workers=4,         # recommended
        name=run_name,
        project=str(data_dir / "runs"),
        verbose=True
    )

    metrics = model.val(
        data=str(data_dir / "data.yaml"),
        device=0           # force GPU for eval too
    )

    precision = float(metrics.box.mp)
    recall    = float(metrics.box.mr)
    map50     = float(metrics.box.map50)
    map5095   = float(metrics.box.map)
    f1        = (2 * precision * recall / (precision + recall + 1e-12))

    box_loss = None
    results_csv = Path(train_res.save_dir) / "results.csv"
    if results_csv.exists():
        df = pd.read_csv(results_csv)
        box_cols = [c for c in df.columns if "train/box_loss" in c.lower()]
        if box_cols:
            box_loss = float(df[box_cols[0]].iloc[-1])

    return {
        "Precision": precision,
        "Recall": recall,
        "F1-Score": f1,
        "mAP50": map50,
        "mAP50-95": map5095,
        "Box Loss": box_loss
    }


In [16]:
rows = []

base_metrics = train_and_eval_yolo(baseline_dir, run_name="baseline_tau1.0", epochs=30, imgsz=640, batch=16)
rows.append({
    "Model Strategy": "Baseline (no pruning)",
    "Dataset (Images)": "AV2",
    **base_metrics
})

pruned_metrics = train_and_eval_yolo(pruned_dir, run_name="pruned_tau0.2", epochs=30, imgsz=640, batch=16)
rows.append({
    "Model Strategy": "Pruned (tau_BCS=0.2)",
    "Dataset (Images)": "AV2",
    **pruned_metrics
})

report_df = pd.DataFrame(rows, columns=[
    "Model Strategy", "Dataset (Images)", "Precision", "Recall", "F1-Score", "mAP50", "mAP50-95", "Box Loss"
])

report_df


Ultralytics 8.3.241 üöÄ Python-3.12.12 torch-2.9.0+cu126 CUDA:0 (NVIDIA A100-SXM4-80GB, 81222MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=True, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/av2_redundancy_yolo/baseline_tau1.0/data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=30, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=baseline_tau1.0, nbs=64, nms=False, opset=None, optimize=False, optimizer

Unnamed: 0,Model Strategy,Dataset (Images),Precision,Recall,F1-Score,mAP50,mAP50-95,Box Loss
0,Baseline (no pruning),AV2,0.941997,0.752534,0.836674,0.862451,0.721629,0.57838
1,Pruned (tau_BCS=0.2),AV2,0.947917,0.8291,0.884536,0.888146,0.742988,0.57911


**Try other threshold pruning**

# **Threshold = 0.1**




In [17]:
OUT_ROOT = Path("/content/drive/MyDrive/av2_redundancy_yolo")

baseline_dir = build_yolo_from_av2_scene(
    scene_path=SCENE_PATH,
    ann_df=ann_df,
    INTR=INTR,
    T_EGO_SENSOR=T_EGO_SENSOR,
    T_SENSOR_EGO=T_SENSOR_EGO,
    IMG_INDEX=IMG_INDEX,
    overlap_pairs=OVERLAP_PAIRS,
    out_root=OUT_ROOT / "baseline_tau1.0",
    tau_bcs=1.0,
    train_ratio=0.8,
    max_frames=None,
    drop_empty_images=False
)

pruned_dir = build_yolo_from_av2_scene(
    scene_path=SCENE_PATH,
    ann_df=ann_df,
    INTR=INTR,
    T_EGO_SENSOR=T_EGO_SENSOR,
    T_SENSOR_EGO=T_SENSOR_EGO,
    IMG_INDEX=IMG_INDEX,
    overlap_pairs=OVERLAP_PAIRS,
    out_root=OUT_ROOT / "pruned_tau0.1",
    tau_bcs=0.1,
    train_ratio=0.8,
    max_frames=None,
    drop_empty_images=False
)


Exporting (tau_bcs=1.0): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 157/157 [00:28<00:00,  5.52it/s]


Done. kept_images=1413, dropped_images=0
data.yaml: /content/drive/MyDrive/av2_redundancy_yolo/baseline_tau1.0/data.yaml


Exporting (tau_bcs=0.1): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 157/157 [00:29<00:00,  5.33it/s]

Done. kept_images=1413, dropped_images=0
data.yaml: /content/drive/MyDrive/av2_redundancy_yolo/pruned_tau0.1/data.yaml





In [18]:
def train_and_eval_yolo(data_dir: Path, run_name: str, epochs: int = 30, imgsz: int = 640, batch: int = 16):
    model = YOLO("yolov8n.pt")

    train_res = model.train(
        data=str(data_dir / "data.yaml"),
        epochs=epochs,
        imgsz=imgsz,
        batch=batch,
        device=0,          # force GPU
        cache=True,        # recommended
        workers=4,         # recommended
        name=run_name,
        project=str(data_dir / "runs"),
        verbose=True
    )

    metrics = model.val(
        data=str(data_dir / "data.yaml"),
        device=0           # force GPU for eval too
    )

    precision = float(metrics.box.mp)
    recall    = float(metrics.box.mr)
    map50     = float(metrics.box.map50)
    map5095   = float(metrics.box.map)
    f1        = (2 * precision * recall / (precision + recall + 1e-12))

    box_loss = None
    results_csv = Path(train_res.save_dir) / "results.csv"
    if results_csv.exists():
        df = pd.read_csv(results_csv)
        box_cols = [c for c in df.columns if "train/box_loss" in c.lower()]
        if box_cols:
            box_loss = float(df[box_cols[0]].iloc[-1])

    return {
        "Precision": precision,
        "Recall": recall,
        "F1-Score": f1,
        "mAP50": map50,
        "mAP50-95": map5095,
        "Box Loss": box_loss
    }


In [19]:
rows = []

base_metrics = train_and_eval_yolo(baseline_dir, run_name="baseline_tau1.0", epochs=30, imgsz=640, batch=16)
rows.append({
    "Model Strategy": "Baseline (no pruning)",
    "Dataset (Images)": "AV2",
    **base_metrics
})

pruned_metrics = train_and_eval_yolo(pruned_dir, run_name="pruned_tau0.1", epochs=30, imgsz=640, batch=16)
rows.append({
    "Model Strategy": "Pruned (tau_BCS=0.1)",
    "Dataset (Images)": "AV2",
    **pruned_metrics
})

report_df = pd.DataFrame(rows, columns=[
    "Model Strategy", "Dataset (Images)", "Precision", "Recall", "F1-Score", "mAP50", "mAP50-95", "Box Loss"
])

report_df


Ultralytics 8.3.241 üöÄ Python-3.12.12 torch-2.9.0+cu126 CUDA:0 (NVIDIA A100-SXM4-80GB, 81222MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=True, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/av2_redundancy_yolo/baseline_tau1.0/data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=30, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=baseline_tau1.0, nbs=64, nms=False, opset=None, optimize=False, optimizer

Unnamed: 0,Model Strategy,Dataset (Images),Precision,Recall,F1-Score,mAP50,mAP50-95,Box Loss
0,Baseline (no pruning),AV2,0.941997,0.752534,0.836674,0.862451,0.721629,0.57838
1,Pruned (tau_BCS=0.1),AV2,0.951829,0.810456,0.875472,0.88329,0.755696,0.58597


#**Threshold 0.3**

In [20]:
OUT_ROOT = Path("/content/drive/MyDrive/av2_redundancy_yolo")

baseline_dir = build_yolo_from_av2_scene(
    scene_path=SCENE_PATH,
    ann_df=ann_df,
    INTR=INTR,
    T_EGO_SENSOR=T_EGO_SENSOR,
    T_SENSOR_EGO=T_SENSOR_EGO,
    IMG_INDEX=IMG_INDEX,
    overlap_pairs=OVERLAP_PAIRS,
    out_root=OUT_ROOT / "baseline_tau1.0",
    tau_bcs=1.0,
    train_ratio=0.8,
    max_frames=None,
    drop_empty_images=False
)

pruned_dir = build_yolo_from_av2_scene(
    scene_path=SCENE_PATH,
    ann_df=ann_df,
    INTR=INTR,
    T_EGO_SENSOR=T_EGO_SENSOR,
    T_SENSOR_EGO=T_SENSOR_EGO,
    IMG_INDEX=IMG_INDEX,
    overlap_pairs=OVERLAP_PAIRS,
    out_root=OUT_ROOT / "pruned_tau0.3",
    tau_bcs=0.3,
    train_ratio=0.8,
    max_frames=None,
    drop_empty_images=False
)


Exporting (tau_bcs=1.0): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 157/157 [00:28<00:00,  5.49it/s]


Done. kept_images=1413, dropped_images=0
data.yaml: /content/drive/MyDrive/av2_redundancy_yolo/baseline_tau1.0/data.yaml


Exporting (tau_bcs=0.3): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 157/157 [00:29<00:00,  5.32it/s]

Done. kept_images=1413, dropped_images=0
data.yaml: /content/drive/MyDrive/av2_redundancy_yolo/pruned_tau0.3/data.yaml





In [21]:
def train_and_eval_yolo(data_dir: Path, run_name: str, epochs: int = 30, imgsz: int = 640, batch: int = 16):
    model = YOLO("yolov8n.pt")

    train_res = model.train(
        data=str(data_dir / "data.yaml"),
        epochs=epochs,
        imgsz=imgsz,
        batch=batch,
        device=0,          # force GPU
        cache=True,        # recommended
        workers=4,         # recommended
        name=run_name,
        project=str(data_dir / "runs"),
        verbose=True
    )

    metrics = model.val(
        data=str(data_dir / "data.yaml"),
        device=0           # force GPU for eval too
    )

    precision = float(metrics.box.mp)
    recall    = float(metrics.box.mr)
    map50     = float(metrics.box.map50)
    map5095   = float(metrics.box.map)
    f1        = (2 * precision * recall / (precision + recall + 1e-12))

    box_loss = None
    results_csv = Path(train_res.save_dir) / "results.csv"
    if results_csv.exists():
        df = pd.read_csv(results_csv)
        box_cols = [c for c in df.columns if "train/box_loss" in c.lower()]
        if box_cols:
            box_loss = float(df[box_cols[0]].iloc[-1])

    return {
        "Precision": precision,
        "Recall": recall,
        "F1-Score": f1,
        "mAP50": map50,
        "mAP50-95": map5095,
        "Box Loss": box_loss
    }


In [22]:
rows = []

base_metrics = train_and_eval_yolo(baseline_dir, run_name="baseline_tau1.0", epochs=30, imgsz=640, batch=16)
rows.append({
    "Model Strategy": "Baseline (no pruning)",
    "Dataset (Images)": "AV2",
    **base_metrics
})

pruned_metrics = train_and_eval_yolo(pruned_dir, run_name="pruned_tau0.3", epochs=30, imgsz=640, batch=16)
rows.append({
    "Model Strategy": "Pruned (tau_BCS=0.3)",
    "Dataset (Images)": "AV2",
    **pruned_metrics
})

report_df = pd.DataFrame(rows, columns=[
    "Model Strategy", "Dataset (Images)", "Precision", "Recall", "F1-Score", "mAP50", "mAP50-95", "Box Loss"
])

report_df


Ultralytics 8.3.241 üöÄ Python-3.12.12 torch-2.9.0+cu126 CUDA:0 (NVIDIA A100-SXM4-80GB, 81222MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=True, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/av2_redundancy_yolo/baseline_tau1.0/data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=30, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=baseline_tau1.0, nbs=64, nms=False, opset=None, optimize=False, optimizer

Unnamed: 0,Model Strategy,Dataset (Images),Precision,Recall,F1-Score,mAP50,mAP50-95,Box Loss
0,Baseline (no pruning),AV2,0.941997,0.752534,0.836674,0.862451,0.721629,0.57838
1,Pruned (tau_BCS=0.3),AV2,0.94409,0.817721,0.876373,0.881457,0.742123,0.58487


# **threhsold = 0.5**

In [23]:
OUT_ROOT = Path("/content/drive/MyDrive/av2_redundancy_yolo")

baseline_dir = build_yolo_from_av2_scene(
    scene_path=SCENE_PATH,
    ann_df=ann_df,
    INTR=INTR,
    T_EGO_SENSOR=T_EGO_SENSOR,
    T_SENSOR_EGO=T_SENSOR_EGO,
    IMG_INDEX=IMG_INDEX,
    overlap_pairs=OVERLAP_PAIRS,
    out_root=OUT_ROOT / "baseline_tau1.0",
    tau_bcs=1.0,
    train_ratio=0.8,
    max_frames=None,
    drop_empty_images=False
)

pruned_dir = build_yolo_from_av2_scene(
    scene_path=SCENE_PATH,
    ann_df=ann_df,
    INTR=INTR,
    T_EGO_SENSOR=T_EGO_SENSOR,
    T_SENSOR_EGO=T_SENSOR_EGO,
    IMG_INDEX=IMG_INDEX,
    overlap_pairs=OVERLAP_PAIRS,
    out_root=OUT_ROOT / "pruned_tau0.5",
    tau_bcs=0.5,
    train_ratio=0.8,
    max_frames=None,
    drop_empty_images=False
)


Exporting (tau_bcs=1.0): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 157/157 [00:28<00:00,  5.48it/s]


Done. kept_images=1413, dropped_images=0
data.yaml: /content/drive/MyDrive/av2_redundancy_yolo/baseline_tau1.0/data.yaml


Exporting (tau_bcs=0.5): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 157/157 [00:29<00:00,  5.31it/s]

Done. kept_images=1413, dropped_images=0
data.yaml: /content/drive/MyDrive/av2_redundancy_yolo/pruned_tau0.5/data.yaml





In [24]:
def train_and_eval_yolo(data_dir: Path, run_name: str, epochs: int = 30, imgsz: int = 640, batch: int = 16):
    model = YOLO("yolov8n.pt")

    train_res = model.train(
        data=str(data_dir / "data.yaml"),
        epochs=epochs,
        imgsz=imgsz,
        batch=batch,
        device=0,          # force GPU
        cache=True,        # recommended
        workers=4,         # recommended
        name=run_name,
        project=str(data_dir / "runs"),
        verbose=True
    )

    metrics = model.val(
        data=str(data_dir / "data.yaml"),
        device=0           # force GPU for eval too
    )

    precision = float(metrics.box.mp)
    recall    = float(metrics.box.mr)
    map50     = float(metrics.box.map50)
    map5095   = float(metrics.box.map)
    f1        = (2 * precision * recall / (precision + recall + 1e-12))

    box_loss = None
    results_csv = Path(train_res.save_dir) / "results.csv"
    if results_csv.exists():
        df = pd.read_csv(results_csv)
        box_cols = [c for c in df.columns if "train/box_loss" in c.lower()]
        if box_cols:
            box_loss = float(df[box_cols[0]].iloc[-1])

    return {
        "Precision": precision,
        "Recall": recall,
        "F1-Score": f1,
        "mAP50": map50,
        "mAP50-95": map5095,
        "Box Loss": box_loss
    }


In [25]:
rows = []

base_metrics = train_and_eval_yolo(baseline_dir, run_name="baseline_tau1.0", epochs=30, imgsz=640, batch=16)
rows.append({
    "Model Strategy": "Baseline (no pruning)",
    "Dataset (Images)": "AV2",
    **base_metrics
})

pruned_metrics = train_and_eval_yolo(pruned_dir, run_name="pruned_tau0.5", epochs=30, imgsz=640, batch=16)
rows.append({
    "Model Strategy": "Pruned (tau_BCS=0.5)",
    "Dataset (Images)": "AV2",
    **pruned_metrics
})

report_df = pd.DataFrame(rows, columns=[
    "Model Strategy", "Dataset (Images)", "Precision", "Recall", "F1-Score", "mAP50", "mAP50-95", "Box Loss"
])

report_df


Ultralytics 8.3.241 üöÄ Python-3.12.12 torch-2.9.0+cu126 CUDA:0 (NVIDIA A100-SXM4-80GB, 81222MiB)
[34m[1mengine/trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=True, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=/content/drive/MyDrive/av2_redundancy_yolo/baseline_tau1.0/data.yaml, degrees=0.0, deterministic=True, device=0, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=30, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=baseline_tau1.0, nbs=64, nms=False, opset=None, optimize=False, optimizer

Unnamed: 0,Model Strategy,Dataset (Images),Precision,Recall,F1-Score,mAP50,mAP50-95,Box Loss
0,Baseline (no pruning),AV2,0.941997,0.752534,0.836674,0.862451,0.721629,0.57838
1,Pruned (tau_BCS=0.5),AV2,0.873335,0.85232,0.8627,0.870235,0.730433,0.58421
