# nuScenes-mini exploration

Quick notebook to inspect nuScenes-mini annotations and think through a reasonable baseline + GNN extension. Run inside the repo root (or adjust `dataroot`).

## Setup
1. Download nuScenes **mini** from https://www.nuscenes.org/download.
2. Unzip to `data/nuscenes` so you have folders like `samples/`, `sweeps/`, `v1.0-mini/`.
3. Install devkit dependencies (one time):
   ```bash
   pip install nuscenes-devkit matplotlib seaborn tqdm
   ```

In [None]:
from pathlib import Path
from collections import Counter, defaultdict

import matplotlib.pyplot as plt
import seaborn as sns
from nuscenes.nuscenes import NuScenes
from nuscenes.utils.data_classes import Box

# Point to your extracted nuScenes data
dataroot = Path("../data/nuscenes")
version = "v1.0-mini"  # use mini for fast iteration

nusc = NuScenes(version=version, dataroot=str(dataroot), verbose=True)

## Scene / sample overview

In [None]:
print(f"Scenes: {len(nusc.scene)}")
print(f"Samples: {len(nusc.sample)}")
print(f"Annotations: {len(nusc.sample_annotation)}")

# Peek at the first scene and its samples
scene = nusc.scene[0]
print("\nFirst scene:")
print({k: scene[k] for k in ["token", "name", "description", "nbr_samples"]})

# Walk the sample chain for this scene
sample_tokens = []
current_token = scene["first_sample_token"]
while current_token:
    sample_tokens.append(current_token)
    sample = nusc.get("sample", current_token)
    current_token = sample.get("next")

print("Sample tokens in first scene:")
print(sample_tokens)

## Class distribution

nuScenes defines 23 categories. Mini is tiny; this is just to check coverage.

In [None]:
cat_counts = Counter()
attr_counts = Counter()
for ann in nusc.sample_annotation:
    cat = nusc.get("category", ann["category_token"])['name']
    cat_counts[cat] += 1
    for attr_token in ann.get("attribute_tokens", []):
        if attr_token:
            attr_counts[nusc.get("attribute", attr_token)['name']] += 1

print("Top categories:")
for name, count in cat_counts.most_common(15):
    print(f"{name:30s} {count}")

sns.barplot(x=list(cat_counts.values()), y=list(cat_counts.keys()))
plt.title("nuScenes-mini category counts")
plt.tight_layout()
plt.show()

## Sample-level look

Grabs a random sample, lists the lidar/camera sweeps, and shows the 3D boxes as text (for quick inspection without visualization).

In [None]:
import random

sample = nusc.get("sample", random.choice([s['token'] for s in nusc.sample]))
print({k: sample[k] for k in ["token", "scene_token", "timestamp"]})

# Sensor sample_data entries (cams + lidar)
for sd_token in sample["data"].values():
    sd = nusc.get("sample_data", sd_token)
    print(f"{sd['channel']:15s} -> filename={sd['filename']}")

# Show annotation boxes (center xyz, size lwh, yaw)
print("\nBoxes (center, size, yaw):")
for ann_token in sample["anns"][:10]:  # limit output
    ann = nusc.get("sample_annotation", ann_token)
    cat_name = nusc.get("category", ann["category_token"])['name']
    yaw = Box.quaternion_yaw(Box(ann["translation"], ann["size"], ann["rotation"]).orientation)
    print(f"{cat_name:25s} center={ann['translation']} size={ann['size']} yaw={yaw:.2f}")

## Next steps toward a simple baseline + GNN hook

- Start with a lidar-only detector (PointPillars or CenterPoint). Use mini for sanity checks; train on full when ready.
- Graph idea A (spatial): build a graph over predicted boxes within a sweep; edges connect nearby boxes; apply message passing to refine classification/scores.
- Graph idea B (temporal): build graph across consecutive sweeps with ego-motion compensation; edges link boxes that align in space; message passing fuses temporal context.
- Labels and metrics: use official nuScenes metrics (mAP, NDS). Mini cannot benchmark, but it exercises the pipeline quickly.
- Data loader: devkit + dataloader from existing 3D detectors; avoid reimplementing ground-truth parsing.