# Step 1 · nuScenes Data Characterization

This notebook performs initial data setup and validation for the Trajectron++ federated learning research project. It loads the nuScenes dataset via TrajData and verifies that trajectory data and map resources are accessible.

**Prerequisites:**
- Python 3.10+ environment with dependencies from `requirements.txt` installed
- nuScenes v1.0-mini dataset extracted to `data/raw/`
- nuScenes map expansion pack (v1.3) extracted to `data/raw/maps/expansion/`
- Kernel set to `.venv (Python 3.10.16)` or equivalent project environment

**Expected Outcomes:**
- Successful TrajData cache initialization
- Dataset summary showing agent trajectory counts
- Validation that map features are accessible for scene context

## 1. Verify Data Directory

Confirm that the nuScenes dataset is available at the expected location. This cell will raise an error if `data/raw/` is missing or incorrectly structured.

In [1]:
from pathlib import Path
import os

NUSCENES_ROOT = Path('../data/raw').resolve()
if not NUSCENES_ROOT.exists():
    raise FileNotFoundError(f"nuScenes root not found at {NUSCENES_ROOT}. Update the path before continuing.")

os.environ['NUSCENES_ROOT'] = str(NUSCENES_ROOT)
print(f"nuScenes root set to: {NUSCENES_ROOT}")

nuScenes root set to: /Users/simondrauz/Lokale Dokumente/Repositories/ds_practical/data/raw


## 2. Initialize TrajData Dataset

Load the nuScenes mini split through TrajData's `UnifiedDataset` interface. On first execution, this will:
- Build dataset caches (trajectory metadata, agent states)
- Rasterize map data at 2 px/m resolution
- Create vector map feature indices

**Note:** Initial caching takes 3–5 minutes; subsequent runs are instant.

In [6]:
from trajdata import UnifiedDataset

try:
    # Initialize TrajData with nuScenes mini split including map data
    dataset = UnifiedDataset(
        desired_data=["nusc_mini"],  # Use 'nusc_trainval' for the full dataset
        centric="agent",
        desired_dt=0.1,
        history_sec=(3.2, 3.2),
        future_sec=(4.8, 4.8),
        incl_raster_map=True,  # Enable raster map rendering
        raster_map_params={
            "px_per_m": 2,  # 2 pixels per meter resolution
            "map_size_px": 224,  # 224x224 pixel map patches
        },
        incl_vector_map=True,  # Enable vector map features
        verbose=True,
        data_dirs={
            "nusc_mini": str(NUSCENES_ROOT)
        },
        num_workers=0,
    )
    print(f"Successfully loaded dataset with {len(dataset)} agent trajectories")
    print(f"Dataset info: {dataset}")
except FileNotFoundError as exc:
    print("nuScenes data not found. Confirm the dataset is extracted under data/raw with proper folder structure.")
    raise
except Exception as exc:
    print(f"Encountered an issue while initializing TrajData: {exc}")
    raise exc

Loading data for matched scene tags: ['mini_train-nusc_mini-boston', 'mini_train-nusc_mini-singapore', 'nusc_mini-mini_val-boston', 'nusc_mini-mini_val-singapore']


Getting Scenes from nusc_mini: 100%|██████████| 4/4 [00:00<00:00, 1974.49it/s]

Calculating Agent Data (Serially): 100%|██████████| 10/10 [00:00<00:00, 67979.00it/s]
Calculating Agent Data (Serially): 100%|██████████| 10/10 [00:00<00:00, 67979.00it/s]


10 scenes in the scene index.


Creating Agent Data Index (Serially): 100%|██████████| 10/10 [00:00<00:00, 1853.26it/s]
Creating Agent Data Index (Serially): 100%|██████████| 10/10 [00:00<00:00, 1853.26it/s]
Structuring Agent Data Index: 100%|██████████| 10/10 [00:00<00:00, 21687.20it/s]

Successfully loaded dataset with 26377 agent trajectories
Dataset info: <trajdata.dataset.UnifiedDataset object at 0x333b80190>





## 3. Dataset Summary

Inspect the loaded dataset to confirm accessibility and understand the data scope.

In [None]:
import pandas as pd

# Display dataset statistics
print(f"Total agent trajectories: {len(dataset)}")
print(f"Dataset description: {dataset}")

# Sample the first batch to verify data loading
sample_batch = dataset[0]
print(f"\nSample batch keys: {sample_batch.keys()}")

## Setup Complete ✓

If the cells above executed without errors, your environment is ready for data characterization and clustering work. 

**Next Steps:**
1. Commit the notebook and updated `requirements.txt` to the repository
2. Share the repo URL with your colleague for collaborative setup
3. Begin exploratory data analysis in subsequent notebook cells or a new notebook