# alpamayo-tools: Basic Usage

This notebook demonstrates the basic usage of alpamayo-tools for working with NVIDIA's Alpamayo/PhysicalAI-AV ecosystem.

> **Note:** If you're running this from the repo with `uv sync --extra all --extra dev`, all dependencies are already installed.

## Installation (for standalone use)

Skip this if you cloned the repo and ran `uv sync`.

```bash
# Core package (dataloader only)
pip install alpamayo-tools

# With embeddings support
pip install alpamayo-tools[embeddings]

# With inference support (also need alpamayo_r1 from GitHub)
pip install alpamayo-tools[inference]
pip install git+https://github.com/NVlabs/alpamayo.git

# Everything
pip install alpamayo-tools[all]
```

## 1. Using the PyTorch DataLoader

The `PhysicalAIDataset` provides a clean interface for loading PhysicalAI-AV data into PyTorch.

In [3]:
# First, let's fetch some available clip IDs from the dataset
import physical_ai_av

avdi = physical_ai_av.PhysicalAIAVDatasetInterface()

# Get all available clip IDs
all_clip_ids = avdi.clip_index.index.tolist()
print(f"Total clips available: {len(all_clip_ids)}")

# Use first 2 for demo
sample_clip_ids = all_clip_ids[:2]
print(f"Using clips: {sample_clip_ids}")

Total clips available: 227985
Using clips: ['25cd4769-5dcf-4b53-a351-bf2c5deb6124', '2edf278f-d5e3-4b83-b5df-923a04335725']


In [4]:
from alpamayo_tools import PhysicalAIDataset, DatasetConfig

# Configure the dataset
config = DatasetConfig(
    clip_ids=sample_clip_ids,
    cameras=("camera_front_wide_120fov", "camera_front_tele_30fov"),
    num_frames=4,
    num_history_steps=16,  # 1.6s history @ 10Hz
    num_future_steps=64,   # 6.4s future @ 10Hz
    stream=True,  # Stream from HuggingFace
)

# Create dataset
dataset = PhysicalAIDataset(config)
print(f"Dataset size: {len(dataset)} clips")

Dataset size: 2 clips


In [5]:
# Get a single sample
sample = dataset[0]

print("Sample keys:", list(sample.keys()))
print(f"Frames shape: {sample['frames'].shape}")  # (N_cameras, num_frames, 3, H, W)
print(f"History positions shape: {sample['ego_history_xyz'].shape}")  # (16, 3)
print(f"Future positions shape: {sample['ego_future_xyz'].shape}")  # (64, 3)

Sample keys: ['clip_id', 't0_us', 'frames', 'camera_indices', 'ego_history_xyz', 'ego_history_rot', 'ego_future_xyz', 'ego_future_rot', 'frame_timestamps']
Frames shape: torch.Size([2, 4, 3, 1080, 1920])
History positions shape: torch.Size([16, 3])
Future positions shape: torch.Size([64, 3])


In [6]:
# Use with DataLoader
from torch.utils.data import DataLoader
from alpamayo_tools import collate_fn

loader = DataLoader(
    dataset,
    batch_size=2,
    shuffle=True,
    num_workers=0,  # Set > 0 for parallel loading
    collate_fn=collate_fn,
)

for batch in loader:
    print(f"Batch frames shape: {batch['frames'].shape}")
    print(f"Batch history shape: {batch['ego_history_xyz'].shape}")
    break

Batch frames shape: torch.Size([2, 2, 4, 3, 1080, 1920])
Batch history shape: torch.Size([2, 16, 3])


### Loading from a file

You can also load clip IDs from a parquet or text file:

In [7]:
# From parquet file
# config = DatasetConfig(
#     clip_ids_file="path/to/clip_ids.parquet",
#     cameras=("camera_front_wide_120fov",),
# )

# From text file (one clip ID per line)
# config = DatasetConfig(
#     clip_ids_file="path/to/clip_ids.txt",
# )

## 2. CoC Embeddings

The `CoCEmbedder` provides a simple interface for embedding Chain-of-Cognition reasoning text.

In [8]:
from alpamayo_tools import CoCEmbedder

# Initialize embedder (uses all-MiniLM-L6-v2 by default)
embedder = CoCEmbedder()

print(f"Embedding dimension: {embedder.embedding_dim}")
print(f"Device: {embedder.device}")

Embedding dimension: 384
Device: mps:0


In [9]:
# Embed reasoning texts
texts = [
    "The vehicle ahead is braking. Reduce speed to maintain safe following distance.",
    "Clear road ahead with no obstacles. Continue at current speed.",
    "Pedestrian crossing detected on the right. Slow down and prepare to stop.",
]

embeddings = embedder.embed(texts)
print(f"Embeddings shape: {embeddings.shape}")  # (3, 384)

Embeddings shape: (3, 384)


In [10]:
# Compute similarity between texts
import numpy as np

# Cosine similarity (embeddings are normalized by default)
similarity_matrix = embeddings @ embeddings.T
print("Similarity matrix:")
print(similarity_matrix)

Similarity matrix:
[[1.        0.6757734 0.5420312]
 [0.6757734 1.0000001 0.5415089]
 [0.5420312 0.5415089 1.0000001]]


## 3. Alpamayo Inference (Requires GPU)

The `AlpamayoPredictor` provides a simple interface for running Alpamayo-R1 inference.

**Requirements:**
- GPU with 24GB+ VRAM
- `alpamayo_r1` package installed from GitHub

In [11]:
# Note: This requires alpamayo_r1 and a GPU
# pip install git+https://github.com/NVlabs/alpamayo.git

from alpamayo_tools.inference import AlpamayoPredictor
import torch

# Load the model (skip this cell if you don't have a GPU)
if torch.cuda.is_available():
    predictor = AlpamayoPredictor.from_pretrained(
        model_id="nvidia/Alpamayo-R1-10B",
        dtype=torch.bfloat16,
    )
    print("Model loaded!")
else:
    print("Skipping: CUDA not available")

Skipping: CUDA not available


In [12]:
# Run inference on a clip (skip if no GPU)
if torch.cuda.is_available():
    result = predictor.predict_from_clip(
        clip_id=sample_clip_ids[0],
        t0_us=5_100_000,  # 5.1 seconds into clip
        num_samples=1,
    )

    print(f"Trajectory shape: {result.trajectory_xyz.shape}")  # (64, 3)
    print(f"\nReasoning:\n{result.reasoning_text}")

In [13]:
# Or use with our dataset
if torch.cuda.is_available():
    sample = dataset[0]
    result = predictor.predict_from_dataset_sample(sample)

    print(f"Predicted trajectory: {result.trajectory_xyz.shape}")
    print(f"Ground truth trajectory: {sample['ego_future_xyz'].shape}")

## 4. CLI: Generate Teacher Labels

The package includes a CLI tool for generating teacher labels at scale:

```bash
# Basic usage
alpamayo-generate-labels \
    --clip-ids-file train_clips.parquet \
    --output-dir ./labels

# With sharding for multi-GPU
# GPU 0:
CUDA_VISIBLE_DEVICES=0 alpamayo-generate-labels \
    --clip-ids-file train_clips.parquet \
    --output-dir ./labels \
    --shard 0/2

# GPU 1:
CUDA_VISIBLE_DEVICES=1 alpamayo-generate-labels \
    --clip-ids-file train_clips.parquet \
    --output-dir ./labels \
    --shard 1/2

# Resume from checkpoint
alpamayo-generate-labels \
    --clip-ids-file train_clips.parquet \
    --output-dir ./labels \
    --resume
```

## 5. Working with Output Labels

The generated labels are saved as compressed NPZ files:

In [14]:
# Example: Load generated labels (requires running the CLI first)
# import numpy as np
#
# data = np.load("./labels/teacher_labels.npz", allow_pickle=True)
#
# print("Available arrays:", list(data.keys()))
# print(f"Number of samples: {len(data['clip_ids'])}")
# print(f"Trajectory shape: {data['trajectory_xyz'].shape}")  # (N, 64, 3)
# print(f"Rotation shape: {data['trajectory_rot'].shape}")  # (N, 64, 3, 3)
# print(f"CoC embedding shape: {data['coc_embeddings'].shape}")  # (N, 384)
#
# # Access individual samples
# idx = 0
# print(f"Clip ID: {data['clip_ids'][idx]}")
# print(f"Reasoning: {data['coc_texts'][idx]}")

## Cleanup

In [15]:
# Always close the dataset to release video reader resources
dataset.close()