# Waymo Open Motion Dataset - Trajectory Prediction with PyTorch Geometric

This notebook demonstrates:
- Loading Waymo Open Motion Dataset scenarios from TFRecord files
- Converting scenarios to PyTorch Geometric graphs
- Training a GCN model for trajectory prediction
- Using Weights & Biases for experiment tracking

**Note:** This notebook uses individual graph processing (no batching) for compatibility with temporal GNN architectures like EvolveGCN-H.

In [1]:
!pip install torch torch-geometric torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://data.pyg.org/whl/torch-2.4.0+cu118.html -q
!pip install wandb tensorflow protobuf==3.20.3 -q
print("✓ Packages installed")

✓ Packages installed


In [2]:
import os
import sys
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.data import Data
import wandb
import tensorflow as tf
import numpy as np

# Add Waymo module path
src_path = os.path.abspath(os.path.join(os.getcwd(), 'src'))
if src_path not in sys.path:
    sys.path.insert(0, src_path)

from waymo_open_dataset.protos import scenario_pb2

print(f"PyTorch version: {torch.__version__}")
print(f"TensorFlow version: {tf.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")




PyTorch version: 2.4.0+cu118
TensorFlow version: 2.15.0
CUDA available: False


In [3]:
# This will prompt you for your W&B API key.
# You can also set the WANDB_API_KEY environment variable.
import wandb
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mradovicevic-erik1[0m ([33mradovicevic-erik1-[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

## Download Waymo Open Motion Dataset

You can download scenario files from: https://console.cloud.google.com/storage/browser/waymo_open_dataset_motion_v_1_3_0/uncompressed/scenario

**Option 1: Using gsutil (recommended)**
```bash
# Install gcloud SDK first, then:
gsutil -m cp gs://waymo_open_dataset_motion_v_1_3_0/uncompressed/scenario/training/uncompressed_scenario_training_training.tfrecord-00000-of-01000 ./data/scenario/training/
```

**Option 2: Manual download**
- Navigate to the GCS browser link above
- Download files to `./data/scenario/training/` directory

For this example, we'll use files you've already downloaded in the `data` directory.

In [4]:
# Helper functions to convert Waymo scenarios to PyTorch Geometric graphs

def parse_scenario_file(file_path):
    """Parse a Waymo TFRecord file and return list of scenarios."""
    dataset = tf.data.TFRecordDataset(file_path, compression_type='')
    scenarios = []
    for raw_record in dataset:
        try:
            scenario = scenario_pb2.Scenario.FromString(raw_record.numpy())
            scenarios.append(scenario)
        except Exception as e:
            print(f"Error parsing scenario: {e}")
            break
    return scenarios

def initial_feature_vector(agent, state_index):
    """Create feature vector for an agent at a specific timestep."""
    state = agent.states[state_index]
    
    # Basic features: position, velocity, valid flag
    properties = [
        state.center_x, 
        state.center_y, 
        state.velocity_x, 
        state.velocity_y, 
        float(state.valid)
    ]
    
    # One-hot encoding for object type
    object_types = {1: 'Vehicle', 2: 'Pedestrian', 3: 'Cyclist', 4: 'Other'}
    type_onehot = [
        1 if agent.object_type == 1 else 0,  # Vehicle
        1 if agent.object_type == 2 else 0,  # Pedestrian
        1 if agent.object_type == 3 else 0,  # Cyclist
        1 if agent.object_type == 4 else 0   # Other
    ]
    
    return torch.tensor(properties + type_onehot, dtype=torch.float32)

def build_edge_index_radius(positions, radius=30.0, valid_mask=None):
    """Build graph edges based on spatial proximity."""
    pairwise_distances = torch.cdist(positions, positions)
    
    if valid_mask is not None:
        vm = torch.as_tensor(valid_mask, dtype=torch.bool)
        valid_pair = vm[:, None] & vm[None, :]
        pairwise_distances = pairwise_distances.clone()
        pairwise_distances[~valid_pair] = float('inf')
    
    # Remove self-loops
    pairwise_distances.fill_diagonal_(float('inf'))
    
    # Create edges for agents within radius
    edges_mask = pairwise_distances <= radius
    src, dst = torch.where(edges_mask)
    edge_index = torch.stack([src, dst], dim=0)
    
    return edge_index

def scenario_to_graph(scenario, timestep, radius=30.0, future_steps=1):
    """Convert a Waymo scenario at a specific timestep to PyG Data."""
    node_features = []
    positions = []
    agent_ids = []
    valid_mask = []
    
    # Extract features for all valid agents at this timestep
    for agent in scenario.tracks:
        if timestep >= len(agent.states):
            continue
            
        state = agent.states[timestep]
        if not state.valid:
            continue
        
        node_features.append(initial_feature_vector(agent, timestep))
        positions.append([state.center_x, state.center_y])
        agent_ids.append(agent.id)
        valid_mask.append(1)
    
    if len(node_features) == 0:
        return None
    
    # Stack features and positions
    x = torch.stack(node_features)
    pos = torch.tensor(positions, dtype=torch.float32)
    
    # Build edges based on proximity
    edge_index = build_edge_index_radius(pos, radius, valid_mask)
    
    # Create labels: future positions (offsets from current position)
    labels = []
    id_to_agent = {t.id: t for t in scenario.tracks}
    for i, agent_id in enumerate(agent_ids):
        agent = id_to_agent[agent_id]
        future_pos = []
        for t in range(1, future_steps + 1):
            future_t = timestep + t
            if future_t < len(agent.states) and agent.states[future_t].valid:
                future_pos.append([
                    agent.states[future_t].center_x,
                    agent.states[future_t].center_y
                ])
            else:
                # Pad with last known position
                last = agent.states[min(future_t, len(agent.states) - 1)]
                future_pos.append([last.center_x, last.center_y])
        
        # Convert to offsets from current position
        current_pos = torch.tensor(positions[i], dtype=torch.float32)
        future_tensor = torch.tensor(future_pos, dtype=torch.float32)
        offsets = future_tensor - current_pos
        labels.append(offsets.flatten())
    
    y = torch.stack(labels)
    
    # Create PyG Data object
    data = Data(x=x, edge_index=edge_index, pos=pos, y=y)
    data.agent_ids = agent_ids
    data.scenario_id = scenario.scenario_id
    
    return data

print("✓ Helper functions defined")

✓ Helper functions defined


In [5]:
# Define hyperparameters for Waymo dataset
config = {
    "learning_rate": 0.001,
    "epochs": 50,
    "hidden_channels": 64,
    "dropout": 0.3,
    "dataset": "Waymo Open Motion Dataset",
    "architecture": "GCN",
    "radius": 30.0,  # meters - spatial proximity for edges
    "future_steps": 8,  # predict 8 timesteps (0.8 seconds) into future
    "timestep": 10,  # use timestep 10 as current observation
    "batch_size": 32,
    "num_scenarios": 10  # number of scenarios to load for this demo
}

print(f"Configuration: {config}")

Configuration: {'learning_rate': 0.001, 'epochs': 50, 'hidden_channels': 64, 'dropout': 0.3, 'dataset': 'Waymo Open Motion Dataset', 'architecture': 'GCN', 'radius': 30.0, 'future_steps': 8, 'timestep': 10, 'batch_size': 32, 'num_scenarios': 10}


In [6]:
# Load Waymo Open Motion Dataset scenarios
data_dir = './data/scenario/training'

# Get list of TFRecord files
tfrecord_files = [
    os.path.join(data_dir, f) 
    for f in os.listdir(data_dir) 
    if f.endswith('.tfrecord') or f.endswith('.tfrecord-00000-of-00150')
]

if not tfrecord_files:
    print("⚠ No TFRecord files found in ./data/scenario/training/")
    print("Please download files from:")
    print("https://console.cloud.google.com/storage/browser/waymo_open_dataset_motion_v_1_3_0/uncompressed/scenario/training")
else:
    print(f"Found {len(tfrecord_files)} TFRecord file(s)")
    
    # Load scenarios from first file
    print(f"\nLoading scenarios from: {os.path.basename(tfrecord_files[0])}")
    all_scenarios = parse_scenario_file(tfrecord_files[0])
    
    # Limit to configured number for this demo
    scenarios = all_scenarios[:config['num_scenarios']]
    
    print(f"\n{'='*60}")
    print(f"Dataset: {config['dataset']}")
    print(f"{'='*60}")
    print(f"Number of scenarios loaded: {len(scenarios)}")
    print(f"Total scenarios in file: {len(all_scenarios)}")
    
    # Analyze first scenario
    if scenarios:
        scenario = scenarios[0]
        print(f"\nFirst Scenario Analysis:")
        print(f"  Scenario ID: {scenario.scenario_id}")
        print(f"  SDC track index: {scenario.sdc_track_index}")
        print(f"  Number of agents/tracks: {len(scenario.tracks)}")
        print(f"  Number of timesteps: {len(scenario.timestamps_seconds)}")
        print(f"  Duration: {scenario.timestamps_seconds[-1] - scenario.timestamps_seconds[0]:.1f}s")
        print(f"  Number of map features: {len(scenario.map_features)}")
        
        # Count agents by type
        agent_types = {}
        type_names = {1: 'Vehicle', 2: 'Pedestrian', 3: 'Cyclist', 4: 'Other'}
        for track in scenario.tracks:
            type_name = type_names.get(track.object_type, 'Unknown')
            agent_types[type_name] = agent_types.get(type_name, 0) + 1
        
        print(f"\n  Agent types:")
        for agent_type, count in agent_types.items():
            print(f"    {agent_type}: {count}")
        
        # Convert to graph
        print(f"\nConverting to PyG graph at timestep {config['timestep']}...")
        graph_data = scenario_to_graph(
            scenario, 
            timestep=config['timestep'],
            radius=config['radius'],
            future_steps=config['future_steps']
        )
        
        if graph_data:
            print(f"\nGraph structure:")
            print(f"  Number of nodes: {graph_data.num_nodes}")
            print(f"  Number of edges: {graph_data.num_edges}")
            print(f"  Node feature dim: {graph_data.x.shape[1]}")
            print(f"  Label dim (future trajectory): {graph_data.y.shape[1]}")
            print(f"  Average degree: {graph_data.num_edges / graph_data.num_nodes:.2f}")
            print(f"  Has isolated nodes: {graph_data.has_isolated_nodes()}")
            print(f"  Has self-loops: {graph_data.has_self_loops()}")
            print(f"  Is undirected: {graph_data.is_undirected()}")
        else:
            print("  ⚠ No valid graph at this timestep")
    
    print(f"{'='*60}")

Found 1 TFRecord file(s)

Loading scenarios from: uncompressed_scenario_testing_testing.tfrecord-00000-of-00150

Dataset: Waymo Open Motion Dataset
Number of scenarios loaded: 10
Total scenarios in file: 289

First Scenario Analysis:
  Scenario ID: 53efd22f9e0bd276
  SDC track index: 48
  Number of agents/tracks: 49
  Number of timesteps: 11
  Duration: 1.0s
  Number of map features: 175

  Agent types:
    Vehicle: 34
    Pedestrian: 15

Converting to PyG graph at timestep 10...

Graph structure:
  Number of nodes: 29
  Number of edges: 240
  Node feature dim: 9
  Label dim (future trajectory): 16
  Average degree: 8.28
  Has isolated nodes: False
  Has self-loops: False
  Is undirected: True


In [7]:
# Define GCN model for trajectory prediction
class TrajectoryGCN(torch.nn.Module):
    def __init__(self, input_dim, hidden_channels, output_dim, dropout=0.3):
        super(TrajectoryGCN, self).__init__()
        self.conv1 = GCNConv(input_dim, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, hidden_channels)
        self.conv3 = GCNConv(hidden_channels, output_dim)
        self.dropout = dropout
        
    def forward(self, x, edge_index):
        # First GCN layer
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, p=self.dropout, training=self.training)
        
        # Second GCN layer
        x = self.conv2(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, p=self.dropout, training=self.training)
        
        # Output layer (trajectory prediction)
        x = self.conv3(x, edge_index)
        
        return x

# Initialize model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

if graph_data:
    input_dim = graph_data.x.shape[1]
    output_dim = graph_data.y.shape[1]  # future_steps * 2 (x,y offsets)
    
    model = TrajectoryGCN(
        input_dim=input_dim,
        hidden_channels=config['hidden_channels'],
        output_dim=output_dim,
        dropout=config['dropout']
    ).to(device)
    
    print(f"\nModel architecture:")
    print(f"  Input dim: {input_dim}")
    print(f"  Hidden dim: {config['hidden_channels']}")
    print(f"  Output dim: {output_dim}")
    print(f"  Total parameters: {sum(p.numel() for p in model.parameters()):,}")
    print(model)

Using device: cpu

Model architecture:
  Input dim: 9
  Hidden dim: 64
  Output dim: 16
  Total parameters: 5,840
TrajectoryGCN(
  (conv1): GCNConv(9, 64)
  (conv2): GCNConv(64, 64)
  (conv3): GCNConv(64, 16)
)


In [8]:
# Initialize W&B run
wandb.init(
    project="waymo-trajectory-prediction",
    config=config,
    name=f"GCN_r{config['radius']}_h{config['hidden_channels']}"
)

# Log model architecture
wandb.watch(model, log='all', log_freq=10)

print("✓ W&B initialized")

✓ W&B initialized


In [9]:
# Prepare training data - convert all scenarios to graphs
print("Preparing training graphs...")
train_graphs = []

for i, scenario in enumerate(scenarios):
    graph = scenario_to_graph(
        scenario,
        timestep=config['timestep'],
        radius=config['radius'],
        future_steps=config['future_steps']
    )
    if graph is not None:
        train_graphs.append(graph)
    
    if (i + 1) % 5 == 0:
        print(f"  Processed {i + 1}/{len(scenarios)} scenarios")

print(f"\n✓ Created {len(train_graphs)} training graphs")
print(f"  Average nodes per graph: {sum(g.num_nodes for g in train_graphs) / len(train_graphs):.1f}")
print(f"  Average edges per graph: {sum(g.num_edges for g in train_graphs) / len(train_graphs):.1f}")

Preparing training graphs...
  Processed 5/10 scenarios
  Processed 10/10 scenarios

✓ Created 10 training graphs
  Average nodes per graph: 32.2
  Average edges per graph: 301.4


In [10]:
# Training loop
optimizer = torch.optim.Adam(model.parameters(), lr=config['learning_rate'])
criterion = torch.nn.MSELoss()

print(f"\nStarting training for {config['epochs']} epochs...")
print(f"{'='*60}")

for epoch in range(config['epochs']):
    model.train()
    total_loss = 0
    
    # Train on each graph
    for graph in train_graphs:
        graph = graph.to(device)
        
        optimizer.zero_grad()
        
        # Forward pass
        out = model(graph.x, graph.edge_index)
        
        # Compute loss (MSE on trajectory predictions)
        loss = criterion(out, graph.y)
        
        # Backward pass
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    
    # Calculate average loss
    avg_loss = total_loss / len(train_graphs)
    
    # Log to W&B
    wandb.log({
        "epoch": epoch,
        "train_loss": avg_loss,
        "avg_loss_per_graph": avg_loss
    })
    
    # Print progress
    if (epoch + 1) % 5 == 0 or epoch == 0:
        print(f"Epoch {epoch+1:3d}/{config['epochs']} | Loss: {avg_loss:.6f}")

print(f"{'='*60}")
print("✓ Training complete!")


Starting training for 50 epochs...
Epoch   1/50 | Loss: 507660.445410
Epoch   5/50 | Loss: 61600.854626
Epoch  10/50 | Loss: 41588.555597
Epoch  15/50 | Loss: 32595.117230
Epoch  20/50 | Loss: 28589.013644
Epoch  25/50 | Loss: 22326.270651
Epoch  30/50 | Loss: 15802.430869
Epoch  35/50 | Loss: 11541.796442
Epoch  40/50 | Loss: 11237.283972
Epoch  45/50 | Loss: 7226.145450
Epoch  50/50 | Loss: 7445.612439
✓ Training complete!


In [11]:
# Evaluate model on a sample graph
model.eval()

with torch.no_grad():
    # Take first graph for visualization
    sample_graph = train_graphs[0].to(device)
    
    # Predict trajectories
    predictions = model(sample_graph.x, sample_graph.edge_index)
    
    # Compute metrics
    mse = criterion(predictions, sample_graph.y)
    mae = torch.nn.L1Loss()(predictions, sample_graph.y)
    
    print(f"\nEvaluation on sample graph:")
    print(f"  Scenario ID: {sample_graph.scenario_id}")
    print(f"  Number of agents: {sample_graph.num_nodes}")
    print(f"  MSE: {mse.item():.6f}")
    print(f"  MAE: {mae.item():.6f}")
    
    # Log final metrics to W&B
    wandb.log({
        "final_mse": mse.item(),
        "final_mae": mae.item(),
        "num_graphs": len(train_graphs)
    })
    
    # Show sample predictions
    print(f"\nSample predictions (first 3 agents):")
    for i in range(min(3, sample_graph.num_nodes)):
        pred = predictions[i].cpu().numpy()
        true = sample_graph.y[i].cpu().numpy()
        
        # Reshape to (future_steps, 2)
        pred_traj = pred.reshape(-1, 2)
        true_traj = true.reshape(-1, 2)
        
        print(f"\n  Agent {i} (ID: {sample_graph.agent_ids[i]}):")
        print(f"    Predicted final position offset: ({pred_traj[-1, 0]:.2f}, {pred_traj[-1, 1]:.2f})")
        print(f"    True final position offset:      ({true_traj[-1, 0]:.2f}, {true_traj[-1, 1]:.2f})")
        print(f"    Error: {np.linalg.norm(pred_traj[-1] - true_traj[-1]):.2f} meters")

print("\n✓ Evaluation complete")


Evaluation on sample graph:
  Scenario ID: 53efd22f9e0bd276
  Number of agents: 29
  MSE: 2190.499512
  MAE: 37.408176

Sample predictions (first 3 agents):

  Agent 0 (ID: 259):
    Predicted final position offset: (76.16, 34.50)
    True final position offset:      (0.00, 0.00)
    Error: 83.61 meters

  Agent 1 (ID: 260):
    Predicted final position offset: (88.52, 40.13)
    True final position offset:      (0.00, 0.00)
    Error: 97.19 meters

  Agent 2 (ID: 261):
    Predicted final position offset: (72.68, 32.69)
    True final position offset:      (0.00, 0.00)
    Error: 79.69 meters

✓ Evaluation complete


In [12]:
# Finish W&B run
wandb.finish()

print("✓ W&B run finished")
print("\nView your results at: https://wandb.ai")

[34m[1mwandb[0m: [32m[41mERROR[0m The nbformat package was not found. It is required to save notebook history.


0,1
avg_loss_per_graph,█▄▃▂▂▂▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
final_mae,▁
final_mse,▁
num_graphs,▁
train_loss,█▄▃▂▂▂▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
avg_loss_per_graph,7445.61244
epoch,49.0
final_mae,37.40818
final_mse,2190.49951
num_graphs,10.0
train_loss,7445.61244


✓ W&B run finished

View your results at: https://wandb.ai
