# VELOCITY-ASR v2 Collaboration Notebook

This notebook is a shared scratchpad for understanding the repo layout, downloading LibriSpeech, and running training. It mirrors the behavior in `scripts/download_librispeech.py`, `scripts/train.py`, and `velocity_asr/model.py`.

## Repo map (fast context)

- `scripts/download_librispeech.py`: download splits and optionally generate JSONL manifests
- `scripts/train.py`: training entrypoint that reads `configs/train.yaml` and `configs/model.yaml`
- `velocity_asr/model.py`: VELOCITY-ASR model definition
- `velocity_asr/data.py`: dataset + dataloaders (manifest or LibriSpeech direct)
- `configs/`: training and model hyperparameters

## Setup

Install dependencies and the package in editable mode:

```bash
pip install -r requirements.txt
pip install -e .
```

## Download LibriSpeech (option A: torchaudio download + manifests)

This uses `scripts/download_librispeech.py`, which can also write JSONL manifests for training.

In [None]:
!python scripts/download_librispeech.py --train 100 --create-manifests --data-dir ./data --manifest-dir ./manifests

## Inspect training and model configs

These YAML files drive the training script and model instantiation.

In [None]:
import yaml

with open("configs/train.yaml", "r") as f:
    train_cfg = yaml.safe_load(f)

with open("configs/model.yaml", "r") as f:
    model_cfg = yaml.safe_load(f)

train_cfg, model_cfg

## Instantiate the model (from `velocity_asr/model.py`)

This matches the logic in `scripts/train.py` where `VelocityASRConfig` is assembled from `configs/model.yaml`.

In [None]:
from velocity_asr import VELOCITYASR, VelocityASRConfig

cfg = VelocityASRConfig(
    mel_bins=model_cfg.get("input", {}).get("mel_bins", 80),
    d_model=model_cfg.get("model", {}).get("d_model", 192),
    ssm_layers=model_cfg.get("ssm", {}).get("num_layers", 8),
    ssm_state_dim=model_cfg.get("ssm", {}).get("state_dim", 64),
    ssm_expand_ratio=model_cfg.get("ssm", {}).get("expand_ratio", 2),
    ssm_kernel_size=model_cfg.get("ssm", {}).get("kernel_size", 4),
    global_ssm_layers=model_cfg.get("global_context", {}).get("ssm_layers", 2),
    global_ssm_state_dim=model_cfg.get("global_context", {}).get("ssm_state_dim", 32),
    attention_heads=model_cfg.get("global_context", {}).get("attention_heads", 4),
    attention_dim=model_cfg.get("global_context", {}).get("attention_dim", 48),
    vocab_size=model_cfg.get("model", {}).get("vocab_size", 1000),
    dropout=model_cfg.get("model", {}).get("dropout", 0.1),
    gradient_checkpointing=model_cfg.get("memory", {}).get("gradient_checkpointing", False),
    scan_mode=model_cfg.get("performance", {}).get("scan_mode", "parallel"),
    use_compile=model_cfg.get("performance", {}).get("use_compile", False),
)

model = VELOCITYASR(cfg)
print(f"Params: {model.count_parameters():,}")

## Training (runs `scripts/train.py`)

The training script can load either JSONL manifests or LibriSpeech directly. Update `configs/train.yaml` to set `data.train_manifest`/`data.val_manifest`, or set `data.librispeech_root` for direct loading.

In [None]:
!python scripts/train.py --config configs/train.yaml --model-config configs/model.yaml

## Quick forward pass + greedy decode (dummy audio)

This is only to validate the model wiring. Output is random without trained weights.

In [None]:
import torch
from velocity_asr import create_default_vocabulary, CTCDecoder

# Fake mel spectrogram: (batch, frames, mel_bins)
mel = torch.randn(1, 300, cfg.mel_bins)

with torch.no_grad():
    logits = model(mel)

vocab = create_default_vocabulary(cfg.vocab_size)
decoder = CTCDecoder(vocab)
decoder.decode_greedy(logits)