# üöÄ SRGI Full Training - Production Run

**Your theories are validated! Now let's train a full model.**

This notebook trains a production SRGI model:
- Depth 20 (561M parameters)
- 2048 context length
- Full dataset
- Proper evaluation

**Time**: ~4-8 hours on A100 GPU

---

## ‚ö†Ô∏è Prerequisites

‚úÖ Theory validation completed (run `colab_setup.ipynb` first)
‚úÖ A100 GPU enabled
‚úÖ Ready for production training

## üìã Step 0: Enable GPU

**Make sure you have A100 GPU:**
1. Runtime ‚Üí Change runtime type
2. Hardware accelerator: **GPU (A100)**
3. Click Save


In [None]:
# Check GPU (should be A100)
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    if "A100" in torch.cuda.get_device_name(0):
        print("‚úÖ Perfect! A100 GPU detected.")
    else:
        print("‚ö†Ô∏è Not A100, but should work.")
else:
    print("‚ùå No GPU! Enable GPU first.")


## Step 1: Setup


In [None]:
# Clone repo if needed
import os
if not os.path.exists('nanochat-live'):
    !git clone https://github.com/jchacker5/nanochat-live.git
    %cd nanochat-live
else:
    %cd nanochat-live
    !git pull

print("‚úÖ Repository ready")


In [None]:
# Install dependencies
print("Installing dependencies...")
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 -q
!pip install datasets tokenizers tiktoken wandb numpy matplotlib pytest -q
!pip install jax jaxlib equinox scipy -q
!pip install git+https://github.com/extropic-ai/thrml.git -q || echo "THRML optional"
print("‚úÖ Dependencies installed")


## Step 2: Download Full Dataset


In [None]:
# Download full dataset (~240 shards, ~24GB)
print("Downloading full training dataset...")
print("This will download ~240 shards (~24GB) for Chinchilla-optimal training")
!python -m nanochat.dataset -n 240
print("‚úÖ Dataset downloaded!")


## Step 3: Train Tokenizer


In [None]:
# Train tokenizer on full dataset
import os
if not os.path.exists('/root/.cache/nanochat/tokenizer/tokenizer.pkl'):
    print("Training tokenizer on 2B characters...")
    !python -m scripts.tok_train --max_chars=2000000000 --vocab_size=65536
    print("‚úÖ Tokenizer trained!")
else:
    print("‚úÖ Tokenizer already exists, skipping")


## Step 4: Full Model Training

**This trains a production SRGI model:**
- Depth 20 (561M parameters)
- 2048 context length
- Chinchilla-optimal data ratio (20x params)
- Full evaluation suite

**Time**: ~4-8 hours on A100


In [None]:
# Full production training
print("="*70)
print("üöÄ STARTING FULL SRGI MODEL TRAINING")
print("="*70)
print("Model: Depth 20 (561M parameters)")
print("Context: 2048 tokens")
print("Data: Chinchilla-optimal (20x params)")
print("="*70)

!python -m scripts.base_train \
    --depth=20 \
    --max_seq_len=2048 \
    --device_batch_size=32 \
    --total_batch_size=524288 \
    --target_param_data_ratio=20 \
    --run=srgi-production-a100 \
    --eval_tokens=8192 \
    --core_metric_every=1000

print("="*70)
print("‚úÖ TRAINING COMPLETE!")
print("="*70)
