# nanoGPT Colab companion (single-VM demo)

This notebook mirrors the nanoGPT training used in the Kubernetes project but runs inside a single VM (Colab) with up to 1 GPU. It helps you validate configs quickly before you run the Kubernetes jobs on your GPU server.

What you will learn:
- Install nanoGPT and minimal deps
- Prepare tiny Shakespeare dataset
- Run CPU-only quick check
- If GPU is available, run `torchrun --standalone --nproc_per_node=2` to simulate multi-process training on a single VM

Note: This does not use Kubernetes, PV/PVC, or multi-pod networking. See the repo README for Kubernetes steps.



In [None]:
# Setup
!nvidia-smi -L || echo "No GPU detected"

import sys, subprocess

def pip_install(pkgs):
    print("Installing:", pkgs)
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-U"] + pkgs)

pip_install(["torch==2.3.1", "torchaudio==2.3.1", "torchvision==0.18.1", "--index-url", "https://download.pytorch.org/whl/cu118"])
pip_install(["tiktoken", "numpy", "tensorboard"])

!git clone https://github.com/karpathy/nanoGPT.git /content/nanogpt || true
%cd /content/nanogpt



In [None]:
# Prepare tiny Shakespeare dataset
%cd /content/nanogpt/data/shakespeare_char
!python prepare.py

# Move prepared data to a known path for training
!mkdir -p /content/data/datasets/shakespeare_char
!cp -v train.bin val.bin meta.pkl /content/data/datasets/shakespeare_char/



In [None]:
# CPU-only smoke test (fast)
%cd /content/nanogpt
!python - <<'PY'
import os, subprocess
cmd = [
    'python','train.py','config/train_shakespeare_char.py',
    '--out_dir=/content/runs/cpu',
    '--eval_interval=50','--log_interval=1',
    '--block_size=128','--batch_size=16',
    '--n_layer=2','--n_head=2','--n_embd=64',
    '--max_iters=50','--lr_decay_iters=50','--dropout=0.0',
    '--device=cpu','--compile=False',
    '--dataset=shakespeare_char'
]
print('Running:', ' '.join(cmd))
subprocess.check_call(cmd)
PY


In [None]:
# Optional: 2-process torchrun on one VM (if GPU available)
%cd /content/nanogpt
!python - <<'PY'
import os, shutil, subprocess, sys

def has_gpu():
    try:
        out = subprocess.check_output(['nvidia-smi','-L'], stderr=subprocess.STDOUT)
        return b'GPU' in out
    except Exception:
        return False

if not has_gpu():
    print('No GPU available, skipping torchrun demo')
    sys.exit(0)

cmd = [
    'torchrun','--standalone','--nproc_per_node=2','train.py','config/train_shakespeare_char.py',
    '--out_dir=/content/runs/torchrun',
    '--eval_interval=50','--log_interval=1',
    '--block_size=128','--batch_size=16',
    '--n_layer=2','--n_head=2','--n_embd=64',
    '--max_iters=100','--lr_decay_iters=100','--dropout=0.0',
    '--device=cuda','--compile=False',
    '--dataset=shakespeare_char'
]
print('Running:', ' '.join(cmd))
subprocess.check_call(cmd)
PY


## Notes
- Logs are written under `/content/runs`. Use TensorBoard from the left sidebar (Colab) or `%load_ext tensorboard` and `%tensorboard --logdir /content/runs`.
- For Kubernetes runs, the datasets and runs live under the PVC mount (`/data`). This notebook mirrors the configs but not the storage/networking model.
- Map to Kubernetes:
  - `torchrun --standalone --nproc_per_node=2` ≈ single-Pod multi-GPU with `nproc_per_node=2`
  - Multi-pod DDP in Kubernetes uses `--nnodes` `--node_rank` and headless Service rendezvous; not demonstrated here.

