# <a href="https://colab.research.google.com/github/lastweek/nano-train/blob/main/notebooks/train_in_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Nano-Train: Train LLMs in Google Colab

Train a 125M parameter GPT-style model on GPU in Google Colab.

## Setup

1. **Enable GPU**: Runtime â†’ Change runtime type â†’ T4 GPU
2. **Run all cells**: Runtime â†’ Run all

## What This Does

- Clones the latest nano-train code
- Installs dependencies
- Trains a 125M model on tiny Shakespeare
- Saves checkpoints to Google Drive (optional)

## 1. Check GPU

In [None]:
!nvidia-smi
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## 2. Install nano-train

In [None]:
# Clone repository
!git clone https://github.com/lastweek/nano-train.git
%cd nano-train

# Install dependencies
!pip install -q torch tqdm

print("\nâœ… nano-train installed successfully!")

## 3. Train Model

In [None]:
!python3 examples/train_mvp.py

## 4. What's Next?

### Training Complete! ðŸŽ‰

Checkpoints saved in `outputs/`:
- `checkpoint-step-500/` - Mid-training checkpoint
- `checkpoint-step-1000/` - Final checkpoint

### Next Steps:
1. **Generate text** from the trained model
2. **Train longer** - Increase `max_steps` in config
3. **Try larger models** - Modify `ModelConfig` parameters
4. **Use your own data** - Replace `tiny_shakespeare.txt`

### Roadmap:
Phase 0 (current): MVP Training Cycle âœ…
Phase 1: Production-ready foundation (OmegaConf + Hydra)
Phase 2: Flash Attention, gradient checkpointing, BF16
Phase 3-11: Distributed training to 671B MoE models

See [README.md](https://github.com/lastweek/nano-train) for full roadmap.