# hcho-200M Training on Google Colab

Train a 163M parameter language model on Google Colab with GPU acceleration.

## 🚀 Quick Start

1. **Enable GPU**: Runtime → Change runtime type → GPU → T4 (free) or V100/A100 (Pro)
2. **Run all cells**: Runtime → Run all
3. **Monitor training**: Watch the progress bars and loss curves

## 📊 Dataset
- **Total tokens**: ~1.1 billion
- **Datasets**: WikiText, SQuAD, IMDB, AG News, Yelp, Amazon, GLUE
- **Sequence length**: 2048 tokens
- **Tokens per parameter**: 6.9 (optimal for training)

## ⏱️ Training Time
- **T4 GPU (free)**: ~3-4 hours
- **V100 GPU (Pro)**: ~1-2 hours
- **A100 GPU (Pro)**: ~45-60 minutes


## 🔧 Setup and Installation


In [None]:
# Install PyTorch with CUDA support
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install other dependencies
%pip install transformers datasets accelerate tokenizers tqdm pyyaml

print("✅ Dependencies installed successfully!")


In [None]:
# Clone the repository
!git clone https://github.com/your-username/hcho-200M.git
%cd hcho-200M

print("✅ Repository cloned successfully!")
import os
print("📁 Current directory:", os.getcwd())


In [None]:
# Check GPU availability
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️  No GPU detected! Please enable GPU in Runtime settings.")


## 🚀 Start Training


In [None]:
# Start training with progress monitoring
!python train_llm.py --config config.yaml
