# TinyGPT Training on Google Colab

**Free GPU Training!** This notebook trains your TinyGPT model using Google Colab's free T4 GPU.

## Setup Instructions:
1. **Enable GPU**: Runtime → Change runtime type → GPU (T4)
2. **Run all cells** in order
3. Training will take ~2-3 hours (vs 6+ hours on M4)
4. Download trained model and tokenizer at the end

**Advantages over M4:**
- 2-3x faster training (T4 GPU vs M4 GPU)
- Free GPU hours (no cost)
- More VRAM (16GB vs M4's shared memory)
- Can use larger batch sizes

In [None]:
# Step 1: Clone your repository
!git clone https://github.com/mcintoshjames-sketch/tiny_gpt.git
%cd tiny_gpt
!ls -lh

In [None]:
# Step 2: Install dependencies
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -q tokenizers datasets tqdm numpy requests

print("\n✓ Dependencies installed!")

In [None]:
# Step 3: Verify GPU is available
import torch

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"✅ GPU Available: {gpu_name}")
    print(f"   Memory: {gpu_memory:.1f} GB")
    print(f"   CUDA Version: {torch.version.cuda}")
else:
    print("❌ No GPU found!")
    print("   Go to: Runtime → Change runtime type → GPU (T4)")
    raise RuntimeError("GPU required for efficient training")

In [None]:
# Step 4: Download and verify WikiText-103 dataset
import os

if not os.path.exists('data/wt103_train.txt'):
    print("Downloading WikiText-103 dataset...")
    from datasets import load_dataset
    
    dataset = load_dataset("Salesforce/wikitext", "wikitext-103-raw-v1")
    
    os.makedirs('data', exist_ok=True)
    
    train_text = "\n".join([t for t in dataset["train"]["text"] if t and len(t.strip()) > 0])
    with open('data/wt103_train.txt', 'w', encoding='utf-8') as f:
        f.write(train_text)
    
    valid_text = "\n".join([t for t in dataset["validation"]["text"] if t and len(t.strip()) > 0])
    with open('data/wt103_valid.txt', 'w', encoding='utf-8') as f:
        f.write(valid_text)
    
    print(f"✓ Downloaded {len(train_text):,} training characters")
    print(f"✓ Downloaded {len(valid_text):,} validation characters")
else:
    print("✓ Dataset already downloaded")

In [None]:
# Step 5: Train the model!
# This will take ~2-3 hours on T4 GPU

!python3 train.py

print("\n✅ Training complete!")

In [None]:
# Step 6: Test the trained model
!python3 inference.py --prompt "The history of artificial intelligence" --max_tokens 100

print("\n---\n")

!python3 inference.py --prompt "Machine learning is" --max_tokens 100

In [None]:
# Step 7: Download trained model to your computer
from google.colab import files
import os

print("Preparing files for download...\n")

# Download model checkpoint
if os.path.exists('tiny_gpt_best.pt'):
    print("Downloading tiny_gpt_best.pt...")
    files.download('tiny_gpt_best.pt')

# Download tokenizer
if os.path.exists('tokenizer_bpe_best.json'):
    print("Downloading tokenizer_bpe_best.json...")
    files.download('tokenizer_bpe_best.json')

print("\n✅ Downloads complete!")
print("   Place these files in your ~/tiny_gpt directory on Mac")
print("   Then run: python3 inference.py")

## Alternative: Mount Google Drive

If you want to save directly to Google Drive (so you don't lose progress if Colab disconnects):

In [None]:
# Optional: Mount Google Drive to save checkpoints
from google.colab import drive
drive.mount('/content/drive')

# Copy trained files to Google Drive
!mkdir -p '/content/drive/MyDrive/tiny_gpt_models'
!cp tiny_gpt_best.pt '/content/drive/MyDrive/tiny_gpt_models/'
!cp tokenizer_bpe_best.json '/content/drive/MyDrive/tiny_gpt_models/'

print("✓ Models saved to Google Drive: MyDrive/tiny_gpt_models/")