# Mason Transformer - GPU Training on Google Colab

This notebook trains the 125M-parameter Mason transformer on a Colab GPU.

## Setup Instructions

1. Go to [colab.research.google.com](https://colab.research.google.com)
2. Sign in with your Google account
3. Upload this notebook (File > Upload notebook)
4. Select a GPU runtime: Runtime > Change runtime type > T4 GPU (free tier)
5. Run all cells in order

Training takes approximately 2-4 hours on a T4 GPU for 20,000 steps.

## 1. Clone Repository and Install Dependencies

In [None]:
import os

# Clone the repo using anonymous HTTPS (public repo, no auth needed)
if not os.path.exists('masonearl.com'):
    !GIT_TERMINAL_PROMPT=0 git clone https://github.com/masonearl/masonearl.com.git
else:
    print('Repo already cloned, pulling latest...')
    !GIT_TERMINAL_PROMPT=0 git -C masonearl.com pull

# Install dependencies
!pip install -q torch openpyxl

print('Done.')
!ls masonearl.com/pages/contech/estimator/model/transformer/

## 2. GPU Detection and Setup

In [None]:
import torch
import sys

# Check GPU
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f'GPU: {gpu_name} ({gpu_mem:.1f} GB)')
    print(f'CUDA version: {torch.version.cuda}')
    print(f'PyTorch version: {torch.__version__}')
    print(f'bfloat16 supported: {torch.cuda.is_bf16_supported()}')
else:
    print('WARNING: No GPU detected!')
    print('Go to Runtime > Change runtime type > T4 GPU')
    print('Then restart this notebook.')

# Set up paths
TRANSFORMER_DIR = 'masonearl.com/pages/contech/estimator/model/transformer'
sys.path.insert(0, TRANSFORMER_DIR)
os.chdir(TRANSFORMER_DIR)
print(f'\nWorking directory: {os.getcwd()}')

## 3. Build Training Corpus

In [None]:
!python build_corpus.py

# Note: real_projects.txt (from extract_real_projects.py) is pre-generated locally
# and committed to the repo. It requires OneDrive access to regenerate.
# Check if it exists:
import os
if os.path.exists('corpus/real_projects.txt'):
    size = os.path.getsize('corpus/real_projects.txt') / 1024
    print(f'real_projects.txt found ({size:.0f} KB) - real Tempest project data loaded')
else:
    print('WARNING: corpus/real_projects.txt not found - real project data will not be included')

## 4. Train Tokenizer

In [None]:
!python tokenizer.py --train

## 5. Train the Model

Speed improvements applied:
- `torch.compile()` (PyTorch 2.0+): ~1.5-2x faster on CUDA
- TF32 enabled on Ampere GPUs (A100): free 1.5x speedup
- `num_workers=2` + `pin_memory=True` + `prefetch_factor=4`: faster data loading
- Mixed precision (bfloat16 on A100, float16 on T4)

Expected throughput by GPU:
- T4 (free): ~3,000-5,000 tok/s → 40k steps ≈ 3-5 hours
- A100 (Colab Pro+): ~15,000-25,000 tok/s → 40k steps ≈ 45-90 min

Loss target: below 3.0 within 2k steps, ideally below 2.5 by 40k steps.
If you get OOM errors, reduce `BATCH_SIZE` to 2.

In [None]:
import torch

# Training configuration
STEPS = 40000        # Total training steps (was 20k - more steps = better quality)
BATCH_SIZE = 4       # Reduce to 2 if you get OOM; increase to 8 on A100
RESUME = False       # Set to True to resume from a checkpoint

# Auto-increase batch size on A100 (40GB VRAM)
if torch.cuda.is_available():
    vram_gb = torch.cuda.get_device_properties(0).total_memory / 1e9
    if vram_gb >= 35:  # A100
        BATCH_SIZE = 8
        print(f'A100 detected ({vram_gb:.0f} GB) - using batch_size={BATCH_SIZE}')
    else:
        print(f'GPU: {vram_gb:.0f} GB - using batch_size={BATCH_SIZE}')

cmd = f'python train.py --steps {STEPS} --batch-size {BATCH_SIZE}'
if RESUME:
    cmd += ' --resume'

print(f'Running: {cmd}')
print('=' * 60)
!PYTHONUNBUFFERED=1 {cmd}

## 6. Test the Model

In [None]:
from generate import MasonEngine

engine = MasonEngine()

# Test prompts including real Tempest project knowledge
test_prompts = [
    'Can you help me bid a job?',
    'What is the T&M rate for a foreman?',
    'How much does 8-inch PVC C900 pipe cost per foot?',
    'What was the Herriman Old Town Waterline project?',
    'What does a Cat 320 excavator cost per hour?',
    'What is a typical price per foot for a 6-inch IHP gas line relocation?',
    'How much does 3/4 inch APWA road base cost from SPC?',
    'Estimate 1000 LF of 8-inch sewer in clay at 6 feet deep.',
    'What production rate should I use for a Cat 315 excavator?',
    'Hello, what can you do?',
]

for prompt in test_prompts:
    print(f'\nUser: {prompt}')
    response = engine.chat([
        {'role': 'system', 'content': 'You are Mason, a personal AI assistant built by Mason Earl.'},
        {'role': 'user', 'content': prompt}
    ])
    print(f'Mason: {response}')
    print('-' * 60)

## 7. Download Trained Model

Download the best checkpoint to your local machine. Then copy it to:
`pages/contech/estimator/model/transformer/checkpoints/`

In [None]:
from google.colab import files
import os

best_path = 'checkpoints/best.pt'
latest_path = 'checkpoints/latest.pt'

if os.path.exists(best_path):
    size_mb = os.path.getsize(best_path) / 1e6
    print(f'Downloading best.pt ({size_mb:.0f} MB)...')
    files.download(best_path)
elif os.path.exists(latest_path):
    size_mb = os.path.getsize(latest_path) / 1e6
    print(f'Downloading latest.pt ({size_mb:.0f} MB)...')
    files.download(latest_path)
else:
    print('No checkpoint found. Run the training cell first.')

# Also download the tokenizer
if os.path.exists('tokenizer.json'):
    files.download('tokenizer.json')
    print('Downloaded tokenizer.json')