# LLMSQL2 Colab GPU Training

This notebook trains the required models on the geography dataset using a Colab GPU and saves checkpoints to Google Drive.

**Important:** Runtime → Change runtime type → **GPU**.

In [None]:
# Verify GPU
!nvidia-smi

In [None]:
# Mount Google Drive (for saving checkpoints)
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Clone the repo
%cd /content
!git clone https://github.com/<YOUR_ORG_OR_USERNAME>/LLMSQL2.git
%cd /content/LLMSQL2

In [None]:
# Install dependencies
!pip -q install -r requirements.txt

In [None]:
# Set output paths in Drive
GDRIVE_OUT = '/content/drive/MyDrive/LLMSQL2/results'
!mkdir -p {GDRIVE_OUT}
print('Results will be saved to:', GDRIVE_OUT)

In [None]:
# Train GPT-2 on geography
!python -m src.train_gpt2 \
  --data /content/LLMSQL2/data/text2sql-data/data/geography.json \
  --output /content/drive/MyDrive/LLMSQL2/results/gpt2-geography \
  --epochs 5 \
  --batch-size 2

In [None]:
# Train TinyLlama (LoRA) on geography
!python -m src.train_tinyllama \
  --data /content/LLMSQL2/data/text2sql-data/data/geography.json \
  --output /content/drive/MyDrive/LLMSQL2/results/tinyllama-geography \
  --epochs 3

In [None]:
# Evaluate GPT-2 fine-tuned model
!python -m src.evaluation \
  --model gpt2 \
  --checkpoint /content/drive/MyDrive/LLMSQL2/results/gpt2-geography/final \
  --database geography

In [None]:
# Evaluate TinyLlama fine-tuned model
!python -m src.evaluation \
  --model tinyllama \
  --checkpoint /content/drive/MyDrive/LLMSQL2/results/tinyllama-geography/final \
  --database geography

## Next Steps
Repeat the training and evaluation for advising, atis, and restaurants by changing the `--data` and `--output` paths.

Example data paths:
- /content/LLMSQL2/data/text2sql-data/data/advising.json
- /content/LLMSQL2/data/text2sql-data/data/atis.json
- /content/LLMSQL2/data/text2sql-data/data/restaurants.json