# Energy-Based Model Training on Google Colab (FREE GPU)

This notebook runs your training on Google Colab's free T4 GPU without any quota requirements.

## Setup Instructions:
1. Upload this notebook to Google Colab
2. Go to Runtime → Change runtime type → GPU → T4
3. Run the cells below

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Mount Google Drive for persistent storage
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Clone your repository or upload files
# Option 1: If you have a GitHub repo
# !git clone https://github.com/YOUR_USERNAME/energy-based-model.git

# Option 2: Upload from local (run this and use the file picker)
from google.colab import files
import zipfile
import os

print("Upload your energy-based-model.zip file:")
uploaded = files.upload()

# Extract if zip file uploaded
for filename in uploaded.keys():
    if filename.endswith('.zip'):
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall('/content/')
        print(f"Extracted {filename}")

# Change to project directory
%cd /content/energy-based-model

In [None]:
# Install dependencies
!pip install -q accelerate==1.10.1
!pip install -q einops==0.8.1
!pip install -q ema_pytorch==0.7.7
!pip install -q tabulate==0.9.0
!pip install -q tqdm==4.67.1
!pip install -q wandb  # Optional for logging

In [None]:
# Verify PyTorch and CUDA
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

In [None]:
# Download data if needed
!mkdir -p data
# Add your data download commands here
# Example: !wget -O data/dataset.tar.gz https://example.com/dataset.tar.gz
# !tar -xzf data/dataset.tar.gz -C data/

In [None]:
# Set up checkpoint directory in Google Drive for persistence
import os
os.makedirs('/content/drive/MyDrive/ebm_checkpoints', exist_ok=True)
os.makedirs('/content/drive/MyDrive/ebm_logs', exist_ok=True)

# Create symlinks
!ln -sf /content/drive/MyDrive/ebm_checkpoints ./checkpoints
!ln -sf /content/drive/MyDrive/ebm_logs ./logs

In [None]:
# Training configuration
BATCH_SIZE = 32
LEARNING_RATE = 1e-4
NUM_EPOCHS = 100
CHECKPOINT_DIR = "./checkpoints"
LOG_DIR = "./logs"
DATA_DIR = "./data"

In [None]:
# Run training
!python train.py \
    --data_dir {DATA_DIR} \
    --checkpoint_dir {CHECKPOINT_DIR} \
    --log_dir {LOG_DIR} \
    --batch_size {BATCH_SIZE} \
    --learning_rate {LEARNING_RATE} \
    --num_epochs {NUM_EPOCHS} \
    --device cuda \
    --num_workers 2

In [None]:
# Alternative: Run with automatic mixed precision for faster training
# !python train.py \
#     --data_dir {DATA_DIR} \
#     --checkpoint_dir {CHECKPOINT_DIR} \
#     --log_dir {LOG_DIR} \
#     --batch_size {BATCH_SIZE} \
#     --learning_rate {LEARNING_RATE} \
#     --num_epochs {NUM_EPOCHS} \
#     --device cuda \
#     --num_workers 2 \
#     --mixed_precision

In [None]:
# Monitor GPU usage during training
!nvidia-smi

In [None]:
# Download results to local machine
from google.colab import files
import shutil

# Zip checkpoints and logs
shutil.make_archive('training_results', 'zip', '.', 'checkpoints')
files.download('training_results.zip')

## Tips for Colab:

1. **Session Time Limits**: Free Colab has a 12-hour maximum runtime. Save checkpoints frequently!
2. **GPU Limits**: You get about 8-12 hours of GPU per day on the free tier
3. **Persistent Storage**: Always save important files to Google Drive
4. **Idle Timeout**: Colab disconnects after 90 minutes of inactivity
5. **Keep Alive**: Use this JavaScript in browser console to prevent disconnection:
```javascript
function ClickConnect(){
    console.log("Keeping alive...");
    document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect, 60000)
```

## Alternative: Colab Pro
- $10/month for faster GPUs (V100), longer runtimes, and more RAM
- No quota requirements, instant access