# Training Phi-2 with QLoRA and GRPO on Google Colab

This notebook provides step-by-step instructions for training the Phi-2 model using QLoRA and GRPO on Google Colab.

## 1. Setup Environment

First, let's install the required dependencies and mount Google Drive for storage.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!pip install -q transformers==4.36.2 peft==0.7.1 bitsandbytes==0.41.3 trl==0.7.4 accelerate==0.25.0 wandb
!pip install -q torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

## 2. Clone Repository and Setup Project

Clone the repository and set up the project structure.

In [None]:
!git clone https://github.com/hsinghweb/era-v3-s22-grpo.git
!cd era-v3-s22-grpo

## 3. Modify Training Script

Create a modified version of the training script optimized for Colab.

In [None]:
import os

# Set output directory in Google Drive
OUTPUT_DIR = "/content/drive/MyDrive/phi2-qlora-grpo"
os.makedirs(OUTPUT_DIR, exist_ok=True)

# Rest of the training script from train.py with the following modifications:
# 1. Reduced batch size and gradient accumulation for Colab's GPU
BATCH_SIZE = 2

# 2. Enable gradient checkpointing for memory efficiency
model.gradient_checkpointing_enable()

# 3. Modified training arguments
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=8,  # Increased for stability
    learning_rate=LEARNING_RATE,
    weight_decay=0.01,
    warmup_ratio=0.03,
    logging_steps=5,
    save_strategy="epoch",
    evaluation_strategy="no",
    lr_scheduler_type="cosine",
    report_to="wandb",
    gradient_checkpointing=True,
    fp16=True  # Enable mixed precision training
)

# Rest of the training code remains the same

## 4. Start Training

Initialize Weights & Biases for tracking and start the training process.

In [None]:
import wandb
wandb.login()  # Login to your W&B account

# Start training
trainer.train()

## 5. Save the Model

Save the trained model to Google Drive.

In [None]:
# Save the final model
trainer.save_model()
print(f"Model saved to {OUTPUT_DIR}")