# 🚀 RAFT Fine-tuning on MSRS Story-QA

Complete pipeline for fine-tuning Qwen3-4B-Instruct with RAFT methodology on Google Colab T4 GPU.

## 📋 What This Notebook Does

1. **Setup Environment** - Install dependencies and clone repository
2. **Load Data** - Download MSRS Story-QA dataset
3. **Build Index** - Create vector search index
4. **Generate RAFT Dataset** - Create training data with CoT and citations
5. **Train Model** - Fine-tune with Unsloth QLoRA
6. **Evaluate** - Test model performance

## ⚙️ Requirements

- **GPU**: T4 (15GB VRAM)
- **RAM**: High-RAM runtime recommended
- **Time**: ~3-4 hours for 50-100 training examples
- **API Key**: OpenAI API key for CoT generation

## 🎯 Quick Start

1. Enable GPU: Runtime → Change runtime type → T4 GPU
2. Run all cells in order
3. Enter your OpenAI API key when prompted
4. Wait for training to complete

---

## 🔧 Step 1: Setup Environment

Install all required packages and clone the repository.

In [1]:
%%capture
# Install PyTorch with CUDA support
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

In [2]:
%%capture
# Clone the repository
!git clone https://github.com/limcheekin/MSRS-RAFT.git
%cd MSRS-RAFT

In [3]:
%%capture
# Install all dependencies
!pip install -r requirements.txt

# Upgrade Unsloth to latest
!pip install --upgrade --force-reinstall --no-cache-dir unsloth unsloth_zoo

# Download NLTK data
import nltk
nltk.download('punkt', quiet=True)
nltk.download('punkt_tab', quiet=True)

In [4]:
# Verify installation
print("🔍 Verifying installation...\n")
!python test_installation.py

🔍 Verifying installation...

INFO: RAFT Installation Test
INFO: 
Running tests...

INFO: ✓ Python Version: Python 3.12.12
INFO: ✓ PyTorch: torch
INFO: ✓ CUDA Support: CUDA 12.8 - Tesla T4
INFO: ✓ Transformers: Transformers 4.56.2
INFO: ✓ Accelerate: accelerate
INFO: NumExpr defaulting to 2 threads.
2025-10-21 02:41:15.708498: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1761014476.056438    2402 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1761014476.151779    2402 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1761014476.860394    2402 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid li

## 🔑 Step 2: Configure API Keys and Settings

Set up your OpenAI API key and configure training parameters.

In [None]:
import os
from getpass import getpass

# Get OpenAI API key
print("📝 Enter your OpenAI API key (required for RAFT dataset generation):")
openai_key = getpass("OpenAI API Key: ")
os.environ['OPENAI_API_KEY'] = openai_key

print("\n✅ API key configured!")

📝 Enter your OpenAI API key (required for RAFT dataset generation):
OpenAI API Key: ··········

✅ API key configured!


In [None]:
# Configure training parameters for T4 GPU
from raft_config import RAFTConfig, ModelConfig, TrainingConfig, RAFTDataConfig

# Create custom config optimized for T4
config = RAFTConfig()

# Model settings (optimized for T4 15GB)
config.model.max_seq_length = 2048  # Reduced for T4
config.model.lora_r = 16  # Smaller LoRA rank
config.model.lora_alpha = 32
config.model.load_in_4bit = True

# Training settings (optimized for T4)
config.training.num_train_epochs = 2  # Fewer epochs for demo
config.training.per_device_train_batch_size = 1  # Small batch for T4
config.training.gradient_accumulation_steps = 8  # Effective batch size = 8
config.training.learning_rate = 2e-4
config.training.max_new_tokens = 512  # Reduced for T4
config.training.logging_steps = 10
config.training.eval_steps = 50
config.training.save_steps = 100

# RAFT settings
config.raft_data.oracle_percentage = 0.8
config.raft_data.num_distractors = 3  # Fewer distractors
config.raft_data.chunk_size = 1000  # Smaller chunks

# System settings
config.system.project_name = "raft-colab"
config.system.use_wandb = False  # Disable W&B for simplicity

# Save config
config.to_yaml("colab_config.yaml")

print("⚙️ Configuration for T4 GPU:")
print(f"  Max Sequence Length: {config.model.max_seq_length}")
print(f"  LoRA Rank: {config.model.lora_r}")
print(f"  Batch Size: {config.training.per_device_train_batch_size}")
print(f"  Gradient Accumulation: {config.training.gradient_accumulation_steps}")
print(f"  Effective Batch Size: {config.training.per_device_train_batch_size * config.training.gradient_accumulation_steps}")
print(f"  Epochs: {config.training.num_train_epochs}")
print(f"\n✅ Configuration saved to colab_config.yaml")

⚙️ Configuration for T4 GPU:
  Max Sequence Length: 2048
  LoRA Rank: 16
  Batch Size: 1
  Gradient Accumulation: 8
  Effective Batch Size: 8
  Epochs: 2

✅ Configuration saved to colab_config.yaml


## 📊 Step 3: Load Data and Build Retrieval Index

1. Load the MSRS Story-QA dataset and explore its structure.
2. Create vector search index for document retrieval.

In [None]:
!python raft_pipeline.py --step index --config raft_config.yaml

2025-10-20 04:38:50.928854: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1760935130.964603    3004 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1760935130.975839    3004 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1760935131.008821    3004 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760935131.008867    3004 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760935131.008877    3004 computation_placer.cc:177] computation placer alr

## 🏗️ Step 4: Generate RAFT Training Dataset

Create RAFT training examples with Chain-of-Thought reasoning and citations.

**Note**: This step uses OpenAI API and will incur costs (~$0.01-0.03 per example).

In [None]:
!python raft_pipeline.py \
  --step dataset \
  --config raft_config.yaml \
  --train-max-examples 100

2025-10-17 08:40:11.823310: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1760690411.845029    5569 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1760690411.852368    5569 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1760690411.870956    5569 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760690411.870982    5569 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760690411.870986    5569 computation_placer.cc:177] computation placer alr

In [None]:
!python raft_pipeline.py \
  --step dataset \
  --split dev \
  --eval-max-examples 20

2025-10-20 04:46:36.063266: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1760935596.124929    4989 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1760935596.159974    4989 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1760935596.309843    4989 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760935596.309886    4989 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760935596.309893    4989 computation_placer.cc:177] computation placer alr

In [None]:
!python raft_pipeline.py \
  --step dataset \
  --split test \
  --eval-max-examples 30

2025-10-20 04:52:21.249213: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1760935941.295656    6472 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1760935941.320054    6472 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1760935941.366037    6472 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760935941.366076    6472 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760935941.366080    6472 computation_placer.cc:177] computation placer alr

## 🎯 Step 5: Train Model with Unsloth

Fine-tune Qwen3-4B-Instruct using QLoRA with Unsloth.

In [16]:
!git pull origin main
!python raft_pipeline.py --step train --config raft_config.yaml

From https://github.com/limcheekin/MSRS-RAFT
 * branch            main       -> FETCH_HEAD
Already up to date.
2025-10-21 03:34:13.853864: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1761017653.876999   16518 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1761017653.884723   16518 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1761017653.904553   16518 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1761017653.904580   16518 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same

## 💾 Step 6: Save Trained Model

Save the fine-tuned model for later use or download.

In [None]:
print("💾 Saving trained model...\n")

# Save model (merged 16bit for best quality)
output_dir = "./models/raft_qwen3_colab"

trainer.save_model(
    output_dir=output_dir,
    save_method="merged_16bit"  # or "lora" for smaller size
)

print(f"\n✅ Model saved to {output_dir}")

# Check model size
import os
total_size = sum(os.path.getsize(os.path.join(dirpath, filename))
                 for dirpath, _, filenames in os.walk(output_dir)
                 for filename in filenames)
print(f"📦 Model size: {total_size / 1e9:.2f} GB")

# Optionally zip for download
print("\n📦 Creating zip file for download...")
!zip -r raft_model_colab.zip {output_dir}
print("✅ Model zipped as raft_model_colab.zip")
print("\nYou can download this file from the Files panel on the left.")

## 📊 Step 8: Evaluate Model

Test the fine-tuned model on evaluation examples.

In [15]:
!python raft_pipeline.py --step eval --config raft_config.yaml

2025-10-21 03:29:01.922754: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1761017342.361789   15054 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1761017342.478788   15054 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1761017343.257357   15054 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1761017343.257426   15054 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1761017343.257435   15054 computation_placer.cc:177] computation placer alr

## 🎨 Step 9: Interactive Demo

Try the model with your own questions!

In [None]:
def answer_question(question, top_k=3):
    """Answer a question using the trained model"""
    print(f"\n❓ Question: {question}")
    print("\n🔍 Retrieving relevant contexts...")

    # Retrieve contexts
    results = retrieval_system.retrieve(question, top_k=top_k)
    contexts = [r.text for r in results]

    print(f"   Found {len(results)} relevant documents")
    for i, r in enumerate(results, 1):
        print(f"   {i}. {r.doc_id} (score: {r.score:.4f})")

    print("\n🤖 Generating answer...\n")

    # Generate answer
    answer = evaluator.generate_answer(question, contexts)

    print("="*70)
    print("ANSWER:")
    print("="*70)
    print(answer)
    print("="*70)

    return answer

# Try some example questions
print("🎨 Interactive Demo - Try the model!\n")

# Example 1
answer_question("What is the main theme of the story?")

# Example 2
answer_question("Who are the main characters?")

# Try your own question
print("\n" + "="*70)
print("Try your own question!")
print("="*70)
custom_question = input("Enter your question: ")
if custom_question.strip():
    answer_question(custom_question)

## 📥 Step 10: Download Results

Package and download training artifacts.

In [None]:
from google.colab import files
import shutil

print("📥 Preparing files for download...\n")

# Create results package
package_dir = "raft_training_results"
!mkdir -p {package_dir}

# Copy important files
print("📦 Packaging results...")

files_to_package = [
    ("colab_config.yaml", "Configuration"),
    ("./logs/training_colab.jsonl", "Training logs"),
    ("./results/eval_colab.jsonl", "Evaluation results"),
    ("./data/raft_train.jsonl", "Training data sample"),
]

for file_path, description in files_to_package:
    if os.path.exists(file_path):
        shutil.copy(file_path, package_dir)
        print(f"  ✓ {description}")

# Create summary report
summary = f"""RAFT Training Summary
=====================

Training Configuration:
- Model: Qwen3-4B-Instruct
- LoRA Rank: {config.model.lora_r}
- Sequence Length: {config.model.max_seq_length}
- Batch Size: {config.training.per_device_train_batch_size}
- Gradient Accumulation: {config.training.gradient_accumulation_steps}
- Epochs: {config.training.num_train_epochs}
- Learning Rate: {config.training.learning_rate}

Dataset:
- Training Examples: {len(train_dataset)}
- Evaluation Examples: {len(eval_dataset) if eval_dataset else 0}
- Oracle Percentage: {config.raft_data.oracle_percentage}
- Distractors: {config.raft_data.num_distractors}

Results:
- Final Training Loss: {train_result.training_loss:.4f}
- Evaluation Metrics: See eval_colab.jsonl

Model Location: {output_dir}
Model Size: {total_size / 1e9:.2f} GB
"""

with open(f"{package_dir}/SUMMARY.txt", 'w') as f:
    f.write(summary)

print("  ✓ Summary report")

# Zip everything
print("\n📦 Creating zip file...")
!zip -r raft_results.zip {package_dir}

print("\n✅ Results packaged!")
print("\nDownload options:")
print("1. raft_results.zip - Training logs and results")
print("2. raft_model_colab.zip - Trained model (large file)")
print("\nUse the Files panel on the left to download.")

# Optionally trigger download
print("\n💾 Downloading results package...")
files.download('raft_results.zip')

## 📊 Step 11: Visualize Training Progress

Plot training metrics.

In [None]:
import json
import matplotlib.pyplot as plt

print("📊 Visualizing training progress...\n")

# Load training logs
log_file = "./logs/training_colab.jsonl"
if os.path.exists(log_file):
    logs = []
    with open(log_file, 'r') as f:
        for line in f:
            logs.append(json.loads(line))

    # Extract metrics
    steps = [log['step'] for log in logs if 'loss' in log]
    losses = [log['loss'] for log in logs if 'loss' in log]
    learning_rates = [log.get('learning_rate', 0) for log in logs if 'loss' in log]

    # Create plots
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

    # Loss plot
    ax1.plot(steps, losses, 'b-', linewidth=2)
    ax1.set_xlabel('Training Steps', fontsize=12)
    ax1.set_ylabel('Loss', fontsize=12)
    ax1.set_title('Training Loss', fontsize=14, fontweight='bold')
    ax1.grid(True, alpha=0.3)

    # Learning rate plot
    ax2.plot(steps, learning_rates, 'r-', linewidth=2)
    ax2.set_xlabel('Training Steps', fontsize=12)
    ax2.set_ylabel('Learning Rate', fontsize=12)
    ax2.set_title('Learning Rate Schedule', fontsize=14, fontweight='bold')
    ax2.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.savefig('training_progress.png', dpi=150, bbox_inches='tight')
    plt.show()

    print("✅ Training visualization complete!")
    print(f"   Total steps: {len(steps)}")
    if losses:
        print(f"   Initial loss: {losses[0]:.4f}")
        print(f"   Final loss: {losses[-1]:.4f}")
        print(f"   Improvement: {((losses[0] - losses[-1]) / losses[0] * 100):.2f}%")
else:
    print("⚠️ Training log file not found")

## 🛠️ Step 12: Troubleshooting

Run diagnostics if you encounter issues.

In [None]:
# Run this cell if you encounter issues

print("🔍 System Diagnostics\n")
print("="*70)

# Check GPU
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"GPU Memory Allocated: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")
    print(f"GPU Memory Cached: {torch.cuda.memory_reserved(0) / 1e9:.2f} GB")

# Check disk space
import shutil
total, used, free = shutil.disk_usage("/")
print(f"\nDisk Space:")
print(f"  Total: {total / 1e9:.2f} GB")
print(f"  Used: {used / 1e9:.2f} GB")
print(f"  Free: {free / 1e9:.2f} GB")

# Check Python packages
print(f"\nKey Package Versions:")
import transformers
print(f"  transformers: {transformers.__version__}")
print(f"  torch: {torch.__version__}")

try:
    from unsloth import __version__ as unsloth_version
    print(f"  unsloth: {unsloth_version}")
except:
    print(f"  unsloth: installed (version unknown)")

# Check environment variables
print(f"\nEnvironment:")
print(f"  OPENAI_API_KEY: {'Set' if os.environ.get('OPENAI_API_KEY') else 'Not set'}")

print("\n" + "="*70)
print("\nCommon Solutions:")
print("  1. Out of Memory: Reduce batch_size or max_seq_length")
print("  2. API Errors: Check OpenAI API key and credits")
print("  3. Slow Training: Ensure GPU runtime is enabled")
print("  4. Import Errors: Restart runtime and reinstall packages")
print("="*70)

## 💾 Step 13: Save to Google Drive (Optional)

Save your work to Google Drive to prevent data loss.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Create backup directory
backup_dir = '/content/drive/MyDrive/RAFT_Backup'
!mkdir -p {backup_dir}

print(f"📁 Backing up to: {backup_dir}\n")

# Copy important files
print("💾 Copying files...")
!cp -r ./models/raft_qwen3_colab {backup_dir}/ 2>/dev/null || echo "  ⚠️ Model not found"
!cp -r ./data {backup_dir}/ 2>/dev/null || echo "  ⚠️ Data not found"
!cp -r ./results {backup_dir}/ 2>/dev/null || echo "  ⚠️ Results not found"
!cp -r ./logs {backup_dir}/ 2>/dev/null || echo "  ⚠️ Logs not found"
!cp colab_config.yaml {backup_dir}/ 2>/dev/null || echo "  ⚠️ Config not found"

print("\n✅ Backup complete!")
print(f"Files saved to: {backup_dir}")

## 🎉 Congratulations!

You've successfully completed the RAFT fine-tuning pipeline!

### What you've accomplished:

✅ Installed all dependencies  
✅ Loaded MSRS Story-QA dataset  
✅ Built vector search index  
✅ Generated RAFT training data with CoT  
✅ Fine-tuned Qwen3-4B with QLoRA  
✅ Evaluated model performance  
✅ Created interactive demo  

### Next Steps:

1. **Increase dataset size** - Train on 100+ examples for better performance
2. **Tune hyperparameters** - Adjust learning rate, batch size, epochs
3. **Try different models** - Experiment with other base models
4. **Compare baselines** - Test against 0-shot and standard SFT
5. **Deploy the model** - Use for production QA tasks

### Resources:

- 📚 [RAFT Paper](https://arxiv.org/abs/2403.10131)
- 🔧 [GitHub Repository](https://github.com/limcheekin/MSRS-RAFT)
- 📖 [Unsloth Documentation](https://docs.unsloth.ai/)
- 💬 [MSRS Dataset](https://huggingface.co/datasets/yale-nlp/MSRS)

### Need Help?

- Check the [GitHub Issues](https://github.com/limcheekin/MSRS-RAFT/issues)
- Review the [COLAB_GUIDE.md](https://github.com/limcheekin/MSRS-RAFT/blob/main/COLAB_GUIDE.md)
- Run the troubleshooting cell above

---

**Happy Training! 🚀**