# Training Self-RAG Models

Train critic and generator models using QLoRA for Self-RAG system.

## Prerequisites

Before training:
1. ✅ Documents indexed (from notebook 02)
2. ✅ Training data prepared (from notebook 01)
3. ⚠️ Training requires significant compute (GPU recommended)

## Step 1: Generate Training Labels

Generate reflection token labels for Q&A data.

In [None]:
%%bash
# Generate labels using rule-based approach
uv run python -m src.training.generate_labels \
    --input ../data/samples/sample_qa_data.json \
    --output-dir ../data/training \
    --num-samples 10

echo "✅ Labels generated!"

## Step 2: Train Critic Model

Train the critic model to predict reflection tokens.

In [None]:
%%bash
# Train critic (reduce epochs for testing)
uv run python -m src.training.train_critic_qlora \
    --config ../configs/critic_config.yaml

echo "✅ Critic model trained!"

## Step 3: Train Generator Model

Train the generator model with augmented data.

In [None]:
%%bash
# Train generator with critic weights
uv run python -m src.training.train_generator_qlora \
    --config ../configs/generator_config.yaml \
    --critic-weights ../models/critic_lora/final

echo "✅ Generator model trained!"

## Step 4: Test Trained Models

Quick test of the trained Self-RAG system.

In [None]:
import sys
sys.path.append('..')

from src.self_rag.inference import load_pipeline_from_config

# Load complete pipeline
pipeline = load_pipeline_from_config(
    retrieval_config_path='../configs/retrieval_config.yaml',
    generator_config_path='../configs/generator_config.yaml',
    retriever_index_dir='../data/embeddings',
    generator_weights_path='../models/generator_lora/final',
)

print("✅ Pipeline loaded!")

In [None]:
# Test question
question = "What are the elements of negligence?"

result = pipeline.answer_question(question)

print(f"Question: {question}\n")
print(f"Answer: {result['answer']}\n")
print(f"Reflection: {result['reflection']}\n")
print(f"Score: {result['score']:.2f}")

## Training Tips

### For CPU Training:
- Reduce `per_device_train_batch_size` to 1-2
- Increase `gradient_accumulation_steps`
- Reduce `num_train_epochs` to 1 for testing
- Use smaller models if available

### For GPU Training:
- Use larger batch sizes (4-8)
- Enable `fp16` or `bf16` in config
- Monitor GPU memory usage

### Monitoring:
- Check `models/*/logs/` for TensorBoard logs
- Watch training loss decrease
- Save checkpoints frequently

## Summary

Training complete!
- ✅ Generated training labels
- ✅ Trained critic model
- ✅ Trained generator model
- ✅ Tested Self-RAG pipeline

**Next:** Proceed to `04_evaluation.ipynb` to evaluate performance