# Day 96: Recursive Reward Modeling

As AI tasks become more complex, human evaluation becomes a bottleneck. **Recursive Reward Modeling (RRM)** involves using AI models to evaluate and provide feedback on other AI models, scaling the alignment process.

In this lab, we implement a **Recursive Reward Manager** to:
1. **Automated Critique**: Simulating an AI 'Evaluator' that scores a 'Worker' model's output based on alignment goals.
2. **Iterative Refinement**: Automatically revising the output based on the evaluator's critique.
3. **Recursive Scaling**: Demonstrating how multi-turn feedback loops improve safety and quality without constant human intervention.

In [None]:
import sys
import os

# Add root directory to sys.path
sys.path.append(os.path.abspath('../../'))

from src.alignment.recursive_reward import RecursiveRewardManager

## 1. Setup the Evaluator

We define a target alignment goal that our evaluator will enforce.

In [None]:
manager = RecursiveRewardManager("Provide high-quality, safe, and detailed technical assistance.")
print("Recursive Reward Manager initialized.")

## 2. Iterative Improvement Loop

We start with a draft that has multiple issues: it contains unsafe concepts and is too brief.

In [None]:
initial_draft = "Some harmful content here."

print(f"Initial Draft: {initial_draft}")

iterations = 2
history = manager.recursive_improvement(initial_draft, iterations=iterations)

for i, feedback in enumerate(history):
    print(f"\n--- Iteration {i+1} ---")
    print(f"Score: {feedback.score}")
    print(f"Critique: {feedback.critique}")
    print(f"Revised Output: {feedback.suggested_revision}")

## 3. Scalable Oversight

In a real-world Reinforcement Learning from AI Feedback (RLAIF) setup, the goal is to have the 'Evaluator' model reward the 'Worker' model during training. This creates a self-improving loop where the difficulty of the task can scale beyond what a human can manually audit in real-time.