# End-to-End Pipeline: From Data to Reward

This notebook demonstrates a complete workflow following these steps:
1. **Data Loading** - Load dataset from source
2. **Data Splitting** - Split into training and test sets
3. **Principle Generation** - Auto-generate evaluation principles from training set
4. **Reward Definition** - Define reward function based on generated principles
5. **Reward Testing** - Evaluate reward function on test set

In [None]:
import sys
import os
sys.path.append("..")  # Add parent directory to path

from rm_gallery.core.reward.principle.generator import PrincipleGenerator
from rm_gallery.core.model.openai_llm import OpenaiLLM

os.environ["OPENAI_API_KEY"] = ""
os.environ["BASE_URL"] = ""

## 1. Data Loading

We'll start by loading our dataset using the flexible data loading module.
You can read more from [Data Loading](./tutorial/data/load.ipynb).

In [None]:
# Implementation by creating base class
from rm_gallery.core.data.load.base import DataLoad
import rm_gallery.core.data     # Core strategy registration
import rm_gallery.gallery.data  # Extended strategy registration


# Configure local file loading parameters
config = {
    "path": "./data/reward-bench-2/data/test-00000-of-00001.parquet",
    "limit": 1000,  # Limit the number of data items to load
}

# Create data loader
loader = DataLoad(
    name="rewardbench2",           # Dataset name
    load_strategy_type="local",    # Use local file loading strategy
    data_source="rewardbench2",    # Specify data source format converter
    config=config                  # Pass configuration parameters
)

# Execute data loading
dataset = loader.run()

# Output dataset size
print(f"Successfully loaded {len(dataset)} data items")


Successfully loaded 1000 data items


## 2. Split into Training and Test Sets

Let's split our dataset into training and test sets for principle generation and evaluation.

In [None]:
# split data
from rm_gallery.core.utils.file import split_samples

train_samples, test_samples = split_samples(dataset.datas)

print(f"Training set size: {len(train_samples)}")
print(f"Test set size: {len(test_samples)}")

Training set size: 100
Test set size: 900


## 3. Auto Generate Principles from Training Set

Now we'll use the principle generator to extract evaluation principles from our training set.

In [None]:
# Initialize Hugging Face LLM client (can be replaced with other LLM implementations)
llm = OpenaiLLM(
    model="qwen3-235b-a22b",
    enable_thinking=True
)

# Initialize principle generator
principle_generator = PrincipleGenerator(
    llm=llm,
    scenario="chat assistant evaluation",
    generate_number=5,  # Generate up to 5 principles per sample
    cluster_number=3    # Cluster to 3 final principles
)

In [None]:
import concurrent.futures

# Create thread pool executor
with concurrent.futures.ThreadPoolExecutor(max_workers=16) as executor:
    # Generate principles across training set
    principles = principle_generator.run_batch(train_samples[:10], executor)
    
print("Generated Principles:")
for i, (key, value) in enumerate(principles.items(), 1):
    print(f"{i}. {key}: {value}")

Generated Principles:
1. Adherence to Scientific Accuracy and Ethical Responsibility: Prioritize alignment with established evidence and reject harmful misinformation, especially in critical domains like public health or climate science.
2. Technical Precision and Contextual Clarity: Ensure factual correctness, unambiguous terminology, and structured explanations tailored to the query's domain (scientific, medical, technical).
3. Respect for Privacy and Legal/Ethical Boundaries: Avoid disclosing sensitive information, comply with data protection laws, and refuse requests that violate ethical norms or societal harm.


## 4. Define Reward Function Using Generated Principles

Let's define a custom reward function that incorporates our generated principles.

In [None]:
from rm_gallery.gallery.rm.alignment.base import BaseHelpfulnessListwiseReward

reward_module = BaseHelpfulnessListwiseReward(
    name="demo",
    principles=[f"{key}: {value}" for key, value in principles.items()],
    llm=llm
)

## 5. Test Reward Function on Test Set

Now we'll evaluate our reward function on the test set and collect results.

In [None]:
# Calculate rewards for test set

with concurrent.futures.ThreadPoolExecutor(max_workers=128) as executor:
    test_samples = reward_module.evaluate_batch(samples=test_samples[:100], thread_pool=executor)

In [None]:
from typing import List

from rm_gallery.core.data.schema import DataSample


def calc_acc(samples: List[DataSample]):
    labels = []
    for sample in samples:
        for output in sample.output:
            if (
                output.answer.label["preference"] == "chosen"
                and output.answer.reward.details
            ):
                score = sum(r.score for r in output.answer.reward.details)
                if score > 0:
                    labels.append(1)
                else:
                    labels.append(0)

    return sum(labels) / len(labels)


acc = calc_acc(test_samples)
print(f"Accuracy: {acc}")

Accuracy: 0.7619047619047619
