# RM-Gallery Quick Start

This notebook demonstrates the basic usage of RM-Gallery for reward model evaluation and application.

## Installation

First, make sure you have RM-Gallery installed:

In [None]:
# Install RM-Gallery
# !pip install rm-gallery

## Import Libraries

Import the necessary modules from RM-Gallery:

In [None]:
from rm_gallery.gallery.rm import RewardModel
from rm_gallery.core.data import DataProcessor
import pandas as pd

## Load a Pre-built Reward Model

RM-Gallery supports 35+ pre-built reward models. Here's how to load one:

In [None]:
# Initialize a reward model
rm = RewardModel(
    model_name="Skywork/Skywork-Reward-Llama-3.1-8B",
    device="cuda"
)

print("Reward model loaded successfully!")

## Evaluate Text Quality

Use the reward model to score text quality:

In [None]:
# Prepare sample data
prompt = "What is the capital of France?"
response_a = "The capital of France is Paris."
response_b = "I don't know."

# Score the responses
score_a = rm.score(prompt, response_a)
score_b = rm.score(prompt, response_b)

print(f"Score for Response A: {score_a:.4f}")
print(f"Score for Response B: {score_b:.4f}")
print(f"\nResponse A is better!" if score_a > score_b else "\nResponse B is better!")

## Batch Evaluation

Evaluate multiple response pairs efficiently:

In [None]:
# Prepare batch data
data = [
    {
        "prompt": "Explain quantum computing",
        "response_a": "Quantum computing uses quantum bits...",
        "response_b": "It's about computers."
    },
    {
        "prompt": "What is machine learning?",
        "response_a": "Machine learning is a subset of AI...",
        "response_b": "ML is when computers learn."
    }
]

# Batch scoring
results = rm.batch_score(data)

# Display results
df = pd.DataFrame(results)
print(df)

## Best-of-N Selection

Select the best response from multiple candidates:

In [None]:
# Generate multiple candidates
prompt = "Write a poem about AI"
candidates = [
    "AI is smart, AI is bright...",
    "In circuits deep and neural nets...",
    "Artificial minds that think and learn...",
    "Code and data intertwine..."
]

# Score all candidates
scores = [rm.score(prompt, candidate) for candidate in candidates]

# Select the best one
best_idx = scores.index(max(scores))
print(f"Best candidate (score: {scores[best_idx]:.4f}):\n{candidates[best_idx]}")

## Next Steps

- 📖 Check out the [full documentation](https://modelscope.github.io/RM-Gallery/)
- 🎯 Explore custom reward models
- 📊 Learn about evaluation pipelines
- 🚀 Try RLHF training