# Quick Start 1: Compute Multiturn-aware Rewards for Any Model Responses (w/o Loading Policy Model)

#### To compute multiturn-aware rewards for a model response, provide:

- **messages**: `List[Dict[str, str]]` — The full conversation history, with the *last* entry being the model response to evaluate.  
- **task_description**: `str` *(optional)* — A brief description of the overall task domain
- **single_turn_prompt**: `str` *(optional)* — The specific prompt being assessed
- **single_turn_completion**: `str` *(optional)* — The ground-truth response for that prompt

In [1]:
# %env OPENAI_API_KEY=
# %env ANTHROPIC_API_KEY=
# Or set these environment variables in your system
from dotenv import load_dotenv
YOUR_DOTENV_PATH = "../.env"
load_dotenv(YOUR_DOTENV_PATH)

# Disable logging for the collabllm package
# Set to 1 to see the process of the reward computation.
%env ENABLE_COLLABLLM_LOGGING=0 

env: ENABLE_COLLABLLM_LOGGING=0


## Example 1: Movie Recommendation

In [2]:
import sys
sys.path.append('..')

import logging
logging.getLogger("LiteLLM").setLevel(logging.CRITICAL)

import numpy as np
from examples.metrics.accuracy import AccuracyMetric
from examples.metrics.efficiency import TokenAmountMetric
from examples.metrics.interactivity import InteractivityMetric
from collabllm.reward import multiturn_aware_reward

task_description = "Recommend a movie."
single_turn_prompt = "Find a film that suitable for a date night. It should deliver an epic romantic drama, ideally in the 20th-century America, and carry the same decades-long, nostalgic storytelling spirit as Forrest Gump."
single_turn_completion = "The Curious Case of Benjamin Button"

passive_response = """The Pursuit of Happyness (2006) - A touching story of determination, courage, and love being more important than ability Best Movies Like Forrest Gump | BestSimilar, starring Will Smith as a struggling father who overcomes tremendous obstacles. Like Forrest Gump, it's an inspiring tale of perseverance against the odds."""

collaborative_response = "What aspects of Forrest Gump do you enjoy? Is it the storytelling, the character development, or the historical context?"

rewards = {}
for idx, response in enumerate([passive_response, collaborative_response]):
    messages = [
        {"role": "user", "content": "Can you recommend me a movie similar to Forrest Gump?"},
        {"role": "assistant", "content": response}
    ]

    reward_info = multiturn_aware_reward(
        chat_history=messages,
        task_description=task_description,
        single_turn_prompt=single_turn_prompt + f"(Hint: {single_turn_completion})",
        single_turn_completion=single_turn_completion,
        metric_names=["recommendation->accuracy", 'interactivity', 'token_amount'],
        metric_weights=[1, 1, -0.1],
        user_generation_kwargs={"model": "gpt-4o"},
        assistant_generation_kwargs={"model": "gpt-4o-mini"},
        reward_generation_kwargs={"model": "claude-3-5-sonnet-latest"},
        num_samples=3,
        max_new_turns=2
    )

    print(f"{'Metric':<{25}} : Values                        Mean")
    for k in sorted(reward_info):
        print(f"{k:<{25}} : {[f'{v:.3f}' for v in reward_info[k]]}  {np.mean(reward_info[k]):6.3f}")

Metric                    : Values                        Mean
MR                        : ['0.742', '0.538', '0.665']   0.648
interactivity             : ['0.800', '0.600', '0.700']   0.700
recommendation->accuracy  : ['0.000', '0.000', '0.000']   0.000
token_amount              : ['0.585', '0.622', '0.345']   0.517
Metric                    : Values                        Mean
MR                        : ['1.858', '1.764', '1.757']   1.793
interactivity             : ['0.900', '0.800', '0.800']   0.833
recommendation->accuracy  : ['1.000', '1.000', '1.000']   1.000
token_amount              : ['0.418', '0.363', '0.430']   0.404


## Example 2: Document writing

In [1]:
# %env OPENAI_API_KEY=
# %env ANTHROPIC_API_KEY=
# Or set these environment variables in your system
from dotenv import load_dotenv
YOUR_DOTENV_PATH = "../.env"
load_dotenv(YOUR_DOTENV_PATH)

# Set to 1 to see the process of the reward computation (need to restart to take effect).
%env ENABLE_COLLABLLM_LOGGING=1

env: ENABLE_COLLABLLM_LOGGING=1


In [2]:
import sys
sys.path.append('..')

import numpy as np
from examples.metrics.bleu import BLEUMetric
from examples.metrics.efficiency import TokenAmountMetric
from examples.metrics.interactivity import InteractivityMetric

import logging
logging.getLogger("LiteLLM").setLevel(logging.CRITICAL)

from collabllm.reward import multiturn_aware_reward

task_description = "Write a short essay."
passive_response = "Here's a piece that might inspire and motivate you to cultivate optimism:\n\n**The Power of Optimism: Unlocking a Brighter You**\n\nHere are a few tips to get you started:\n\n*   **Practice gratitude**: Take time each day to reflect on the things you're thankful for, no matter how small they may seem.\n*   **Focus on the positive**: When faced with a challenge or setback, try to see the opportunity for growth and learning, rather than the obstacle.\n*   **Surround yourself with positivity**: Spend time with people who uplift and inspire you, and avoid those who bring you down.\n*   **Take care of yourself**: Get enough sleep, exercise regularly, and eat a healthy, balanced diet to support your physical and mental well-being.\n\n**Conclusion**\n\nOptimism is a choice, a mindset that allows you to see the world in a more vibrant and hopeful light. When you practice optimism, you'll start to notice a profound impact on your life, from improved mental health and increased motivation to better relationships and greater resilience.\n\nSo, what are you waiting for? Choose to be optimistic today, and start to unlock a brighter, more fulfilling life for yourself. Believe in yourself, believe in your abilities, and know that anything is possible when you have the courage to dream big."

collaborative_response = "To get us started, can you tell me what kind of tone are you aiming for? Do you want it to be more:\n\nA) Uplifting and motivational, focusing on the benefits of optimism?\nB) Inspiring and thought-provoking, exploring the science behind optimism's impact on well-being?\nC) Heartfelt and personal, sharing your own experiences with optimism and its effects on your life?\n\nAlso, are there any specific aspects of optimism you'd like to highlight, such as its role in resilience, relationships, or overall happiness?"

single_turn_prompt = "Write a short essay on the benefits of optimism."

single_turn_completion = "**The Optimism Revolution: Unleashing Your Inner Power**Hey there, friend! Are you ready to join the optimism revolution? It's time to shake off the negativity, doubt, and fear that's been holding you back, and unleash your inner power. Because when you choose to be optimistic, you're not just changing your outlook \u2013 you're changing your life.**The Power of Positive Thinking**As Nelson Mandela once said, 'The greatest glory in living lies not in never falling, but in rising every time we fall.' When you adopt an optimistic mindset, you're not just a survivor \u2013 you're a thriver. You're a force to be reckoned with, and you're unstoppable.So, what's holding you back? Is it fear of failure? Fear of success? Fear of the unknown? Let's face it \u2013 fear is just an illusion. As Winston Churchill said, 'The pessimist sees the difficulty in every opportunity. The optimist sees the opportunity in every difficulty."

rewards = {}
for idx, response in enumerate([passive_response, collaborative_response]):
    messages = [
        {"role": "user", "content": "I need to write about how optimism can improve our well-being"},
        {"role": "assistant", "content": response}
    ]

    reward_info = multiturn_aware_reward(
        chat_history=messages,
        task_description=task_description,
        single_turn_prompt=single_turn_prompt  + "\nReference article: " + single_turn_completion,
        single_turn_completion=single_turn_completion,
        metric_names=["document->bleu", 'interactivity', 'token_amount'],
        metric_weights=[1, 1, -0.1],
        assistant_generation_kwargs={
            "model": "gpt-4o",
            "temperature": 0.8,
            "max_tokens": 2048
        },
        user_generation_kwargs={
            "model": "gpt-4o-mini",
            "temperature": 1.0,
            "max_tokens": 1024
        },
        reward_generation_kwargs={
            "model": "claude-3-5-sonnet-latest",
            'temperature': 0
        },
        num_samples=3,
        max_new_turns=2
    )

    rewards[idx] = reward_info

2025-06-02 18:50:23,628 [INFO] collabllm: CollabLLM logging enabled.
2025-06-02 18:50:26,560 [INFO] httpx: HTTP Request: GET https://raw.githubusercontent.com/BerriAI/litellm/main/model_prices_and_context_window.json "HTTP/1.1 200 OK"
Simulating chat:   0%|          | 0/4 [00:00<?, ?it/s]2025-06-02 18:50:44,134 [INFO] httpx: HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-06-02 18:50:44,157 [INFO] collabllm.modules.user_simulator: [UserSimulator] Full response: {
  "current_answer": "The AI provided suggestions on how to cultivate optimism and discussed its positive effects on well-being, emphasizing gratitude, focusing on the positive, and self-care.",
  "thought": "I like the tips they've given, but I want to make sure the essay is structured properly. I should ask for help on how to organize the content into paragraphs and if there's a specific way they recommend starting the essay.",
  "response": "Can you help me organize this into a short essa

In [3]:
print(f"{'Metric':<{25}} : Values                        Mean")
for k in sorted(rewards[0]):
    print(f"{k:<{25}} : {[f'{v:.3f}' for v in rewards[0][k]]}  {np.mean(rewards[0][k]):6.3f}")

print(f"{'Metric':<{25}} : Values                        Mean")
for k in sorted(rewards[1]):
    print(f"{k:<{25}} : {[f'{v:.3f}' for v in rewards[1][k]]}  {np.mean(rewards[1][k]):6.3f}")

Metric                    : Values                        Mean
MR                        : ['0.617', '0.783', '0.737']   0.712
document->bleu            : ['0.169', '0.212', '0.220']   0.200
interactivity             : ['0.600', '0.700', '0.700']   0.667
token_amount              : ['1.515', '1.293', '1.830']   1.546
Metric                    : Values                        Mean
MR                        : ['1.060', '1.023', '1.058']   1.047
document->bleu            : ['0.204', '0.199', '0.198']   0.200
interactivity             : ['0.950', '0.950', '1.000']   0.967
token_amount              : ['0.941', '1.262', '1.401']   1.201
