# Getting Started: Reward Tasks with Promptolution

Welcome to the world of **reward-based prompt optimization**! If you've explored our classification tutorial (`getting_started.ipynb`) or our LLM-as-a-Judge notebook (`llm_judge_getting_started.ipynb`), you've seen how to optimize prompts for predicting labels or generating content that gets rated by AI judges.

But what if you want to optimize for something completely different? What if you want to optimize for:
* **Objective, measurable outcomes** rather than subjective quality?
* **System compatibility** - does the output actually work with your software?
* **Concrete business metrics** that you can define and measure automatically?

This is where **Reward Tasks** shine. Instead of relying on pre-labeled data or AI judges, you define your own reward function - a simple Python function that takes the model's output and returns a score. The optimizer then evolves prompts that maximize this reward.

**The beauty of reward tasks**: You can optimize for literally anything you can measure! Valid JSON parsing, code execution success, mathematical correctness, format compliance, API compatibility - if you can write a function to evaluate it, you can optimize for it.

> **New to Promptolution?** If you haven't seen our other tutorials yet, check out `getting_started.ipynb` (classification) and `llm_judge_getting_started.ipynb` (LLM evaluation) first! This notebook builds on those concepts but tackles objective, measurable outcomes.

## Installation
Install Promptolution with a single command

In [None]:
! pip install promptolution[api]

## Imports

In [23]:
import pandas as pd
from promptolution.utils import ExperimentConfig
from promptolution.helpers import run_experiment
import nest_asyncio

nest_asyncio.apply()  # Required for notebook environments

## Setting Up Your Experiment

### Prepare the data

For this tutorial, we're tackling a practical business problem: **converting unstructured text into valid JSON**. This is something every company deals with - extracting structured data from emails, documents, conversations, and web content.

We're using a specialized dataset designed for JSON extraction tasks, containing diverse text examples across different domains and formats.

In [5]:
df = pd.read_parquet("hf://datasets/paraloq/json_data_extraction/data.parquet").sample(300)

**Key difference from other tasks**: Notice we're not using labeled "correct" outputs or asking an AI to judge quality. Instead, we'll define our own objective success criteria - does the output parse as valid JSON?

Let's explore what we're working with:

In [None]:
print("Dataset columns:", df.columns.tolist())
print(f"\nDataset size: {len(df)} examples")
print("\nSample text:")
print(df["text"].iloc[0][:200] + "...")

### Creating Inital Prompts

Here are some starter prompts for JSON extraction. Feel free to experiment with your own approaches!

In [15]:
init_prompts = ["Translate the provided information into a valid json schema!"]

### Configure Your LLM

Promptolution offers three flexible ways to access language models:

1. Local LLMs (using the Transformers library)
1. vLLM backend (for efficient serving of large language models)
1. API-based LLMs (compatible with any provider following the OpenAI standard)

For this demonstration, we'll use the DeepInfra API, but you can easily switch to other providers like Anthropic or OpenAI by simply changing the base_url and llm string in the configuration.

In [None]:
api_key = "YOUR_API_KEY"  # Replace with your Promptolution API key

Here's an explanation of each configuration parameter in the ExperimentConfig:
- `optimizer`: The algorithm used for prompt optimization. Currently we support "capo", "evopromptga", "evopromptde", and "opro". For this example, we use "capo" as it is capable of leveraging few-shot examples.
- `task_description`: A string describing the task you're optimizing prompts for. This is used to provide the meta-llm with context about your task.
- `prompts`: A list of initial prompt strings that will be used as the starting point for optimization.
- `n_steps`: The number of optimization steps to run. Higher values allow more exploration and refinement but require more API calls and computational resources.
- `api_url`: The API endpoint URL used to access the language model. This example uses DeepInfra's API which follows the OpenAI standard.
- `llm`: The LLM to use for the experiment, as both downstream and meta LLM.
- `token`: Your API authentication token required to access the language model service.

### Define Your Reward Function

This is where the magic happens! Unlike classification (which needs labeled data) or judging (which uses AI evaluation), reward tasks let you define exactly what "success" means for your business case.

Our reward function is beautifully simple and objective:

In [17]:
import json


def reward_function(prediction: str) -> float:
    try:
        json.loads(prediction)
        return 1.0  # Valid JSON
    except json.JSONDecodeError:
        return 0.0  # Invalid JSON

This reward function captures a real business requirement - "generate output that our systems can actually process." No subjective judgment needed, no human labeling required!

In [28]:
config = ExperimentConfig(
    optimizer="opro",
    task_description="The task is to convert information into a valid JSON schema. The LLM should generate a JSON object and return the json inside of <final_answer> tags.",
    prompts=init_prompts,
    x_column="text",
    n_steps=8,
    num_instructions_per_step=5,
    api_url="https://api.deepinfra.com/v1/openai",
    model_id="meta-llama/Meta-Llama-3-8B-Instruct",
    api_key=api_key,
    n_subsamples=15,
    task_type="reward",
    reward_function=reward_function,
)

**Difference compared to Classification and LLM-As-a-Judge**:
- `task_type="reward"` - Uses your custom reward function instead of accuracy or AI judgment
- `reward_function=reward_function` - Your objective success criteria
- `optimizer="opro"` - We already used EvoPrompt and CAPO in the other tutorials - here we will use OPRO. Its main benefit: it requires only a single initial prompt.
- No need for labeled "correct" outputs - the reward function defines success
- Completely customizable - change the reward function to optimize for anything!

## Run Your Experiment

With everything configured, you're ready to optimize your prompts! The `run_experiment` function will run the optimization and evaluate on a holdout set. You can expect this cell to take a few minutes to run.

In [29]:
prompts = run_experiment(df, config)

🔥 Starting optimization...
📊 Starting evaluation...


In [33]:
prompts.iloc[:4]

Unnamed: 0,prompt,score
0,"Convert the input text into a valid JSON schema by extracting relevant information and filling in the corresponding fields. The JSON schema format is as follows:\n{\n""meta"": {\n""type"": ""object"",\n""properties"": {\n""author"": {\n""type"": ""string""\n},\n""title"": {\n""type"": ""string""\n},\n""date"": {\n""type"": ""string""\n}\n}\n},\n""content"": {\n""type"": ""array"",\n""items"": {\n""type"": ""object"",\n""properties"": {\n""header"": {\n""type"": ""string""\n},\n""paragraph"": {\n""type"": ""string""\n}\n}\n}\n}\n}\n\nUse the class label extracted from the text between the markers ""<final_answer>"" and ""</final_answer>"" as the ""title"" field in the ""meta"" section of the JSON schema.\n\nNote: The class label should be extracted using natural language processing techniques such as named entity recognition and sentence classification. If the class label is not present in the text, use an empty string as the default value.",0.316667
1,"To convert text into a valid JSON schema, follow these steps:\n\n1. Identify the main entities and relationships present in the text.\n2. Create a JSON schema template with the following structure:\n```json\n{\n ""meta"": {\n ""type"": ""object"",\n ""properties"": {\n ""title"": {\n ""type"": ""string""\n },\n ""author"": {\n ""type"": ""string""\n },\n ""date"": {\n ""type"": ""string""\n }\n }\n },\n ""content"": {\n ""type"": ""array"",\n ""items"": {\n ""type"": ""object"",\n ""properties"": {\n ""header"": {\n ""type"": ""string""\n },\n ""paragraphs"": {\n ""type"": ""array"",\n ""items"": {\n ""type"": ""string""\n }\n }\n }\n }\n }\n}\n```\n3. Extract relevant information from the text and fill in the corresponding fields in the JSON schema template.\n4. Use the class label extracted from the text between the markers ""<final_answer>"" and ""</final_answer>"" as the ""title"" field in the ""meta"" section of the JSON schema.\n\nNote: If the class label is not present in the text, use an empty string as the default value.\n\nScore: 95",0.133333
2,"Convert the provided text into a valid JSON schema by extracting relevant information and filling in the corresponding fields. The JSON schema format is as follows:\n\n{\n""meta"": {\n""type"": ""object"",\n""properties"": {\n""title"": {\n""type"": ""string""\n},\n""author"": {\n""type"": ""string""\n},\n""tags"": {\n""type"": ""array"",\n""items"": {\n""type"": ""string""\n}\n},\n""date"": {\n""type"": ""string""\n}\n},\n""content"": {\n""type"": ""array"",\n""items"": {\n""type"": ""object"",\n""properties"": {\n""heading"": {\n""type"": ""string""\n},\n""paragraphs"": {\n""type"": ""array"",\n""items"": {\n""type"": ""string""\n}\n}\n}\n}\n}\n}\n\nExtract relevant information from the text and fill in the corresponding fields in the JSON schema. The class label is the text within the markers ""<final_answer>"" and ""</final_answer>"".",0.066667
3,"Convert the provided text into a JSON schema in the following structure:\n\n{\n""advisory"": {\n""type"": ""object"",\n""properties"": {\n""title"": {\n""type"": ""string""\n},\n""issued"": {\n""type"": ""string""\n},\n""level"": {\n""type"": ""integer""\n},\n""reason"": {\n""type"": ""string""\n},\n""recommendations"": {\n""type"": ""array"",\n""items"": {\n""type"": ""string""\n}\n}\n},\n""character"": {\n""type"": ""object"",\n""properties"": {\n""name"": {\n""type"": ""string""\n},\n""level"": {\n""type"":"": ""integer""\n},\n""health"": {\n""type"": ""integer""\n},\n""mana"": {\n""type"": ""integer""\n},\n""strength"": {\n""type"": ""integer""\n},\n""agility"": {\n""type"": ""integer""\n},\n""intelligence"": {\n""type"": ""integer""\n},\n""equipment"": {\n""type"": ""array"",\n""items"": {\n""type"": ""object"",\n""properties"": {\n""name"": {\n""type"": ""string""\n},\n""damage"": {\n""type"": ""integer""\n},\n""defense"": {\n""type"": ""integer""\n},\n""durability"": {\n""type"": ""integer""\n}\n}\n}\n},\n""skills"": {\n""type"": ""array"",\n""items"": {\n""type"": ""object"",\n""properties"": {\n""name"": {\n""type"": ""string""\n},\n""level"": {\n""type"": ""integer""\n}\n}\n}\n},\n""quests"": {\n""type"": ""object"",\n""properties"": {\n""active"": {\n""type"": ""array"",\n""items"": {\n""type"": ""string""\n}\n},\n""completed"": {\n""type"": ""array"",\n""items"": {\n""type"": ""string""\n}\n}\n}\n}\n}\n}\n\nPlease extract relevant information from the text and fill in the corresponding fields in the JSON schema. The corresponding class label is the text within the markers ""<final_answer>"" and ""</final_answer>"".",0.033333


We might think 'just ask for JSON' would work fine, but optimization showed that detailed instructions perform much better - another reminder that optimization beats guessing!


Happy prompt optimizing! 🚀✨ We can't wait to see what you build with Promptolution! 🤖💡