# Finetuning Agents with GRPO

In this cookbook, we'll explore how to enhance the performance of agentic systems by fine-tuning them with Generalized Reinforcement from Preference Optimization (GRPO). Specifically, we'll focus on query rewriting - a critical component in retrieval systems that transforms vague user questions into more effective search queries.

What makes this approach particularly exciting is that we'll be using a smaller model - Qwen 1.7B - rather than relying on massive models like GPT-4. This demonstrates how GRPO can unlock impressive capabilities from more efficient, cost-effective models that can run locally or on modest hardware.

GRPO, as implemented in DSPy, is a powerful technique that generalizes popular online reinforcement learning algorithms, enabling more effective learning from interactions. By applying GRPO to query rewriting with smaller models, we can systematically improve retrieval performance without the computational and financial costs of larger models.

In this notebook, we'll walk through:
1. Setting up a DSPy environment with the Qwen 1.7B model
2. Creating a simple query rewriting agent for retrieval
3. Defining a reward function based on retrieval success
4. Fine-tuning the query rewriter with GRPO
5. Evaluating the performance improvements

By the end, you'll understand how to apply GRPO to optimize query rewriting using smaller models, achieving better performance without relying on massive models or extensive manual prompt engineering.

## Requirements

Before we begin, ensure you have the necessary packages. If you're running this in an environment where `dspy` and its dependencies are not yet installed, you might need to install them. For this notebook, the key libraries are `dspy` and potentially others for data handling or specific model interactions.

In [9]:
%pip install dspy bm25s PyStemmer git+https://github.com/Ziems/arbor.git git+https://github.com/stanfordnlp/dspy.git@refs/pull/8171/head

  pid, fd = os.forkpty()


Collecting git+https://github.com/Ziems/arbor.git
  Cloning https://github.com/Ziems/arbor.git to /private/var/folders/zy/wms_7q0j39x_rvs746kfr1tr0000gn/T/pip-req-build-g1gwfrt3
  Running command git clone --filter=blob:none --quiet https://github.com/Ziems/arbor.git /private/var/folders/zy/wms_7q0j39x_rvs746kfr1tr0000gn/T/pip-req-build-g1gwfrt3
  Resolved https://github.com/Ziems/arbor.git to commit 2f11573e60fe599c79bc768c37b5160a46303932
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[?25hCollecting git+https://github.com/stanfordnlp/dspy.git@refs/pull/8171/head
  Cloning https://github.com/stanfordnlp/dspy.git (to revision refs/pull/8171/head) to /private/var/folders/zy/wms_7q0j39x_rvs746kfr1tr0000gn/T/pip-req-build-g7wtci0j
  Running command git clone --filter=blob:none --quiet https://github.com/stanfordnlp/dspy.git /private/var/folders/zy/wms_7q0j39x_rvs746kfr1t

## Set up

First, let's configure our environment. This involves connecting to an AI model provider. In this example, we'll set up a connection to a local Arbor server, which will act as our Reinforcement Learning (RL) server. This server handles inference and RL requests over HTTP. We'll also specify and load the Qwen3-1.7B model. 

In [205]:
import dspy
from dspy.clients.lm_local_arbor import ArborProvider

# Connect to local Arbor server
port = 7453
local_lm_name = "Qwen/Qwen3-1.7B"

local_lm = dspy.LM(
    model=f"openai/arbor:{local_lm_name}",
    provider=ArborProvider(),
    temperature=0.7,
    api_base=f"http://localhost:{port}/v1/",
    api_key="arbor",
)

dspy.configure(lm=local_lm)

## Load Dataset

With our environment configured, the next step is to load a dataset. For this example, we'll use a dataset containing questions about GPT research papers (GPT-1, GPT-2, GPT-3, GPT-4). Each example contains a query and its expected answer.

DSPy works with examples in a specific format, so we'll convert our raw data into `dspy.Example` objects. Each example will have a question as input and the expected answer for evaluation. We'll split our dataset into training, validation, and test sets to properly evaluate our approach.

The training set will be used to optimize our agent, the validation set to tune parameters and monitor progress, and the test set for final evaluation.

In [206]:
import json
import random

# Load the dataset from a JSON file
ds = json.load(open("../data/evalset/evalset.json"))
document_chunks = list({doc["document"] for doc in ds})  

# Convert to DSPy Examples
examples = [
    dspy.Example(question=ex["query"], answers=[ex["answer"]]).with_inputs("question")
    for ex in ds
    if ex["answer"].strip()
]

# Shuffle for randomness and reproducibility
random.seed(42)
random.shuffle(examples)

# Split into train, validation, and test sets
trainset = examples[:100]
devset = examples[100:150]
testset = examples[150:200]

print(f"Train size: {len(trainset)}, Dev size: {len(devset)}, Test size: {len(testset)}")

Train size: 100, Dev size: 50, Test size: 50


## Implement Search Functionality

Before building our agent, we need to implement the search functionality that will retrieve relevant documents based on a query. In a real-world application, this might connect to a vector database or search engine.

For this example, we'll create a simple search function that simulates document retrieval from our corpus of GPT research papers. The function will:
1. Take a query string and number of results (k) as input
2. Tokenize and embed the query
3. Retrieve the k most relevant documents based on embedding similarity
4. Return the list of retrieved documents

This search function will be used by our agent to find information relevant to user questions.

In [207]:
import bm25s
import Stemmer

#corpus = [f"{ex.inputs()['question']} | {ans}" for ex in trainset for ans in ex.answers]
corpus = document_chunks
stemmer = Stemmer.Stemmer("english")
corpus_tokens = bm25s.tokenize(corpus, stopwords="en", stemmer=stemmer)
retriever = bm25s.BM25(k1=0.9, b=0.4)
retriever.index(corpus_tokens)

# BM25 Search Wrapper
def search(query: str, k: int = 3):
    tokens = bm25s.tokenize(query, stopwords="en", stemmer=stemmer, show_progress=False)
    results, scores = retriever.retrieve(tokens, k=k, n_threads=1, show_progress=False)
    run = {corpus[doc]: float(score) for doc, score in zip(results[0], scores[0])}
    return list(run.keys())

DEBUG:bm25s:Building index from IDs objects           
                                                             

## Building the Agent

Now we'll create our agent using DSPy's module system. Our agent will be a simple query rewriter that takes a user question, rewrites it to be more specific and search-friendly, and then retrieves relevant documents.

The agent consists of two main components:
1. A query rewriting module that uses Chain-of-Thought reasoning to improve the original question
2. A document retrieval step that uses our search function to find relevant information

This simple agent will serve as our baseline before optimization with GRPO.

In [208]:
# DSPy Module for Query Rewriting
class QueryRewriter(dspy.Module):
    def __init__(self):
        super().__init__()

        self.rewrite = dspy.ChainOfThought(
            dspy.Signature(
                "question -> rewritten_query",
                "Rewrite the vague user question into a more specific search query."
            )
        )
        self.rewrite.set_lm(dspy.settings.lm)

    def forward(self, question):
        rewritten_query = self.rewrite(question=question).rewritten_query
        retrieved_docs = search(rewritten_query, k=3)
        return dspy.Prediction(rewritten_query=rewritten_query, retrieved_docs=retrieved_docs)


## Defining the Reward Function

For GRPO to work effectively, we need to define a reward function that evaluates the performance of our agent. This function will determine how well the agent is doing and guide the optimization process.

In our case, we'll use a simple reward function that checks if any of the retrieved documents contain the expected answer. This binary reward (0 or 1) will indicate whether the agent successfully found the information needed to answer the user's question.

For this example, we'll keep it simple with a binary reward based on exact substring matching.

In [209]:
import re
# Reward Function
def contains_answer(example, pred, trace=None):
    docs = [doc.lower() for doc in pred.retrieved_docs]
    answers = [ans.lower() for ans in example.answers]

    def normalize(text):
        return re.sub(r"[^a-z0-9]", " ", text.lower()).split()

    for answer in answers:
        answer_tokens = set(normalize(answer))
        for doc in docs:
            doc_tokens = set(normalize(doc))
            if len(answer_tokens & doc_tokens) / len(answer_tokens) > 0.75:  # 75% token overlap
                return 1.0
    return 0.0

# Recall Score
def recall_score(example, pred, trace=None):
    print("QUESTION:", example.inputs())
    print("ANSWERS:", example.answers)
    print("RETRIEVED:", pred.retrieved_docs)
    predictions = [doc.lower() for doc in pred.retrieved_docs]
    labels = [answer.lower() for answer in example.answers]
    if not labels:
        return 0.0
    hits = sum(any(label in doc for doc in predictions) for label in labels)
    return hits / len(labels)

## Evaluating the Baseline Agent

Before optimizing our agent, we need to establish a baseline performance. This will help us measure the improvement achieved through GRPO.

We'll use DSPy's evaluation framework to test our agent on the validation set. The evaluation will:
1. Run the agent on each example in the validation set
2. Apply our reward function to measure performance
3. Calculate the average reward across all examples

This baseline score will serve as our reference point for improvement.

In [210]:
# Baseline Eval
program = QueryRewriter()
evaluate = dspy.Evaluate(devset=devset, metric=contains_answer, num_threads=4, display_progress=True)
baseline_result = evaluate(program)

print(f"\nBaseline Performance: {baseline_result:.4f}")

Average Metric: 14.00 / 50 (28.0%): 100%|██████████| 50/50 [00:00<00:00, 759.95it/s]

2025/05/07 18:28:03 INFO dspy.evaluate.evaluate: Average Metric: 14.0 / 50 (28.0%)




Baseline Performance: 28.0000


## Optimizing with GRPO

Now that we have our baseline agent and evaluation metric, we can apply GRPO to optimize the agent's performance. GRPO works by:

1. Sampling multiple outputs from the agent for each input
2. Evaluating each output using our reward function
3. Using the rewards to update the model's parameters through reinforcement learning

The key parameters for GRPO include:
- `update_interval`: How often to update the model
- `num_samples_per_input`: How many different outputs to generate for each input
- `num_train_steps`: Total number of training steps
- `beta`: Controls the trade-off between optimizing for rewards and staying close to the original model

We'll configure these parameters and run the optimization process.

In [211]:
from dspy.teleprompt.grpo import GRPO

# Configure GRPO parameters
train_kwargs = {
    "update_interval": 3,
    "per_device_train_batch_size": 2,  # reduced from 8
    "gradient_accumulation_steps": 8,  # increased to maintain effective batch size
    "temperature": 0.7,
    "beta": 0.04,
    "learning_rate": 1e-5,
    "gradient_checkpointing": True,
    "gradient_checkpointing_kwargs": {"use_reentrant": False},
    "bf16": True,
    "lr_scheduler_type": "constant_with_warmup",
    "max_prompt_length": 512,           # add to control token length
    "max_completion_length": 128,
    "scale_rewards": True,
    "max_grad_norm": 0.5,
    "lora": True,
}

# Initialize the GRPO compiler
compiler = GRPO(
    metric=contains_answer,
    multitask=True,
    num_dspy_examples_per_grpo_step=4,
    num_samples_per_input=8,
    exclude_demos=True,
    num_train_steps=100,
    num_threads=24,
    use_train_as_val=False,
    num_steps_for_val=10,
    train_kwargs=train_kwargs,
    report_train_scores=False,
)

print("Starting GRPO optimization. This may take some time...")
optimized_program = compiler.compile(student=program, trainset=trainset, valset=devset)
print("Optimization complete!")

2025/05/07 18:28:05 INFO dspy.teleprompt.grpo: Validating the inputs...
2025/05/07 18:28:05 INFO dspy.teleprompt.grpo: Preparing the student program...
2025/05/07 18:28:05 INFO dspy.teleprompt.grpo: Preparing the GRPO training job(s)...


Starting GRPO optimization. This may take some time...


2025/05/07 18:29:08 INFO dspy.teleprompt.grpo: Using user provided validation set and not reporting train scores.
2025/05/07 18:29:08 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set before training loop...


Average Metric: 0.00 / 9 (0.0%):  16%|█▌        | 8/50 [00:26<10:42, 15.30s/it]



Average Metric: 5.00 / 24 (20.8%):  46%|████▌     | 23/50 [00:26<00:22,  1.21it/s]

2025/05/07 18:29:41 ERROR dspy.utils.parallelizer: Error for Example({'question': 'summarize the limitations of current self-supervised learning approaches as mentioned in the paper', 'answers': ['The limitations of current self-supervised learning approaches include: \n\n1. Equal weighting for all tokens in the pretraining objective, neglecting the importance of certain tokens.\n2. The reliance on forcing tasks into a prediction problem, which may not align with the goal-directed nature of useful language systems.\n3. A lack of grounding in other domains, such as video or physical interaction, which limits contextual understanding of the world.\n\nThese factors suggest that purely self-supervised prediction approaches may reach a limit, indicating the need for augmentation with alternative methods.']}) (input_keys={'question'}): ('Failed to parse response as per signature from original completion with input and num present and expected', '{\n\n\n}', StringSignature(question -> reasoni

Average Metric: 12.00 / 49 (24.5%): 100%|██████████| 50/50 [01:02<00:00,  1.26s/it]

2025/05/07 18:30:11 INFO dspy.evaluate.evaluate: Average Metric: 12.0 / 50 (24.0%)
2025/05/07 18:30:11 INFO dspy.teleprompt.grpo: Student program validation set score before training loop: 24.0
2025/05/07 18:30:11 INFO dspy.teleprompt.grpo: Starting the GRPO training loop...
2025/05/07 18:30:11 INFO dspy.teleprompt.grpo: GRPO training step 1/100...
2025/05/07 18:30:11 INFO dspy.teleprompt.grpo: Updating shuffled trainset for epoch 0...
2025/05/07 18:30:11 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:28<00:00,  1.11it/s]

2025/05/07 18:30:40 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 18:30:40 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:30:40 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:30:41 INFO dspy.teleprompt.grpo: GRPO training step 1/100 completed.
2025/05/07 18:30:41 INFO dspy.teleprompt.grpo: GRPO training step 2/100...
2025/05/07 18:30:41 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:38<00:00,  1.21s/it]

2025/05/07 18:31:19 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 18:31:19 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:31:19 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:31:20 INFO dspy.teleprompt.grpo: GRPO training step 2/100 completed.
2025/05/07 18:31:20 INFO dspy.teleprompt.grpo: GRPO training step 3/100...
2025/05/07 18:31:20 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 25.00 / 32 (78.1%): 100%|██████████| 32/32 [00:31<00:00,  1.01it/s]

2025/05/07 18:31:52 INFO dspy.evaluate.evaluate: Average Metric: 25.0 / 32 (78.1%)
2025/05/07 18:31:52 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:31:52 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:31:52 INFO dspy.teleprompt.grpo: Current train step is 3. Updating the model...
2025/05/07 18:33:27 INFO dspy.teleprompt.grpo: GRPO training step 3/100 completed.
2025/05/07 18:33:27 INFO dspy.teleprompt.grpo: GRPO training step 4/100...
2025/05/07 18:33:27 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 3.00 / 10 (30.0%):  28%|██▊       | 9/32 [00:33<04:37, 12.08s/it]



Average Metric: 7.00 / 32 (21.9%): 100%|██████████| 32/32 [00:52<00:00,  1.63s/it]

2025/05/07 18:34:20 INFO dspy.evaluate.evaluate: Average Metric: 7.0 / 32 (21.9%)
2025/05/07 18:34:20 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:34:20 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:34:20 INFO dspy.teleprompt.grpo: GRPO training step 4/100 completed.
2025/05/07 18:34:20 INFO dspy.teleprompt.grpo: GRPO training step 5/100...
2025/05/07 18:34:20 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:28<00:00,  1.12it/s]

2025/05/07 18:34:49 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 18:34:49 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:34:49 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:34:49 INFO dspy.teleprompt.grpo: GRPO training step 5/100 completed.
2025/05/07 18:34:49 INFO dspy.teleprompt.grpo: GRPO training step 6/100...
2025/05/07 18:34:49 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 23.00 / 32 (71.9%): 100%|██████████| 32/32 [00:19<00:00,  1.61it/s]

2025/05/07 18:35:09 INFO dspy.evaluate.evaluate: Average Metric: 23.0 / 32 (71.9%)
2025/05/07 18:35:09 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:35:09 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:35:09 INFO dspy.teleprompt.grpo: Current train step is 6. Updating the model...
2025/05/07 18:36:45 INFO dspy.teleprompt.grpo: GRPO training step 6/100 completed.
2025/05/07 18:36:45 INFO dspy.teleprompt.grpo: GRPO training step 7/100...
2025/05/07 18:36:45 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:34<00:00,  1.08s/it]

2025/05/07 18:37:19 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 18:37:19 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:37:19 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:37:20 INFO dspy.teleprompt.grpo: GRPO training step 7/100 completed.
2025/05/07 18:37:20 INFO dspy.teleprompt.grpo: GRPO training step 8/100...
2025/05/07 18:37:20 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 3.00 / 19 (15.8%):  59%|█████▉    | 19/32 [00:11<00:05,  2.23it/s]



Average Metric: 3.00 / 32 (9.4%): 100%|██████████| 32/32 [00:25<00:00,  1.25it/s] 

2025/05/07 18:37:45 INFO dspy.evaluate.evaluate: Average Metric: 3.0 / 32 (9.4%)
2025/05/07 18:37:45 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:37:45 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:37:46 INFO dspy.teleprompt.grpo: GRPO training step 8/100 completed.
2025/05/07 18:37:46 INFO dspy.teleprompt.grpo: GRPO training step 9/100...
2025/05/07 18:37:46 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 9.00 / 20 (45.0%):  62%|██████▎   | 20/32 [00:29<00:13,  1.15s/it]



Average Metric: 12.00 / 32 (37.5%): 100%|██████████| 32/32 [00:53<00:00,  1.67s/it]

2025/05/07 18:38:39 INFO dspy.evaluate.evaluate: Average Metric: 12.0 / 32 (37.5%)
2025/05/07 18:38:39 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:38:39 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:38:40 INFO dspy.teleprompt.grpo: Current train step is 9. Updating the model...
2025/05/07 18:40:10 INFO dspy.teleprompt.grpo: GRPO training step 9/100 completed.
2025/05/07 18:40:10 INFO dspy.teleprompt.grpo: GRPO training step 10/100...
2025/05/07 18:40:10 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:32<00:00,  1.00s/it]

2025/05/07 18:40:42 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 18:40:42 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:40:42 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:40:42 INFO dspy.teleprompt.grpo: GRPO training step 10/100 completed.
2025/05/07 18:40:42 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 10/100


Average Metric: 11.00 / 50 (22.0%): 100%|██████████| 50/50 [00:57<00:00,  1.15s/it]

2025/05/07 18:41:40 INFO dspy.evaluate.evaluate: Average Metric: 11.0 / 50 (22.0%)
2025/05/07 18:41:40 INFO dspy.teleprompt.grpo: Student program validation set score after training step 10/100: 22.0
2025/05/07 18:41:40 INFO dspy.teleprompt.grpo: GRPO training step 11/100...
2025/05/07 18:41:40 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:16<00:00,  1.94it/s]

2025/05/07 18:41:57 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 18:41:57 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:41:57 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:41:57 INFO dspy.teleprompt.grpo: GRPO training step 11/100 completed.
2025/05/07 18:41:57 INFO dspy.teleprompt.grpo: GRPO training step 12/100...
2025/05/07 18:41:57 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 10.00 / 32 (31.2%): 100%|██████████| 32/32 [00:20<00:00,  1.59it/s]

2025/05/07 18:42:17 INFO dspy.evaluate.evaluate: Average Metric: 10.0 / 32 (31.2%)
2025/05/07 18:42:17 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:42:17 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:42:18 INFO dspy.teleprompt.grpo: Current train step is 12. Updating the model...
2025/05/07 18:43:48 INFO dspy.teleprompt.grpo: GRPO training step 12/100 completed.
2025/05/07 18:43:48 INFO dspy.teleprompt.grpo: GRPO training step 13/100...
2025/05/07 18:43:48 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:49<00:00,  1.55s/it]

2025/05/07 18:44:38 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 18:44:38 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:44:38 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:44:38 INFO dspy.teleprompt.grpo: GRPO training step 13/100 completed.
2025/05/07 18:44:38 INFO dspy.teleprompt.grpo: GRPO training step 14/100...
2025/05/07 18:44:38 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:39<00:00,  1.24s/it]

2025/05/07 18:45:18 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 18:45:18 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:45:18 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:45:19 INFO dspy.teleprompt.grpo: GRPO training step 14/100 completed.
2025/05/07 18:45:19 INFO dspy.teleprompt.grpo: GRPO training step 15/100...
2025/05/07 18:45:19 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 7.00 / 32 (21.9%): 100%|██████████| 32/32 [00:25<00:00,  1.27it/s]

2025/05/07 18:45:44 INFO dspy.evaluate.evaluate: Average Metric: 7.0 / 32 (21.9%)
2025/05/07 18:45:44 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:45:44 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:45:44 INFO dspy.teleprompt.grpo: Current train step is 15. Updating the model...
2025/05/07 18:47:19 INFO dspy.teleprompt.grpo: GRPO training step 15/100 completed.
2025/05/07 18:47:19 INFO dspy.teleprompt.grpo: GRPO training step 16/100...
2025/05/07 18:47:19 INFO dspy.teleprompt.grpo: Bootstrapping data...


  0%|          | 0/50 [44:07<?, ?it/s]▎         | 1/32 [00:03<01:38,  3.17s/it]
  0%|          | 0/32 [44:17<?, ?it/s]
  0%|          | 0/50 [43:39<?, ?it/s]
Average Metric: 3.00 / 32 (9.4%): 100%|██████████| 32/32 [00:54<00:00,  1.69s/it] 

2025/05/07 18:48:14 INFO dspy.evaluate.evaluate: Average Metric: 3.0 / 32 (9.4%)
2025/05/07 18:48:14 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:48:14 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:48:14 INFO dspy.teleprompt.grpo: GRPO training step 16/100 completed.
2025/05/07 18:48:14 INFO dspy.teleprompt.grpo: GRPO training step 17/100...
2025/05/07 18:48:14 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 13.00 / 32 (40.6%): 100%|██████████| 32/32 [00:27<00:00,  1.14it/s]

2025/05/07 18:48:42 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 32 (40.6%)
2025/05/07 18:48:42 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:48:42 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:48:43 INFO dspy.teleprompt.grpo: GRPO training step 17/100 completed.
2025/05/07 18:48:43 INFO dspy.teleprompt.grpo: GRPO training step 18/100...
2025/05/07 18:48:43 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 16.00 / 32 (50.0%): 100%|██████████| 32/32 [00:36<00:00,  1.14s/it]

2025/05/07 18:49:19 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 32 (50.0%)
2025/05/07 18:49:19 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:49:19 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:49:19 INFO dspy.teleprompt.grpo: Current train step is 18. Updating the model...
2025/05/07 18:50:50 INFO dspy.teleprompt.grpo: GRPO training step 18/100 completed.
2025/05/07 18:50:50 INFO dspy.teleprompt.grpo: GRPO training step 19/100...
2025/05/07 18:50:50 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 15.00 / 32 (46.9%): 100%|██████████| 32/32 [00:45<00:00,  1.43s/it]

2025/05/07 18:51:36 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 32 (46.9%)
2025/05/07 18:51:36 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:51:36 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:51:36 INFO dspy.teleprompt.grpo: GRPO training step 19/100 completed.
2025/05/07 18:51:36 INFO dspy.teleprompt.grpo: GRPO training step 20/100...
2025/05/07 18:51:36 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:29<00:00,  1.09it/s]

2025/05/07 18:52:05 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 18:52:05 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:52:05 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:52:06 INFO dspy.teleprompt.grpo: GRPO training step 20/100 completed.
2025/05/07 18:52:06 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 20/100


Average Metric: 12.00 / 50 (24.0%): 100%|██████████| 50/50 [00:58<00:00,  1.17s/it]

2025/05/07 18:53:04 INFO dspy.evaluate.evaluate: Average Metric: 12.0 / 50 (24.0%)
2025/05/07 18:53:04 INFO dspy.teleprompt.grpo: Student program validation set score after training step 20/100: 24.0
2025/05/07 18:53:04 INFO dspy.teleprompt.grpo: GRPO training step 21/100...
2025/05/07 18:53:04 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 15.00 / 32 (46.9%): 100%|██████████| 32/32 [00:39<00:00,  1.25s/it]

2025/05/07 18:53:44 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 32 (46.9%)
2025/05/07 18:53:44 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:53:44 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:53:45 INFO dspy.teleprompt.grpo: Current train step is 21. Updating the model...
2025/05/07 18:55:20 INFO dspy.teleprompt.grpo: GRPO training step 21/100 completed.
2025/05/07 18:55:20 INFO dspy.teleprompt.grpo: GRPO training step 22/100...
2025/05/07 18:55:20 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 4.00 / 32 (12.5%): 100%|██████████| 32/32 [00:21<00:00,  1.51it/s]

2025/05/07 18:55:41 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 32 (12.5%)
2025/05/07 18:55:41 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:55:41 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:55:42 INFO dspy.teleprompt.grpo: GRPO training step 22/100 completed.
2025/05/07 18:55:42 INFO dspy.teleprompt.grpo: GRPO training step 23/100...
2025/05/07 18:55:42 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:25<00:00,  1.28it/s]

2025/05/07 18:56:07 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 18:56:07 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:56:07 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:56:07 INFO dspy.teleprompt.grpo: GRPO training step 23/100 completed.
2025/05/07 18:56:07 INFO dspy.teleprompt.grpo: GRPO training step 24/100...
2025/05/07 18:56:07 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 1.00 / 32 (3.1%): 100%|██████████| 32/32 [00:19<00:00,  1.68it/s]

2025/05/07 18:56:26 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 32 (3.1%)
2025/05/07 18:56:26 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:56:26 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:56:27 INFO dspy.teleprompt.grpo: Current train step is 24. Updating the model...
2025/05/07 18:58:02 INFO dspy.teleprompt.grpo: GRPO training step 24/100 completed.
2025/05/07 18:58:02 INFO dspy.teleprompt.grpo: GRPO training step 25/100...
2025/05/07 18:58:02 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:20<00:00,  1.55it/s]

2025/05/07 18:58:23 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 18:58:23 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:58:23 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:58:23 INFO dspy.teleprompt.grpo: GRPO training step 25/100 completed.
2025/05/07 18:58:23 INFO dspy.teleprompt.grpo: GRPO training step 26/100...
2025/05/07 18:58:23 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:18<00:00,  1.70it/s]

2025/05/07 18:58:42 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 18:58:42 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:58:42 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:58:42 INFO dspy.teleprompt.grpo: GRPO training step 26/100 completed.
2025/05/07 18:58:42 INFO dspy.teleprompt.grpo: GRPO training step 27/100...
2025/05/07 18:58:42 INFO dspy.teleprompt.grpo: Updating shuffled trainset for epoch 1...
2025/05/07 18:58:42 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 5.00 / 32 (15.6%): 100%|██████████| 32/32 [00:36<00:00,  1.13s/it]

2025/05/07 18:59:19 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 32 (15.6%)
2025/05/07 18:59:19 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 18:59:19 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 18:59:19 INFO dspy.teleprompt.grpo: Current train step is 27. Updating the model...
2025/05/07 19:01:04 INFO dspy.teleprompt.grpo: GRPO training step 27/100 completed.
2025/05/07 19:01:04 INFO dspy.teleprompt.grpo: GRPO training step 28/100...
2025/05/07 19:01:04 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:42<00:00,  1.33s/it]

2025/05/07 19:01:47 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:01:47 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:01:47 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:01:47 INFO dspy.teleprompt.grpo: GRPO training step 28/100 completed.
2025/05/07 19:01:47 INFO dspy.teleprompt.grpo: GRPO training step 29/100...
2025/05/07 19:01:47 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:38<00:00,  1.19s/it]

2025/05/07 19:02:26 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:02:26 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:02:26 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:02:26 INFO dspy.teleprompt.grpo: GRPO training step 29/100 completed.
2025/05/07 19:02:26 INFO dspy.teleprompt.grpo: GRPO training step 30/100...
2025/05/07 19:02:26 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 15.00 / 32 (46.9%): 100%|██████████| 32/32 [00:34<00:00,  1.08s/it]

2025/05/07 19:03:01 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 32 (46.9%)
2025/05/07 19:03:01 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:03:01 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:03:01 INFO dspy.teleprompt.grpo: Current train step is 30. Updating the model...
2025/05/07 19:04:32 INFO dspy.teleprompt.grpo: GRPO training step 30/100 completed.
2025/05/07 19:04:32 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 30/100


Average Metric: 12.00 / 50 (24.0%): 100%|██████████| 50/50 [00:56<00:00,  1.14s/it]

2025/05/07 19:05:29 INFO dspy.evaluate.evaluate: Average Metric: 12.0 / 50 (24.0%)
2025/05/07 19:05:29 INFO dspy.teleprompt.grpo: Student program validation set score after training step 30/100: 24.0
2025/05/07 19:05:29 INFO dspy.teleprompt.grpo: GRPO training step 31/100...
2025/05/07 19:05:29 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:41<00:00,  1.29s/it]

2025/05/07 19:06:10 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:06:10 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:06:10 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:06:10 INFO dspy.teleprompt.grpo: GRPO training step 31/100 completed.
2025/05/07 19:06:10 INFO dspy.teleprompt.grpo: GRPO training step 32/100...
2025/05/07 19:06:10 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:24<00:00,  1.32it/s]

2025/05/07 19:06:35 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:06:35 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:06:35 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:06:35 INFO dspy.teleprompt.grpo: GRPO training step 32/100 completed.
2025/05/07 19:06:35 INFO dspy.teleprompt.grpo: GRPO training step 33/100...
2025/05/07 19:06:35 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:28<00:00,  1.14it/s]

2025/05/07 19:07:03 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:07:03 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:07:03 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:07:04 INFO dspy.teleprompt.grpo: Current train step is 33. Updating the model...
2025/05/07 19:08:39 INFO dspy.teleprompt.grpo: GRPO training step 33/100 completed.
2025/05/07 19:08:39 INFO dspy.teleprompt.grpo: GRPO training step 34/100...
2025/05/07 19:08:39 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 10.00 / 32 (31.2%): 100%|██████████| 32/32 [00:26<00:00,  1.20it/s]

2025/05/07 19:09:06 INFO dspy.evaluate.evaluate: Average Metric: 10.0 / 32 (31.2%)
2025/05/07 19:09:06 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:09:06 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:09:06 INFO dspy.teleprompt.grpo: GRPO training step 34/100 completed.
2025/05/07 19:09:06 INFO dspy.teleprompt.grpo: GRPO training step 35/100...
2025/05/07 19:09:06 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 11.00 / 32 (34.4%): 100%|██████████| 32/32 [00:39<00:00,  1.24s/it]

2025/05/07 19:09:46 INFO dspy.evaluate.evaluate: Average Metric: 11.0 / 32 (34.4%)
2025/05/07 19:09:46 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:09:46 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:09:46 INFO dspy.teleprompt.grpo: GRPO training step 35/100 completed.
2025/05/07 19:09:46 INFO dspy.teleprompt.grpo: GRPO training step 36/100...
2025/05/07 19:09:46 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:27<00:00,  1.18it/s]

2025/05/07 19:10:13 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:10:13 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:10:13 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:10:14 INFO dspy.teleprompt.grpo: Current train step is 36. Updating the model...
2025/05/07 19:11:49 INFO dspy.teleprompt.grpo: GRPO training step 36/100 completed.
2025/05/07 19:11:49 INFO dspy.teleprompt.grpo: GRPO training step 37/100...
2025/05/07 19:11:49 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 6.00 / 32 (18.8%): 100%|██████████| 32/32 [00:42<00:00,  1.32s/it]

2025/05/07 19:12:31 INFO dspy.evaluate.evaluate: Average Metric: 6.0 / 32 (18.8%)
2025/05/07 19:12:31 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:12:31 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:12:32 INFO dspy.teleprompt.grpo: GRPO training step 37/100 completed.
2025/05/07 19:12:32 INFO dspy.teleprompt.grpo: GRPO training step 38/100...
2025/05/07 19:12:32 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:36<00:00,  1.13s/it]

2025/05/07 19:13:08 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:13:08 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:13:08 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:13:09 INFO dspy.teleprompt.grpo: GRPO training step 38/100 completed.
2025/05/07 19:13:09 INFO dspy.teleprompt.grpo: GRPO training step 39/100...
2025/05/07 19:13:09 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 4.00 / 32 (12.5%): 100%|██████████| 32/32 [00:15<00:00,  2.11it/s]

2025/05/07 19:13:24 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 32 (12.5%)
2025/05/07 19:13:24 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:13:24 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:13:24 INFO dspy.teleprompt.grpo: Current train step is 39. Updating the model...
2025/05/07 19:14:59 INFO dspy.teleprompt.grpo: GRPO training step 39/100 completed.
2025/05/07 19:14:59 INFO dspy.teleprompt.grpo: GRPO training step 40/100...
2025/05/07 19:14:59 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:38<00:00,  1.21s/it]

2025/05/07 19:15:38 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:15:38 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:15:38 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:15:39 INFO dspy.teleprompt.grpo: GRPO training step 40/100 completed.
2025/05/07 19:15:39 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 40/100


Average Metric: 12.00 / 50 (24.0%): 100%|██████████| 50/50 [00:53<00:00,  1.07s/it]

2025/05/07 19:16:32 INFO dspy.evaluate.evaluate: Average Metric: 12.0 / 50 (24.0%)
2025/05/07 19:16:32 INFO dspy.teleprompt.grpo: Student program validation set score after training step 40/100: 24.0
2025/05/07 19:16:32 INFO dspy.teleprompt.grpo: GRPO training step 41/100...
2025/05/07 19:16:32 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 24.00 / 32 (75.0%): 100%|██████████| 32/32 [00:34<00:00,  1.07s/it] 

2025/05/07 19:17:06 INFO dspy.evaluate.evaluate: Average Metric: 24.0 / 32 (75.0%)
2025/05/07 19:17:06 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:17:06 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:17:07 INFO dspy.teleprompt.grpo: GRPO training step 41/100 completed.
2025/05/07 19:17:07 INFO dspy.teleprompt.grpo: GRPO training step 42/100...
2025/05/07 19:17:07 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 7.00 / 32 (21.9%): 100%|██████████| 32/32 [00:46<00:00,  1.45s/it]

2025/05/07 19:17:53 INFO dspy.evaluate.evaluate: Average Metric: 7.0 / 32 (21.9%)
2025/05/07 19:17:53 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:17:53 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:17:53 INFO dspy.teleprompt.grpo: Current train step is 42. Updating the model...
2025/05/07 19:19:29 INFO dspy.teleprompt.grpo: GRPO training step 42/100 completed.
2025/05/07 19:19:29 INFO dspy.teleprompt.grpo: GRPO training step 43/100...
2025/05/07 19:19:29 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 11.00 / 32 (34.4%): 100%|██████████| 32/32 [00:30<00:00,  1.06it/s]

2025/05/07 19:19:59 INFO dspy.evaluate.evaluate: Average Metric: 11.0 / 32 (34.4%)
2025/05/07 19:19:59 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:19:59 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:19:59 INFO dspy.teleprompt.grpo: GRPO training step 43/100 completed.
2025/05/07 19:19:59 INFO dspy.teleprompt.grpo: GRPO training step 44/100...
2025/05/07 19:19:59 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 15.00 / 32 (46.9%): 100%|██████████| 32/32 [00:30<00:00,  1.06it/s]

2025/05/07 19:20:30 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 32 (46.9%)
2025/05/07 19:20:30 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:20:30 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:20:30 INFO dspy.teleprompt.grpo: GRPO training step 44/100 completed.
2025/05/07 19:20:30 INFO dspy.teleprompt.grpo: GRPO training step 45/100...
2025/05/07 19:20:30 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 1.00 / 29 (3.4%):  91%|█████████ | 29/32 [00:34<00:05,  1.81s/it]



Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:42<00:00,  1.32s/it]

2025/05/07 19:21:12 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:21:12 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:21:12 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:21:13 INFO dspy.teleprompt.grpo: Current train step is 45. Updating the model...
2025/05/07 19:22:58 INFO dspy.teleprompt.grpo: GRPO training step 45/100 completed.
2025/05/07 19:22:58 INFO dspy.teleprompt.grpo: GRPO training step 46/100...
2025/05/07 19:22:58 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:35<00:00,  1.11s/it]

2025/05/07 19:23:34 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:23:34 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:23:34 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:23:34 INFO dspy.teleprompt.grpo: GRPO training step 46/100 completed.
2025/05/07 19:23:34 INFO dspy.teleprompt.grpo: GRPO training step 47/100...
2025/05/07 19:23:34 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:31<00:00,  1.03it/s]

2025/05/07 19:24:05 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:24:05 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:24:05 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:24:06 INFO dspy.teleprompt.grpo: GRPO training step 47/100 completed.
2025/05/07 19:24:06 INFO dspy.teleprompt.grpo: GRPO training step 48/100...
2025/05/07 19:24:06 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:37<00:00,  1.17s/it]

2025/05/07 19:24:43 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:24:43 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:24:43 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:24:44 INFO dspy.teleprompt.grpo: Current train step is 48. Updating the model...
2025/05/07 19:26:19 INFO dspy.teleprompt.grpo: GRPO training step 48/100 completed.
2025/05/07 19:26:19 INFO dspy.teleprompt.grpo: GRPO training step 49/100...
2025/05/07 19:26:19 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:36<00:00,  1.14s/it]

2025/05/07 19:26:55 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:26:55 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:26:55 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:26:56 INFO dspy.teleprompt.grpo: GRPO training step 49/100 completed.
2025/05/07 19:26:56 INFO dspy.teleprompt.grpo: GRPO training step 50/100...
2025/05/07 19:26:56 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 24.00 / 32 (75.0%): 100%|██████████| 32/32 [00:18<00:00,  1.69it/s]

2025/05/07 19:27:15 INFO dspy.evaluate.evaluate: Average Metric: 24.0 / 32 (75.0%)
2025/05/07 19:27:15 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:27:15 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:27:15 INFO dspy.teleprompt.grpo: GRPO training step 50/100 completed.
2025/05/07 19:27:15 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 50/100


Average Metric: 12.00 / 50 (24.0%): 100%|██████████| 50/50 [01:14<00:00,  1.49s/it]

2025/05/07 19:28:30 INFO dspy.evaluate.evaluate: Average Metric: 12.0 / 50 (24.0%)
2025/05/07 19:28:30 INFO dspy.teleprompt.grpo: Student program validation set score after training step 50/100: 24.0
2025/05/07 19:28:30 INFO dspy.teleprompt.grpo: GRPO training step 51/100...
2025/05/07 19:28:30 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 14.00 / 32 (43.8%): 100%|██████████| 32/32 [00:32<00:00,  1.03s/it]

2025/05/07 19:29:03 INFO dspy.evaluate.evaluate: Average Metric: 14.0 / 32 (43.8%)
2025/05/07 19:29:03 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:29:03 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:29:04 INFO dspy.teleprompt.grpo: Current train step is 51. Updating the model...
2025/05/07 19:30:39 INFO dspy.teleprompt.grpo: GRPO training step 51/100 completed.
2025/05/07 19:30:39 INFO dspy.teleprompt.grpo: GRPO training step 52/100...
2025/05/07 19:30:39 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 1.00 / 32 (3.1%): 100%|██████████| 32/32 [00:27<00:00,  1.16it/s]

2025/05/07 19:31:06 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 32 (3.1%)
2025/05/07 19:31:06 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:31:06 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:31:07 INFO dspy.teleprompt.grpo: GRPO training step 52/100 completed.
2025/05/07 19:31:07 INFO dspy.teleprompt.grpo: GRPO training step 53/100...
2025/05/07 19:31:07 INFO dspy.teleprompt.grpo: Updating shuffled trainset for epoch 2...
2025/05/07 19:31:07 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:44<00:00,  1.38s/it]

2025/05/07 19:31:51 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:31:51 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:31:51 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:31:52 INFO dspy.teleprompt.grpo: GRPO training step 53/100 completed.
2025/05/07 19:31:52 INFO dspy.teleprompt.grpo: GRPO training step 54/100...
2025/05/07 19:31:52 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 23.00 / 32 (71.9%): 100%|██████████| 32/32 [00:22<00:00,  1.42it/s]

2025/05/07 19:32:14 INFO dspy.evaluate.evaluate: Average Metric: 23.0 / 32 (71.9%)
2025/05/07 19:32:14 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:32:14 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:32:15 INFO dspy.teleprompt.grpo: Current train step is 54. Updating the model...
2025/05/07 19:33:45 INFO dspy.teleprompt.grpo: GRPO training step 54/100 completed.
2025/05/07 19:33:45 INFO dspy.teleprompt.grpo: GRPO training step 55/100...
2025/05/07 19:33:45 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 13.00 / 32 (40.6%): 100%|██████████| 32/32 [01:07<00:00,  2.12s/it]

2025/05/07 19:34:53 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 32 (40.6%)
2025/05/07 19:34:53 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:34:53 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:34:53 INFO dspy.teleprompt.grpo: GRPO training step 55/100 completed.
2025/05/07 19:34:53 INFO dspy.teleprompt.grpo: GRPO training step 56/100...
2025/05/07 19:34:53 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:44<00:00,  1.38s/it]

2025/05/07 19:35:38 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:35:38 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:35:38 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:35:38 INFO dspy.teleprompt.grpo: GRPO training step 56/100 completed.
2025/05/07 19:35:38 INFO dspy.teleprompt.grpo: GRPO training step 57/100...
2025/05/07 19:35:38 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 19.00 / 32 (59.4%): 100%|██████████| 32/32 [00:33<00:00,  1.05s/it]

2025/05/07 19:36:12 INFO dspy.evaluate.evaluate: Average Metric: 19.0 / 32 (59.4%)
2025/05/07 19:36:12 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:36:12 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:36:12 INFO dspy.teleprompt.grpo: Current train step is 57. Updating the model...
2025/05/07 19:37:48 INFO dspy.teleprompt.grpo: GRPO training step 57/100 completed.
2025/05/07 19:37:48 INFO dspy.teleprompt.grpo: GRPO training step 58/100...
2025/05/07 19:37:48 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 19.00 / 32 (59.4%): 100%|██████████| 32/32 [00:23<00:00,  1.35it/s]

2025/05/07 19:38:11 INFO dspy.evaluate.evaluate: Average Metric: 19.0 / 32 (59.4%)
2025/05/07 19:38:11 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:38:11 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:38:12 INFO dspy.teleprompt.grpo: GRPO training step 58/100 completed.
2025/05/07 19:38:12 INFO dspy.teleprompt.grpo: GRPO training step 59/100...
2025/05/07 19:38:12 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:26<00:00,  1.22it/s]

2025/05/07 19:38:38 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:38:38 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:38:38 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:38:38 INFO dspy.teleprompt.grpo: GRPO training step 59/100 completed.
2025/05/07 19:38:38 INFO dspy.teleprompt.grpo: GRPO training step 60/100...
2025/05/07 19:38:38 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 1.00 / 32 (3.1%): 100%|██████████| 32/32 [00:18<00:00,  1.76it/s]

2025/05/07 19:38:57 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 32 (3.1%)
2025/05/07 19:38:57 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:38:57 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:38:57 INFO dspy.teleprompt.grpo: Current train step is 60. Updating the model...
2025/05/07 19:40:32 INFO dspy.teleprompt.grpo: GRPO training step 60/100 completed.
2025/05/07 19:40:32 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 60/100


Average Metric: 5.00 / 22 (22.7%):  42%|████▏     | 21/50 [00:28<00:26,  1.10it/s]



Average Metric: 10.00 / 45 (22.2%):  90%|█████████ | 45/50 [01:00<00:07,  1.53s/it]

2025/05/07 19:41:33 ERROR dspy.utils.parallelizer: Error for Example({'question': 'analyze the implications of biases related to gender, race, and religion in GPT-3 as discussed in this paper', 'answers': ['The document discusses the implications of biases in GPT-3 related to gender, race, and religion, noting that such biases can lead to the generation of stereotyped or prejudiced content. This raises concerns as it may harm individuals within these groups by entrenching existing stereotypes and producing demeaning portrayals. The analysis conducted aims to understand some limitations of GPT-3 concerning fairness, bias, and representation, but acknowledges that it does not exhaustively characterize all biases present, suggesting that further research is needed to study these issues comprehensively.']}) (input_keys={'question'}): ('Failed to parse response as per signature from original completion with input and num present and expected', '{\n\n\n}', StringSignature(question -> reasoni

Average Metric: 11.00 / 49 (22.4%): 100%|██████████| 50/50 [01:03<00:00,  1.26s/it]

2025/05/07 19:41:35 INFO dspy.evaluate.evaluate: Average Metric: 11.0 / 50 (22.0%)
2025/05/07 19:41:35 INFO dspy.teleprompt.grpo: Student program validation set score after training step 60/100: 22.0
2025/05/07 19:41:35 INFO dspy.teleprompt.grpo: GRPO training step 61/100...
2025/05/07 19:41:35 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:34<00:00,  1.07s/it]

2025/05/07 19:42:10 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:42:10 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:42:10 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:42:10 INFO dspy.teleprompt.grpo: GRPO training step 61/100 completed.
2025/05/07 19:42:10 INFO dspy.teleprompt.grpo: GRPO training step 62/100...
2025/05/07 19:42:10 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:28<00:00,  1.11it/s]

2025/05/07 19:42:39 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:42:39 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:42:39 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:42:39 INFO dspy.teleprompt.grpo: GRPO training step 62/100 completed.
2025/05/07 19:42:39 INFO dspy.teleprompt.grpo: GRPO training step 63/100...
2025/05/07 19:42:39 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 7.00 / 32 (21.9%): 100%|██████████| 32/32 [00:35<00:00,  1.09s/it]

2025/05/07 19:43:14 INFO dspy.evaluate.evaluate: Average Metric: 7.0 / 32 (21.9%)
2025/05/07 19:43:14 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:43:15 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:43:15 INFO dspy.teleprompt.grpo: Current train step is 63. Updating the model...
2025/05/07 19:45:00 INFO dspy.teleprompt.grpo: GRPO training step 63/100 completed.
2025/05/07 19:45:00 INFO dspy.teleprompt.grpo: GRPO training step 64/100...
2025/05/07 19:45:00 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:40<00:00,  1.28s/it]

2025/05/07 19:45:41 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:45:41 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:45:41 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:45:42 INFO dspy.teleprompt.grpo: GRPO training step 64/100 completed.
2025/05/07 19:45:42 INFO dspy.teleprompt.grpo: GRPO training step 65/100...
2025/05/07 19:45:42 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 24.00 / 32 (75.0%): 100%|██████████| 32/32 [00:20<00:00,  1.59it/s]

2025/05/07 19:46:02 INFO dspy.evaluate.evaluate: Average Metric: 24.0 / 32 (75.0%)
2025/05/07 19:46:02 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:46:02 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:46:02 INFO dspy.teleprompt.grpo: GRPO training step 65/100 completed.
2025/05/07 19:46:02 INFO dspy.teleprompt.grpo: GRPO training step 66/100...
2025/05/07 19:46:02 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:21<00:00,  1.49it/s]

2025/05/07 19:46:24 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:46:24 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:46:24 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:46:24 INFO dspy.teleprompt.grpo: Current train step is 66. Updating the model...
2025/05/07 19:48:10 INFO dspy.teleprompt.grpo: GRPO training step 66/100 completed.
2025/05/07 19:48:10 INFO dspy.teleprompt.grpo: GRPO training step 67/100...
2025/05/07 19:48:10 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 12.00 / 32 (37.5%): 100%|██████████| 32/32 [00:42<00:00,  1.33s/it]

2025/05/07 19:48:52 INFO dspy.evaluate.evaluate: Average Metric: 12.0 / 32 (37.5%)
2025/05/07 19:48:52 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:48:52 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:48:53 INFO dspy.teleprompt.grpo: GRPO training step 67/100 completed.
2025/05/07 19:48:53 INFO dspy.teleprompt.grpo: GRPO training step 68/100...
2025/05/07 19:48:53 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 5.00 / 32 (15.6%): 100%|██████████| 32/32 [00:37<00:00,  1.17s/it]

2025/05/07 19:49:30 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 32 (15.6%)
2025/05/07 19:49:30 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:49:30 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:49:31 INFO dspy.teleprompt.grpo: GRPO training step 68/100 completed.
2025/05/07 19:49:31 INFO dspy.teleprompt.grpo: GRPO training step 69/100...
2025/05/07 19:49:31 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:39<00:00,  1.24s/it]

2025/05/07 19:50:10 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 19:50:10 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:50:10 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:50:11 INFO dspy.teleprompt.grpo: Current train step is 69. Updating the model...
2025/05/07 19:51:41 INFO dspy.teleprompt.grpo: GRPO training step 69/100 completed.
2025/05/07 19:51:41 INFO dspy.teleprompt.grpo: GRPO training step 70/100...
2025/05/07 19:51:41 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 16.00 / 32 (50.0%): 100%|██████████| 32/32 [00:33<00:00,  1.04s/it]

2025/05/07 19:52:14 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 32 (50.0%)
2025/05/07 19:52:14 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:52:14 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:52:15 INFO dspy.teleprompt.grpo: GRPO training step 70/100 completed.
2025/05/07 19:52:15 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 70/100


Average Metric: 7.00 / 31 (22.6%):  60%|██████    | 30/50 [00:41<00:26,  1.32s/it]



Average Metric: 8.00 / 39 (20.5%):  78%|███████▊  | 39/50 [01:01<00:19,  1.78s/it]

2025/05/07 19:53:16 ERROR dspy.utils.parallelizer: Error for Example({'question': 'summarize the key findings of the paper "Red teaming language models with language models" by Ethan Perez et al.', 'answers': ['I cannot answer this question based on the provided document.']}) (input_keys={'question'}): ('Failed to parse response as per signature from original completion with input and num present and expected', '{\n\n\n"Oh, I need to summarize the key findings of the paper \'Red teaming language models with language models\' by Ethan Perez et al. Let me think... The user is asking for a summary of the paper\'s key findings, so I should focus on the main results and conclusions. The paper likely discusses how Red Teaming improves the performance of language models. I need to make sure the query is specific enough to retrieve the paper\'s content. Maybe I should mention the authors, the title, and the main contribution. Wait, the user might be looking for a concise summary, so the query 

Average Metric: 13.00 / 49 (26.5%): 100%|██████████| 50/50 [01:10<00:00,  1.40s/it]

2025/05/07 19:53:25 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 50 (26.0%)
2025/05/07 19:53:25 INFO dspy.teleprompt.grpo: Student program validation set score after training step 70/100: 26.0
2025/05/07 19:53:25 INFO dspy.teleprompt.grpo: GRPO training step 71/100...
2025/05/07 19:53:25 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 5.00 / 32 (15.6%): 100%|██████████| 32/32 [00:42<00:00,  1.34s/it]

2025/05/07 19:54:08 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 32 (15.6%)
2025/05/07 19:54:08 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:54:08 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:54:08 INFO dspy.teleprompt.grpo: GRPO training step 71/100 completed.
2025/05/07 19:54:08 INFO dspy.teleprompt.grpo: GRPO training step 72/100...
2025/05/07 19:54:08 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:43<00:00,  1.35s/it]

2025/05/07 19:54:52 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 19:54:52 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:54:52 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:54:52 INFO dspy.teleprompt.grpo: Current train step is 72. Updating the model...
2025/05/07 19:56:22 INFO dspy.teleprompt.grpo: GRPO training step 72/100 completed.
2025/05/07 19:56:22 INFO dspy.teleprompt.grpo: GRPO training step 73/100...
2025/05/07 19:56:22 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 5.00 / 32 (15.6%): 100%|██████████| 32/32 [00:31<00:00,  1.01it/s]

2025/05/07 19:56:54 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 32 (15.6%)
2025/05/07 19:56:54 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:56:54 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:56:54 INFO dspy.teleprompt.grpo: GRPO training step 73/100 completed.
2025/05/07 19:56:54 INFO dspy.teleprompt.grpo: GRPO training step 74/100...
2025/05/07 19:56:54 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 11.00 / 32 (34.4%): 100%|██████████| 32/32 [00:52<00:00,  1.66s/it]

2025/05/07 19:57:47 INFO dspy.evaluate.evaluate: Average Metric: 11.0 / 32 (34.4%)
2025/05/07 19:57:47 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:57:47 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:57:48 INFO dspy.teleprompt.grpo: GRPO training step 74/100 completed.
2025/05/07 19:57:48 INFO dspy.teleprompt.grpo: GRPO training step 75/100...
2025/05/07 19:57:48 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 2.00 / 6 (33.3%):  16%|█▌        | 5/32 [00:03<00:24,  1.09it/s]



Average Metric: 6.00 / 32 (18.8%): 100%|██████████| 32/32 [00:37<00:00,  1.16s/it]

2025/05/07 19:58:25 INFO dspy.evaluate.evaluate: Average Metric: 6.0 / 32 (18.8%)
2025/05/07 19:58:25 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 19:58:25 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 19:58:25 INFO dspy.teleprompt.grpo: Current train step is 75. Updating the model...
2025/05/07 19:59:56 INFO dspy.teleprompt.grpo: GRPO training step 75/100 completed.
2025/05/07 19:59:56 INFO dspy.teleprompt.grpo: GRPO training step 76/100...
2025/05/07 19:59:56 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:55<00:00,  1.72s/it]

2025/05/07 20:00:51 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 20:00:51 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:00:51 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:00:51 INFO dspy.teleprompt.grpo: GRPO training step 76/100 completed.
2025/05/07 20:00:51 INFO dspy.teleprompt.grpo: GRPO training step 77/100...
2025/05/07 20:00:51 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:33<00:00,  1.05s/it]

2025/05/07 20:01:25 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 20:01:25 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:01:25 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:01:25 INFO dspy.teleprompt.grpo: GRPO training step 77/100 completed.
2025/05/07 20:01:25 INFO dspy.teleprompt.grpo: GRPO training step 78/100...
2025/05/07 20:01:25 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:29<00:00,  1.07it/s]

2025/05/07 20:01:55 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 20:01:55 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:01:55 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:01:56 INFO dspy.teleprompt.grpo: Current train step is 78. Updating the model...
2025/05/07 20:03:31 INFO dspy.teleprompt.grpo: GRPO training step 78/100 completed.
2025/05/07 20:03:31 INFO dspy.teleprompt.grpo: GRPO training step 79/100...
2025/05/07 20:03:31 INFO dspy.teleprompt.grpo: Updating shuffled trainset for epoch 3...
2025/05/07 20:03:31 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [01:01<00:00,  1.91s/it]

2025/05/07 20:04:32 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 20:04:32 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:04:32 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:04:33 INFO dspy.teleprompt.grpo: GRPO training step 79/100 completed.
2025/05/07 20:04:33 INFO dspy.teleprompt.grpo: GRPO training step 80/100...
2025/05/07 20:04:33 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 2.00 / 32 (6.2%): 100%|██████████| 32/32 [00:46<00:00,  1.45s/it] 

2025/05/07 20:05:19 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 32 (6.2%)
2025/05/07 20:05:19 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:05:19 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:05:19 INFO dspy.teleprompt.grpo: GRPO training step 80/100 completed.
2025/05/07 20:05:19 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 80/100


Average Metric: 12.00 / 50 (24.0%): 100%|██████████| 50/50 [01:24<00:00,  1.70s/it]

2025/05/07 20:06:44 INFO dspy.evaluate.evaluate: Average Metric: 12.0 / 50 (24.0%)
2025/05/07 20:06:44 INFO dspy.teleprompt.grpo: Student program validation set score after training step 80/100: 24.0
2025/05/07 20:06:44 INFO dspy.teleprompt.grpo: GRPO training step 81/100...
2025/05/07 20:06:44 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 13.00 / 32 (40.6%): 100%|██████████| 32/32 [01:03<00:00,  1.99s/it]

2025/05/07 20:07:48 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 32 (40.6%)
2025/05/07 20:07:48 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:07:48 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:07:49 INFO dspy.teleprompt.grpo: Current train step is 81. Updating the model...
2025/05/07 20:09:24 INFO dspy.teleprompt.grpo: GRPO training step 81/100 completed.
2025/05/07 20:09:24 INFO dspy.teleprompt.grpo: GRPO training step 82/100...
2025/05/07 20:09:24 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 3.00 / 32 (9.4%): 100%|██████████| 32/32 [00:50<00:00,  1.57s/it] 

2025/05/07 20:10:14 INFO dspy.evaluate.evaluate: Average Metric: 3.0 / 32 (9.4%)
2025/05/07 20:10:14 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:10:14 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:10:15 INFO dspy.teleprompt.grpo: GRPO training step 82/100 completed.
2025/05/07 20:10:15 INFO dspy.teleprompt.grpo: GRPO training step 83/100...
2025/05/07 20:10:15 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 28 (28.6%):  88%|████████▊ | 28/32 [00:36<00:03,  1.13it/s]



Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:46<00:00,  1.46s/it]

2025/05/07 20:11:01 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 20:11:01 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:11:01 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:11:02 INFO dspy.teleprompt.grpo: GRPO training step 83/100 completed.
2025/05/07 20:11:02 INFO dspy.teleprompt.grpo: GRPO training step 84/100...
2025/05/07 20:11:02 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 16.00 / 32 (50.0%): 100%|██████████| 32/32 [00:34<00:00,  1.08s/it]

2025/05/07 20:11:36 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 32 (50.0%)
2025/05/07 20:11:36 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:11:36 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:11:37 INFO dspy.teleprompt.grpo: Current train step is 84. Updating the model...
2025/05/07 20:13:07 INFO dspy.teleprompt.grpo: GRPO training step 84/100 completed.
2025/05/07 20:13:07 INFO dspy.teleprompt.grpo: GRPO training step 85/100...
2025/05/07 20:13:07 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:49<00:00,  1.55s/it]

2025/05/07 20:13:57 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 20:13:57 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:13:57 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:13:58 INFO dspy.teleprompt.grpo: GRPO training step 85/100 completed.
2025/05/07 20:13:58 INFO dspy.teleprompt.grpo: GRPO training step 86/100...
2025/05/07 20:13:58 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 0.00 / 32 (0.0%): 100%|██████████| 32/32 [00:39<00:00,  1.23s/it]

2025/05/07 20:14:37 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 32 (0.0%)
2025/05/07 20:14:37 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:14:37 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:14:37 INFO dspy.teleprompt.grpo: GRPO training step 86/100 completed.
2025/05/07 20:14:37 INFO dspy.teleprompt.grpo: GRPO training step 87/100...
2025/05/07 20:14:37 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 16.00 / 32 (50.0%): 100%|██████████| 32/32 [00:31<00:00,  1.01it/s]

2025/05/07 20:15:09 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 32 (50.0%)
2025/05/07 20:15:09 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:15:09 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:15:10 INFO dspy.teleprompt.grpo: Current train step is 87. Updating the model...
2025/05/07 20:16:40 INFO dspy.teleprompt.grpo: GRPO training step 87/100 completed.
2025/05/07 20:16:40 INFO dspy.teleprompt.grpo: GRPO training step 88/100...
2025/05/07 20:16:40 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 16.00 / 32 (50.0%): 100%|██████████| 32/32 [00:25<00:00,  1.23it/s]

2025/05/07 20:17:06 INFO dspy.evaluate.evaluate: Average Metric: 16.0 / 32 (50.0%)
2025/05/07 20:17:06 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:17:06 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:17:06 INFO dspy.teleprompt.grpo: GRPO training step 88/100 completed.
2025/05/07 20:17:06 INFO dspy.teleprompt.grpo: GRPO training step 89/100...
2025/05/07 20:17:06 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:36<00:00,  1.13s/it]

2025/05/07 20:17:43 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 20:17:43 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:17:43 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:17:43 INFO dspy.teleprompt.grpo: GRPO training step 89/100 completed.
2025/05/07 20:17:43 INFO dspy.teleprompt.grpo: GRPO training step 90/100...
2025/05/07 20:17:43 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 4.00 / 32 (12.5%): 100%|██████████| 32/32 [01:05<00:00,  2.04s/it]

2025/05/07 20:18:48 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 32 (12.5%)
2025/05/07 20:18:48 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:18:48 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:18:49 INFO dspy.teleprompt.grpo: Current train step is 90. Updating the model...
2025/05/07 20:20:24 INFO dspy.teleprompt.grpo: GRPO training step 90/100 completed.
2025/05/07 20:20:24 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 90/100


Average Metric: 14.00 / 50 (28.0%): 100%|██████████| 50/50 [01:08<00:00,  1.37s/it]

2025/05/07 20:21:33 INFO dspy.evaluate.evaluate: Average Metric: 14.0 / 50 (28.0%)
2025/05/07 20:21:33 INFO dspy.teleprompt.grpo: Student program validation set score after training step 90/100: 28.0
2025/05/07 20:21:33 INFO dspy.teleprompt.grpo: GRPO training step 91/100...
2025/05/07 20:21:33 INFO dspy.teleprompt.grpo: Bootstrapping data...



Average Metric: 8.00 / 30 (26.7%):  94%|█████████▍| 30/32 [00:29<00:01,  1.36it/s]



Average Metric: 7.00 / 32 (21.9%): 100%|██████████| 32/32 [00:35<00:00,  1.12s/it]

2025/05/07 20:22:09 INFO dspy.evaluate.evaluate: Average Metric: 7.0 / 32 (21.9%)
2025/05/07 20:22:09 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:22:09 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:22:09 INFO dspy.teleprompt.grpo: GRPO training step 91/100 completed.
2025/05/07 20:22:09 INFO dspy.teleprompt.grpo: GRPO training step 92/100...
2025/05/07 20:22:09 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 15.00 / 32 (46.9%): 100%|██████████| 32/32 [00:23<00:00,  1.35it/s]

2025/05/07 20:22:33 INFO dspy.evaluate.evaluate: Average Metric: 15.0 / 32 (46.9%)
2025/05/07 20:22:33 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:22:33 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:22:33 INFO dspy.teleprompt.grpo: GRPO training step 92/100 completed.
2025/05/07 20:22:33 INFO dspy.teleprompt.grpo: GRPO training step 93/100...
2025/05/07 20:22:33 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 20.00 / 32 (62.5%): 100%|██████████| 32/32 [00:28<00:00,  1.12it/s]

2025/05/07 20:23:02 INFO dspy.evaluate.evaluate: Average Metric: 20.0 / 32 (62.5%)
2025/05/07 20:23:02 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:23:02 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:23:02 INFO dspy.teleprompt.grpo: Current train step is 93. Updating the model...
2025/05/07 20:24:38 INFO dspy.teleprompt.grpo: GRPO training step 93/100 completed.
2025/05/07 20:24:38 INFO dspy.teleprompt.grpo: GRPO training step 94/100...
2025/05/07 20:24:38 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 5.00 / 32 (15.6%): 100%|██████████| 32/32 [00:55<00:00,  1.74s/it]

2025/05/07 20:25:33 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 32 (15.6%)
2025/05/07 20:25:33 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:25:33 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:25:34 INFO dspy.teleprompt.grpo: GRPO training step 94/100 completed.
2025/05/07 20:25:34 INFO dspy.teleprompt.grpo: GRPO training step 95/100...
2025/05/07 20:25:34 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 7.00 / 32 (21.9%): 100%|██████████| 32/32 [00:38<00:00,  1.21s/it]

2025/05/07 20:26:13 INFO dspy.evaluate.evaluate: Average Metric: 7.0 / 32 (21.9%)
2025/05/07 20:26:13 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:26:13 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:26:13 INFO dspy.teleprompt.grpo: GRPO training step 95/100 completed.
2025/05/07 20:26:13 INFO dspy.teleprompt.grpo: GRPO training step 96/100...
2025/05/07 20:26:13 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 1.00 / 32 (3.1%): 100%|██████████| 32/32 [00:44<00:00,  1.38s/it]

2025/05/07 20:26:57 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 32 (3.1%)
2025/05/07 20:26:57 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:26:57 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:26:58 INFO dspy.teleprompt.grpo: Current train step is 96. Updating the model...
2025/05/07 20:28:43 INFO dspy.teleprompt.grpo: GRPO training step 96/100 completed.
2025/05/07 20:28:43 INFO dspy.teleprompt.grpo: GRPO training step 97/100...
2025/05/07 20:28:43 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 13.00 / 32 (40.6%): 100%|██████████| 32/32 [00:20<00:00,  1.58it/s]

2025/05/07 20:29:03 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 32 (40.6%)
2025/05/07 20:29:03 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:29:03 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:29:04 INFO dspy.teleprompt.grpo: GRPO training step 97/100 completed.
2025/05/07 20:29:04 INFO dspy.teleprompt.grpo: GRPO training step 98/100...
2025/05/07 20:29:04 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:34<00:00,  1.08s/it]

2025/05/07 20:29:38 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 20:29:38 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:29:38 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:29:39 INFO dspy.teleprompt.grpo: GRPO training step 98/100 completed.
2025/05/07 20:29:39 INFO dspy.teleprompt.grpo: GRPO training step 99/100...
2025/05/07 20:29:39 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 8.00 / 32 (25.0%): 100%|██████████| 32/32 [00:30<00:00,  1.04it/s]

2025/05/07 20:30:10 INFO dspy.evaluate.evaluate: Average Metric: 8.0 / 32 (25.0%)
2025/05/07 20:30:10 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:30:10 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:30:10 INFO dspy.teleprompt.grpo: Current train step is 99. Updating the model...
2025/05/07 20:31:40 INFO dspy.teleprompt.grpo: GRPO training step 99/100 completed.
2025/05/07 20:31:40 INFO dspy.teleprompt.grpo: GRPO training step 100/100...
2025/05/07 20:31:40 INFO dspy.teleprompt.grpo: Bootstrapping data...


Average Metric: 9.00 / 32 (28.1%): 100%|██████████| 32/32 [00:29<00:00,  1.10it/s]

2025/05/07 20:32:10 INFO dspy.evaluate.evaluate: Average Metric: 9.0 / 32 (28.1%)
2025/05/07 20:32:10 INFO dspy.teleprompt.grpo: Preparing the training data batch from bootstrapped examples for GRPO...
2025/05/07 20:32:10 INFO dspy.teleprompt.grpo: Invoking GRPO training step...





2025/05/07 20:32:10 INFO dspy.teleprompt.grpo: GRPO training step 100/100 completed.
2025/05/07 20:32:10 INFO dspy.teleprompt.grpo: Evaluating the student program on the validation set after training step 100/100


Average Metric: 14.00 / 50 (28.0%): 100%|██████████| 50/50 [01:19<00:00,  1.59s/it]

2025/05/07 20:33:30 INFO dspy.evaluate.evaluate: Average Metric: 14.0 / 50 (28.0%)
2025/05/07 20:33:30 INFO dspy.teleprompt.grpo: Student program validation set score after training step 100/100: 28.0
2025/05/07 20:33:30 INFO dspy.teleprompt.grpo: Done with the iterations! Retrieving the final model(s)...





2025/05/07 20:34:00 INFO dspy.teleprompt.grpo: GRPO compiler has finished compiling the student program


Optimization complete!


## Evaluating the Optimized Agent

After optimizing our agent with GRPO, we need to evaluate its performance to see how much it has improved. We'll use the same evaluation framework as before, but now with our optimized agent.

We'll also compare the baseline and optimized agents on a specific example to see the differences in their behavior. This will help us understand how GRPO has changed the agent's query rewriting strategy.

In [214]:
# Evaluate the optimized program
optimized_result = evaluate(optimized_program)

print(f"\nBaseline Performance: {baseline_result:.2f}")
print(f"Optimized Performance: {optimized_result:.2f}")

Average Metric: 13.00 / 50 (26.0%): 100%|██████████| 50/50 [00:00<00:00, 943.89it/s]

2025/05/07 20:40:32 INFO dspy.evaluate.evaluate: Average Metric: 13.0 / 50 (26.0%)




Baseline Performance: 28.00
Optimized Performance: 26.00


## Conclusion

In this cookbook, we've demonstrated how to use GRPO to optimize an LLM agent for better performance. We've seen that:

1. GRPO can significantly improve an agent's performance without manual prompt engineering
2. The improvements come from better query rewriting strategies learned through reinforcement

This approach can be applied to various types of agents beyond simple query rewriters, including:
- Multi-step reasoning agents
- Tool-using agents
- Conversational agents
- Code generation agents

By defining appropriate reward functions and training data, you can use GRPO to optimize virtually any LLM-based agent for your specific use case.

Remember that the quality of your training data and reward function are crucial for effective optimization. Invest time in creating high-quality examples and designing rewards that truly capture what you want your agent to optimize for.