# Information Extraction with MIPROv2

The project is from the DSPy Documentation [Link](https://dspy.ai/tutorials/gepa_facilitysupportanalyzer/)

This notebook demonstrates how to use the MIPROv2 (Model-Informed Prompt Optimization v2) optimizer in DSPy to build and optimize an information extraction system. The goal is to analyze facility support messages and extract three key pieces of information:

1. **Urgency**: Classify messages as low, medium, or high urgency
2. **Sentiment**: Determine if the message is positive, neutral, or negative
3. **Categories**: Identify which facility management categories apply to each message

The notebook shows how to:
- Set up a DSPy module with multiple prediction tasks
- Evaluate the baseline performance
- Use MIPROv2 to automatically optimize prompts through Bayesian optimization and few-shot example selection
- Visualize the optimization process and results

The optimization process improves the model's accuracy from ~76% to ~83% by automatically refining the prompts and selecting optimal few-shot examples based on feedback from the validation set.


In [1]:
%run supportvectors-common.ipynb


<div style="color:#aaa;font-size:8pt">
<hr/>
&copy; SupportVectors. All rights reserved. <blockquote>This notebook is the intellectual property of SupportVectors, and part of its training material. 
Only the participants in SupportVectors workshops are allowed to study the notebooks for educational purposes currently, but is prohibited from copying or using it for any other purposes without written permission.

<b> These notebooks are chapters and sections from Asif Qamar's textbook that he is writing on Data Science. So we request you to not circulate the material to others.</b>
 </blockquote>
 <hr/>
</div>



## Configure Language Model

Initialize and configure the DSPy language model. Here we use GPT-4.1-nano with temperature=1 for more diverse outputs during optimization.


In [2]:
import dspy
lm = dspy.LM("openai/gpt-4.1-nano", temperature=1)
dspy.configure(lm=lm)

## Load and Prepare Dataset

Download the facility support analyzer dataset from the llama-prompt-ops repository and convert it into DSPy Examples. The dataset is split into training (33%), validation (33%), and test (34%) sets with a fixed random seed for reproducibility.


In [3]:
import requests
import dspy
import json
import random

def init_dataset():
    # Load from the url
    url = "https://raw.githubusercontent.com/meta-llama/llama-prompt-ops/refs/heads/main/use-cases/facility-support-analyzer/dataset.json"
    dataset = json.loads(requests.get(url).text)
    dspy_dataset = [
        dspy.Example({
            "message": d['fields']['input'],
            "answer": d['answer'],
        }).with_inputs("message")
        for d in dataset
    ]
    random.Random(0).shuffle(dspy_dataset)
    train_set = dspy_dataset[:int(len(dspy_dataset) * 0.33)]
    val_set = dspy_dataset[int(len(dspy_dataset) * 0.33):int(len(dspy_dataset) * 0.66)]
    test_set = dspy_dataset[int(len(dspy_dataset) * 0.66):]

    return train_set, val_set, test_set

## Initialize Dataset Splits

Load the dataset and display the size of each split (training, validation, and test sets).


In [4]:
train_set, val_set, test_set = init_dataset()

len(train_set), len(val_set), len(test_set)

(66, 66, 68)

## Examine Example Data

Display a sample input message and its corresponding gold standard answer to understand the data format and expected output structure.


In [5]:
print("Input Message:")
print(train_set[0]['message'])

print("\n\nGold Answer:")
for k, v in json.loads(train_set[0]['answer']).items():
    print(f"{k}: {v}")

Input Message:
Subject: Adjusting Bi-Weekly Cleaning Schedule for My Office

Dear ProCare Facility Solutions Support Team,

I hope this message finds you well. My name is Dr. Alex Turner, and I have been utilizing your services for my office space for the past year. I must say, your team's dedication to maintaining a pristine environment has been commendable and greatly appreciated.

I am reaching out to discuss the scheduling of our regular cleaning services. While I find the logistical challenges of coordinating these services intellectually stimulating, I believe we could optimize the current schedule to better suit the needs of my team and our workflow. Specifically, I would like to explore the possibility of adjusting our cleaning schedule to a bi-weekly arrangement, ideally on Tuesdays and Fridays, to ensure our workspace remains consistently clean without disrupting our research activities.

Previously, I have attempted to adjust the schedule through the online portal, but I enc

## Define DSPy Module

Create the main DSPy module that performs information extraction. This includes:
- Three signature classes for urgency, sentiment, and category classification
- A multi-module (MM) class that chains together three ChainOfThought predictors
- Each predictor analyzes the message and extracts its respective information


In [6]:
from typing import List, Literal


class FacilitySupportAnalyzerUrgency(dspy.Signature):
    """
    Read the provided message and determine the urgency.
    """
    message: str = dspy.InputField()
    urgency: Literal['low', 'medium', 'high'] = dspy.OutputField()

class FacilitySupportAnalyzerSentiment(dspy.Signature):
    """
    Read the provided message and determine the sentiment.
    """
    message: str = dspy.InputField()
    sentiment: Literal['positive', 'neutral', 'negative'] = dspy.OutputField()

class FacilitySupportAnalyzerCategories(dspy.Signature):
    """
    Read the provided message and determine the set of categories applicable to the message.
    """
    message: str = dspy.InputField()
    categories: List[Literal["emergency_repair_services", "routine_maintenance_requests", "quality_and_safety_concerns", "specialized_cleaning_services", "general_inquiries", "sustainability_and_environmental_practices", "training_and_support_requests", "cleaning_services_scheduling", "customer_feedback_and_complaints", "facility_management_issues"]] = dspy.OutputField()

class FacilitySupportAnalyzerMM(dspy.Module):
    def __init__(self):
        self.urgency_module = dspy.ChainOfThought(FacilitySupportAnalyzerUrgency)
        self.sentiment_module = dspy.ChainOfThought(FacilitySupportAnalyzerSentiment)
        self.categories_module = dspy.ChainOfThought(FacilitySupportAnalyzerCategories)
    
    def forward(self, message: str):
        urgency = self.urgency_module(message=message)
        sentiment = self.sentiment_module(message=message)
        categories = self.categories_module(message=message)

        return dspy.Prediction(
            urgency=urgency.urgency,
            sentiment=sentiment.sentiment,
            categories=categories.categories
        )

program = FacilitySupportAnalyzerMM()

## Define Evaluation Metrics

Implement scoring functions for each task (urgency, sentiment, categories) and a combined metric function. The metric computes accuracy for each component and returns the average as the overall score.


In [7]:
def score_urgency(gold_urgency, pred_urgency):
    """
    Compute score for the urgency module.
    """
    score = 1.0 if gold_urgency == pred_urgency else 0.0
    return score

def score_sentiment(gold_sentiment, pred_sentiment):
    """
    Compute score for the sentiment module.
    """
    score = 1.0 if gold_sentiment == pred_sentiment else 0.0
    return score

def score_categories(gold_categories, pred_categories):
    """
    Compute score for the categories module.
    Uses the same match/mismatch logic as category accuracy in the score.
    """
    correct = 0
    for k, v in gold_categories.items():
        if v and k in pred_categories:
            correct += 1
        elif not v and k not in pred_categories:
            correct += 1
    score = correct / len(gold_categories)
    return score

def metric(example, pred, trace=None, pred_name=None, pred_trace=None):
    """
    Computes a score based on agreement between prediction and gold standard for categories, sentiment, and urgency.
    Returns the score (float).
    """
    # Parse gold standard from example
    gold = json.loads(example['answer'])

    # Compute scores for all modules
    score_urgency_val = score_urgency(gold['urgency'], pred.urgency)
    score_sentiment_val = score_sentiment(gold['sentiment'], pred.sentiment)
    score_categories_val = score_categories(gold['categories'], pred.categories)

    # Overall score: average of the three accuracies
    total = (score_urgency_val + score_sentiment_val + score_categories_val) / 3

    return total

## Evaluate Baseline Performance

Run the initial evaluation on the test set to establish baseline performance before optimization. This gives us a starting point to measure improvement.


In [8]:
import dspy
evaluate = dspy.Evaluate(
    devset=test_set,
    metric=metric,
    num_threads=32,
    display_table=True,
    display_progress=True
)

evaluate(program)

Average Metric: 51.53 / 68 (75.8%): 100%|██████████| 68/68 [00:03<00:00, 17.74it/s]

2025/11/12 19:27:26 INFO dspy.evaluate.evaluate: Average Metric: 51.53333333333333 / 68 (75.8%)





Unnamed: 0,message,answer,urgency,sentiment,categories,metric
0,"Hey ProCare Support Team, Hope you all are doing great! My name is...","{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",medium,positive,"[sustainability_and_environmental_practices, general_inquiries]",✔️ [0.633]
1,"Hey ProCare Team, Hope you’re all doing well! My name’s Jake, and ...","{""categories"": {""routine_maintenance_requests"": true, ""customer_fe...",medium,positive,[routine_maintenance_requests],✔️ [1.000]
2,"Subject: Assistance Needed for HVAC Maintenance Hi [Receiver], I h...","{""categories"": {""routine_maintenance_requests"": true, ""customer_fe...",medium,neutral,[routine_maintenance_requests],✔️ [1.000]
3,Subject: A Green Inquiry from a Bill Maher Enthusiast Hey ProCare ...,"{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",low,positive,[sustainability_and_environmental_practices],✔️ [1.000]
4,Subject: Inquiry on Sustainability Practices Dear ProCare Facility...,"{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",low,neutral,"[sustainability_and_environmental_practices, general_inquiries]",✔️ [0.967]
...,...,...,...,...,...,...
63,Subject: Inquiry About Your Eco-Friendly Practices Dear ProCare Fa...,"{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",medium,positive,"[sustainability_and_environmental_practices, general_inquiries]",✔️ [0.300]
64,Subject: Assistance Needed for Facility Management Issue Dear ProC...,"{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",high,positive,[facility_management_issues],✔️ [0.667]
65,"Subject: Request for Training and Support Hi ProCare Support Team,...","{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",medium,positive,[training_and_support_requests],✔️ [0.667]
66,Subject: Concerns About Studio Maintenance and Rent Increase Dear ...,"{""categories"": {""routine_maintenance_requests"": true, ""customer_fe...",high,negative,"[routine_maintenance_requests, facility_management_issues, quality...",✔️ [0.300]


EvaluationResult(score=75.78, results=<list of 68 results>)

## Initialize MIPROv2 Optimizer

Set up the MIPROv2 optimizer with the feedback-enabled metric. MIPROv2 uses Bayesian optimization to find optimal combinations of instructions and few-shot examples. The optimizer is configured with auto="heavy" for thorough optimization, multiple threads for parallel evaluation, and will bootstrap few-shot examples and propose instruction candidates to improve performance.


In [None]:
from dspy import MIPROv2

optimizer = MIPROv2(
    metric=metric,
    auto="heavy", 
    num_threads=32,
)

## Run MIPROv2 Optimization

Compile and optimize the program using MIPROv2. This process will:
- Bootstrap few-shot example candidates from the training set
- Propose instruction candidates based on the few-shot examples and dataset summary
- Use Bayesian optimization to find optimal combinations of instructions and few-shot examples
- Evaluate candidates on minibatches and perform full evaluations on promising configurations
- Track the best performing versions

This may take several minutes as it explores many prompt and few-shot example combinations.


In [10]:
optimized_program = optimizer.compile(
    program,
    trainset=train_set,
)

2025/11/12 19:27:26 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING HEAVY AUTO RUN SETTINGS:
num_trials: 50
minibatch: True
num_fewshot_candidates: 18
num_instruct_candidates: 9
valset size: 52

2025/11/12 19:27:26 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/11/12 19:27:26 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/11/12 19:27:26 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=18 sets of demonstrations...


Bootstrapping set 1/18
Bootstrapping set 2/18
Bootstrapping set 3/18


 29%|██▊       | 4/14 [00:12<00:30,  3.05s/it]


Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 4/18


  7%|▋         | 1/14 [00:00<00:01, 12.41it/s]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 5/18


 14%|█▍        | 2/14 [00:25<02:31, 12.66s/it]


Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 6/18


 14%|█▍        | 2/14 [00:04<00:29,  2.48s/it]


Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 7/18


 29%|██▊       | 4/14 [00:03<00:08,  1.19it/s]


Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 8/18


 21%|██▏       | 3/14 [00:10<00:40,  3.65s/it]


Bootstrapped 3 full traces after 3 examples for up to 1 rounds, amounting to 3 attempts.
Bootstrapping set 9/18


 14%|█▍        | 2/14 [00:00<00:00, 12.24it/s]


Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 10/18


 14%|█▍        | 2/14 [00:08<00:50,  4.20s/it]


Bootstrapped 2 full traces after 2 examples for up to 1 rounds, amounting to 2 attempts.
Bootstrapping set 11/18


 29%|██▊       | 4/14 [00:00<00:00, 12.28it/s]


Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 12/18


 21%|██▏       | 3/14 [00:03<00:11,  1.07s/it]


Bootstrapped 3 full traces after 3 examples for up to 1 rounds, amounting to 3 attempts.
Bootstrapping set 13/18


  7%|▋         | 1/14 [00:05<01:17,  5.93s/it]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 14/18


 29%|██▊       | 4/14 [00:00<00:00, 11.01it/s]


Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 15/18


  7%|▋         | 1/14 [00:00<00:01, 12.32it/s]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 16/18


  7%|▋         | 1/14 [00:00<00:01, 12.20it/s]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 17/18


  7%|▋         | 1/14 [00:00<00:01, 12.22it/s]


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 18/18


  7%|▋         | 1/14 [00:00<00:01, 12.12it/s]
2025/11/12 19:28:42 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/11/12 19:28:42 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.


2025/11/12 19:30:07 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=9 instructions...

2025/11/12 19:35:17 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/11/12 19:35:17 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Read the provided message and determine the urgency.

2025/11/12 19:35:17 INFO dspy.teleprompt.mipro_optimizer_v2: 1: Analyze the given support message and accurately determine its urgency level by carefully considering the content, tone, and context of the message. Provide a detailed reasoning explanation that describes your thought process step-by-step, including how you assess whether the message indicates an urgent issue, a routine inquiry, or a non-urgent request. Use this reasoning to assign one of the following urgency categories: 'low', 'medium', or 'high'. Your instruction should emphasize the importance of thorough analysis and clarity in differentiating levels of urgency based on message cues. Ensure your response clearly

Average Metric: 39.57 / 52 (76.1%): 100%|██████████| 52/52 [00:15<00:00,  3.38it/s]

2025/11/12 19:35:33 INFO dspy.evaluate.evaluate: Average Metric: 39.56666666666666 / 52 (76.1%)
2025/11/12 19:35:33 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 76.09

2025/11/12 19:35:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 2 / 61 - Minibatch ==



Average Metric: 29.73 / 35 (85.0%): 100%|██████████| 35/35 [00:42<00:00,  1.22s/it]

2025/11/12 19:36:16 INFO dspy.evaluate.evaluate: Average Metric: 29.733333333333334 / 35 (85.0%)
2025/11/12 19:36:16 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 84.95 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:36:16 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95]
2025/11/12 19:36:16 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09]
2025/11/12 19:36:16 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 76.09


2025/11/12 19:36:16 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 61 - Minibatch ==



Average Metric: 28.70 / 35 (82.0%): 100%|██████████| 35/35 [00:19<00:00,  1.79it/s]

2025/11/12 19:36:36 INFO dspy.evaluate.evaluate: Average Metric: 28.7 / 35 (82.0%)
2025/11/12 19:36:36 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 82.0 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 12', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 0', 'Predictor 2: Few-Shot Set 16'].
2025/11/12 19:36:36 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0]
2025/11/12 19:36:36 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09]
2025/11/12 19:36:36 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 76.09


2025/11/12 19:36:36 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 61 - Minibatch ==



Average Metric: 29.00 / 35 (82.9%): 100%|██████████| 35/35 [00:44<00:00,  1.26s/it]

2025/11/12 19:37:21 INFO dspy.evaluate.evaluate: Average Metric: 29.0 / 35 (82.9%)
2025/11/12 19:37:21 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 82.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 13', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 10'].
2025/11/12 19:37:21 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86]
2025/11/12 19:37:21 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09]
2025/11/12 19:37:21 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 76.09


2025/11/12 19:37:21 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 61 - Minibatch ==



Average Metric: 26.53 / 35 (75.8%): 100%|██████████| 35/35 [00:22<00:00,  1.55it/s]

2025/11/12 19:37:45 INFO dspy.evaluate.evaluate: Average Metric: 26.53333333333333 / 35 (75.8%)
2025/11/12 19:37:45 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 75.81 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 7', 'Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 0', 'Predictor 2: Instruction 1', 'Predictor 2: Few-Shot Set 3'].
2025/11/12 19:37:45 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81]
2025/11/12 19:37:45 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09]
2025/11/12 19:37:45 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 76.09


2025/11/12 19:37:45 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 61 - Minibatch ==



Average Metric: 26.03 / 35 (74.4%): 100%|██████████| 35/35 [00:18<00:00,  1.87it/s]

2025/11/12 19:38:04 INFO dspy.evaluate.evaluate: Average Metric: 26.03333333333333 / 35 (74.4%)
2025/11/12 19:38:04 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 74.38 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 14', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 0', 'Predictor 2: Few-Shot Set 14'].
2025/11/12 19:38:04 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38]
2025/11/12 19:38:04 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09]
2025/11/12 19:38:04 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 76.09


2025/11/12 19:38:04 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 61 - Full Evaluation =====
2025/11/12 19:38:04 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 84.95) from minibatch trials...



Average Metric: 43.03 / 52 (82.8%): 100%|██████████| 52/52 [00:10<00:00,  4.81it/s] 

2025/11/12 19:38:16 INFO dspy.evaluate.evaluate: Average Metric: 43.03333333333333 / 52 (82.8%)
2025/11/12 19:38:16 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 82.76
2025/11/12 19:38:16 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76]
2025/11/12 19:38:16 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76
2025/11/12 19:38:16 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 19:38:16 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 8 / 61 - Minibatch ==



Average Metric: 27.07 / 35 (77.3%): 100%|██████████| 35/35 [00:20<00:00,  1.67it/s]

2025/11/12 19:38:38 INFO dspy.evaluate.evaluate: Average Metric: 27.066666666666666 / 35 (77.3%)
2025/11/12 19:38:38 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 77.33 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 16', 'Predictor 2: Instruction 4', 'Predictor 2: Few-Shot Set 8'].
2025/11/12 19:38:38 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33]
2025/11/12 19:38:38 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76]
2025/11/12 19:38:38 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:38:38 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 9 / 61 - Minibatch ==



Average Metric: 22.93 / 35 (65.5%): 100%|██████████| 35/35 [01:23<00:00,  2.38s/it]

2025/11/12 19:40:02 INFO dspy.evaluate.evaluate: Average Metric: 22.933333333333334 / 35 (65.5%)
2025/11/12 19:40:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 65.52 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 8', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 0', 'Predictor 2: Instruction 2', 'Predictor 2: Few-Shot Set 15'].
2025/11/12 19:40:02 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52]
2025/11/12 19:40:02 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76]
2025/11/12 19:40:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:40:02 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 10 / 61 - Minibatch ==



Average Metric: 28.97 / 35 (82.8%): 100%|██████████| 35/35 [00:19<00:00,  1.77it/s]

2025/11/12 19:40:22 INFO dspy.evaluate.evaluate: Average Metric: 28.966666666666665 / 35 (82.8%)
2025/11/12 19:40:22 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 82.76 on minibatch of size 35 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 9', 'Predictor 1: Instruction 6', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 17'].
2025/11/12 19:40:22 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76]
2025/11/12 19:40:22 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76]
2025/11/12 19:40:22 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:40:22 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 11 / 61 - Minibatch ==



Average Metric: 29.07 / 35 (83.0%): 100%|██████████| 35/35 [00:27<00:00,  1.26it/s]

2025/11/12 19:40:50 INFO dspy.evaluate.evaluate: Average Metric: 29.066666666666666 / 35 (83.0%)
2025/11/12 19:40:50 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 83.05 on minibatch of size 35 with parameters ['Predictor 0: Instruction 8', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 4', 'Predictor 2: Instruction 5', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:40:50 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05]
2025/11/12 19:40:50 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76]
2025/11/12 19:40:50 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:40:50 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 12 / 61 - Minibatch ==



Average Metric: 28.40 / 35 (81.1%): 100%|██████████| 35/35 [00:23<00:00,  1.52it/s]

2025/11/12 19:41:14 INFO dspy.evaluate.evaluate: Average Metric: 28.4 / 35 (81.1%)
2025/11/12 19:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.14 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 0', 'Predictor 2: Instruction 5', 'Predictor 2: Few-Shot Set 11'].
2025/11/12 19:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14]
2025/11/12 19:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76]
2025/11/12 19:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 61 - Full Evaluation =====
2025/11/12 19:41:14 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 83.05) from minibatch trials...



Average Metric: 42.00 / 52 (80.8%): 100%|██████████| 52/52 [00:22<00:00,  2.31it/s]

2025/11/12 19:41:37 INFO dspy.evaluate.evaluate: Average Metric: 42.0 / 52 (80.8%)
2025/11/12 19:41:37 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77]
2025/11/12 19:41:37 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76
2025/11/12 19:41:37 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 19:41:37 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 14 / 61 - Minibatch ==



Average Metric: 27.47 / 35 (78.5%): 100%|██████████| 35/35 [00:31<00:00,  1.10it/s]

2025/11/12 19:42:10 INFO dspy.evaluate.evaluate: Average Metric: 27.466666666666665 / 35 (78.5%)
2025/11/12 19:42:10 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 78.48 on minibatch of size 35 with parameters ['Predictor 0: Instruction 8', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 15', 'Predictor 2: Instruction 1', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:42:10 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48]
2025/11/12 19:42:10 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77]
2025/11/12 19:42:10 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:42:10 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 15 / 61 - Minibatch ==



Average Metric: 26.47 / 35 (75.6%): 100%|██████████| 35/35 [00:33<00:00,  1.04it/s]

2025/11/12 19:42:44 INFO dspy.evaluate.evaluate: Average Metric: 26.466666666666665 / 35 (75.6%)
2025/11/12 19:42:44 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 75.62 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 14', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 5', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:42:44 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62]
2025/11/12 19:42:44 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77]
2025/11/12 19:42:44 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:42:44 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 16 / 61 - Minibatch ==



Average Metric: 27.03 / 35 (77.2%): 100%|██████████| 35/35 [02:07<00:00,  3.64s/it]

2025/11/12 19:44:52 INFO dspy.evaluate.evaluate: Average Metric: 27.03333333333333 / 35 (77.2%)
2025/11/12 19:44:52 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 77.24 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 7', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 4'].
2025/11/12 19:44:52 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24]
2025/11/12 19:44:52 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77]
2025/11/12 19:44:52 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:44:52 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 17 / 61 - Minibatch ==



Average Metric: 28.73 / 35 (82.1%): 100%|██████████| 35/35 [00:21<00:00,  1.63it/s]

2025/11/12 19:45:14 INFO dspy.evaluate.evaluate: Average Metric: 28.733333333333334 / 35 (82.1%)
2025/11/12 19:45:14 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 82.1 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 8', 'Predictor 1: Few-Shot Set 1', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 6'].
2025/11/12 19:45:14 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1]
2025/11/12 19:45:14 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77]
2025/11/12 19:45:14 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:45:14 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 18 / 61 - Minibatch ==



Average Metric: 28.33 / 35 (81.0%): 100%|██████████| 35/35 [00:27<00:00,  1.29it/s]

2025/11/12 19:45:41 INFO dspy.evaluate.evaluate: Average Metric: 28.333333333333332 / 35 (81.0%)
2025/11/12 19:45:41 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 80.95 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3', 'Predictor 1: Instruction 4', 'Predictor 1: Few-Shot Set 14', 'Predictor 2: Instruction 3', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:45:41 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95]
2025/11/12 19:45:41 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77]
2025/11/12 19:45:41 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:45:41 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 19 / 61 - Full Evaluation =====
2025/11/12 19:45:41 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program 


Average Metric: 42.70 / 52 (82.1%): 100%|██████████| 52/52 [00:18<00:00,  2.87it/s]

2025/11/12 19:46:01 INFO dspy.evaluate.evaluate: Average Metric: 42.699999999999996 / 52 (82.1%)
2025/11/12 19:46:01 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12]
2025/11/12 19:46:01 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76
2025/11/12 19:46:01 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 19:46:01 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 20 / 61 - Minibatch ==



Average Metric: 28.67 / 35 (81.9%): 100%|██████████| 35/35 [00:30<00:00,  1.16it/s]

2025/11/12 19:46:32 INFO dspy.evaluate.evaluate: Average Metric: 28.666666666666664 / 35 (81.9%)
2025/11/12 19:46:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.9 on minibatch of size 35 with parameters ['Predictor 0: Instruction 8', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 4', 'Predictor 2: Instruction 6', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:46:32 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9]
2025/11/12 19:46:32 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12]
2025/11/12 19:46:32 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:46:32 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 21 / 61 - Minibatch ==



Average Metric: 28.50 / 35 (81.4%): 100%|██████████| 35/35 [00:20<00:00,  1.74it/s]

2025/11/12 19:46:53 INFO dspy.evaluate.evaluate: Average Metric: 28.5 / 35 (81.4%)
2025/11/12 19:46:53 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.43 on minibatch of size 35 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 10', 'Predictor 2: Instruction 2', 'Predictor 2: Few-Shot Set 7'].
2025/11/12 19:46:53 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43]
2025/11/12 19:46:53 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12]
2025/11/12 19:46:53 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:46:53 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 22 / 61 - Minibatch ==



Average Metric: 25.77 / 35 (73.6%): 100%|██████████| 35/35 [00:08<00:00,  3.89it/s]

2025/11/12 19:47:02 INFO dspy.evaluate.evaluate: Average Metric: 25.766666666666666 / 35 (73.6%)
2025/11/12 19:47:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 73.62 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 10', 'Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 4', 'Predictor 2: Instruction 5', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:47:02 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62]
2025/11/12 19:47:02 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12]
2025/11/12 19:47:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:47:02 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 23 / 61 - Minibatch ==



Average Metric: 26.47 / 35 (75.6%): 100%|██████████| 35/35 [01:01<00:00,  1.75s/it]

2025/11/12 19:48:04 INFO dspy.evaluate.evaluate: Average Metric: 26.466666666666665 / 35 (75.6%)
2025/11/12 19:48:04 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 75.62 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 14', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 9', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 10'].
2025/11/12 19:48:04 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62]
2025/11/12 19:48:04 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12]
2025/11/12 19:48:04 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:48:04 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 24 / 61 - Minibatch ==



Average Metric: 26.20 / 35 (74.9%): 100%|██████████| 35/35 [00:14<00:00,  2.45it/s]

2025/11/12 19:48:19 INFO dspy.evaluate.evaluate: Average Metric: 26.2 / 35 (74.9%)
2025/11/12 19:48:19 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 74.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 7', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 2', 'Predictor 2: Few-Shot Set 2'].
2025/11/12 19:48:19 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86]
2025/11/12 19:48:19 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12]
2025/11/12 19:48:19 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:48:19 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 25 / 61 - Full Evaluation =====
2025/11/12 19:48:19 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on 


Average Metric: 42.17 / 52 (81.1%): 100%|██████████| 52/52 [00:09<00:00,  5.25it/s]

2025/11/12 19:48:30 INFO dspy.evaluate.evaluate: Average Metric: 42.166666666666664 / 52 (81.1%)
2025/11/12 19:48:30 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09]
2025/11/12 19:48:30 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76
2025/11/12 19:48:30 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 19:48:30 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 26 / 61 - Minibatch ==



Average Metric: 29.13 / 35 (83.2%): 100%|██████████| 35/35 [00:06<00:00,  5.10it/s]

2025/11/12 19:48:37 INFO dspy.evaluate.evaluate: Average Metric: 29.133333333333333 / 35 (83.2%)
2025/11/12 19:48:37 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 83.24 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:48:37 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24]
2025/11/12 19:48:37 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09]
2025/11/12 19:48:37 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:48:37 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 27 / 61 - Minibatch ==



Average Metric: 28.53 / 35 (81.5%): 100%|██████████| 35/35 [00:09<00:00,  3.53it/s]

2025/11/12 19:48:48 INFO dspy.evaluate.evaluate: Average Metric: 28.53333333333333 / 35 (81.5%)
2025/11/12 19:48:48 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.52 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 3', 'Predictor 1: Few-Shot Set 11', 'Predictor 2: Instruction 1', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:48:48 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52]
2025/11/12 19:48:48 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09]
2025/11/12 19:48:48 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:48:48 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 28 / 61 - Minibatch ==



Average Metric: 27.80 / 35 (79.4%): 100%|██████████| 35/35 [00:37<00:00,  1.06s/it]

2025/11/12 19:49:26 INFO dspy.evaluate.evaluate: Average Metric: 27.8 / 35 (79.4%)
2025/11/12 19:49:26 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 79.43 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 16', 'Predictor 2: Instruction 2', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:49:26 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43]
2025/11/12 19:49:26 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09]
2025/11/12 19:49:26 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:49:26 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 29 / 61 - Minibatch ==



Average Metric: 29.13 / 35 (83.2%): 100%|██████████| 35/35 [00:25<00:00,  1.38it/s]

2025/11/12 19:49:52 INFO dspy.evaluate.evaluate: Average Metric: 29.133333333333333 / 35 (83.2%)
2025/11/12 19:49:52 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 83.24 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 4', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 19:49:52 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24]
2025/11/12 19:49:52 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09]
2025/11/12 19:49:52 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:49:52 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 30 / 61 - Minibatch ==



Average Metric: 25.90 / 35 (74.0%): 100%|██████████| 35/35 [00:35<00:00,  1.02s/it]

2025/11/12 19:50:28 INFO dspy.evaluate.evaluate: Average Metric: 25.9 / 35 (74.0%)
2025/11/12 19:50:28 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 74.0 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 13', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 7'].
2025/11/12 19:50:28 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0]
2025/11/12 19:50:28 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09]
2025/11/12 19:50:28 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:50:28 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 31 / 61 - Full Evaluation =====
2025/11/12 19:50:28 INFO dspy.telepromp


Average Metric: 43.03 / 52 (82.8%): 100%|██████████| 52/52 [00:03<00:00, 17.20it/s] 

2025/11/12 19:50:33 INFO dspy.evaluate.evaluate: Average Metric: 43.03333333333333 / 52 (82.8%)
2025/11/12 19:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76]
2025/11/12 19:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76
2025/11/12 19:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 19:50:33 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 32 / 61 - Minibatch ==



Average Metric: 29.17 / 35 (83.3%): 100%|██████████| 35/35 [00:03<00:00,  8.89it/s]

2025/11/12 19:50:38 INFO dspy.evaluate.evaluate: Average Metric: 29.166666666666664 / 35 (83.3%)
2025/11/12 19:50:38 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 83.33 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 4', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 19:50:38 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33]
2025/11/12 19:50:38 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76]
2025/11/12 19:50:38 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:50:38 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 33 / 61 - Minibatch ==



Average Metric: 29.27 / 35 (83.6%): 100%|██████████| 35/35 [00:13<00:00,  2.52it/s]

2025/11/12 19:50:53 INFO dspy.evaluate.evaluate: Average Metric: 29.266666666666666 / 35 (83.6%)
2025/11/12 19:50:53 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 83.62 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 15', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 19:50:53 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62]
2025/11/12 19:50:53 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76]
2025/11/12 19:50:53 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:50:53 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 34 / 61 - Minibatch ==



Average Metric: 28.37 / 35 (81.0%): 100%|██████████| 35/35 [00:10<00:00,  3.43it/s]

2025/11/12 19:51:03 INFO dspy.evaluate.evaluate: Average Metric: 28.366666666666667 / 35 (81.0%)
2025/11/12 19:51:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.05 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 13', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 7', 'Predictor 2: Few-Shot Set 15'].
2025/11/12 19:51:03 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05]
2025/11/12 19:51:03 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76]
2025/11/12 19:51:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:51:03 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 35 / 61 - Minibatch ==



Average Metric: 27.70 / 35 (79.1%): 100%|██████████| 35/35 [00:12<00:00,  2.91it/s]

2025/11/12 19:51:16 INFO dspy.evaluate.evaluate: Average Metric: 27.7 / 35 (79.1%)
2025/11/12 19:51:16 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 79.14 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 15', 'Predictor 1: Instruction 6', 'Predictor 1: Few-Shot Set 17', 'Predictor 2: Instruction 3', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 19:51:16 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14]
2025/11/12 19:51:16 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76]
2025/11/12 19:51:16 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:51:16 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 36 / 61 - Minibatch ==



Average Metric: 26.13 / 35 (74.7%): 100%|██████████| 35/35 [00:11<00:00,  3.17it/s]

2025/11/12 19:51:28 INFO dspy.evaluate.evaluate: Average Metric: 26.133333333333333 / 35 (74.7%)
2025/11/12 19:51:28 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 74.67 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 15', 'Predictor 1: Instruction 3', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 11'].





2025/11/12 19:51:28 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67]
2025/11/12 19:51:28 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76]
2025/11/12 19:51:28 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:51:28 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 37 / 61 - Full Evaluation =====
2025/11/12 19:51:28 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 83.62) from minibatch trials...


Average Metric: 41.27 / 52 (79.4%): 100%|██████████| 52/52 [00:09<00:00,  5.51it/s]

2025/11/12 19:51:39 INFO dspy.evaluate.evaluate: Average Metric: 41.266666666666666 / 52 (79.4%)
2025/11/12 19:51:39 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36]
2025/11/12 19:51:39 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76
2025/11/12 19:51:39 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 19:51:39 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 38 / 61 - Minibatch ==



Average Metric: 27.70 / 35 (79.1%): 100%|██████████| 35/35 [00:08<00:00,  4.19it/s]

2025/11/12 19:51:48 INFO dspy.evaluate.evaluate: Average Metric: 27.7 / 35 (79.1%)
2025/11/12 19:51:48 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 79.14 on minibatch of size 35 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 15', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 19:51:48 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14]
2025/11/12 19:51:48 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36]
2025/11/12 19:51:48 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:51:48 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 39 / 61 - Minibat


Average Metric: 29.37 / 35 (83.9%): 100%|██████████| 35/35 [00:22<00:00,  1.53it/s] 

2025/11/12 19:52:12 INFO dspy.evaluate.evaluate: Average Metric: 29.366666666666667 / 35 (83.9%)
2025/11/12 19:52:12 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 83.9 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 6', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 6', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:52:12 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9]
2025/11/12 19:52:12 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36]
2025/11/12 19:52:12 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:52:12 INFO dspy.teleprompt.mipro_optimizer_v2: == Tria


Average Metric: 26.17 / 35 (74.8%): 100%|██████████| 35/35 [00:18<00:00,  1.90it/s]

2025/11/12 19:52:31 INFO dspy.evaluate.evaluate: Average Metric: 26.166666666666664 / 35 (74.8%)
2025/11/12 19:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 74.76 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 8', 'Predictor 1: Instruction 8', 'Predictor 1: Few-Shot Set 0', 'Predictor 2: Instruction 6', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76]
2025/11/12 19:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36]
2025/11/12 19:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:52:31 INFO dspy.teleprompt.mipro_optimizer_v2: =


Average Metric: 27.73 / 35 (79.2%): 100%|██████████| 35/35 [00:32<00:00,  1.06it/s]

2025/11/12 19:53:05 INFO dspy.evaluate.evaluate: Average Metric: 27.733333333333334 / 35 (79.2%)
2025/11/12 19:53:05 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 79.24 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 6', 'Predictor 1: Few-Shot Set 9', 'Predictor 2: Instruction 6', 'Predictor 2: Few-Shot Set 5'].
2025/11/12 19:53:05 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24]
2025/11/12 19:53:05 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36]
2025/11/12 19:53:05 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:53:05 INFO dspy.teleprompt.mipro_optimize


Average Metric: 29.07 / 35 (83.0%): 100%|██████████| 35/35 [00:19<00:00,  1.84it/s]

2025/11/12 19:53:24 INFO dspy.evaluate.evaluate: Average Metric: 29.066666666666666 / 35 (83.0%)
2025/11/12 19:53:24 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 83.05 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 7', 'Predictor 2: Few-Shot Set 0'].
2025/11/12 19:53:24 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05]
2025/11/12 19:53:24 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36]
2025/11/12 19:53:24 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.76


2025/11/12 19:53:24 INFO dspy.teleprompt.mipro


Average Metric: 43.10 / 52 (82.9%): 100%|██████████| 52/52 [00:00<00:00, 600.45it/s]

2025/11/12 19:53:26 INFO dspy.evaluate.evaluate: Average Metric: 43.1 / 52 (82.9%)
2025/11/12 19:53:26 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 82.88
2025/11/12 19:53:26 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88]
2025/11/12 19:53:26 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.88
2025/11/12 19:53:26 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 19:53:26 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 44 / 61 - Minibatch ==



Average Metric: 28.07 / 35 (80.2%): 100%|██████████| 35/35 [02:08<00:00,  3.67s/it]

2025/11/12 19:55:35 INFO dspy.evaluate.evaluate: Average Metric: 28.066666666666666 / 35 (80.2%)
2025/11/12 19:55:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 80.19 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 14', 'Predictor 2: Instruction 5', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 19:55:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19]
2025/11/12 19:55:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88]
2025/11/12 19:55:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.88


2025/11/12 19:55:35 INFO dspy.te


Average Metric: 29.00 / 35 (82.9%): 100%|██████████| 35/35 [00:10<00:00,  3.44it/s]

2025/11/12 19:55:47 INFO dspy.evaluate.evaluate: Average Metric: 29.0 / 35 (82.9%)
2025/11/12 19:55:47 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 82.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 6', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 6', 'Predictor 2: Few-Shot Set 17'].
2025/11/12 19:55:47 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86]
2025/11/12 19:55:47 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88]
2025/11/12 19:55:47 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.88


2025/11/12 19:55:47 INFO dspy.teleprom


Average Metric: 29.00 / 35 (82.9%): 100%|██████████| 35/35 [00:13<00:00,  2.51it/s]

2025/11/12 19:56:01 INFO dspy.evaluate.evaluate: Average Metric: 29.0 / 35 (82.9%)
2025/11/12 19:56:01 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 82.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 5', 'Predictor 1: Instruction 4', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 3', 'Predictor 2: Few-Shot Set 11'].
2025/11/12 19:56:01 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86]
2025/11/12 19:56:01 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88]
2025/11/12 19:56:01 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.88


2025/11/12 19:56:01 INFO dspy.te


Average Metric: 29.53 / 35 (84.4%): 100%|██████████| 35/35 [00:05<00:00,  6.54it/s] 

2025/11/12 19:56:07 INFO dspy.evaluate.evaluate: Average Metric: 29.53333333333333 / 35 (84.4%)
2025/11/12 19:56:07 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 84.38 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:56:07 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38]
2025/11/12 19:56:07 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88]
2025/11/12 19:56:07 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.88


2025/11/12 1


Average Metric: 27.50 / 35 (78.6%): 100%|██████████| 35/35 [00:25<00:00,  1.35it/s]

2025/11/12 19:56:34 INFO dspy.evaluate.evaluate: Average Metric: 27.5 / 35 (78.6%)
2025/11/12 19:56:34 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 78.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 3', 'Predictor 2: Few-Shot Set 5'].
2025/11/12 19:56:34 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57]
2025/11/12 19:56:34 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88]
2025/11/12 19:56:34 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.88


2025/11/12 19:56:34


Average Metric: 43.03 / 52 (82.8%): 100%|██████████| 52/52 [00:03<00:00, 14.38it/s]

2025/11/12 19:56:39 INFO dspy.evaluate.evaluate: Average Metric: 43.03333333333333 / 52 (82.8%)
2025/11/12 19:56:39 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76]
2025/11/12 19:56:39 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 82.88
2025/11/12 19:56:39 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 19:56:39 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 50 / 61 - Minibatch ==



Average Metric: 28.07 / 35 (80.2%): 100%|██████████| 35/35 [02:05<00:00,  3.58s/it]

2025/11/12 19:58:45 INFO dspy.evaluate.evaluate: Average Metric: 28.066666666666666 / 35 (80.2%)
2025/11/12 19:58:45 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 80.19 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 4', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 0', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 19:58:45 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19]
2025/11/12 19:58:45 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76]
2025/11/12 19:58:45 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far


Average Metric: 28.10 / 35 (80.3%): 100%|██████████| 35/35 [00:07<00:00,  4.71it/s]

2025/11/12 19:58:53 INFO dspy.evaluate.evaluate: Average Metric: 28.099999999999998 / 35 (80.3%)
2025/11/12 19:58:53 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 80.29 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 3', 'Predictor 1: Few-Shot Set 13', 'Predictor 2: Instruction 4', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 19:58:53 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19, 80.29]
2025/11/12 19:58:53 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76]
2025/11/12 19:58:53 INFO dspy.teleprompt.mipro_optimizer_v2: Best full scor


Average Metric: 28.23 / 35 (80.7%): 100%|██████████| 35/35 [00:08<00:00,  4.12it/s]

2025/11/12 19:59:02 INFO dspy.evaluate.evaluate: Average Metric: 28.23333333333333 / 35 (80.7%)
2025/11/12 19:59:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 80.67 on minibatch of size 35 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:59:02 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19, 80.29, 80.67]
2025/11/12 19:59:02 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76]
2025/11/12 19:59:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best full 


Average Metric: 28.20 / 35 (80.6%): 100%|██████████| 35/35 [00:06<00:00,  5.35it/s]

2025/11/12 19:59:09 INFO dspy.evaluate.evaluate: Average Metric: 28.2 / 35 (80.6%)
2025/11/12 19:59:09 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 80.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 15', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 6', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 19:59:09 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19, 80.29, 80.67, 80.57]
2025/11/12 19:59:09 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76]
2025/11/12 19:59:09 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score


Average Metric: 26.73 / 35 (76.4%): 100%|██████████| 35/35 [00:10<00:00,  3.48it/s]

2025/11/12 19:59:20 INFO dspy.evaluate.evaluate: Average Metric: 26.733333333333334 / 35 (76.4%)
2025/11/12 19:59:20 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 76.38 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 4', 'Predictor 1: Few-Shot Set 16', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 4'].
2025/11/12 19:59:20 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19, 80.29, 80.67, 80.57, 76.38]
2025/11/12 19:59:20 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76]
2025/11/12 19:59:20 INFO dspy.teleprompt.mipro_optimize


Average Metric: 43.20 / 52 (83.1%): 100%|██████████| 52/52 [00:02<00:00, 20.24it/s]

2025/11/12 19:59:24 INFO dspy.evaluate.evaluate: Average Metric: 43.199999999999996 / 52 (83.1%)
2025/11/12 19:59:24 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 83.08
2025/11/12 19:59:24 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76, 83.08]
2025/11/12 19:59:24 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 83.08
2025/11/12 19:59:24 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 19:59:24 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 56 / 61 - Minibatch ==



Average Metric: 28.30 / 35 (80.9%): 100%|██████████| 35/35 [00:47<00:00,  1.36s/it]

2025/11/12 20:00:13 INFO dspy.evaluate.evaluate: Average Metric: 28.3 / 35 (80.9%)
2025/11/12 20:00:13 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 80.86 on minibatch of size 35 with parameters ['Predictor 0: Instruction 3', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 8', 'Predictor 2: Few-Shot Set 14'].
2025/11/12 20:00:13 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19, 80.29, 80.67, 80.57, 76.38, 80.86]
2025/11/12 20:00:13 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76, 83.08]
2025/11/12 20:00:13 INFO dspy.teleprompt.mipro_optimize


Average Metric: 27.00 / 35 (77.1%): 100%|██████████| 35/35 [00:18<00:00,  1.88it/s]

2025/11/12 20:00:32 INFO dspy.evaluate.evaluate: Average Metric: 27.0 / 35 (77.1%)
2025/11/12 20:00:32 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 77.14 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 9', 'Predictor 1: Instruction 4', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 2', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 20:00:32 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19, 80.29, 80.67, 80.57, 76.38, 80.86, 77.14]
2025/11/12 20:00:32 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76, 83.08]
2025/11/12 20:00:32 INFO dspy.teleprompt.mipro_op


Average Metric: 27.50 / 35 (78.6%): 100%|██████████| 35/35 [01:52<00:00,  3.22s/it]

2025/11/12 20:02:25 INFO dspy.evaluate.evaluate: Average Metric: 27.5 / 35 (78.6%)
2025/11/12 20:02:25 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 78.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 11', 'Predictor 1: Instruction 7', 'Predictor 1: Few-Shot Set 2', 'Predictor 2: Instruction 1', 'Predictor 2: Few-Shot Set 12'].
2025/11/12 20:02:25 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19, 80.29, 80.67, 80.57, 76.38, 80.86, 77.14, 78.57]
2025/11/12 20:02:25 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76, 83.08]
2025/11/12 20:02:25 INFO dspy.teleprompt.


Average Metric: 29.57 / 35 (84.5%): 100%|██████████| 35/35 [00:01<00:00, 29.24it/s] 

2025/11/12 20:02:28 INFO dspy.evaluate.evaluate: Average Metric: 29.566666666666666 / 35 (84.5%)
2025/11/12 20:02:28 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 84.48 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 17', 'Predictor 1: Instruction 5', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 1', 'Predictor 2: Few-Shot Set 1'].
2025/11/12 20:02:28 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19, 80.29, 80.67, 80.57, 76.38, 80.86, 77.14, 78.57, 84.48]
2025/11/12 20:02:28 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76, 83.08]
2025/11/12 20:02:28 


Average Metric: 24.70 / 35 (70.6%): 100%|██████████| 35/35 [01:38<00:00,  2.82s/it]

2025/11/12 20:04:07 INFO dspy.evaluate.evaluate: Average Metric: 24.7 / 35 (70.6%)
2025/11/12 20:04:07 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 70.57 on minibatch of size 35 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 10', 'Predictor 1: Instruction 8', 'Predictor 1: Few-Shot Set 12', 'Predictor 2: Instruction 1', 'Predictor 2: Few-Shot Set 14'].
2025/11/12 20:04:07 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [84.95, 82.0, 82.86, 75.81, 74.38, 77.33, 65.52, 82.76, 83.05, 81.14, 78.48, 75.62, 77.24, 82.1, 80.95, 81.9, 81.43, 73.62, 75.62, 74.86, 83.24, 81.52, 79.43, 83.24, 74.0, 83.33, 83.62, 81.05, 79.14, 74.67, 79.14, 83.9, 74.76, 79.24, 83.05, 80.19, 82.86, 82.86, 84.38, 78.57, 80.19, 80.29, 80.67, 80.57, 76.38, 80.86, 77.14, 78.57, 84.48, 70.57]
2025/11/12 20:04:07 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76, 83.08]
2025/11/12 20:04:07 INFO d


Average Metric: 43.07 / 52 (82.8%): 100%|██████████| 52/52 [00:04<00:00, 11.43it/s]

2025/11/12 20:04:13 INFO dspy.evaluate.evaluate: Average Metric: 43.06666666666666 / 52 (82.8%)
2025/11/12 20:04:13 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [76.09, 82.76, 80.77, 82.12, 81.09, 82.76, 79.36, 82.88, 82.76, 83.08, 82.82]
2025/11/12 20:04:13 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 83.08
2025/11/12 20:04:13 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/11/12 20:04:13 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 83.08!





## Inspect Optimized Prompts

Display the optimized prompts that MIPROv2 generated for each predictor. These prompts have been refined through the optimization process and should be more effective than the original signatures.


In [11]:
for name, pred in optimized_program.named_predictors():
    print("================================")
    print(f"Predictor: {name}")
    print("================================")
    print("Prompt:")
    print(pred.signature.instructions)
    print("*********************************")

Predictor: urgency_module.predict
Prompt:
Analyze the following facility management message to determine its urgency level, considering that misclassification could lead to delayed response to a critical issue. Focus on identifying language cues, context, and tone indicating whether the message requires immediate action or can be addressed later. Provide a clear, actionable instruction for the language model to accurately classify the message’s urgency as low, medium, or high, emphasizing the importance of prioritization in urgent scenarios such as safety hazards, service disruptions, or emergencies. Remember, improper assessment could result in severe operational or safety consequences, so accuracy is paramount.
*********************************
Predictor: sentiment_module.predict
Prompt:
Analyze the given message and generate a clear, engaging instruction that guides a language model to accurately identify the tone or emotional stance expressed in the message, considering nuances suc

## Evaluate Optimized Program

Run final evaluation on the test set using the optimized program to confirm the performance improvement.


In [12]:
evaluate(optimized_program)

Average Metric: 56.13 / 68 (82.5%): 100%|██████████| 68/68 [00:49<00:00,  1.37it/s]

2025/11/12 20:05:04 INFO dspy.evaluate.evaluate: Average Metric: 56.13333333333333 / 68 (82.5%)





Unnamed: 0,message,answer,urgency,sentiment,categories,metric
0,"Hey ProCare Support Team, Hope you all are doing great! My name is...","{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",low,positive,"[sustainability_and_environmental_practices, general_inquiries]",✔️ [0.967]
1,"Hey ProCare Team, Hope you’re all doing well! My name’s Jake, and ...","{""categories"": {""routine_maintenance_requests"": true, ""customer_fe...",medium,positive,[routine_maintenance_requests],✔️ [1.000]
2,"Subject: Assistance Needed for HVAC Maintenance Hi [Receiver], I h...","{""categories"": {""routine_maintenance_requests"": true, ""customer_fe...",medium,neutral,[routine_maintenance_requests],✔️ [1.000]
3,Subject: A Green Inquiry from a Bill Maher Enthusiast Hey ProCare ...,"{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",low,positive,"[sustainability_and_environmental_practices, training_and_support_...",✔️ [0.967]
4,Subject: Inquiry on Sustainability Practices Dear ProCare Facility...,"{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",low,positive,"[sustainability_and_environmental_practices, general_inquiries]",✔️ [0.633]
...,...,...,...,...,...,...
63,Subject: Inquiry About Your Eco-Friendly Practices Dear ProCare Fa...,"{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",low,neutral,"[sustainability_and_environmental_practices, general_inquiries]",✔️ [0.967]
64,Subject: Assistance Needed for Facility Management Issue Dear ProC...,"{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",high,neutral,[facility_management_issues],✔️ [0.333]
65,"Subject: Request for Training and Support Hi ProCare Support Team,...","{""categories"": {""routine_maintenance_requests"": false, ""customer_f...",low,positive,[training_and_support_requests],✔️ [1.000]
66,Subject: Concerns About Studio Maintenance and Rent Increase Dear ...,"{""categories"": {""routine_maintenance_requests"": true, ""customer_fe...",high,negative,"[routine_maintenance_requests, facility_management_issues]",✔️ [0.267]


EvaluationResult(score=82.55, results=<list of 68 results>)

## Performance Improvement Summary

Display the improvement in accuracy from baseline (75.78%) to optimized (82.55%) performance.

75.78 -> 82.55