# LangWatch DSPy Visualizer

This notebook shows an example of a simple DSPy optimization process integrated with LangWatch for training visualization and debugging.

[<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/langwatch/langwatch/blob/main/python-sdk/examples/dspy_visualization.ipynb)

In [1]:
# Install langwatch along with dspy for the visualization
!pip install dspy langwatch


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Preparing the LLM

In [2]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OPENAI_API_KEY: ")

import dspy

llm = dspy.LM("openai/gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"])

print("LLM test response:", llm("hello there"))

colbertv2_wiki17_abstracts = dspy.ColBERTv2(
    url="http://20.102.90.50:2017/wiki17_abstracts"
)
dspy.settings.configure(lm=llm, rm=colbertv2_wiki17_abstracts)

LLM test response: ['Hello! How can I assist you today?']


## Preparing the Dataset

In [3]:
from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=32, eval_seed=2025, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

(32, 50)

## Defining the model

In [4]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")


class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages # type: ignore
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)


dev_example = devset[18]
print(f"[Devset] Question: {dev_example.question}")
print(f"[Devset] Answer: {dev_example.answer}")
print(f"[Devset] Relevant Wikipedia Titles: {dev_example.gold_titles}")

generate_answer = RAG()

pred = generate_answer(question=dev_example.question)

# Print the input and the prediction.
print(f"[Prediction] Question: {dev_example.question}")
print(f"[Prediction] Predicted Answer: {pred.answer}")

[Devset] Question: Which magazine was released first, Fortune or Motor Trend?
[Devset] Answer: Motor Trend
[Devset] Relevant Wikipedia Titles: {'Fortune (magazine)', 'Motor Trend'}
[Prediction] Question: Which magazine was released first, Fortune or Motor Trend?
[Prediction] Predicted Answer: Fortune


## Login to LangWatch

In [1]:
import langwatch

langwatch.login()

  from .autonotebook import tqdm as notebook_tqdm


Please go to https://app.langwatch.ai/authorize to get your API key
LangWatch API key set


## Start Training Session!

In [6]:
from dspy.teleprompt import MIPROv2
import dspy.evaluate

# Define our metric validation
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a MIPROv2 optimizer, which will compile our RAG program.
optimizer = MIPROv2(metric=validate_context_and_answer, prompt_model=llm, task_model=llm, num_candidates=2, init_temperature=0.7)

# Initialize langwatch for this run, to track the optimizer compilation
langwatch.dspy.init(experiment="my-awesome-experiment", optimizer=optimizer)

# Compile
compiled_rag = optimizer.compile( RAG(),
    trainset=trainset,
    num_trials=10,
    max_bootstrapped_demos=3,
    max_labeled_demos=5,
)

2025/05/09 14:34:42 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING MEDIUM AUTO RUN SETTINGS:
num_trials: 25
minibatch: False
num_candidates: 19
valset size: 25




[LangWatch] Experiment initialized, run_id: gay-jasmine-pigeon
[LangWatch] Open https://app.langwatch.ai/public-documentation-examples-YxS3Bf/experiments/my-awesome-experiment?runIds=gay-jasmine-pigeon to track your DSPy training session live



2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=19 sets of demonstrations...


Bootstrapping set 1/19
Bootstrapping set 2/19
Bootstrapping set 3/19


100%|██████████| 7/7 [00:00<00:00, 216.70it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 4/19


100%|██████████| 7/7 [00:00<00:00, 238.30it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 5/19


 57%|█████▋    | 4/7 [00:00<00:00, 383.80it/s]


Bootstrapped 1 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
Bootstrapping set 6/19


 43%|████▎     | 3/7 [00:00<00:00, 177.71it/s]


Bootstrapped 1 full traces after 3 examples for up to 1 rounds, amounting to 3 attempts.
Bootstrapping set 7/19


100%|██████████| 7/7 [00:00<00:00, 426.47it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 8/19


100%|██████████| 7/7 [00:00<00:00, 347.24it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 9/19


 71%|███████▏  | 5/7 [00:00<00:00, 559.60it/s]


Bootstrapped 1 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 10/19


100%|██████████| 7/7 [00:00<00:00, 314.52it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 11/19


 86%|████████▌ | 6/7 [00:00<00:00, 507.27it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.
Bootstrapping set 12/19


 71%|███████▏  | 5/7 [00:00<00:00, 410.37it/s]


Bootstrapped 1 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 13/19


100%|██████████| 7/7 [00:00<00:00, 56.82it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 14/19


100%|██████████| 7/7 [00:00<00:00, 602.46it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 15/19


100%|██████████| 7/7 [00:00<00:00, 488.53it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 16/19


100%|██████████| 7/7 [00:00<00:00, 503.09it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 17/19


 86%|████████▌ | 6/7 [00:00<00:00, 504.97it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.
Bootstrapping set 18/19


100%|██████████| 7/7 [00:00<00:00, 473.65it/s]


Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Bootstrapping set 19/19


100%|██████████| 7/7 [00:00<00:00, 418.26it/s]
2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing instructions...

2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Answer questions with short factoid answers.

2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are a trivia expert. Given the provided context, answer the following question with a short factoid response (1 to 5 words).

2025/05/09 14:34:44 INFO dspy.teleprompt.mipro_optimizer_v2: 2: Imagine you are a contestant on a high-stake

Bootstrapped 1 full traces after 6 examples for up to 1 rounds, amounting to 7 attempts.
Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:00<00:00, 357.72it/s]

2025/05/09 14:34:44 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)





2025/05/09 14:34:46 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 40.0

2025/05/09 14:34:46 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 25 =====


Average Metric: 8.00 / 25 (32.0%): 100%|██████████| 25/25 [00:00<00:00, 40.66it/s]

2025/05/09 14:34:47 INFO dspy.evaluate.evaluate: Average Metric: 8 / 25 (32.0%)





2025/05/09 14:34:47 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 32.0 with parameters ['Predictor 0: Instruction 12', 'Predictor 0: Few-Shot Set 7'].
2025/05/09 14:34:47 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0]
2025/05/09 14:34:47 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 40.0


2025/05/09 14:34:47 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 25 =====


Average Metric: 9.00 / 25 (36.0%): 100%|██████████| 25/25 [00:00<00:00, 332.77it/s]

2025/05/09 14:34:47 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)





2025/05/09 14:34:48 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 with parameters ['Predictor 0: Instruction 10', 'Predictor 0: Few-Shot Set 7'].
2025/05/09 14:34:48 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0]
2025/05/09 14:34:48 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 40.0


2025/05/09 14:34:48 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 25 =====


Average Metric: 11.00 / 25 (44.0%): 100%|██████████| 25/25 [00:00<00:00, 397.39it/s]

2025/05/09 14:34:48 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)





2025/05/09 14:34:48 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 44.0
2025/05/09 14:34:48 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 18'].
2025/05/09 14:34:48 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0]
2025/05/09 14:34:48 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:48 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 25 =====


Average Metric: 9.00 / 25 (36.0%): 100%|██████████| 25/25 [00:00<00:00, 372.74it/s]

2025/05/09 14:34:48 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)





2025/05/09 14:34:49 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 with parameters ['Predictor 0: Instruction 15', 'Predictor 0: Few-Shot Set 2'].
2025/05/09 14:34:49 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0]
2025/05/09 14:34:49 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:49 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 25 =====


Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:00<00:00, 286.23it/s]

2025/05/09 14:34:49 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)





2025/05/09 14:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 with parameters ['Predictor 0: Instruction 8', 'Predictor 0: Few-Shot Set 18'].
2025/05/09 14:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0]
2025/05/09 14:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 25 =====


Average Metric: 9.00 / 25 (36.0%): 100%|██████████| 25/25 [00:00<00:00, 323.79it/s]

2025/05/09 14:34:50 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)





2025/05/09 14:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 1'].
2025/05/09 14:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0]
2025/05/09 14:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:50 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 25 =====


Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:00<00:00, 336.26it/s]

2025/05/09 14:34:50 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)





2025/05/09 14:34:51 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 12'].
2025/05/09 14:34:51 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0]
2025/05/09 14:34:51 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:51 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 9 / 25 =====


Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:00<00:00, 367.12it/s]

2025/05/09 14:34:51 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)





2025/05/09 14:34:51 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 with parameters ['Predictor 0: Instruction 11', 'Predictor 0: Few-Shot Set 13'].
2025/05/09 14:34:51 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0]
2025/05/09 14:34:51 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:51 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 10 / 25 =====


Average Metric: 11.00 / 25 (44.0%): 100%|██████████| 25/25 [00:00<00:00, 404.03it/s]

2025/05/09 14:34:51 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)





2025/05/09 14:34:52 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 4'].
2025/05/09 14:34:52 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0]
2025/05/09 14:34:52 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:52 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 25 =====


Average Metric: 11.00 / 25 (44.0%): 100%|██████████| 25/25 [00:00<00:00, 359.01it/s]

2025/05/09 14:34:52 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)





2025/05/09 14:34:52 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 with parameters ['Predictor 0: Instruction 7', 'Predictor 0: Few-Shot Set 18'].
2025/05/09 14:34:52 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0]
2025/05/09 14:34:52 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:52 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 12 / 25 =====


Average Metric: 9.00 / 25 (36.0%): 100%|██████████| 25/25 [00:00<00:00, 385.01it/s]

2025/05/09 14:34:53 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)





2025/05/09 14:34:53 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 8'].
2025/05/09 14:34:53 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0]
2025/05/09 14:34:53 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:53 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 25 =====


Average Metric: 11.00 / 25 (44.0%): 100%|██████████| 25/25 [00:00<00:00, 399.78it/s]

2025/05/09 14:34:53 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)





2025/05/09 14:34:53 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 with parameters ['Predictor 0: Instruction 5', 'Predictor 0: Few-Shot Set 4'].
2025/05/09 14:34:53 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0]
2025/05/09 14:34:53 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:53 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 14 / 25 =====


Average Metric: 11.00 / 25 (44.0%): 100%|██████████| 25/25 [00:00<00:00, 372.34it/s]

2025/05/09 14:34:54 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)





2025/05/09 14:34:54 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 with parameters ['Predictor 0: Instruction 9', 'Predictor 0: Few-Shot Set 10'].
2025/05/09 14:34:54 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0]
2025/05/09 14:34:54 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:54 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 15 / 25 =====


Average Metric: 11.00 / 25 (44.0%): 100%|██████████| 25/25 [00:00<00:00, 318.49it/s]

2025/05/09 14:34:54 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)





2025/05/09 14:34:55 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 4'].
2025/05/09 14:34:55 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0]
2025/05/09 14:34:55 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:55 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 16 / 25 =====


Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:00<00:00, 319.98it/s]

2025/05/09 14:34:55 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)





2025/05/09 14:34:56 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 with parameters ['Predictor 0: Instruction 13', 'Predictor 0: Few-Shot Set 3'].
2025/05/09 14:34:56 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0]
2025/05/09 14:34:56 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:56 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 17 / 25 =====


Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:00<00:00, 344.25it/s]

2025/05/09 14:34:56 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)





2025/05/09 14:34:56 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 with parameters ['Predictor 0: Instruction 16', 'Predictor 0: Few-Shot Set 6'].
2025/05/09 14:34:56 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0]
2025/05/09 14:34:56 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 44.0


2025/05/09 14:34:56 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 18 / 25 =====


Average Metric: 12.00 / 25 (48.0%): 100%|██████████| 25/25 [00:00<00:00, 367.23it/s]

2025/05/09 14:34:56 INFO dspy.evaluate.evaluate: Average Metric: 12 / 25 (48.0%)





2025/05/09 14:34:57 INFO dspy.teleprompt.mipro_optimizer_v2: [92mBest full score so far![0m Score: 48.0
2025/05/09 14:34:57 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.0 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 14'].
2025/05/09 14:34:57 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0, 48.0]
2025/05/09 14:34:57 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 48.0


2025/05/09 14:34:57 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 19 / 25 =====


Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:00<00:00, 370.02it/s]

2025/05/09 14:34:57 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)





2025/05/09 14:34:57 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 14'].
2025/05/09 14:34:57 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0, 48.0, 40.0]
2025/05/09 14:34:57 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 48.0


2025/05/09 14:34:57 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 20 / 25 =====


Average Metric: 12.00 / 25 (48.0%): 100%|██████████| 25/25 [00:00<00:00, 399.46it/s]

2025/05/09 14:34:57 INFO dspy.evaluate.evaluate: Average Metric: 12 / 25 (48.0%)





2025/05/09 14:34:58 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.0 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 14'].
2025/05/09 14:34:58 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0, 48.0, 40.0, 48.0]
2025/05/09 14:34:58 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 48.0


2025/05/09 14:34:58 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 21 / 25 =====


Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:00<00:00, 363.16it/s]

2025/05/09 14:34:58 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)





2025/05/09 14:34:58 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 with parameters ['Predictor 0: Instruction 9', 'Predictor 0: Few-Shot Set 14'].
2025/05/09 14:34:58 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0, 48.0, 40.0, 48.0, 40.0]
2025/05/09 14:34:58 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 48.0


2025/05/09 14:34:58 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 22 / 25 =====


Average Metric: 11.00 / 25 (44.0%): 100%|██████████| 25/25 [00:00<00:00, 409.33it/s]

2025/05/09 14:34:58 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)





2025/05/09 14:34:59 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 with parameters ['Predictor 0: Instruction 17', 'Predictor 0: Few-Shot Set 14'].
2025/05/09 14:34:59 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0, 48.0, 40.0, 48.0, 40.0, 44.0]
2025/05/09 14:34:59 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 48.0


2025/05/09 14:34:59 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 23 / 25 =====


Average Metric: 12.00 / 25 (48.0%): 100%|██████████| 25/25 [00:00<00:00, 379.15it/s]

2025/05/09 14:34:59 INFO dspy.evaluate.evaluate: Average Metric: 12 / 25 (48.0%)





2025/05/09 14:34:59 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.0 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 14'].
2025/05/09 14:34:59 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0, 48.0, 40.0, 48.0, 40.0, 44.0, 48.0]
2025/05/09 14:34:59 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 48.0


2025/05/09 14:34:59 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 24 / 25 =====


Average Metric: 12.00 / 25 (48.0%): 100%|██████████| 25/25 [00:00<00:00, 390.89it/s]

2025/05/09 14:35:00 INFO dspy.evaluate.evaluate: Average Metric: 12 / 25 (48.0%)





2025/05/09 14:35:00 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.0 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 14'].
2025/05/09 14:35:00 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0, 48.0, 40.0, 48.0, 40.0, 44.0, 48.0, 48.0]
2025/05/09 14:35:00 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 48.0


2025/05/09 14:35:00 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 25 / 25 =====


Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:00<00:00, 412.88it/s]

2025/05/09 14:35:00 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)





2025/05/09 14:35:01 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 15'].
2025/05/09 14:35:01 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0, 48.0, 40.0, 48.0, 40.0, 44.0, 48.0, 48.0, 40.0]
2025/05/09 14:35:01 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 48.0


2025/05/09 14:35:01 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 26 / 25 =====


Average Metric: 9.00 / 25 (36.0%): 100%|██████████| 25/25 [00:00<00:00, 377.49it/s]

2025/05/09 14:35:01 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)





2025/05/09 14:35:01 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 with parameters ['Predictor 0: Instruction 6', 'Predictor 0: Few-Shot Set 17'].
2025/05/09 14:35:01 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [40.0, 32.0, 36.0, 44.0, 36.0, 40.0, 36.0, 40.0, 40.0, 44.0, 44.0, 36.0, 44.0, 44.0, 44.0, 40.0, 40.0, 48.0, 40.0, 48.0, 40.0, 44.0, 48.0, 48.0, 40.0, 36.0]
2025/05/09 14:35:01 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 48.0


2025/05/09 14:35:01 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 48.0!


: 

In [11]:
compiled_rag

generate_answer.predict = Predict(StringSignature(context, question -> reasoning, answer
    instructions='Given the context and the question, generate a reasoned answer that is concise and factually correct, ensuring it consists of no more than 5 words.'
    context = Field(annotation=str required=True json_schema_extra={'desc': 'may contain relevant facts', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    reasoning = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${reasoning}', '__dspy_field_type': 'output'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'often between 1 and 5 words', '__dspy_field_type': 'output', 'prefix': 'Answer:'})
))

In [None]:
compiled_rag.save("optimized_model.json")