<a href="https://colab.research.google.com/github/hanhanwu/Hanhan_COLAB_Experiemnts/blob/master/GenAI_Practice/Langwatch/try_dspy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Try DsPy for RAG Prompt Optimization


In [1]:
%%capture --no-stderr
!pip install --upgrade nbformat
%pip install -U --quiet dspy

## Prepare LLM

* `http://20.102.90.50:2017/wiki17_abstracts` provides the sources for retrieval here

In [2]:
import os
import pandas as pd
from getpass import getpass
import dspy
from google.colab import userdata


# OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
# llm = dspy.LM("openai/gpt-4.1-nano", api_key=OPENAI_API_KEY)

GOOGLE_AI_API_KEY = userdata.get('GOOGLE_AI_API_KEY')
llm = dspy.LM("gemini/gemini-2.0-flash", api_key=GOOGLE_AI_API_KEY)
print("LLM test response:", llm("Where's Silicon Valley?"))

# the retrieval model
colbertv2_wiki17_abstracts = dspy.ColBERTv2(
    url="http://20.102.90.50:2017/wiki17_abstracts"
)
dspy.settings.configure(lm=llm, rm=colbertv2_wiki17_abstracts)

LLM test response: ['Silicon Valley is located in the southern part of the San Francisco Bay Area in **Northern California, United States**.\n']


## Preparing Dataset

In [3]:
from dspy.datasets import HotPotQA


dataset = HotPotQA(train_seed=1, train_size=32, eval_seed=2025, dev_size=50, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

print()
print(len(trainset), len(devset))
print(trainset[0])
print(devset[0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/9.19k [00:00<?, ?B/s]

hotpot_qa.py:   0%|          | 0.00/6.42k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/566M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/47.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/46.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/90447 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/7405 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7405 [00:00<?, ? examples/s]


32 50
Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})
Example({'question': 'Pehchaan: The Face of Truth stars Vinod Khanna, Rati Agnihotri and which Indian actress, producer, and former model who also produced the film?', 'answer': 'Raveena Tandon', 'gold_titles': {'Raveena Tandon', 'Pehchaan: The Face of Truth'}}) (input_keys={'question'})


## Defining DsPy RAG

In [4]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")


class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context,
                               answer=prediction.answer,
                               reasoning=prediction.reasoning)


dev_example = devset[12]
print(f"[Devset] Question: {dev_example.question}")
print(f"[Devset] Answer: {dev_example.answer}")
print(f"[Devset] Relevant Wikipedia Titles: {dev_example.gold_titles}")
print()

generate_answer = RAG()
pred = generate_answer(question=dev_example.question)
print(f"[Prediction] Question: {dev_example.question}")
print(f"[Prediction] Predicted Answer: {pred.answer}")
print(f"[Prediction] Reasoning: {pred.reasoning}")

[Devset] Question: Twelve Inches is a compilation album by which 1980s British band?
[Devset] Answer: Frankie Goes to Hollywood
[Devset] Relevant Wikipedia Titles: {'Twelve Inches', 'Frankie Goes to Hollywood'}

[Prediction] Question: Twelve Inches is a compilation album by which 1980s British band?
[Prediction] Predicted Answer: Bananarama
[Prediction] Reasoning: The question asks which 1980s British band released a compilation album called "Twelve Inches". I need to find a band that matches both criteria.
The context provides three albums with "Twelve Inch" in the title:
- The Twelve Inch Singles by Soft Cell
- The Twelve Inches of Bananarama by Bananarama
- The Twelve Inch Mixes by Spandau Ballet

All three bands are British and were active in the 1980s. However, the question asks for the album title "Twelve Inches", so the answer must be Bananarama.


In [5]:
from dspy.teleprompt import MIPROv2


trial_logs = []

def validate_context_and_answer(example, prediction):
    gold = example.answer.strip().lower()
    pred = prediction.answer.strip().lower()
    score = int(gold == pred)

    # Format similar to Langwatch's internal logging
    log_entry = {
        "input": {
            "question": example.question,
            "context": getattr(example, "context", "")
        },
        "output": {
            "answer": pred
        },
        # Include prediction trace (as dict if possible)
        "trace": prediction.__dict__ if hasattr(prediction, "__dict__") else str(prediction),
        "score": score,
        "optimizer_name": optimizer.__class__.__name__,
    }
    trial_logs.append(log_entry)

    print(f"[Trial] Q: {example.question} | Pred: {pred} | GT: {gold} | Score: {score}")
    return score


optimizer = MIPROv2(
    metric=validate_context_and_answer,
    prompt_model=llm,
    task_model=llm,
    num_candidates=2,
    init_temperature=0.7,
    auto=None,
)


compiled_rag = optimizer.compile(
    RAG(),
    trainset=trainset,
    num_trials=5,
    max_bootstrapped_demos=2,
    max_labeled_demos=3,
    minibatch_size=4,
    requires_permission_to_run=False
)

2025/05/31 16:30:46 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/05/31 16:30:46 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/05/31 16:30:46 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=2 sets of demonstrations...
2025/05/31 16:30:46 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/05/31 16:30:46 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


Bootstrapping set 1/2
Bootstrapping set 2/2


2025/05/31 16:30:48 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=2 instructions...

2025/05/31 16:31:10 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/05/31 16:31:10 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Answer questions with short factoid answers.

2025/05/31 16:31:10 INFO dspy.teleprompt.mipro_optimizer_v2: 1: Given relevant context, answer the question with a short, fact-based answer. Explain your reasoning step-by-step before providing the final answer.

2025/05/31 16:31:10 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/05/31 16:31:10 INFO dspy.teleprompt.mipro_optimizer_v2: ==> STEP 3: FINDING OPTIMAL PROMPT PARAMETERS <==
2025/05/31 16:31:10 INFO dspy.teleprompt.mipro_optimizer_v2: We will evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination using Bayesian Optimization.

2025/05/31 16:31:10 INFO dspy.teleprompt.mipro_optimizer_v2: == Tri

  0%|          | 0/25 [00:00<?, ?it/s][Trial] Q: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino? | Pred: rosario dawson | GT: rosario dawson | Score: 1
Average Metric: 1.00 / 1 (100.0%):   4%|▍         | 1/25 [00:01<00:44,  1.86s/it][Trial] Q: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?  | Pred: outfield of dreams | GT: "outfield of dreams" | Score: 0
Average Metric: 1.00 / 2 (50.0%):   8%|▊         | 2/25 [00:02<00:20,  1.15it/s][Trial] Q: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?  | Pred: space | GT: space | Score: 1
Average Metric: 2.00 / 3 (66.7%):   8%|▊         | 2/25 [00:02<00:20,  1.15it/s][Trial] Q: Tombstone stared an actor born May 17, 1955 known as who? | Pred: unknown | GT: bill paxton | Score: 0
Average Metric: 2.00 / 4 (50.0%):  16%|█▌        | 4/2

2025/05/31 16:31:24 ERROR dspy.utils.parallelizer: Error for Example({'question': '"Everything Has Changed" is a song from an album released under which record label ?', 'answer': 'Big Machine Records'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
      

Average Metric: 4.00 / 10 (40.0%):  44%|████▍     | 11/25 [00:13<00:34,  2.43s/it]

2025/05/31 16:31:26 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Are Baltasar Kormákur and John G. Avildsen both film producers?', 'answer': 'no'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            "quotaDimensions": {
           

Average Metric: 4.00 / 10 (40.0%):  48%|████▊     | 12/25 [00:16<00:31,  2.42s/it]

2025/05/31 16:31:26 ERROR dspy.utils.parallelizer: Error for Example({'question': 'This American guitarist best known for her work with the Iron Maidens is an ancestor of a composer who was known as what?', 'answer': 'The Waltz King'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerP

Average Metric: 4.00 / 10 (40.0%):  52%|█████▏    | 13/25 [00:16<00:21,  1.82s/it]

2025/05/31 16:31:26 ERROR dspy.utils.parallelizer: Error for Example({'question': "Do Stu Block and Johnny Bonnel's bands play the same type of music?", 'answer': 'no'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            "quotaDimensions": {
       

Average Metric: 4.00 / 10 (40.0%):  56%|█████▌    | 14/25 [00:16<00:15,  1.38s/it]

2025/05/31 16:31:27 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Who was the Tennis Masters Cup champion in 2000, Gustavo Kuerten or Stan Wawrinka?', 'answer': 'Gustavo Kuerten'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            

Average Metric: 4.00 / 10 (40.0%):  60%|██████    | 15/25 [00:16<00:11,  1.13s/it]

2025/05/31 16:31:27 ERROR dspy.utils.parallelizer: Error for Example({'question': 'What head of state position was held by Harry S Truman when he gave Harold E Wilson the Medal of Honor?', 'answer': 'President of the United States'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerPro

Average Metric: 4.00 / 10 (40.0%):  60%|██████    | 15/25 [00:16<00:11,  1.13s/it]

2025/05/31 16:31:27 ERROR dspy.utils.parallelizer: Error for Example({'question': 'On the coast of what ocean is the birthplace of Diogal Sakho?', 'answer': 'Atlantic'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            "quotaDimensions": {
       

Average Metric: 4.00 / 10 (40.0%):  68%|██████▊   | 17/25 [00:17<00:05,  1.48it/s]

2025/05/31 16:31:27 ERROR dspy.utils.parallelizer: Error for Example({'question': 'How old is the fossil record of the order that contains the only strictly marine herbivorous mammal?', 'answer': '50-million-year-old fossil record'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerPro

Average Metric: 4.00 / 10 (40.0%):  68%|██████▊   | 17/25 [00:17<00:05,  1.48it/s]

2025/05/31 16:31:34 ERROR dspy.utils.parallelizer: Error for Example({'question': 'What evening cable television station programming block has a show with Ashley Holliday as a cast member?', 'answer': 'Nick at Nite'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-Fre

Average Metric: 4.00 / 10 (40.0%):  76%|███████▌  | 19/25 [00:24<00:10,  1.78s/it]

2025/05/31 16:31:38 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which band had a longer hiatus, Juliette and the Licks or The Last Shadow Puppets?', 'answer': 'The Last Shadow Puppets'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
    

Average Metric: 4.00 / 10 (40.0%):  80%|████████  | 20/25 [00:27<00:06,  1.39s/it]

2025/05/31 16:31:38 ERROR dspy.teleprompt.utils: An exception occurred during evaluation
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/dspy/teleprompt/utils.py", line 52, in eval_candidate_program
    return evaluate(candidate_program, devset=trainset, return_all_scores=return_all_scores, callback_metadata={"metric_key": "eval_full"})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/dspy/utils/callback.py", line 326, in sync_wrapper
    return fn(instance, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/dspy/evaluate/evaluate.py", line 171, in __call__
    results = executor.execute(process_item, devset)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/dspy/utils/parallelizer.py", line 48, in execute
    


  0%|          | 0/4 [00:00<?, ?it/s]

2025/05/31 16:31:39 ERROR dspy.utils.parallelizer: Error for Example({'question': 'What person does Wormholes in fiction and Nathan Rosen have in common?', 'answer': 'Einstein'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            "quotaDimensions": 

Average Metric: 0.00 / 0 (0%):  25%|██▌       | 1/4 [00:11<00:33, 11.21s/it]

2025/05/31 16:31:49 ERROR dspy.utils.parallelizer: Error for Example({'question': 'What head of state position was held by Harry S Truman when he gave Harold E Wilson the Medal of Honor?', 'answer': 'President of the United States'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerPro

Average Metric: 0.00 / 0 (0%):  25%|██▌       | 1/4 [00:11<00:33, 11.21s/it]

2025/05/31 16:31:49 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?', 'answer': 'Aleksandr Danilovich Aleksandrov'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
           

Average Metric: 0.00 / 0 (0%):  75%|███████▌  | 3/4 [00:11<00:02,  2.98s/it]

2025/05/31 16:31:50 ERROR dspy.utils.parallelizer: Error for Example({'question': "Remember Me Ballin' is a CD single by Indo G that features an American rapper born in what year?", 'answer': '1979'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
         

Average Metric: 0.00 / 0 (0%): 100%|██████████| 4/4 [00:11<00:00,  2.90s/it]

2025/05/31 16:31:50 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 4 (0.0%)
2025/05/31 16:31:50 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 0.0 on minibatch of size 4 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0'].
2025/05/31 16:31:50 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [0.0]
2025/05/31 16:31:50 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [0.0]
2025/05/31 16:31:50 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 0.0


2025/05/31 16:31:50 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 7 - Minibatch ==



  0%|          | 0/4 [00:00<?, ?it/s][Trial] Q: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko? | Pred: cannot determine | GT: aleksandr danilovich aleksandrov | Score: 0
Average Metric: 0.00 / 1 (0.0%):  25%|██▌       | 1/4 [00:01<00:04,  1.65s/it][Trial] Q: Do Stu Block and Johnny Bonnel's bands play the same type of music? | Pred: no | GT: no | Score: 1
Average Metric: 1.00 / 2 (50.0%):  25%|██▌       | 1/4 [00:01<00:04,  1.65s/it][Trial] Q: What person does Wormholes in fiction and Nathan Rosen have in common? | Pred: nathan rosen | GT: einstein | Score: 0
Average Metric: 1.00 / 3 (33.3%):  75%|███████▌  | 3/4 [00:01<00:00,  2.07it/s][Trial] Q: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division? | Pred: operation citadel | GT: operation citadel | Score: 1
Average Metric: 2.00 / 4 (50.0%): 1

2025/05/31 16:31:52 INFO dspy.evaluate.evaluate: Average Metric: 2 / 4 (50.0%)
2025/05/31 16:31:52 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 50.0 on minibatch of size 4 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0'].
2025/05/31 16:31:52 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [0.0, 50.0]
2025/05/31 16:31:52 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [0.0]
2025/05/31 16:31:52 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 0.0


2025/05/31 16:31:52 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 7 - Minibatch ==



  0%|          | 0/4 [00:00<?, ?it/s][Trial] Q: What person does Wormholes in fiction and Nathan Rosen have in common? | Pred: nathan rosen | GT: einstein | Score: 0
Average Metric: 0.00 / 1 (0.0%):  25%|██▌       | 1/4 [00:01<00:04,  1.37s/it][Trial] Q: Who composed "Sunflower Slow Drag" with the King of Ragtime? | Pred: scott hayden | GT: scott hayden | Score: 1
Average Metric: 1.00 / 2 (50.0%):  25%|██▌       | 1/4 [00:01<00:04,  1.37s/it][Trial] Q: Tombstone stared an actor born May 17, 1955 known as who? | Pred: not in provided context. | GT: bill paxton | Score: 0
Average Metric: 1.00 / 3 (33.3%):  75%|███████▌  | 3/4 [00:01<00:00,  1.97it/s][Trial] Q: How old is the fossil record of the order that contains the only strictly marine herbivorous mammal? | Pred: 50 million years old | GT: 50-million-year-old fossil record | Score: 0
Average Metric: 1.00 / 4 (25.0%): 100%|██████████| 4/4 [00:01<00:00,  2.10it/s]

2025/05/31 16:31:54 INFO dspy.evaluate.evaluate: Average Metric: 1 / 4 (25.0%)
2025/05/31 16:31:54 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 25.0 on minibatch of size 4 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1'].
2025/05/31 16:31:54 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [0.0, 50.0, 25.0]
2025/05/31 16:31:54 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [0.0]
2025/05/31 16:31:54 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 0.0


2025/05/31 16:31:54 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 7 - Minibatch ==



  0%|          | 0/4 [00:00<?, ?it/s][Trial] Q: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?  | Pred: space | GT: space | Score: 1
Average Metric: 1.00 / 1 (100.0%):   0%|          | 0/4 [00:00<?, ?it/s][Trial] Q: This American guitarist best known for her work with the Iron Maidens is an ancestor of a composer who was known as what? | Pred: johann strauss ii | GT: the waltz king | Score: 0
Average Metric: 1.00 / 2 (50.0%):  50%|█████     | 2/4 [00:01<00:01,  1.60it/s][Trial] Q: Who was coach of the No. 9-ranked team that was upset in the NCAA Tournament by the 2014-15 UAB Blazers men's basketball team?   | Pred: cannot answer. | GT: fred hoiberg | Score: 0
Average Metric: 1.00 / 3 (33.3%):  75%|███████▌  | 3/4 [00:01<00:00,  1.96it/s][Trial] Q: Who composed "Sunflower Slow Drag" with the King of Ragtime? | Pred: scott hayden | GT: scott hayden | Score: 1
Average Metric: 2.00 / 4 (50.0%): 100%|██████████| 4/4 [00:01<00:00

2025/05/31 16:31:56 INFO dspy.evaluate.evaluate: Average Metric: 2 / 4 (50.0%)
2025/05/31 16:31:56 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 50.0 on minibatch of size 4 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0'].
2025/05/31 16:31:56 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [0.0, 50.0, 25.0, 50.0]
2025/05/31 16:31:56 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [0.0]
2025/05/31 16:31:56 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 0.0


2025/05/31 16:31:56 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 7 - Minibatch ==



  0%|          | 0/4 [00:00<?, ?it/s][Trial] Q: Do Stu Block and Johnny Bonnel's bands play the same type of music? | Pred: no | GT: no | Score: 1
Average Metric: 1.00 / 1 (100.0%):  25%|██▌       | 1/4 [00:01<00:03,  1.07s/it][Trial] Q: Who was the Tennis Masters Cup champion in 2000, Gustavo Kuerten or Stan Wawrinka? | Pred: cannot answer | GT: gustavo kuerten | Score: 0
Average Metric: 1.00 / 2 (50.0%):  50%|█████     | 2/4 [00:01<00:01,  1.91it/s][Trial] Q: Who was coach of the No. 9-ranked team that was upset in the NCAA Tournament by the 2014-15 UAB Blazers men's basketball team?   | Pred: not mentioned in context | GT: fred hoiberg | Score: 0
Average Metric: 1.00 / 3 (33.3%):  75%|███████▌  | 3/4 [00:01<00:00,  2.99it/s][Trial] Q: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?  | Pred: space | GT: space | Score: 1
Average Metric: 2.00 / 4 (50.0%): 100%|██████████| 4/4 [00:01<00:00,  3.00it/s]

2025/05/31 16:31:57 INFO dspy.evaluate.evaluate: Average Metric: 2 / 4 (50.0%)
2025/05/31 16:31:57 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 50.0 on minibatch of size 4 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1'].
2025/05/31 16:31:57 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [0.0, 50.0, 25.0, 50.0, 50.0]
2025/05/31 16:31:57 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [0.0]
2025/05/31 16:31:57 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 0.0


2025/05/31 16:31:57 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 7 - Full Evaluation =====
2025/05/31 16:31:57 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 50.0) from minibatch trials...



[Trial] Q: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino? | Pred: rosario dawson | GT: rosario dawson | Score: 1
[Trial] Q: Tombstone stared an actor born May 17, 1955 known as who? | Pred: unknown | GT: bill paxton | Score: 0
[Trial] Q: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division? | Pred: operation citadel | GT: operation citadel | Score: 1
[Trial] Q: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ? | Pred: kerry condon | GT: kerry condon | Score: 1
[Trial] Q: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs? | Pred: buena vista distribution company | GT: bue

2025/05/31 16:32:10 ERROR dspy.utils.parallelizer: Error for Example({'question': "Do Stu Block and Johnny Bonnel's bands play the same type of music?", 'answer': 'no'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            "quotaDimensions": {
       

Average Metric: 5.00 / 12 (41.7%):  52%|█████▏    | 13/25 [00:12<00:15,  1.30s/it]

2025/05/31 16:32:10 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Who was the Tennis Masters Cup champion in 2000, Gustavo Kuerten or Stan Wawrinka?', 'answer': 'Gustavo Kuerten'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            

Average Metric: 5.00 / 12 (41.7%):  56%|█████▌    | 14/25 [00:13<00:13,  1.20s/it]

2025/05/31 16:32:10 ERROR dspy.utils.parallelizer: Error for Example({'question': 'On the coast of what ocean is the birthplace of Diogal Sakho?', 'answer': 'Atlantic'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            "quotaDimensions": {
       

Average Metric: 5.00 / 12 (41.7%):  56%|█████▌    | 14/25 [00:13<00:13,  1.20s/it]
Average Metric: 6.00 / 13 (46.2%):  60%|██████    | 15/25 [00:13<00:11,  1.20s/it][Trial] Q: Who was coach of the No. 9-ranked team that was upset in the NCAA Tournament by the 2014-15 UAB Blazers men's basketball team?   | Pred: cannot answer. | GT: fred hoiberg | Score: 0
Average Metric: 6.00 / 14 (42.9%):  64%|██████▍   | 16/25 [00:13<00:10,  1.20s/it]

2025/05/31 16:32:11 ERROR dspy.utils.parallelizer: Error for Example({'question': '"Everything Has Changed" is a song from an album released under which record label ?', 'answer': 'Big Machine Records'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
      

Average Metric: 6.00 / 14 (42.9%):  72%|███████▏  | 18/25 [00:13<00:05,  1.37it/s]

2025/05/31 16:32:11 ERROR dspy.utils.parallelizer: Error for Example({'question': 'What person does Wormholes in fiction and Nathan Rosen have in common?', 'answer': 'Einstein'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            "quotaDimensions": 

Average Metric: 6.00 / 14 (42.9%):  76%|███████▌  | 19/25 [00:14<00:04,  1.41it/s]

2025/05/31 16:32:11 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Are Baltasar Kormákur and John G. Avildsen both film producers?', 'answer': 'no'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            "quotaDimensions": {
           

Average Metric: 6.00 / 14 (42.9%):  76%|███████▌  | 19/25 [00:14<00:04,  1.41it/s]

2025/05/31 16:32:12 ERROR dspy.utils.parallelizer: Error for Example({'question': 'How old is the fossil record of the order that contains the only strictly marine herbivorous mammal?', 'answer': '50-million-year-old fossil record'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerPro

Average Metric: 6.00 / 14 (42.9%):  84%|████████▍ | 21/25 [00:14<00:02,  1.83it/s]

2025/05/31 16:32:12 ERROR dspy.utils.parallelizer: Error for Example({'question': 'What evening cable television station programming block has a show with Ashley Holliday as a cast member?', 'answer': 'Nick at Nite'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-Fre

Average Metric: 6.00 / 14 (42.9%):  84%|████████▍ | 21/25 [00:14<00:02,  1.83it/s]

2025/05/31 16:32:18 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which band had a longer hiatus, Juliette and the Licks or The Last Shadow Puppets?', 'answer': 'The Last Shadow Puppets'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
    

Average Metric: 6.00 / 14 (42.9%):  92%|█████████▏| 23/25 [00:21<00:02,  1.34s/it]

2025/05/31 16:32:20 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which movie was released first, Son of Flubber or Davy Crockett, King of the Wild Frontier?', 'answer': 'Davy Crockett, King of the Wild Frontier'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProje

Average Metric: 6.00 / 14 (42.9%):  96%|█████████▌| 24/25 [00:22<00:00,  1.06it/s]

2025/05/31 16:32:20 ERROR dspy.teleprompt.utils: An exception occurred during evaluation
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/dspy/teleprompt/utils.py", line 52, in eval_candidate_program
    return evaluate(candidate_program, devset=trainset, return_all_scores=return_all_scores, callback_metadata={"metric_key": "eval_full"})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/dspy/utils/callback.py", line 326, in sync_wrapper
    return fn(instance, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/dspy/evaluate/evaluate.py", line 171, in __call__
    results = executor.execute(process_item, devset)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/dspy/utils/parallelizer.py", line 48, in execute
    




In [10]:
# optimized results
compiled_rag

generate_answer.predict = Predict(StringSignature(context, question -> reasoning, answer
    instructions='Answer questions with short factoid answers.'
    context = Field(annotation=str required=True json_schema_extra={'desc': 'may contain relevant facts', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    reasoning = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${reasoning}', '__dspy_field_type': 'output'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'often between 1 and 5 words', '__dspy_field_type': 'output', 'prefix': 'Answer:'})
))

In [7]:
# example output with optimized results
dev_example = devset[0]
pred = compiled_rag(question=dev_example.question)
print("\n--- Test on dev example ---")
print(f"Question: {dev_example.question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Ground Truth: {dev_example.answer}")


--- Test on dev example ---
Question: Pehchaan: The Face of Truth stars Vinod Khanna, Rati Agnihotri and which Indian actress, producer, and former model who also produced the film?
Predicted Answer: Raveena Tandon
Ground Truth: Raveena Tandon


In [11]:
from pprint import pprint

print(len(trial_logs))
pprint(trial_logs[:2])

40
[{'input': {'context': '',
            'question': 'Which American actress who made their film debut in '
                        'the 1995 teen drama "Kids" was the co-founder of Voto '
                        'Latino?'},
  'optimizer_name': 'MIPROv2',
  'output': {'answer': 'rosario dawson'},
  'score': 1,
  'trace': {'_completions': None,
            '_lm_usage': None,
            '_store': {'answer': 'Rosario Dawson',
                       'context': ['Rosario Dawson | Rosario Isabel Dawson '
                                   '(born May 9, 1979) is an American actress, '
                                   'producer, singer, comic book writer, and '
                                   'political activist. She made her film '
                                   'debut in the 1995 teen drama "Kids". Her '
                                   'subsequent film roles include "He Got '
                                   'Game", "Men in Black II", "25th Hour", '
                          

In [12]:
print(f"Total prompts sent: {len(llm.history)}")
llm.inspect_history(n=1)

Total prompts sent: 51




[34m[2025-05-31T16:32:52.919519][0m

[31mSystem message:[0m

Your input fields are:
1. `context` (str): may contain relevant facts
2. `question` (str)
Your output fields are:
1. `reasoning` (str)
2. `answer` (str): often between 1 and 5 words
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## context ## ]]
{context}

[[ ## question ## ]]
{question}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Answer questions with short factoid answers.


[31mUser message:[0m

[[ ## context ## ]]
[1] «Pehchaan: The Face of Truth | Pehchaan: The Face of Truth is a Bollywood film released in 2005. The film directed by Shrabani Deodhar stars Vinod Khanna, Rati Agnihotri and Raveena Tandon who also produced the film.»
[2] «Mashaal | Mashaal is a 1984 Bollywood film. Produced and directed by Yash Chopra, it starred Dilip Ku

In [15]:
history_records = []

ct = 0
for record in llm.history:
  ct += 1

  pprint(record)
  print()
  if ct == 2:
    break

{'cost': 9.8e-06,
 'kwargs': {},
 'messages': None,
 'model': 'gemini/gemini-2.0-flash',
 'model_type': 'chat',
 'outputs': ['Silicon Valley is located in the southern part of the San '
             'Francisco Bay Area in **Northern California, United States**.\n'],
 'prompt': "Where's Silicon Valley?",
 'response': ModelResponse(id='chatcmpl-513e90af-3918-4f66-a274-49dbd63db162', created=1748708947, model='gemini-2.0-flash', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='stop', index=0, message=Message(content='Silicon Valley is located in the southern part of the San Francisco Bay Area in **Northern California, United States**.\n', role='assistant', tool_calls=None, function_call=None, provider_specific_fields=None))], usage=Usage(completion_tokens=23, prompt_tokens=6, total_tokens=29, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=6, image_tokens=None)), vertex_ai