<a href="https://colab.research.google.com/github/hanhanwu/Hanhan_COLAB_Experiemnts/blob/master/GenAI_Practice/Langwatch/try_dspy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Try DsPy for RAG Prompt Optimization

* https://github.com/hanhanwu/Hanhan_COLAB_Experiemnts/blob/master/GenAI_Practice/Langwatch/dspy_prompt_optimization_online_dashboard.ipynb
* https://dspy.ai/tutorials/rag/

In [1]:
%%capture --no-stderr
!pip install --upgrade nbformat
%pip install -U --quiet dspy

## Prepare LLM

* `http://20.102.90.50:2017/wiki17_abstracts` provides the sources for retrieval here

In [3]:
import os
import pandas as pd
from getpass import getpass
import dspy
from google.colab import userdata


# OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
# llm = dspy.LM("openai/gpt-4.1-nano", api_key=OPENAI_API_KEY)

GOOGLE_AI_API_KEY = userdata.get('GOOGLE_AI_API_KEY')
llm = dspy.LM("gemini/gemini-2.0-flash", api_key=GOOGLE_AI_API_KEY)
print("LLM test response:", llm("Where's Silicon Valley?"))

# the retrieval model
colbertv2_wiki17_abstracts = dspy.ColBERTv2(
    url="http://20.102.90.50:2017/wiki17_abstracts"
)
dspy.settings.configure(lm=llm, rm=colbertv2_wiki17_abstracts)

LLM test response: ['Silicon Valley is located in the southern part of the San Francisco Bay Area in **Northern California, United States**.\n']


## Preparing Dataset

In [4]:
from dspy.datasets import HotPotQA


dataset = HotPotQA(train_seed=1, train_size=32, eval_seed=2025, dev_size=50, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

print()
print(len(trainset), len(devset))
print(trainset[0])
print(devset[0])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/9.19k [00:00<?, ?B/s]

hotpot_qa.py:   0%|          | 0.00/6.42k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/566M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/47.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/46.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/90447 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/7405 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7405 [00:00<?, ? examples/s]


32 50
Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})
Example({'question': 'Pehchaan: The Face of Truth stars Vinod Khanna, Rati Agnihotri and which Indian actress, producer, and former model who also produced the film?', 'answer': 'Raveena Tandon', 'gold_titles': {'Pehchaan: The Face of Truth', 'Raveena Tandon'}}) (input_keys={'question'})


## Defining DsPy RAG

In [5]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")


class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context,
                               answer=prediction.answer,
                               reasoning=prediction.reasoning)


dev_example = devset[12]
print(f"[Devset] Question: {dev_example.question}")
print(f"[Devset] Answer: {dev_example.answer}")
print(f"[Devset] Relevant Wikipedia Titles: {dev_example.gold_titles}")
print()

generate_answer = RAG()
pred = generate_answer(question=dev_example.question)
print(f"[Prediction] Question: {dev_example.question}")
print(f"[Prediction] Predicted Answer: {pred.answer}")
print(f"[Prediction] Reasoning: {pred.reasoning}")

[Devset] Question: Twelve Inches is a compilation album by which 1980s British band?
[Devset] Answer: Frankie Goes to Hollywood
[Devset] Relevant Wikipedia Titles: {'Twelve Inches', 'Frankie Goes to Hollywood'}

[Prediction] Question: Twelve Inches is a compilation album by which 1980s British band?
[Prediction] Predicted Answer: Bananarama
[Prediction] Reasoning: The question asks which 1980s British band released a compilation album called "Twelve Inches". I need to find a band that matches both criteria.
The context provides three albums with "Twelve Inch" in the title:
- The Twelve Inch Singles by Soft Cell
- The Twelve Inches of Bananarama by Bananarama
- The Twelve Inch Mixes by Spandau Ballet

All three bands are British and were active in the 1980s. However, the question asks for the album title "Twelve Inches", so the answer must be Bananarama.


## Optimizing Prompts & Logging

In [10]:
from dspy.teleprompt import MIPROv2


trial_logs = []

def validate_context_and_answer(example, prediction):
    gold = example.answer.strip().lower()
    pred = prediction.answer.strip().lower()
    score = int(gold == pred)

    # Format similar to Langwatch's internal logging
    log_entry = {
        "input": {
            "question": example.question,
            "context": getattr(example, "context", "")
        },
        "output": {
            "answer": pred
        },
        # Include prediction trace (as dict if possible)
        "trace": prediction.__dict__ if hasattr(prediction, "__dict__") else str(prediction),
        "score": score,
        "optimizer_name": optimizer.__class__.__name__,
    }
    trial_logs.append(log_entry)

    print(f"[Trial] Q: {example.question} | Pred: {pred} | GT: {gold} | Score: {score}")
    return score


# set up optimizer
optimizer = MIPROv2(metric=validate_context_and_answer, prompt_model=llm,
                    task_model=llm, num_candidates=2, init_temperature=0.7,
                    auto=None)


# compile
compiled_rag = optimizer.compile(
    RAG(),
    trainset=trainset,
    num_trials=10,
    max_bootstrapped_demos=3,
    max_labeled_demos=5,
    minibatch_size=10,
    requires_permission_to_run=False
)

2025/05/26 12:03:20 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/05/26 12:03:20 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/05/26 12:03:20 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=2 sets of demonstrations...
2025/05/26 12:03:20 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/05/26 12:03:20 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2025/05/26 12:03:20 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=2 instructions...

2025/05/26 12:03:20 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/05/26 12:03:20 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Answer questions with 

Bootstrapping set 1/2
Bootstrapping set 2/2
[Trial] Q: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino? | Pred: rosario dawson | GT: rosario dawson | Score: 1
[Trial] Q: Tombstone stared an actor born May 17, 1955 known as who? | Pred: unknown | GT: bill paxton | Score: 0
[Trial] Q: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division? | Pred: operation citadel | GT: operation citadel | Score: 1
[Trial] Q: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?  | Pred: space | GT: space | Score: 1
[Trial] Q: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs? | Pred: buena vista distribution company | GT: buena vista 

2025/05/26 12:03:26 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)
2025/05/26 12:03:26 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 44.0

2025/05/26 12:03:26 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 2 / 13 - Minibatch ==



[Trial] Q: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino? | Pred: rosario dawson | GT: rosario dawson | Score: 1
[Trial] Q: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko? | Pred: cannot determine | GT: aleksandr danilovich aleksandrov | Score: 0
[Trial] Q: What head of state position was held by Harry S Truman when he gave Harold E Wilson the Medal of Honor? | Pred: president of the united states | GT: president of the united states | Score: 1
[Trial] Q: Do Stu Block and Johnny Bonnel's bands play the same type of music? | Pred: no | GT: no | Score: 1
[Trial] Q: What person does Wormholes in fiction and Nathan Rosen have in common? | Pred: einstein-rosen bridge | GT: einstein | Score: 0
[Trial] Q: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102

2025/05/26 12:03:35 ERROR dspy.utils.parallelizer: Error for Example({'question': "Remember Me Ballin' is a CD single by Indo G that features an American rapper born in what year?", 'answer': '1979'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
         

Average Metric: 5.00 / 9 (55.6%): 100%|██████████| 10/10 [00:09<00:00,  1.10it/s]

2025/05/26 12:03:35 INFO dspy.evaluate.evaluate: Average Metric: 5.0 / 10 (50.0%)
2025/05/26 12:03:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 50.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0'].
2025/05/26 12:03:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0]
2025/05/26 12:03:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0]
2025/05/26 12:03:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:03:35 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 13 - Minibatch ==



[Trial] Q: Do Stu Block and Johnny Bonnel's bands play the same type of music? | Pred: no | GT: no | Score: 1
[Trial] Q: Who was the Tennis Masters Cup champion in 2000, Gustavo Kuerten or Stan Wawrinka? | Pred: stan wawrinka | GT: gustavo kuerten | Score: 0
[Trial] Q: Who composed "Sunflower Slow Drag" with the King of Ragtime? | Pred: scott hayden | GT: scott hayden | Score: 1
[Trial] Q: Which movie was released first, Son of Flubber or Davy Crockett, King of the Wild Frontier? | Pred: davy crockett, king of the wild frontier | GT: davy crockett, king of the wild frontier | Score: 1
[Trial] Q: Tombstone stared an actor born May 17, 1955 known as who? | Pred: cannot answer. | GT: bill paxton | Score: 0
  0%|          | 0/10 [00:00<?, ?it/s][Trial] Q: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ? | Pred: kerry condon | GT: kerry condon | Score: 1
Average Metric: 2.00 / 3 (66.7%):  20%

2025/05/26 12:03:44 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where? ', 'answer': 'space'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTi

Average Metric: 4.00 / 7 (57.1%):  80%|████████  | 8/10 [00:09<00:02,  1.17s/it]

2025/05/26 12:03:45 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which band had a longer hiatus, Juliette and the Licks or The Last Shadow Puppets?', 'answer': 'The Last Shadow Puppets'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
    

Average Metric: 4.00 / 7 (57.1%):  90%|█████████ | 9/10 [00:09<00:01,  1.03s/it]

2025/05/26 12:03:45 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?', 'answer': 'Aleem Sarwar Dar'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "Generat

Average Metric: 4.00 / 7 (57.1%): 100%|██████████| 10/10 [00:09<00:00,  1.02it/s]

2025/05/26 12:03:45 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 10 (40.0%)
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0'].
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0, 40.0]
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0]
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 13 - Minibatch ==



[Trial] Q: What head of state position was held by Harry S Truman when he gave Harold E Wilson the Medal of Honor? | Pred: president | GT: president of the united states | Score: 0
[Trial] Q: What evening cable television station programming block has a show with Ashley Holliday as a cast member? | Pred: nick at nite | GT: nick at nite | Score: 1
[Trial] Q: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division? | Pred: operation citadel | GT: operation citadel | Score: 1
[Trial] Q: This American guitarist best known for her work with the Iron Maidens is an ancestor of a composer who was known as what? | Pred: johann strauss ii | GT: the waltz king | Score: 0
[Trial] Q: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ? |

2025/05/26 12:03:45 INFO dspy.evaluate.evaluate: Average Metric: 5 / 10 (50.0%)
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 50.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1'].
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0, 40.0, 50.0]
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0]
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 13 - Minibatch ==



[Trial] Q: Which movie was released first, Son of Flubber or Davy Crockett, King of the Wild Frontier? | Pred: davy crockett, king of the wild frontier | GT: davy crockett, king of the wild frontier | Score: 1
[Trial] Q: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year? | Pred: 1992 | GT: 2010 | Score: 0
[Trial] Q: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?  | Pred: outfield of dreams | GT: "outfield of dreams" | Score: 0
[Trial] Q: Who was coach of the No. 9-ranked team that was upset in the NCAA Tournament by the 2014-15 UAB Blazers men's basketball team?   | Pred: coach's name not provided. | GT: fred hoiberg | Score: 0
[Trial] Q: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko? | Pred: cannot determine. | GT: aleksandr danilovich aleksandrov | Score: 0
[Trial] Q: Tombstone stared an ac

2025/05/26 12:03:45 INFO dspy.evaluate.evaluate: Average Metric: 4 / 10 (40.0%)
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0'].
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0, 40.0, 50.0, 40.0]
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0]
2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:03:45 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 13 - Minibatch ==



  0%|          | 0/10 [00:00<?, ?it/s][Trial] Q: Are Baltasar Kormákur and John G. Avildsen both film producers? | Pred: i don't know. | GT: no | Score: 0
Average Metric: 0.00 / 1 (0.0%):  10%|█         | 1/10 [00:02<00:22,  2.53s/it]

2025/05/26 12:03:56 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Tombstone stared an actor born May 17, 1955 known as who?', 'answer': 'Bill Paxton'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            "quotaDimensions": {
        

Average Metric: 0.00 / 1 (0.0%):  20%|██        | 2/10 [00:10<00:46,  5.85s/it]

2025/05/26 12:03:57 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ?', 'answer': 'Kerry Condon'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "Gener

Average Metric: 0.00 / 1 (0.0%):  30%|███       | 3/10 [00:12<00:26,  3.86s/it]

2025/05/26 12:03:59 ERROR dspy.utils.parallelizer: Error for Example({'question': 'This American guitarist best known for her work with the Iron Maidens is an ancestor of a composer who was known as what?', 'answer': 'The Waltz King'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerP

Average Metric: 0.00 / 1 (0.0%):  40%|████      | 4/10 [00:14<00:18,  3.09s/it]

2025/05/26 12:04:00 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?', 'answer': 'Aleem Sarwar Dar'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "Generat

Average Metric: 0.00 / 1 (0.0%):  50%|█████     | 5/10 [00:14<00:10,  2.07s/it]

2025/05/26 12:04:00 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino?', 'answer': 'Rosario Dawson'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPe

Average Metric: 0.00 / 1 (0.0%):  50%|█████     | 5/10 [00:14<00:10,  2.07s/it]

2025/05/26 12:04:00 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Who was the Tennis Masters Cup champion in 2000, Gustavo Kuerten or Stan Wawrinka?', 'answer': 'Gustavo Kuerten'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
            

Average Metric: 0.00 / 1 (0.0%):  60%|██████    | 6/10 [00:14<00:05,  1.42s/it]

2025/05/26 12:04:00 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which band had a longer hiatus, Juliette and the Licks or The Last Shadow Puppets?', 'answer': 'The Last Shadow Puppets'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
    

Average Metric: 0.00 / 1 (0.0%):  70%|███████   | 7/10 [00:14<00:04,  1.42s/it]

2025/05/26 12:04:00 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?', 'answer': 'Buena Vista Distribution'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "Genera

Average Metric: 0.00 / 1 (0.0%):  90%|█████████ | 9/10 [00:14<00:00,  1.64it/s]

2025/05/26 12:04:05 ERROR dspy.utils.parallelizer: Error for Example({'question': 'How old is the fossil record of the order that contains the only strictly marine herbivorous mammal?', 'answer': '50-million-year-old fossil record'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerPro

Average Metric: 0.00 / 1 (0.0%): 100%|██████████| 10/10 [00:20<00:00,  2.00s/it]

2025/05/26 12:04:05 INFO dspy.evaluate.evaluate: Average Metric: 0.0 / 10 (0.0%)
2025/05/26 12:04:05 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 0.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1'].
2025/05/26 12:04:05 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0, 40.0, 50.0, 40.0, 0.0]
2025/05/26 12:04:05 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0]
2025/05/26 12:04:05 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:04:05 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 13 - Full Evaluation =====
2025/05/26 12:04:05 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 50.0) from minibatch trials...



[Trial] Q: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division? | Pred: operation citadel | GT: operation citadel | Score: 1
[Trial] Q: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ? | Pred: kerry condon | GT: kerry condon | Score: 1
[Trial] Q: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?  | Pred: outfield of dreams | GT: "outfield of dreams" | Score: 0
[Trial] Q: Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20? | Pred: aleem dar | GT: aleem sarwar dar | Score: 0
[Trial] Q: "Everything Has Changed" is a song from an album released u



[Trial] Q: Do Stu Block and Johnny Bonnel's bands play the same type of music? | Pred: no | GT: no | Score: 1
Average Metric: 6.00 / 13 (46.2%):  48%|████▊     | 12/25 [00:06<00:05,  2.26it/s][Trial] Q: What person does Wormholes in fiction and Nathan Rosen have in common? | Pred: einstein-rosen bridge | GT: einstein | Score: 0
Average Metric: 6.00 / 14 (42.9%):  52%|█████▏    | 13/25 [00:06<00:05,  2.24it/s][Trial] Q: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino? | Pred: rosario dawson | GT: rosario dawson | Score: 1
Average Metric: 7.00 / 15 (46.7%):  60%|██████    | 15/25 [00:06<00:03,  2.91it/s][Trial] Q: Who composed "Sunflower Slow Drag" with the King of Ragtime? | Pred: scott hayden | GT: scott hayden | Score: 1
Average Metric: 8.00 / 16 (50.0%):  60%|██████    | 15/25 [00:06<00:03,  2.91it/s][Trial] Q: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko? | Pred: cannot determine | GT: aleksandr da

2025/05/26 12:04:13 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)
2025/05/26 12:04:13 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0, 40.0]
2025/05/26 12:04:13 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0
2025/05/26 12:04:13 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/05/26 12:04:13 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 8 / 13 - Minibatch ==



[Trial] Q: Which movie was released first, Son of Flubber or Davy Crockett, King of the Wild Frontier? | Pred: davy crockett, king of the wild frontier | GT: davy crockett, king of the wild frontier | Score: 1
[Trial] Q: What evening cable television station programming block has a show with Ashley Holliday as a cast member? | Pred: nick at nite | GT: nick at nite | Score: 1
[Trial] Q: Who composed "Sunflower Slow Drag" with the King of Ragtime? | Pred: scott hayden | GT: scott hayden | Score: 1
[Trial] Q: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko? | Pred: cannot determine | GT: aleksandr danilovich aleksandrov | Score: 0
[Trial] Q: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino? | Pred: rosario dawson | GT: rosario dawson | Score: 1
[Trial] Q: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) b

2025/05/26 12:04:14 INFO dspy.evaluate.evaluate: Average Metric: 5 / 10 (50.0%)
2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 50.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0'].
2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0, 40.0, 50.0, 40.0, 0.0, 50.0]
2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0, 40.0]
2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 9 / 13 - Minibatch ==



[Trial] Q: What evening cable television station programming block has a show with Ashley Holliday as a cast member? | Pred: nick at nite | GT: nick at nite | Score: 1
[Trial] Q: Are Baltasar Kormákur and John G. Avildsen both film producers? | Pred: cannot determine | GT: no | Score: 0
[Trial] Q: How old is the fossil record of the order that contains the only strictly marine herbivorous mammal? | Pred: 50 million years old | GT: 50-million-year-old fossil record | Score: 0
[Trial] Q: Who was the Tennis Masters Cup champion in 2000, Gustavo Kuerten or Stan Wawrinka? | Pred: gustavo kuerten | GT: gustavo kuerten | Score: 1
[Trial] Q: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division? | Pred: operation citadel | GT: operation citadel | Score: 1
[Trial] Q: This American guitarist best known for her work with the I

2025/05/26 12:04:14 INFO dspy.evaluate.evaluate: Average Metric: 3 / 10 (30.0%)
2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 30.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0'].
2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0, 40.0, 50.0, 40.0, 0.0, 50.0, 30.0]
2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0, 40.0]
2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 10 / 13 - Minibatch ==



[Trial] Q: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?  | Pred: space | GT: space | Score: 1
[Trial] Q: Tombstone stared an actor born May 17, 1955 known as who? | Pred: no information provided. | GT: bill paxton | Score: 0
[Trial] Q: What person does Wormholes in fiction and Nathan Rosen have in common? | Pred: einstein-rosen bridge | GT: einstein | Score: 0
[Trial] Q: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year? | Pred: unknown | GT: 2010 | Score: 0
[Trial] Q: Who composed "Sunflower Slow Drag" with the King of Ragtime? | Pred: scott hayden | GT: scott hayden | Score: 1
[Trial] Q: Remember Me Ballin' is a CD single by Indo G that features an American rapper born in what year? | Pred: cannot answer. | GT: 1979 | Score: 0
[Trial] Q: How old is the fossil record of the order that contains the only strictly marine herbivorous mammal?

2025/05/26 12:04:14 INFO dspy.evaluate.evaluate: Average Metric: 4 / 10 (40.0%)
2025/05/26 12:04:15 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1'].
2025/05/26 12:04:15 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0, 40.0, 50.0, 40.0, 0.0, 50.0, 30.0, 40.0]
2025/05/26 12:04:15 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0, 40.0]
2025/05/26 12:04:15 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:04:15 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 11 / 13 - Minibatch ==



[Trial] Q: Who composed "Sunflower Slow Drag" with the King of Ragtime? | Pred: scott hayden | GT: scott hayden | Score: 1
[Trial] Q: This American guitarist best known for her work with the Iron Maidens is an ancestor of a composer who was known as what? | Pred: austrian composer | GT: the waltz king | Score: 0
[Trial] Q: What person does Wormholes in fiction and Nathan Rosen have in common? | Pred: einstein-rosen bridge | GT: einstein | Score: 0
[Trial] Q: Tombstone stared an actor born May 17, 1955 known as who? | Pred: cannot answer. | GT: bill paxton | Score: 0
[Trial] Q: What head of state position was held by Harry S Truman when he gave Harold E Wilson the Medal of Honor? | Pred: president of the united states | GT: president of the united states | Score: 1
Average Metric: 2.00 / 5 (40.0%):  40%|████      | 4/10 [00:00<00:00, 241.54it/s]
Average Metric: 3.00 / 6 (50.0%):  50%|█████     | 5/10 [00:00<00:00, 137.01it/s][Trial] Q: On the coast of what ocean is the birthplace of Di

2025/05/26 12:04:24 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which band had a longer hiatus, Juliette and the Licks or The Last Shadow Puppets?', 'answer': 'The Last Shadow Puppets'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
    

Average Metric: 4.00 / 8 (50.0%):  90%|█████████ | 9/10 [00:09<00:01,  1.74s/it]

2025/05/26 12:04:25 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?', 'answer': 'Aleem Sarwar Dar'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "Generat

Average Metric: 4.00 / 8 (50.0%): 100%|██████████| 10/10 [00:10<00:00,  1.00s/it]

2025/05/26 12:04:25 INFO dspy.evaluate.evaluate: Average Metric: 4.0 / 10 (40.0%)
2025/05/26 12:04:25 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 0'].
2025/05/26 12:04:25 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0, 40.0, 50.0, 40.0, 0.0, 50.0, 30.0, 40.0, 40.0]
2025/05/26 12:04:25 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0, 40.0]
2025/05/26 12:04:25 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:04:25 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 12 / 13 - Minibatch ==



[Trial] Q: On the coast of what ocean is the birthplace of Diogal Sakho? | Pred: atlantic ocean | GT: atlantic | Score: 0
[Trial] Q: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ? | Pred: kerry condon | GT: kerry condon | Score: 1
[Trial] Q: What evening cable television station programming block has a show with Ashley Holliday as a cast member? | Pred: nick at nite | GT: nick at nite | Score: 1
[Trial] Q: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?  | Pred: outfield of dreams | GT: "outfield of dreams" | Score: 0
[Trial] Q: This American guitarist best known for her work with the Iron Maidens is an ancestor of a composer who was known as what? | Pred: johann strauss ii | GT: the waltz king | Score: 0
[Trial] Q: Which movie was released first, Son of Flubber or Davy Crockett, King of the Wild 



[Trial] Q: Do Stu Block and Johnny Bonnel's bands play the same type of music? | Pred: no | GT: no | Score: 1
Average Metric: 6.00 / 10 (60.0%): 100%|██████████| 10/10 [00:04<00:00,  2.28it/s]

2025/05/26 12:04:29 INFO dspy.evaluate.evaluate: Average Metric: 6 / 10 (60.0%)
2025/05/26 12:04:29 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 60.0 on minibatch of size 10 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1'].
2025/05/26 12:04:29 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [50.0, 40.0, 50.0, 40.0, 0.0, 50.0, 30.0, 40.0, 40.0, 60.0]
2025/05/26 12:04:29 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0, 40.0]
2025/05/26 12:04:29 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0


2025/05/26 12:04:29 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 13 / 13 - Full Evaluation =====
2025/05/26 12:04:29 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 45.0) from minibatch trials...



[Trial] Q: Tombstone stared an actor born May 17, 1955 known as who? | Pred: cannot answer. | GT: bill paxton | Score: 0
[Trial] Q: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division? | Pred: operation citadel | GT: operation citadel | Score: 1
[Trial] Q: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ? | Pred: kerry condon | GT: kerry condon | Score: 1
[Trial] Q: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs? | Pred: buena vista distribution company | GT: buena vista distribution | Score: 0
[Trial] Q: Samantha Cristoforetti and Mark Shuttleworth are both best known for being first in their field to go where?  | Pred: space | GT:

2025/05/26 12:04:40 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which band had a longer hiatus, Juliette and the Licks or The Last Shadow Puppets?', 'answer': 'The Last Shadow Puppets'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
    

Average Metric: 9.00 / 20 (45.0%):  84%|████████▍ | 21/25 [00:10<00:02,  1.48it/s]

2025/05/26 12:04:40 ERROR dspy.utils.parallelizer: Error for Example({'question': '"Everything Has Changed" is a song from an album released under which record label ?', 'answer': 'Big Machine Records'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerProjectPerModel-FreeTier",
      

Average Metric: 9.00 / 20 (45.0%):  88%|████████▊ | 22/25 [00:11<00:01,  1.53it/s]

2025/05/26 12:04:41 ERROR dspy.utils.parallelizer: Error for Example({'question': 'Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?', 'answer': 'Aleem Sarwar Dar'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "Generat

Average Metric: 9.00 / 20 (45.0%):  92%|█████████▏| 23/25 [00:11<00:01,  1.65it/s]

2025/05/26 12:04:41 ERROR dspy.utils.parallelizer: Error for Example({'question': 'How old is the fossil record of the order that contains the only strictly marine herbivorous mammal?', 'answer': '50-million-year-old fossil record'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinutePerPro

Average Metric: 9.00 / 20 (45.0%):  96%|█████████▌| 24/25 [00:11<00:00,  1.82it/s]

2025/05/26 12:04:41 ERROR dspy.utils.parallelizer: Error for Example({'question': 'The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?', 'answer': '2010'}) (input_keys={'question'}): Both structured output format and JSON mode failed. Please choose a model that supports `response_format` argument. Original error: litellm.RateLimitError: litellm.RateLimitError: VertexAIException - {
  "error": {
    "code": 429,
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.",
    "status": "RESOURCE_EXHAUSTED",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.QuotaFailure",
        "violations": [
          {
            "quotaMetric": "generativelanguage.googleapis.com/generate_content_free_tier_requests",
            "quotaId": "GenerateRequestsPerMinute

Average Metric: 9.00 / 20 (45.0%): 100%|██████████| 25/25 [00:11<00:00,  2.15it/s]

2025/05/26 12:04:41 INFO dspy.evaluate.evaluate: Average Metric: 9.0 / 25 (36.0%)
2025/05/26 12:04:41 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [44.0, 40.0, 36.0]
2025/05/26 12:04:41 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 44.0
2025/05/26 12:04:41 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/05/26 12:04:41 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 44.0!





In [11]:
from pprint import pprint
pprint(trial_logs[:3])

[{'input': {'context': '',
            'question': 'Which American actress who made their film debut in '
                        'the 1995 teen drama "Kids" was the co-founder of Voto '
                        'Latino?'},
  'optimizer_name': 'MIPROv2',
  'output': {'answer': 'rosario dawson'},
  'score': 1,
  'trace': {'_completions': None,
            '_lm_usage': None,
            '_store': {'answer': 'Rosario Dawson',
                       'context': ['Rosario Dawson | Rosario Isabel Dawson '
                                   '(born May 9, 1979) is an American actress, '
                                   'producer, singer, comic book writer, and '
                                   'political activist. She made her film '
                                   'debut in the 1995 teen drama "Kids". Her '
                                   'subsequent film roles include "He Got '
                                   'Game", "Men in Black II", "25th Hour", '
                             

In [12]:
import json

with open("langwatch_logs.json", "w") as f:
    json.dump(trial_logs, f, indent=2)