# Poznański Horyzont Danych Meetup - 10.12.2024
# DSPy Introduction

Tutorial inspired by [dspy/examples/agents/multi_agent.ipynb](https://github.com/stanfordnlp/dspy/blob/main/examples/agents/multi_agent.ipynb)

If you've ever felt intimidated by DSPy, don't worry—it might look complex at first glance, but it's actually quite approachable. This tutorial will walk you through the process of building a  project, providing a clear, step-by-step approach to understanding and implementing DSPy concepts.

## Useful links and concepts before starting to rock DSPy... 🚀

- I strongly recommend reading the [DSPy Cheatsheet](https://dspy.ai/cheatsheet/) it will help you with quick start;

- Do you want know more about DSPY optimizers? Read [this guide](https://dspy.ai/learn/optimization/optimizers/?h=optimizers)!

- You find tons of tutorials in [dpsy/examples](https://github.com/stanfordnlp/dspy/blob/main/examples) folder;

- Remember the DSPy "scientific method":
  - **Define your task**: what is the input? What is the expected output?
  - **Define your pipeline**: do we need ChainOfThoughts? Do we need any external tool, like a retriever?
  - **Explore a few examples**: explore both input and output. It might be worth to annotate few examples for debugging purposes.
  - **Define your data**: define training and validation sets;
  - **Define your metric**: define an objective metric you will use to evaluate your programs. You don't have it? You can ask an LLM to evaluate for you!
  - **Collect preliminary "zero-shot" evaluations**: try out your program without optimization to get baseline results;
  - **Compile with a DSPy optimizer**: start with FewShot optimizers and follow with more complex ones, like MIPRO, COPROv2...
  - **Iterate**: iterate the steps above, until you are satisfief with the results! 

In [61]:
import os
import re

import dspy
import pandas as pd
from dotenv import load_dotenv
from dspy.evaluate import Evaluate
from dspy.datasets.hotpotqa import HotPotQA
from dspy.teleprompt import BootstrapFewShot, LabeledFewShot, MIPROv2

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

# load your environment variables from .env file
load_dotenv()

# azure-openai model deployment
AZURE_OPENAI_KEY = os.getenv("AZURE_OPENAI_KEY")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_DEPLOYMENT = os.getenv("AZURE_OPENAI_DEPLOYMENT")
AZURE_OPENAI_VERSION = os.getenv("AZURE_OPENAI_VERSION")

## Loading the Dataset

For this tutorial, we are going to use the [HotPotQA dataset](https://hotpotqa.github.io/). The task for the LLM is multi-hop question answering.

In [2]:
dataset = HotPotQA(train_seed=1, train_size=150, eval_seed=2023, dev_size=50, test_size=0)

In [3]:
dataset.train[0]

Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys=None)

### Specifying Inputs

It's crucial to inform DSPy which attributes serve as inputs for our model. We accomplish this using the `.with_inputs()` method.

In [4]:
train_set = [x.with_inputs('question') for x in dataset.train]
dev_set = [x.with_inputs('question') for x in dataset.dev]

In [5]:
train_set[0]

Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})

## DSPy setup

DSPy is designed to be compatible with a variety of language models and their respective clients ([click here](https://dspy.ai/learn/programming/language_models) for a complete guide to all supported LLMs). For this tutorial, we will primarily utilize GPT-4 through the Azure client


In [6]:
llm = dspy.AzureOpenAI(
    api_base=AZURE_OPENAI_ENDPOINT,
    api_version=AZURE_OPENAI_VERSION,
    deployment_id=AZURE_OPENAI_DEPLOYMENT,
    api_key=AZURE_OPENAI_KEY,
    # Specify the models parameter
    temperature=0.7
)

retriever = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

dspy.settings.configure(lm=llm, rm=retriever)

## Setting up Signatures and Modules


### Core DSPy Components

[dspy.Signature](https://dspy.ai/learn/programming/signatures/) and [dspy.Module](https://dspy.ai/learn/programming/modules/) are fundamental building blocks for DSPy programs:

- **Signature**: A declarative specification of the input/output behavior of a DSPy module.
- **Module**: A building block for programs that leverage Language Models (LMs).

### Types of DSPy Modules

DSPy offers various module types, each serving different purposes:

1. [dspy.Predict](https://dspy-docs.vercel.app/api/modules/Predict)
   - Basic predictor
   - Maintains the original signature
   - Handles key forms of learning (storing instructions, demonstrations, and LM updates)
   - Most similar to direct LM usage

2. [dspy.ChainOfThought](https://dspy-docs.vercel.app/api/modules/ChainOfThought)
   - Enhances the LM to think step-by-step before producing the final response
   - Modifies the signature to incorporate intermediate reasoning steps
     

3. [Additional Advanced Modules](https://dspy-docs.vercel.app/api/category/modules)
   - DSPy library offers a range of more specialized modules for complex tasks;
   - In this talk, we will explore the [ReAct](https://dspy.ai/deep-dive/modules/react/) module.

### Recommendation for starting

For those new to DSPy, it's advisable to start with `dspy.Predict`. Its simplicity makes it ideal for understanding the basics of DSPy operation. Once you've successfully implemented your program using `dspy.Predict`, you can explore more advanced modules like `dspy.ChainOfThought` to potentially enhance your model's performance.

For an overview of other prompting techniques beyond zero-shot learning, refer to the [Prompting Guide](https://www.promptingguide.ai/techniques). This resource covers various methods that can enhance your DSPy applications as you progress.

In [35]:
class GenerateAnswer(dspy.Signature):
    # [OPTIONAL]: Clarify something about the nature of the task expressed below as a docstring.
    """Answer questions with short factoid answers."""

    # Supply hints on the nature of an input field, expressed as a desc keyword argument for dspy.InputField.
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In [48]:
class BaselineRAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer, rationale=prediction.rationale)

In [49]:
baseline = BaselineRAG()

baseline(train_set[0].question)

Prediction(
    context=['At My Window (album) | At My Window is an album released by Folk/country singer-songwriter Townes Van Zandt in 1987. This was Van Zandt\'s first studio album in the nine years that followed 1978\'s "Flyin\' Shoes", and his only studio album recorded in the 1980s. Although the songwriter had become less prolific, this release showed that the quality of his material remained high.', 'Little Window | Little Window is the debut album of American singer-songwriter Baby Dee. The album was released in 2002 on the Durtro label. It was produced, composed, and performed entirely by Dee.', 'Windows and Walls | Windows and Walls is the eighth album by American singer-songwriter Dan Fogelberg, released in 1984 (see 1984 in music). The first single, "The Language of Love", reached 13 on the U.S. "Billboard" Hot 100 chart. Although the follow-up, "Believe in Me", missed the Top 40 of the pop chart, peaking at No. 48, it became the singer\'s fourth No. 1 song on the "Billboar

## Metrics & Evaluation

In [50]:
# We have a positive match when:
# (generated answer and true answer match exactly) AND (the retrieved context does actually contain that answer)
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

In [51]:
evaluate_hotpot = Evaluate(
    devset=dev_set,
    metric=validate_context_and_answer,
    num_threads=8,
    display_progress=True,
    display_table=10,
)

In [52]:
evaluate_hotpot(baseline)

Average Metric: 18.00 / 50 (36.0%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 169.43it/s]

2024/12/09 13:04:34 INFO dspy.evaluate.evaluate: Average Metric: 18 / 50 (36.0%)





Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,rationale,validate_context_and_answer
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{Cangzhou, Qionghai}",['Cangzhou | Cangzhou () is a prefecture-level city in eastern Heb...,"No, only Cangzhou.",determine the location of both cities. We know that Cangzhou is in...,
1,Who conducts the draft in which Marc-Andre Fleury was drafted to t...,National Hockey League,"{2017 NHL Expansion Draft, 2017–18 Pittsburgh Penguins season}",['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Peng...,NHL,identify the entity responsible for the draft in question. We know...,
2,"The Wings entered a new era, following the retirement of which Can...",Steve Yzerman,"{Steve Yzerman, 2006–07 Detroit Red Wings season}","['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, ...",Steve Yzerman,identify the player mentioned in the context. We see that Steve Yz...,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{Crichton Collegiate Church, Crichton Castle}","[""Crichton Collegiate Church | Crichton Collegiate Church is situa...",River Tyne,identify the river associated with Crichton Collegiate Church. The...,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by ...,King Alfred the Great,"{Æthelweard (son of Alfred), Ealhswith}","[""Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-cent...",Alfred the Great,identify the king who was the father of Æthelweard. Ealhswith is k...,
5,The Newark Airport Exchange is at the northern edge of an airport ...,Port Authority of New York and New Jersey,"{Newark Airport Interchange, Newark Liberty International Airport}",['Newark Airport Interchange | The Newark Airport Interchange is a...,Port Authority of New York and New Jersey,identify the operator of Newark Liberty International Airport. We ...,✔️ [True]
6,Where did an event take place resulting in a win during a domestic...,Bundesliga,"{2005–06 FC Bayern Munich season, Claudio Pizarro}",['List of Peru international footballers | Peru took part in the i...,Peru,produce the answer. We know that the context mentions important Pe...,
7,Are both Chico Municipal Airport and William R. Fairchild Internat...,no,"{Chico Municipal Airport, William R. Fairchild International Airport}",['William R. Fairchild International Airport | William R. Fairchil...,No,determine the locations of the airports. William R. Fairchild Inte...,✔️ [True]
8,In which Maine county is Fort Pownall located?,"Waldo County, Maine","{Fort Pownall, Stockton Springs, Maine}","[""Fort Pownall | Fort Pownall was a British fortification built du...",Waldo County,identify the location of Fort Pownall. The context specifies that ...,
9,"Which 90s rock band has more recently reformed, Gene or The Afghan...",The Afghan Whigs,"{The Afghan Whigs, Gene (band)}",['The Afghan Whigs | The Afghan Whigs are an American rock band fr...,The Afghan Whigs,compare the two bands' timelines. The Afghan Whigs were originally...,✔️ [True]


36.0

## And now... some DSPy magic ✨

### Using a ReAct module

The ReAct framework combines reasoning (thinking) and acting (taking actions) in LLMs.
It alternates between reasoning about the problem and performing actions, like retrieving information or interacting with an environment, to dynamically adapt as new information becomes available.

<center><img src="assets/react-schema.png" width="1000" align="center"><center>

In [53]:
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])

In [54]:
question = train_set[0].question
result = agent(question=question)
result.trajectory

{'thought_0': 'At My Window was released by the American singer-songwriter Joni Mitchell.\n\nNext Thought: I need to confirm the artist for the song "At My Window" to ensure accuracy.',
 'tool_name_0': 'Search',
 'tool_args_0': '{"query": "At My Window American singer-songwriter"}',
 'observation_0': Prediction(
     passages=['At My Window (album) | At My Window is an album released by Folk/country singer-songwriter Townes Van Zandt in 1987. This was Van Zandt\'s first studio album in the nine years that followed 1978\'s "Flyin\' Shoes", and his only studio album recorded in the 1980s. Although the songwriter had become less prolific, this release showed that the quality of his material remained high.']
 ),
 'thought_1': 'It seems that the song "At My Window" is actually attributed to Townes Van Zandt, not Joni Mitchell. I need to update my answer accordingly.',
 'tool_name_1': 'finish',
 'tool_args_1': '{}',
 'observation_1': 'Completed.'}

In [55]:
config = dict(num_threads=8, display_progress=True, display_table=5)
evaluate = Evaluate(devset=dev_set, metric=dspy.evaluate.answer_exact_match, **config)

evaluate(agent)

Average Metric: 22.00 / 50 (44.0%): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:01<00:00, 32.58it/s]

2024/12/09 13:08:33 INFO dspy.evaluate.evaluate: Average Metric: 22 / 50 (44.0%)





Unnamed: 0,question,example_answer,gold_titles,trajectory,rationale,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{Cangzhou, Qionghai}","{'thought_0': 'Cangzhou is located in Hebei province, while Qiongh...",determine whether both cities are in the Hebei province. We start ...,"No, they are not both in the Hebei province of China.",
1,Who conducts the draft in which Marc-Andre Fleury was drafted to t...,National Hockey League,"{2017 NHL Expansion Draft, 2017–18 Pittsburgh Penguins season}","{'thought_0': ""The question asks about the draft process for Marc-...",determine who conducted the draft in which Marc-Andre Fleury was d...,The NHL conducted the draft in which Marc-Andre Fleury was drafted...,
2,"The Wings entered a new era, following the retirement of which Can...",Steve Yzerman,"{Steve Yzerman, 2006–07 Detroit Red Wings season}","{'thought_0': ""The Wings' new era likely began after a significant...",produce the answer. We started by identifying the context of the q...,Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{Crichton Collegiate Church, Crichton Castle}",{'thought_0': 'Question: What river is near the Crichton Collegiat...,determine the river near the Crichton Collegiate Church. I started...,River Tyne,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by ...,King Alfred the Great,"{Æthelweard (son of Alfred), Ealhswith}","{'thought_0': ""Question: In the 10th Century A.D. Ealhswith had a ...",determine which English king Ealhswith was married to and subseque...,King Alfred the Great,✔️ [True]


44.0

### Using Optimizers - Automatic Few-Shot Learning

These optimizers extend the signature by automatically generating and including optimized examples within the prompt sent to the model, implementing few-shot learning.

#### LabeledFewShot

Simply constructs few-shot examples (demos) from provided labeled input and output data points.

In [66]:
optimizer = LabeledFewShot(k=3)
optimized_program = optimizer.compile(baseline, trainset=train_set)
optimized_program.save("compiled_programs/LabeledFewShot_program.json")

evaluate(optimized_program)

Average Metric: 28.00 / 50 (56.0%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 293.83it/s]

2024/12/09 13:27:02 INFO dspy.evaluate.evaluate: Average Metric: 28 / 50 (56.0%)





Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,rationale,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{Cangzhou, Qionghai}",['Cangzhou | Cangzhou () is a prefecture-level city in eastern Heb...,No,determine the locations of Cangzhou and Qionghai. Cangzhou is conf...,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to t...,National Hockey League,"{2017 NHL Expansion Draft, 2017–18 Pittsburgh Penguins season}",['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Peng...,National Hockey League,identify the entity responsible for conducting the draft. The cont...,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Can...",Steve Yzerman,"{Steve Yzerman, 2006–07 Detroit Red Wings season}","['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, ...",Steve Yzerman,identify the retired professional ice hockey player mentioned. The...,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{Crichton Collegiate Church, Crichton Castle}","[""Crichton Collegiate Church | Crichton Collegiate Church is situa...",River Tyne,identify the river associated with the Crichton Collegiate Church....,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by ...,King Alfred the Great,"{Æthelweard (son of Alfred), Ealhswith}","[""Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-cent...",Alfred the Great,identify the king who was the father of Æthelweard. From the conte...,


56.0

### Using Optimizers - Automatic Instruction Optimization

These optimizers produce optimal instructions for the prompt.

#### MIPROv2

Generates instructions and few-shot examples in each step. The instruction generation is data-aware and demonstration-aware. Uses Bayesian Optimization to effectively search over the space of generation instructions/demonstrations across your modules.

<center><img src="assets/mipro-schema.png" width="400"></center>

In [73]:
optimizer = MIPROv2(
    metric=dspy.evaluate.answer_exact_match,
    auto="light",                      # Can choose between light, medium, and heavy optimization runs
    verbose=True
)
optimized_program = optimizer.compile(
    baseline,
    trainset=train_set,
    max_bootstrapped_demos=0,           # setting this to 0 to optimize only the instruction
    max_labeled_demos=0,                # setting this to 0 to optimize only the instructions
    requires_permission_to_run=True
)

optimized_program.save("compiled_programs/MIPROv2_program.json")

evaluate(optimized_program)

2024/12/09 13:50:19 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: True
num_candidates: 7
valset size: 100



[93m[1mProjected Language Model (LM) Calls[0m

Based on the parameters you have set, the maximum number of LM calls is projected as follows:

[93m- Prompt Generation: [94m[1m10[0m[93m data summarizer calls + [94m[1m7[0m[93m * [94m[1m1[0m[93m lm calls in program + ([94m[1m2[0m[93m) lm calls in program-aware proposer = [94m[1m19[0m[93m prompt model calls[0m
[93m- Program Evaluation: [94m[1m25[0m[93m examples in minibatch * [94m[1m7[0m[93m batches + [94m[1m100[0m[93m examples in val set * [94m[1m1[0m[93m full evals = [94m[1m275[0m[93m LM Program calls[0m

[93m[1mEstimated Cost Calculation:[0m

[93mTotal Cost = (Number of calls to task model * (Avg Input Token Length per Call * Task Model Price per Input Token + Avg Output Token Length per Call * Task Model Price per Output Token) 
            + (Number of program calls * (Avg Input Token Length per Call * Task Prompt Price per Input Token + Avg Output Token Length per Call * Prompt Model 

2024/12/09 13:50:22 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2024/12/09 13:50:22 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used for informing instruction proposal.

2024/12/09 13:50:22 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=7 sets of demonstrations...


Bootstrapping set 1/7
Bootstrapping set 2/7



17%|███████████████████████████████▋                                                                                                                                                              | 5/30 [00:00<00:00, 304.17it/s]

Bootstrapped 1 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 3/7



17%|███████████████████████████████▋                                                                                                                                                              | 5/30 [00:00<00:00, 319.41it/s]

Bootstrapped 3 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 4/7



 3%|██████▎                                                                                                                                                                                       | 1/30 [00:00<00:00, 180.58it/s]

Bootstrapped 1 full traces after 1 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 5/7



17%|███████████████████████████████▋                                                                                                                                                              | 5/30 [00:00<00:00, 297.14it/s]

Bootstrapped 3 full traces after 5 examples for up to 1 rounds, amounting to 5 attempts.
Bootstrapping set 6/7



20%|██████████████████████████████████████                                                                                                                                                        | 6/30 [00:00<00:00, 277.81it/s]

Bootstrapped 2 full traces after 6 examples for up to 1 rounds, amounting to 6 attempts.
Bootstrapping set 7/7



2024/12/09 13:50:22 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2024/12/09 13:50:22 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2024/12/09 13:50:22 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing instructions...



Bootstrapped 3 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.
SOURCE CODE: StringSignature(context, question -> rationale, answer
    instructions='Answer questions with short factoid answers.'
    context = Field(annotation=str required=True json_schema_extra={'desc': 'may contain relevant facts', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    rationale = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${produce the answer}. We ...', '__dspy_field_type': 'output'})
    answer = Field(annotation=str required=True json_schema_extra={'desc': 'often between 1 and 5 words', '__dspy_field_type': 'output', 'prefix': 'Answer:'})
)

class BaselineRAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        

2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Answer questions with short factoid answers.

2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: 1: Propose a detailed instruction that guides the Language Model to effectively analyze the provided context and answer comparative questions about notable figures or events, ensuring clarity in reasoning and conciseness in the final answer. Include a reminder to check the birthdates or relevant details to establish accurate comparisons.

**PROPOSED INSTRUCTION:** 

"Given the provided context containing facts about notable individuals or events, analyze the information carefully to answer the specific comparative question. Ensure that your reasoning is clear and logical by outlining the relevant details step by step. Focus on key dates or significant achievements to establish comparisons accurately. Finally, provide a

PROGRAM DESCRIPTION: The program appears to be designed to solve fact-based question-answering tasks by leveraging a retrieval-augmented generation (RAG) approach. It utilizes a context of relevant information to provide concise answers to specific questions while also generating a rationale for the answers based on logical reasoning.

The program works as follows:

1. **Context Retrieval**: When a question is posed, the program first retrieves a set of relevant passages (or context) that may contain information necessary to answer the question. This is done using a retrieval module that fetches a specified number of passages (in this case, three) based on the question.

2. **Answer Generation**: After retrieving the context, the program employs a chain-of-thought reasoning mechanism to process the context
task_demos Context: ['The Victorians | The Victorians - Their Story In Pictures is a 2009 British documentary series which focuses on Victorian art and culture. The four-part series 

2024/12/09 13:50:23 INFO dspy.evaluate.evaluate: Average Metric: 39 / 100 (39.0%)





2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 39.0

2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: ==> STEP 3: FINDING OPTIMAL PROMPT PARAMETERS <==
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: We will evaluate the program over a series of trials with different combinations of instructions and few-shot examples to find the optimal combination using Bayesian Optimization.

2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 1 / 7 ==
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: Propose a detailed instruction that guides the Language Model to effectively analyze the provided context and answer comparative questions about notable figures or events, ensuring clarity in reasoning and conciseness in the final answer. Include a reminder to check the birthdates or relevant details to establish accurate comparisons.

**PROPOSED INSTRUCTION:** 

"Given the provided context containing facts about notable individuals or events, analyze the information carefully to answer the specific comparative question. Ensure that your reasoning is clear and logical by outlining the relevant details step by step. Focus on key dates or significant achievements to establish comparisons accurately. Finally, provide a concise factoid answer that directly addresses the question posed.
p: Answer:


Average Metric: 12.00 / 25 (48.0%): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

2024/12/09 13:50:23 INFO dspy.evaluate.evaluate: Average Metric: 12 / 25 (48.0%)
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 48.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1'].
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0]
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [39.0]
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.0







2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 2 / 7 ==
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: You are a knowledgeable trivia expert who specializes in providing concise and accurate answers to fact-based questions. Use the provided context to identify the relevant information and respond with a brief answer, ensuring clarity and precision.
p: Answer:


Average Metric: 16.00 / 25 (64.0%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 364.39it/s]

2024/12/09 13:50:23 INFO dspy.evaluate.evaluate: Average Metric: 16 / 25 (64.0%)
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 64.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 5'].
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 64.0]





2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [39.0]
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.0


2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 3 / 7 ==
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: Provide concise, fact-based answers to the following questions, using the given context to support your reasoning. Please ensure that your responses are clear and directly address the question asked.
p: Answer:



verage Metric: 9.00 / 25 (36.0%): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 363.34it/s]

2024/12/09 13:50:23 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 2'].
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 64.0, 36.0]
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [39.0]
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.0


2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 4 / 7 ==
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: You are a knowledgeable trivia expert who specializes in providing concise and accurate answers to fact-based questions. Use the provided context to identify the relevant information and respond with a brief answer, ensuring clarity and precision.
p: Answer:



verage Metric: 11.00 / 25 (44.0%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 352.90it/s]

2024/12/09 13:50:23 INFO dspy.evaluate.evaluate: Average Metric: 11 / 25 (44.0%)
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 44.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 5'].
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 64.0, 36.0, 44.0]
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [39.0]
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.0


2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 5 / 7 ==
2024/12/09 13:50:23 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: You are a knowledgeable historian and trivia expert. Answer the following questions with concise factoid answers based on the provided context, ensuring to highlight key details and connections relevant to the inquiry.
p: Answer:


Average Metric: 10.00 / 25 (40.0%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 217.71it/s]

2024/12/09 13:50:24 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 4'].
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 64.0, 36.0, 44.0, 40.0]
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [39.0]
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.0


2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 6 / 7 ==
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...




Predictor 0
i: Propose a detailed instruction that guides the Language Model to effectively analyze the provided context and answer comparative questions about notable figures or events, ensuring clarity in reasoning and conciseness in the final answer. Include a reminder to check the birthdates or relevant details to establish accurate comparisons.

**PROPOSED INSTRUCTION:** 

"Given the provided context containing facts about notable individuals or events, analyze the information carefully to answer the specific comparative question. Ensure that your reasoning is clear and logical by outlining the relevant details step by step. Focus on key dates or significant achievements to establish comparisons accurately. Finally, provide a concise factoid answer that directly addresses the question posed.
p: Answer:


Average Metric: 9.00 / 23 (39.1%):  88%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌  

2024/12/09 13:50:24 ERROR dspy.utils.parallelizer: Error processing item Example({'question': "For which college team did this American professional basketball player for the Miami Heat of the National Basketball Association (NBA) who earned praise for being the NBA's leading rebounder during the 2016–17 Miami Heat season play?", 'answer': 'the Marshall Thundering Herd'}) (input_keys={'question'}): 'NoneType' object has no attribute 'strip'. Set `provide_traceback=True` to see the stack trace.


Average Metric: 9.00 / 24 (37.5%): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 302.69it/s]

2024/12/09 13:50:24 INFO dspy.evaluate.evaluate: Average Metric: 9.0 / 25 (36.0%)





2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1'].
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 64.0, 36.0, 44.0, 40.0, 36.0]
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [39.0]
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.0


2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: == Minibatch Trial 7 / 7 ==
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Evaluating the following candidate program...



Predictor 0
i: Provide concise and accurate answers to fact-based questions by first analyzing the relevant context, then applying logical reasoning to derive the answer. Ensure that the response includes a brief rationale explaining the thought process behind the answer.
p: Answer:


Average Metric: 13.00 / 25 (52.0%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 383.68it/s]

2024/12/09 13:50:24 INFO dspy.evaluate.evaluate: Average Metric: 13 / 25 (52.0%)
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 52.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 6'].
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [48.0, 64.0, 36.0, 44.0, 40.0, 36.0, 52.0]
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [39.0]
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 39.0


2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Full Eval 1 =====
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 54.0) from minibatch trials...




verage Metric: 41.00 / 100 (41.0%): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 300.54it/s]

2024/12/09 13:50:24 INFO dspy.evaluate.evaluate: Average Metric: 41 / 100 (41.0%)
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 41.0
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [39.0, 41.0]
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 41.0
2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: 

2024/12/09 13:50:24 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 41.0!


Average Metric: 21.00 / 50 (42.0%): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 279.67it/s]

2024/12/09 13:50:24 INFO dspy.evaluate.evaluate: Average Metric: 21 / 50 (42.0%)





Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,rationale,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{Cangzhou, Qionghai}",['Cangzhou | Cangzhou () is a prefecture-level city in eastern Heb...,"No, only Cangzhou is.",determine the location of both cities. Cangzhou is explicitly ment...,
1,Who conducts the draft in which Marc-Andre Fleury was drafted to t...,National Hockey League,"{2017 NHL Expansion Draft, 2017–18 Pittsburgh Penguins season}",['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Peng...,National Hockey League,identify the organization responsible for the expansion draft. The...,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Can...",Steve Yzerman,"{Steve Yzerman, 2006–07 Detroit Red Wings season}","['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, ...",Steve Yzerman,identify the player being referred to. The context states that the...,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{Crichton Collegiate Church, Crichton Castle}","[""Crichton Collegiate Church | Crichton Collegiate Church is situa...",River Tyne,identify the relevant geographical features. The context mentions ...,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by ...,King Alfred the Great,"{Æthelweard (son of Alfred), Ealhswith}","[""Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-cent...",Alfred the Great,identify the correct English king associated with Ealhswith and he...,


42.0

In [72]:
llm.inspect_history(n=1)




You are a knowledgeable trivia expert who specializes in providing concise and accurate answers to fact-based questions. Use the provided context to identify the relevant information and respond with a brief answer, ensuring clarity and precision.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 5 words

---

Context:
[1] «Sasha Alexander | Suzana S. Drobnjaković Ponti (born May 17, 1973), known by her stage name Sasha Alexander, is a Serbian-American actress. She played Gretchen Witter on "Dawson's Creek" and has acted in films including "Yes Man" (2008) and "He's Just Not That Into You" (2009). Alexander played Caitlin Todd for the first two seasons of "NCIS". From July 2010 through September 2016, Alexander starred as Maura Isles in the TNT series "Rizzoli & Isles".»
[2] «The Hour of the Star | The Hour of the Star ("A hora da e

'\n\n\nYou are a knowledgeable trivia expert who specializes in providing concise and accurate answers to fact-based questions. Use the provided context to identify the relevant information and respond with a brief answer, ensuring clarity and precision.\n\n---\n\nFollow the following format.\n\nContext: may contain relevant facts\n\nQuestion: ${question}\n\nReasoning: Let\'s think step by step in order to ${produce the answer}. We ...\n\nAnswer: often between 1 and 5 words\n\n---\n\nContext:\n[1] «Sasha Alexander | Suzana S. Drobnjaković Ponti (born May 17, 1973), known by her stage name Sasha Alexander, is a Serbian-American actress. She played Gretchen Witter on "Dawson\'s Creek" and has acted in films including "Yes Man" (2008) and "He\'s Just Not That Into You" (2009). Alexander played Caitlin Todd for the first two seasons of "NCIS". From July 2010 through September 2016, Alexander starred as Maura Isles in the TNT series "Rizzoli & Isles".»\n[2] «The Hour of the Star | The Hour 