<center>
<img src="https://supportvectors.ai/logo-poster-transparent.png" width="400px" style="opacity:0.7">
</center>

In [1]:
%run supportvectors-common.ipynb


<div style="color:#aaa;font-size:8pt">
<hr/>
&copy; SupportVectors. All rights reserved. <blockquote>This notebook is the intellectual property of SupportVectors, and part of its training material. 
Only the participants in SupportVectors workshops are allowed to study the notebooks for educational purposes currently, but is prohibited from copying or using it for any other purposes without written permission.

<b> These notebooks are chapters and sections from Asif Qamar's textbook that he is writing on Data Science. So we request you to not circulate the material to others.</b>
 </blockquote>
 <hr/>
</div>



In [2]:
import os
import requests
SERPER_API_KEY = os.getenv("SERPER_API_KEY")

In [3]:
def google_search(query: str, num: int = 3):
    """Search Google using Serper API and return the top results."""
    url = "https://google.serper.dev/search"
    headers = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
    params = {"q": query, "num" : num}
    
    response = requests.post(url, json=params, headers=headers)
    
    if response.status_code == 200:
        results = response.json()
        return results.get("organic", [])  # Extract search results
    else:
        return f"Error: {response.status_code}, {response.text}"

In [4]:
import dspy
from dspy.datasets import HotPotQA

dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

def search_wikipedia(query: str) -> list[str]:
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]

trainset = [x.with_inputs('question') for x in HotPotQA(train_seed=2024, train_size=500).train]
react = dspy.ReAct("question -> answer", tools=[google_search])


In [5]:
from rich import print as rprint

In [6]:
pred = react(question="What is 120 years from the year the special theory of relativity was published?")
rprint(pred.answer)

In [7]:
dspy.inspect_history()





[34m[2025-09-12T12:25:16.679097][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


[31mUser message:[0m

[[ ## question ## ]]
What is 120 years from the year the special theory of relativity was published?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
The special theory of relativity was published in 1905. To find out what year it will be 120 years from then, I can simply add 120 to 1905.

[[ ## tool_name_0 ## ]]
finish

[[ ## tool_args_0 ## ]]
{"kwargs": 2025}

[[ ## observation_0 ## ]]
Completed.

Res

In [6]:

tp = dspy.MIPROv2(metric=dspy.evaluate.answer_exact_match, auto="light", num_threads=24)
optimized_react = tp.compile(react, trainset=trainset)

2025/09/11 21:57:29 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: True
num_candidates: 3
valset size: 100

2025/09/11 21:58:25 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/09/11 21:58:25 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/09/11 21:58:25 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=3 sets of demonstrations...


Bootstrapping set 1/3
Bootstrapping set 2/3
Bootstrapping set 3/3


 31%|███       | 31/100 [10:44<23:54, 20.79s/it]
2025/09/11 22:09:09 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/09/11 22:09:09 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


Bootstrapped 4 full traces after 31 examples for up to 1 rounds, amounting to 31 attempts.


2025/09/11 22:09:45 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing instructions...

2025/09/11 22:10:55 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/09/11 22:10:55 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Given the fields `question`, produce the fields `answer`.

You will be given `question` and your goal is to finish with `answer`.

To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.

Thought can reason about the current situation, and Tool Name can be the following types:

(1) search_wikipedia. It takes arguments {'query': {'type': 'string'}} in JSON format.
(2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.

2025/09/11 22:10:55 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are a knowledgeable assistant tasked with answering questions accu

Average Metric: 9.00 / 100 (9.0%): 100%|██████████| 100/100 [02:18<00:00,  1.39s/it]

2025/09/11 22:13:14 INFO dspy.evaluate.evaluate: Average Metric: 9 / 100 (9.0%)
2025/09/11 22:13:14 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 9.0

2025/09/11 22:13:14 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 2 / 8 - Minibatch ==



Average Metric: 9.00 / 25 (36.0%): 100%|██████████| 25/25 [00:24<00:00,  1.03it/s]

2025/09/11 22:13:38 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)
2025/09/11 22:13:38 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 2'].
2025/09/11 22:13:38 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [36.0]
2025/09/11 22:13:38 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [9.0]
2025/09/11 22:13:38 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 9.0


2025/09/11 22:13:38 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 8 - Minibatch ==



Average Metric: 7.00 / 25 (28.0%): 100%|██████████| 25/25 [00:36<00:00,  1.46s/it]

2025/09/11 22:14:15 INFO dspy.evaluate.evaluate: Average Metric: 7 / 25 (28.0%)
2025/09/11 22:14:15 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 28.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 1'].
2025/09/11 22:14:15 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [36.0, 28.0]
2025/09/11 22:14:15 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [9.0]
2025/09/11 22:14:15 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 9.0


2025/09/11 22:14:15 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 8 - Minibatch ==



Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [00:58<00:00,  2.34s/it]

2025/09/11 22:15:13 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)
2025/09/11 22:15:13 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/09/11 22:15:13 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [36.0, 28.0, 40.0]
2025/09/11 22:15:13 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [9.0]
2025/09/11 22:15:13 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 9.0


2025/09/11 22:15:13 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 8 - Minibatch ==



Average Metric: 9.00 / 25 (36.0%): 100%|██████████| 25/25 [00:38<00:00,  1.53s/it]

2025/09/11 22:15:51 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)
2025/09/11 22:15:51 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/09/11 22:15:51 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [36.0, 28.0, 40.0, 36.0]
2025/09/11 22:15:51 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [9.0]
2025/09/11 22:15:51 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 9.0


2025/09/11 22:15:51 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 8 - Minibatch ==



Average Metric: 9.00 / 25 (36.0%): 100%|██████████| 25/25 [00:51<00:00,  2.07s/it]

2025/09/11 22:16:43 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)
2025/09/11 22:16:43 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/09/11 22:16:43 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [36.0, 28.0, 40.0, 36.0, 36.0]
2025/09/11 22:16:43 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [9.0]
2025/09/11 22:16:43 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 9.0


2025/09/11 22:16:43 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 7 / 8 - Minibatch ==



Average Metric: 7.00 / 25 (28.0%): 100%|██████████| 25/25 [00:37<00:00,  1.51s/it]

2025/09/11 22:17:21 INFO dspy.evaluate.evaluate: Average Metric: 7 / 25 (28.0%)
2025/09/11 22:17:21 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 28.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 1'].
2025/09/11 22:17:21 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [36.0, 28.0, 40.0, 36.0, 36.0, 28.0]
2025/09/11 22:17:21 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [9.0]
2025/09/11 22:17:21 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 9.0


2025/09/11 22:17:21 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 8 - Full Evaluation =====
2025/09/11 22:17:21 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 40.0) from minibatch trials...



Average Metric: 35.00 / 100 (35.0%): 100%|██████████| 100/100 [01:18<00:00,  1.27it/s]

2025/09/11 22:18:40 INFO dspy.evaluate.evaluate: Average Metric: 35 / 100 (35.0%)
2025/09/11 22:18:40 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 35.0
2025/09/11 22:18:40 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [9.0, 35.0]
2025/09/11 22:18:40 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 35.0
2025/09/11 22:18:40 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/09/11 22:18:40 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 35.0!





In [7]:
optimized_react.save("./dspy_program/react_prog.pkl", save_program=False)

In [6]:
loaded_dspy_program = dspy.ReAct("question -> answer", tools=[search_wikipedia])
loaded_dspy_program.load("./dspy_program/react_prog.pkl")

In [7]:
evaluate = dspy.Evaluate(devset=trainset, metric=dspy.evaluate.answer_exact_match, num_threads=24,
                         display_progress=True, display_table=2)

In [10]:
rprint(evaluate(loaded_dspy_program))

Average Metric: 143.00 / 500 (28.6%): 100%|██████████| 500/500 [05:35<00:00,  1.49it/s]

2025/09/11 22:25:15 INFO dspy.evaluate.evaluate: Average Metric: 143 / 500 (28.6%)





Unnamed: 0,question,example_answer,trajectory,reasoning,pred_answer,answer_exact_match
0,Are Smyrnium and Nymania both types of plant?,yes,{'thought_0': 'I need to verify whether Smyrnium and Nymania are b...,Both Smyrnium and Nymania are indeed types of plants. Smyrnium is ...,"Yes, both Smyrnium and Nymania are types of plants.",
1,That Darn Cat! and Never a Dull Moment were both produced by what ...,Walt Disney Productions,"{'thought_0': 'I need to recall the production studio for both ""Th...","Both ""That Darn Cat!"" and ""Never a Dull Moment"" are films that I r...",Walt Disney Productions,✔️ [True]


In [11]:
rprint(evaluate(react))

Average Metric: 27.00 / 500 (5.4%): 100%|██████████| 500/500 [08:57<00:00,  1.07s/it]

2025/09/11 22:34:46 INFO dspy.evaluate.evaluate: Average Metric: 27 / 500 (5.4%)





Unnamed: 0,question,example_answer,trajectory,reasoning,pred_answer,answer_exact_match
0,Are Smyrnium and Nymania both types of plant?,yes,{'thought_0': 'I need to verify if Smyrnium and Nymania are indeed...,I was able to confirm that Smyrnium is indeed a genus of flowering...,"Yes, Smyrnium is a type of plant, but I could not confirm if Nyman...",
1,That Darn Cat! and Never a Dull Moment were both produced by what ...,Walt Disney Productions,"{'thought_0': 'I need to find out which studio produced both ""That...","Both ""That Darn Cat!"" and ""Never a Dull Moment"" were produced by t...",Both 'That Darn Cat!' and 'Never a Dull Moment' were produced by W...,


In [8]:
pred = react(question="Are Smyrnium and Nymania both types of plant?")
rprint(pred.answer)

In [9]:
dspy.inspect_history(n=1)





[34m[2025-09-12T09:29:31.504067][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


[31mUser message:[0m

[[ ## question ## ]]
Are Smyrnium and Nymania both types of plant?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
I need to verify if Smyrnium and Nymania are indeed types of plants. This requires checking reliable sources for botanical classifications.

[[ ## tool_name_0 ## ]]
search_wikipedia

[[ ## tool_args_0 ## ]]
{"query": "Smyrnium plant Nymania plant"}

[[ ## observation_0 ## ]]
Failed to exec

In [10]:
pred = loaded_dspy_program(question="Are Smyrnium and Nymania both types of plant?")
rprint(pred.answer)

In [11]:
dspy.inspect_history(n=1)





[34m[2025-09-12T09:31:21.973354][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Imagine you are an expert researcher tasked with answering critical questions for a high-stakes trivia competition. You must accurately answer the provided question using your reasoning skills and available resources. Begin by analyzing the question in detail, then utilize the search tool to gather relevant information. If the search fails, rely on your knowledge and reasoning to provide the best possible answer. Your goal is to produce a clear and precise answer based on the trajec