<center>
<img src="https://supportvectors.ai/logo-poster-transparent.png" width="400px" style="opacity:0.7">
</center>

In [None]:
%run supportvectors-common.ipynb

## MIPROv2 
MIPROv2 (Multiprompt Instruction PRoposal Optimizer Version 2) is an prompt optimizer capable of optimizing both instructions and few-shot examples jointly. It does this by bootstrapping few-shot example candidates, proposing instructions grounded in different dynamics of the task, and finding an optimized combination of these options using Bayesian Optimization.

### 1) Imports & API key
- Brings in `os` for environment variables and `requests` for HTTP.
- Reads your Serper API key from the `SERPER_API_KEY` environment variable so you don’t hard-code secrets.

In [None]:
import os
import requests
from rich import print as rprint
SERPER_API_KEY = os.getenv("SERPER_API_KEY")

### 2) Web search tool (for ReAct)
- Defines a callable **tool** the LLM can invoke inside a ReAct program. 
- Sends a POST request to Serper’s `/search` endpoint with your query.
- On success, returns the list in `organic` (titles/links/snippets). On failure, returns an error string.
- This tool is later **registered** with your ReAct module so the model can “decide” to call it when reasoning.

In [None]:
def google_search(query: str, num: int = 3):
    """Search Google using Serper API and return the top results."""
    url = "https://google.serper.dev/search"
    headers = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
    params = {"q": query, "num" : num}
    
    response = requests.post(url, json=params, headers=headers)
    
    if response.status_code == 200:
        results = response.json()
        return results.get("organic", [])  # Extract search results
    else:
        return f"Error: {response.status_code}, {response.text}"

### 3) Configure the LLM backend (Ollama)
- Imports DSPy and a built-in dataset.
- Creates a DSPy `LM` that points at your **local Ollama server** running a Qwen model.
- `dspy.configure(lm=lm)` sets this as the global default model for subsequent DSPy modules and evaluations.    

In [None]:
import dspy
from dspy.datasets import HotPotQA


# Using the Ollama provider
lm = dspy.LM(
    "ollama_chat/qwen3:8b",          # provider/model
    api_base="http://localhost:11434",
    api_key=""                       # empty string is fine for local Ollama
)
dspy.configure(lm=lm)

### 4) Load a dataset and define inputs
- Pulls **HotPotQA**, a multi-hop QA dataset.
- Uses 500 training examples with a fixed seed for reproducibility.
- Converts each example to only expose the **`question`** field as input (DSPy examples can carry multiple fields; you’re specifying which ones feed into your module).

In [None]:
trainset = [x.with_inputs('question') for x in HotPotQA(train_seed=2024, train_size=500).train]

### 5) Create a ReAct program with tools
- Instantiates a **ReAct** (Reason + Act) module that maps `question` → `answer`.  
- Registers `google_search` as an available **tool**. During inference, the policy can interleave thoughts (“reason”) with calls to this tool (“act”) before producing a final answer.

In [None]:
react = dspy.ReAct("question -> answer", tools=[google_search])

### 6) Quick manual inference

In [None]:
pred = react(question="What is 120 years from the year the special theory of relativity was published?")
rprint(pred.answer)

Average Metric: 1.00 / 13 (7.7%):  13%|█▎        | 13/100 [00:15<00:05, 15.05it/s]

In [7]:
dspy.inspect_history()





[34m[2025-09-15T02:28:55.600705][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


[31mUser message:[0m

[[ ## question ## ]]
What is 120 years from the year the special theory of relativity was published?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
I need to determine the publication year of the special theory of relativity to calculate the correct date 120 years later.

[[ ## tool_name_0 ## ]]
google_search

[[ ## tool_args_0 ## ]]
{"query": "publication year of special theory of relativity", "num":

### 7) Compile/optimize the program with MIPROv2

- **MIPROv2** = “Model-Initialized Program Optimization” v2. It automatically **tunes** your program (e.g., prompting, routing, or internal heuristics) to improve a target **metric**.
- `metric=dspy.evaluate.answer_exact_match` uses **exact match** between predicted and gold answers.
- `auto="light"` picks a lighter optimization recipe (fewer/cheaper search steps).
- `compile(...)` returns a new, **optimized** version of your ReAct program.

In [9]:

tp = dspy.MIPROv2(metric=dspy.evaluate.answer_exact_match, auto="light", num_threads=24)
optimized_react = tp.compile(react, trainset=trainset)

2025/09/15 02:49:22 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: True
num_candidates: 3
valset size: 100

2025/09/15 02:49:25 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/09/15 02:49:25 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/09/15 02:49:25 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=3 sets of demonstrations...


Bootstrapping set 1/3
Bootstrapping set 2/3
Bootstrapping set 3/3


 30%|███       | 30/100 [05:02<11:46, 10.10s/it]
2025/09/15 02:54:28 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/09/15 02:54:28 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2025/09/15 02:54:28 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing instructions...



Bootstrapped 4 full traces after 30 examples for up to 1 rounds, amounting to 30 attempts.


2025/09/15 02:55:36 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/09/15 02:55:36 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Given the fields `question`, produce the fields `answer`.

You will be given `question` and your goal is to finish with `answer`.

To do this, you will interleave Thought, Tool Name, and Tool Args, and receive a resulting Observation.

Thought can reason about the current situation, and Tool Name can be the following types:

(1) google_search, whose description is <desc>Search Google using Serper API and return the top results.</desc>. It takes arguments {'query': {'type': 'string'}, 'num': {'type': 'integer'}} in JSON format.
(2) finish, whose description is <desc>Signals that the final outputs, i.e. `answer`, are now available and marks the task as complete.</desc>. It takes arguments {'kwargs': 'Any'} in JSON format.

2025/09/15 02:55:36 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are a meticulous researcher tasked w

Average Metric: 20.00 / 100 (20.0%): 100%|██████████| 100/100 [08:43<00:00,  5.24s/it]

2025/09/15 03:04:20 INFO dspy.evaluate.evaluate: Average Metric: 20 / 100 (20.0%)
2025/09/15 03:04:20 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 20.0

2025/09/15 03:04:20 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 2 / 8 - Minibatch ==



Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [07:40<00:00, 18.41s/it] 

2025/09/15 03:12:01 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)
2025/09/15 03:12:01 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 2'].
2025/09/15 03:12:01 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [40.0]
2025/09/15 03:12:01 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [20.0]
2025/09/15 03:12:01 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 20.0


2025/09/15 03:12:01 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 3 / 8 - Minibatch ==



Average Metric: 3.00 / 25 (12.0%): 100%|██████████| 25/25 [03:08<00:00,  7.55s/it]

2025/09/15 03:15:10 INFO dspy.evaluate.evaluate: Average Metric: 3 / 25 (12.0%)
2025/09/15 03:15:10 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 12.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 1', 'Predictor 1: Few-Shot Set 1'].
2025/09/15 03:15:10 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [40.0, 12.0]
2025/09/15 03:15:10 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [20.0]
2025/09/15 03:15:10 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 20.0


2025/09/15 03:15:10 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 4 / 8 - Minibatch ==



Average Metric: 10.00 / 25 (40.0%): 100%|██████████| 25/25 [07:06<00:00, 17.05s/it]

2025/09/15 03:22:16 INFO dspy.evaluate.evaluate: Average Metric: 10 / 25 (40.0%)
2025/09/15 03:22:16 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 40.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/09/15 03:22:16 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [40.0, 12.0, 40.0]
2025/09/15 03:22:16 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [20.0]
2025/09/15 03:22:16 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 20.0


2025/09/15 03:22:16 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 5 / 8 - Minibatch ==



Average Metric: 8.00 / 25 (32.0%): 100%|██████████| 25/25 [03:44<00:00,  9.00s/it]

2025/09/15 03:26:01 INFO dspy.evaluate.evaluate: Average Metric: 8 / 25 (32.0%)
2025/09/15 03:26:01 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 32.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/09/15 03:26:01 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [40.0, 12.0, 40.0, 32.0]
2025/09/15 03:26:01 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [20.0]
2025/09/15 03:26:01 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 20.0


2025/09/15 03:26:01 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 6 / 8 - Minibatch ==



Average Metric: 6.00 / 25 (24.0%): 100%|██████████| 25/25 [03:16<00:00,  7.85s/it]

2025/09/15 03:29:17 INFO dspy.evaluate.evaluate: Average Metric: 6 / 25 (24.0%)
2025/09/15 03:29:17 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 24.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0', 'Predictor 1: Instruction 2', 'Predictor 1: Few-Shot Set 2'].
2025/09/15 03:29:17 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [40.0, 12.0, 40.0, 32.0, 24.0]
2025/09/15 03:29:17 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [20.0]
2025/09/15 03:29:17 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 20.0


2025/09/15 03:29:17 INFO dspy.teleprompt.mipro_optimizer_v2: == Trial 7 / 8 - Minibatch ==



Average Metric: 9.00 / 25 (36.0%): 100%|██████████| 25/25 [07:17<00:00, 17.49s/it]

2025/09/15 03:36:35 INFO dspy.evaluate.evaluate: Average Metric: 9 / 25 (36.0%)
2025/09/15 03:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 36.0 on minibatch of size 25 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1', 'Predictor 1: Instruction 0', 'Predictor 1: Few-Shot Set 1'].
2025/09/15 03:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: Minibatch scores so far: [40.0, 12.0, 40.0, 32.0, 24.0, 36.0]
2025/09/15 03:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [20.0]
2025/09/15 03:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 20.0


2025/09/15 03:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 8 - Full Evaluation =====
2025/09/15 03:36:35 INFO dspy.teleprompt.mipro_optimizer_v2: Doing full eval on next top averaging program (Avg Score: 40.0) from minibatch trials...



Average Metric: 36.00 / 100 (36.0%): 100%|██████████| 100/100 [27:35<00:00, 16.55s/it]

2025/09/15 04:04:10 INFO dspy.evaluate.evaluate: Average Metric: 36 / 100 (36.0%)
2025/09/15 04:04:10 INFO dspy.teleprompt.mipro_optimizer_v2: [92mNew best full eval score![0m Score: 36.0
2025/09/15 04:04:10 INFO dspy.teleprompt.mipro_optimizer_v2: Full eval scores so far: [20.0, 36.0]
2025/09/15 04:04:10 INFO dspy.teleprompt.mipro_optimizer_v2: Best full score so far: 36.0
2025/09/15 04:04:10 INFO dspy.teleprompt.mipro_optimizer_v2: 

2025/09/15 04:04:10 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 36.0!





### 8) Save, reload, and reuse the optimized program

In [11]:
optimized_react.save("./dspy_local_program/react_prog.pkl", save_program=False)

In [12]:
loaded_dspy_program = dspy.ReAct("question -> answer", tools=[google_search])
loaded_dspy_program.load("./dspy_local_program/react_prog.pkl")

### 9) Evaluate programs

- Creates an **evaluator** over a dataset, reporting the chosen metric.
    
- Runs evaluation for:
    
    - `loaded_dspy_program` (the optimized, reloaded program),
        
    - `react` (the original, unoptimized baseline).
        
- `display_table=2` prints a short table; `display_progress=True` shows a progress bar.

In [13]:
evaluate = dspy.Evaluate(devset=trainset, metric=dspy.evaluate.answer_exact_match, num_threads=24,
                         display_progress=True, display_table=2)

In [14]:
rprint(evaluate(loaded_dspy_program))

Average Metric: 191.00 / 500 (38.2%): 100%|██████████| 500/500 [2:18:19<00:00, 16.60s/it]  

2025/09/15 07:18:39 INFO dspy.evaluate.evaluate: Average Metric: 191 / 500 (38.2%)





Unnamed: 0,question,example_answer,trajectory,reasoning,pred_answer,answer_exact_match
0,Are Smyrnium and Nymania both types of plant?,yes,{'thought_0': 'To determine if both Smyrnium and Nymania are plant...,The search results confirm that both Smyrnium and Nymania are plan...,Yes,✔️ [True]
1,That Darn Cat! and Never a Dull Moment were both produced by what ...,Walt Disney Productions,"{'thought_0': 'To determine the shared studio behind both films, I...","Both ""That Darn Cat!"" and ""Never a Dull Moment"" were produced by W...",Walt Disney Productions,✔️ [True]


In [15]:
rprint(evaluate(react))

Average Metric: 71.00 / 393 (18.1%):  79%|███████▊  | 393/500 [1:33:11<30:30, 17.11s/it]  

2025/09/15 08:52:09 ERROR dspy.utils.parallelizer: Error processing item Example({'question': 'Which artist beat both Bob Dylan and the musician who was part of the Mothers of Invention band to the release of the first studio double album?', 'answer': 'Léo Ferré'}) (input_keys={'question'}): Expected dict_keys(['reasoning', 'answer']) but got dict_keys(['reasoning']). Set `provide_traceback=True` to see the stack trace.


Average Metric: 79.00 / 499 (15.8%): 100%|██████████| 500/500 [1:59:54<00:00, 14.39s/it]

2025/09/15 09:18:34 INFO dspy.evaluate.evaluate: Average Metric: 79.0 / 500 (15.8%)





Unnamed: 0,question,example_answer,trajectory,reasoning,pred_answer,answer_exact_match,answer
0,Are Smyrnium and Nymania both types of plant?,yes,"{'thought_0': ""I need to verify if both Smyrnium and Nymania are p...",The Google search for Smyrnium confirmed it is a plant genus in th...,"Yes, both Smyrnium and Nymania are plant genera. Smyrnium belongs ...",,
1,That Darn Cat! and Never a Dull Moment were both produced by what ...,Walt Disney Productions,{'thought_0': 'I need to determine the shared studio behind both f...,"The search results indicate that ""That Darn Cat!"" was produced by ...",Richard Williams Studios,,


In [16]:
pred = react(question="Are Smyrnium and Nymania both types of plant?")
rprint(pred.answer)

In [17]:
dspy.inspect_history(n=1)





[34m[2025-09-15T09:18:39.398579][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


[31mUser message:[0m

[[ ## question ## ]]
Are Smyrnium and Nymania both types of plant?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
I need to verify if both Smyrnium and Nymania are plant genera. I'll start by checking Smyrnium's classification.

[[ ## tool_name_0 ## ]]
google_search

[[ ## tool_args_0 ## ]]
{"query": "Smyrnium plant genus", "num": 1}

[[ ## observation_0 ## ]]
Error: 400, {"message":"Not enough credit

In [18]:
pred = loaded_dspy_program(question="Are Smyrnium and Nymania both types of plant?")
rprint(pred.answer)

In [19]:
dspy.inspect_history(n=1)





[34m[2025-09-15T09:18:50.288718][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


[31mUser message:[0m

This is an example of the task, though some input or output fields are not supplied.

[[ ## question ## ]]
Ryan Blair plays at midfield for which Welsh football club?

[[ ## trajectory ## ]]
Not supplied for this particular example.

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## comp