<img src="../../docs/docs/static/img/dspy_logo.png" alt="DSPy7 Image" height="120"/>

### Multi-Agent DSPy Programs: Bootstrapping & Aggregating Multiple `ReAct` Agents

This is a quick (somewhat advanced) example of DSPy. You're given a hard QA task and an agent architecture (`dspy.ReAct`), how do you get high scores without tinkering with prompts?

There are many ways, but this notebook shows one complex strategy that DSPy makes near-trivial to achieve: we'll automatically bootstrap five different highly-effective prompts for ReAct, then optimize an aggregator that combines their powers.

As is usually the case with DSPy, the code to do this is probably shorter than describing it in English, so let's jump right into that.

### 0) TLDR.

We'll build a ReAct agent in DSPy that scores 30% accuracy on a retrieval-based question answering task.

Then, we'll optimize it with `BootstrapFewShotWithRandomSearch` to get 46% accuracy.

Then, we'll build a multi-agent aggregator over five different optimized versions of the agent.

Our unoptimized aggregator will score 26%. It doesn't understand the task. Hence, we'll optimize the aggregator too.

We'll end up with an optimized multi-agent system that scores a whopping 60% accuracy on the same task.

The core portion of the code to do this can be fit into 10 lines of DSPy, but we'll sprinkle some short explanations below.

### 1) Setting Up.

We'll configure the language model (GPT-3.5) and the retrieval model (ColBERTv2 over Wikipedia).

In [1]:
import dspy
from dspy.evaluate import Evaluate
from dspy.datasets.hotpotqa import HotPotQA
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

gpt3 = dspy.OpenAI('gpt-3.5-turbo-0125', max_tokens=1000)
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(lm=gpt3, rm=colbert)

### 2) Loading some data.

We'll load 150 examples for training (`trainset`), 50 examples for validation & optimization (`valset`), and 300 examples for evaluation (`devset`).

In [2]:
dataset = HotPotQA(train_seed=1, train_size=200, eval_seed=2023, dev_size=300, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train[0:150]]
valset = [x.with_inputs('question') for x in dataset.train[150:200]]
devset = [x.with_inputs('question') for x in dataset.dev]

# show an example datapoint; it's just a question-answer pair
trainset[0]

Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})

### 3) ReAct Agent.

Our agent will just be a DSPy ReAct agent that takes a `question` and outputs the `answer` by using a ColBERTv2 retrieval tool.

In [3]:
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])

Let's evaluate this **unoptimized** ReAct agent on the `devset`.

In [4]:
# Set up an evaluator on the first 300 examples of the devset.
config = dict(num_threads=8, display_progress=True, display_table=5)
evaluate = Evaluate(devset=devset, metric=dspy.evaluate.answer_exact_match, **config)

evaluate(agent)

Average Metric: 91 / 300  (30.3): 100%|██████████| 300/300 [00:01<00:00, 161.84it/s]


Unnamed: 0,question,example_answer,gold_titles,observations,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","[['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...","No, Cangzhou is in the Hebei province, while Qionghai is in the Hainan province of China.",False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}","[[""2017 NHL Expansion Draft | The 2017 NHL Expansion Draft was an expansion draft conducted by the National Hockey League on June 18–20, 2017 to...",National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","[['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","[[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...",Tweed River,False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Ealhswith', 'Æthelweard (son of Alfred)'}",[['Æthelweard (son of Alfred) | Æthelweard (d. 920 or 922) was the younger son of King Alfred the Great and Ealhswith.']],King Alfred the Great,✔️ [True]


30.33

### 4) Optimized ReAct.

Let's use DSPy's simple `BootstrapFewShotWithRandomSearch` optimizer to create successful examples of the ReAct program and attempt to optimize the prompts using those constructed examples. In the future, we could try more sophisticated DSPy optimizers too, like `MIPRO`.

We'll bootstrap 20 programs that way. Examples will be bootstrapped starting from the `trainset` and optimized over our tiny `valset`. We'll evaluate later on the `devset`.

In [5]:
config = dict(max_bootstrapped_demos=2, max_labeled_demos=0, num_candidate_programs=20, num_threads=8)
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, **config)
optimized_react = tp.compile(agent, trainset=trainset, valset=valset)

Average Metric: 14 / 50  (28.0): 100%|██████████| 50/50 [00:00<00:00, 151.32it/s]
Average Metric: 14 / 50  (28.0): 100%|██████████| 50/50 [00:00<00:00, 1191.35it/s]
  4%|▍         | 6/150 [00:00<00:00, 216.36it/s]
Average Metric: 19 / 50  (38.0): 100%|██████████| 50/50 [00:00<00:00, 158.43it/s]
  3%|▎         | 4/150 [00:00<00:00, 258.73it/s]
Average Metric: 21 / 50  (42.0): 100%|██████████| 50/50 [00:00<00:00, 184.63it/s]
  3%|▎         | 4/150 [00:00<00:01, 125.61it/s]
Average Metric: 24 / 50  (48.0): 100%|██████████| 50/50 [00:00<00:00, 130.39it/s]
  1%|▏         | 2/150 [00:00<00:00, 213.13it/s]
Average Metric: 20 / 50  (40.0): 100%|██████████| 50/50 [00:00<00:00, 158.40it/s]
  3%|▎         | 4/150 [00:00<00:00, 387.38it/s]
Average Metric: 18 / 50  (36.0): 100%|██████████| 50/50 [00:00<00:00, 168.50it/s]
  4%|▍         | 6/150 [00:00<00:00, 201.99it/s]
Average Metric: 12 / 50  (24.0): 100%|██████████| 50/50 [00:00<00:00, 152.09it/s]
  6%|▌         | 9/150 [00:00<00:00, 203.21it/s]


In [13]:
evaluate(optimized_react)

Average Metric: 138 / 300  (46.0): 100%|██████████| 300/300 [00:00<00:00, 512.74it/s]


Unnamed: 0,question,example_answer,gold_titles,observations,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","[['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...",no,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}",[['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Penguins season will be the 51st season for the National Hockey League ice hockey team that was...,National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","[['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","[[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...","Crichton Collegiate Church is located in Midlothian, Scotland, near the hamlet of Crichton, about 7.5 miles south of Edinburgh.",False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Ealhswith', 'Æthelweard (son of Alfred)'}","[['Æthelweard (son of Alfred) | Æthelweard (d. 920 or 922) was the younger son of King Alfred the Great and Ealhswith.'], ['Æthelstan of Kent |...",Alfred the Great,False


46.0

### 5) Zero-Shot Aggregator.

Let's now extract the best five bootstrapped ReAct programs. We'll build a simple DSPy aggregator that runs all of them then produces a final answer.

In [7]:
from dsp.utils import flatten, deduplicate

# the best-performing five ReAct programs from the optimization process
AGENTS = [x[-1] for x in optimized_react.candidate_programs[:5]]

class Aggregator(dspy.Module):
	def __init__(self, temperature=0.0):
		self.aggregate = dspy.ChainOfThought('context, question -> answer')
		self.temperature = temperature

	def forward(self, question):
		# Run all five agents with high temperature, then extract and deduplicate their observed contexts
		with dspy.context(lm=gpt3.copy(temperature=self.temperature)):
			preds = [agent(question=question) for agent in AGENTS]
			context = deduplicate(flatten([flatten(p.observations) for p in preds]))

		# Run the aggregation step to produce a final answer
		return self.aggregate(context=context, question=question)

Let's quickly evaluate the aggregator prior to optimization.

In [8]:
aggregator = Aggregator()
evaluate(aggregator)

Average Metric: 78 / 300  (26.0): 100%|██████████| 300/300 [00:06<00:00, 45.38it/s]


Unnamed: 0,question,example_answer,gold_titles,rationale,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}",determine if both Cangzhou and Qionghai are in the Hebei province of China. We need to carefully analyze the information provided in the context to...,"No, only Cangzhou is in the Hebei province of China. Qionghai is located in Hainan province.",False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}","produce the answer. We know that Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season. Looking at the context provided, we...","The 2017 NHL Expansion Draft conducted by the National Hockey League filled the roster of the Vegas Golden Knights, including selecting Marc-Andre Fleury for the...",False
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}",identify the retired Canadian professional ice hockey player and current general manager of the Tampa Bay Lightning of the National Hockey League (NHL) whose retirement...,Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","identify the river near the Crichton Collegiate Church. We know that the church is situated in Midlothian, Scotland, and the River Esk flows through Midlothian...",The River Esk,False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Ealhswith', 'Æthelweard (son of Alfred)'}","produce the answer. We know from the context that Ealhswith had a son named Æthelweard in the 10th century A.D. Now, looking at the information...",King Alfred the Great,✔️ [True]


26.0

### 6) Optimized Aggregator.

In [9]:
kwargs = dict(max_bootstrapped_demos=2, max_labeled_demos=6, num_candidate_programs=10, num_threads=8)
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, **kwargs)
optimized_aggregator = tp.compile(aggregator, trainset=trainset, valset=valset)

Average Metric: 16 / 50  (32.0): 100%|██████████| 50/50 [00:00<00:00, 153.98it/s]
Average Metric: 27 / 50  (54.0): 100%|██████████| 50/50 [00:00<00:00, 82.75it/s]
  3%|▎         | 4/150 [00:00<00:03, 45.32it/s]
Average Metric: 28 / 50  (56.0): 100%|██████████| 50/50 [00:00<00:00, 156.28it/s]
  1%|▏         | 2/150 [00:00<00:03, 39.99it/s]
Average Metric: 28 / 50  (56.0): 100%|██████████| 50/50 [00:00<00:00, 162.26it/s]
  1%|          | 1/150 [00:00<00:02, 51.23it/s]
Average Metric: 26 / 50  (52.0): 100%|██████████| 50/50 [00:00<00:00, 158.64it/s]
  1%|          | 1/150 [00:00<00:00, 155.47it/s]
Average Metric: 28 / 50  (56.0): 100%|██████████| 50/50 [00:00<00:00, 159.96it/s]
  1%|          | 1/150 [00:00<00:04, 31.56it/s]
Average Metric: 27 / 50  (54.0): 100%|██████████| 50/50 [00:00<00:00, 143.11it/s]
  1%|          | 1/150 [00:00<00:03, 43.19it/s]
Average Metric: 29 / 50  (58.0): 100%|██████████| 50/50 [00:00<00:00, 163.95it/s]
  1%|▏         | 2/150 [00:00<00:04, 31.94it/s]
Average 

In [10]:
optimized_aggregator2 = optimized_aggregator.deepcopy()
optimized_aggregator2.temperature = 0.7

evaluate(optimized_aggregator2)

Average Metric: 180 / 300  (60.0): 100%|██████████| 300/300 [00:07<00:00, 42.10it/s]


Unnamed: 0,question,example_answer,gold_titles,rationale,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","produce the answer. From the context, we know that Cangzhou is a prefecture-level city in eastern Hebei province, while Qionghai is one of the seven...",no,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017 NHL Expansion Draft', '2017–18 Pittsburgh Penguins season'}","produce the answer. From the context, we know that Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season. The draft that...",National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}",produce the answer. We know from the context that Steve Yzerman is a Canadian retired professional ice hockey player and the current general manager of...,Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","produce the answer. We know that Crichton Collegiate Church is located in Midlothian, Scotland, near the hamlet of Crichton. Since it is close to Edinburgh,...",River Esk,False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Ealhswith', 'Æthelweard (son of Alfred)'}","produce the answer. From the context, we know that Ealhswith was the wife of King Alfred the Great. Therefore, in the 10th Century A.D., Ealhswith...",King Alfred the Great,✔️ [True]


60.0

### 7) Conclusion.

Normally, we like to release notebooks with pre-computed caches and to inspect the prompts with `gpt3.inspect_history` to explore the behavior of optimization. See the intro notebook (or any of the Colab notebooks on the README) for such annotated examples!

To keep the current release super quick, Omar will extend this notebook into an annotated version if there's significant interest.

### 8) Post-Conclusion Note.

With a little bit of syntactic sugar, the main code in this notebook could be as short as 10 lines excluding whitespace:

```python
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])

optimizer = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match)
optimized_react = optimizer.compile(agent, trainset=trainset, valset=valset)

class Aggregator(dspy.Module):
	def __init__(self):
		self.aggregate = dspy.ChainOfThought('context, question -> answer')

	def forward(self, question):
        preds = [agent(question=question) for agent in optimized_react.best_programs[:5]]
		return self.aggregate(context=deduplicate(flatten([p.observations for p in preds])), question=question)
	
optimized_aggregator = optimizer.compile(aggregator, trainset=trainset, valset=valset)

# Use it!
optimized_aggregator(question="How many storeys are in the castle that David Gregory inherited?")
```