<img src="../../docs/images/DSPy8.png" alt="DSPy7 Image" height="120"/>

### Multi-Agent DSPy Programs: Bootstrapping & Aggregating Multiple `ReAct` Agents

This is a quick (somewhat advanced) example of DSPy. You're given a hard QA task and an agent architecture (`dspy.ReAct`), how do you get high scores without tinkering with prompts?

There are many ways, but this notebook shows one complex strategy that DSPy makes near-trivial to achieve: we'll automatically bootstrap five different highly-effective prompts for ReAct, then optimize an aggregator that combines their powers.

As is usually the case with DSPy, the code to do this is probably shorter than describing it in English, so let's jump right into that.

### 0) TLDR.

This is a copy of the `multi_agent.ipynb` example in the same directory, but now with **Llama3-8b** instead of **GPT-3.5**.

We'll build a ReAct agent in DSPy that scores 24% accuracy on a retrieval-based question answering task.

Then, we'll optimize it with `BootstrapFewShotWithRandomSearch` to get 35% accuracy.

Then, we'll build a multi-agent aggregator over five different optimized versions of the agent.

Our unoptimized aggregator will score 21%. It doesn't understand the task. Hence, we'll optimize the aggregator too.

We'll end up with an optimized multi-agent system that scores a whopping 59% accuracy on the same task.

The core portion of the code to do this can be fit into 15 lines of DSPy, but we'll sprinkle some short explanations below.

### 1) Setting Up.

We'll configure the language model (GPT-3.5) and the retrieval model (ColBERTv2 over Wikipedia).

In [9]:
import dspy
from dspy.evaluate import Evaluate
from dspy.datasets.hotpotqa import HotPotQA
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

# Set up llama3 with a VLLM client, served on four GPUs. Please note that these URLs will not work for you; you'd need to refer to the documentation to set up your own VLLM/SGLANG server(s).
llama3 = dspy.HFClientVLLM(model="meta-llama/Meta-Llama-3-8B-Instruct", port=None, url=["http://future-hgx-3:7411", "http://future-hgx-3:7412", "http://future-hgx-3:7413", "http://future-hgx-1:7414"], max_tokens=500, stop=('\n',))
colbert = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.configure(lm=llama3, rm=colbert)

### 2) Loading some data.

We'll load 150 examples for training (`trainset`), 50 examples for validation & optimization (`valset`), and 300 examples for evaluation (`devset`).

In [10]:
dataset = HotPotQA(train_seed=1, train_size=200, eval_seed=2023, dev_size=300, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train[0:150]]
valset = [x.with_inputs('question') for x in dataset.train[150:200]]
devset = [x.with_inputs('question') for x in dataset.dev]

# show an example datapoint; it's just a question-answer pair
trainset[0]

Example({'question': 'At My Window was released by which American singer-songwriter?', 'answer': 'John Townes Van Zandt'}) (input_keys={'question'})

### 3) ReAct Agent.

Our agent will just be a DSPy ReAct agent that takes a `question` and outputs the `answer` by using a ColBERTv2 retrieval tool.

In [11]:
agent = dspy.ReAct("question -> answer", tools=[dspy.Retrieve(k=1)])

Let's evaluate this **unoptimized** ReAct agent on the `devset`.

In [19]:
# Set up an evaluator on the first 300 examples of the devset.
config = dict(num_threads=8, display_progress=True, display_table=5)
evaluate = Evaluate(devset=devset, metric=dspy.evaluate.answer_exact_match, **config)

evaluate(agent)

  0%|          | 0/300 [00:00<?, ?it/s]

Average Metric: 72 / 300  (24.0): 100%|██████████| 300/300 [01:37<00:00,  3.07it/s]


Unnamed: 0,question,example_answer,gold_titles,observations,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Qionghai', 'Cangzhou'}","[['Hebei | Hebei (; postal: Hopeh) is a province of China in the North China region. Its one-character abbreviation is ""冀 "" (Jì), named after...","No, Qionghai is not in the Hebei province of China",False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}","[['2017 NHL Entry Draft | The 2017 NHL Entry Draft was the 55th NHL Entry Draft. The draft was held from June 23–24, 2017, at...","answer=""Vegas Golden Knights""",False
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","[['Jay Feaster | Jay Harry Feaster (born July 30, 1962 in Harrisburg, Pennsylvania) is a National Hockey League (NHL) executive currently serving as the Executive...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","[[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...",River Tyne,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","[['Æthelweard (son of Alfred) | Æthelweard (d. 920 or 922) was the younger son of King Alfred the Great and Ealhswith.'], ['Ealhswith | Ealhswith or...",King Alfred the Great,✔️ [True]


24.0

### 4) Optimized ReAct.

Let's use DSPy's simple `BootstrapFewShotWithRandomSearch` optimizer to create successful examples of the ReAct program and attempt to optimize the prompts using those constructed examples. In the future, we could try more sophisticated DSPy optimizers too, like `MIPRO`.

We'll bootstrap 20 programs that way. Examples will be bootstrapped starting from the `trainset` and optimized over our tiny `valset`. We'll evaluate later on the `devset`.

In [22]:
config = dict(max_bootstrapped_demos=2, max_labeled_demos=0, num_candidate_programs=5, num_threads=8)
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, **config)
optimized_react = tp.compile(agent, trainset=trainset, valset=valset)

Average Metric: 2 / 3  (66.7):   4%|▍         | 2/50 [00:00<00:00, 77.81it/s]

Average Metric: 15 / 50  (30.0): 100%|██████████| 50/50 [00:09<00:00,  5.29it/s]
Average Metric: 15 / 50  (30.0): 100%|██████████| 50/50 [00:06<00:00,  7.63it/s]
  5%|▌         | 8/150 [00:26<07:55,  3.35s/it]
Average Metric: 19 / 50  (38.0): 100%|██████████| 50/50 [00:16<00:00,  3.10it/s]
  5%|▌         | 8/150 [00:25<07:33,  3.19s/it]
Average Metric: 16 / 50  (32.0): 100%|██████████| 50/50 [00:20<00:00,  2.47it/s]
  2%|▏         | 3/150 [00:08<07:20,  2.99s/it]
Average Metric: 16 / 50  (32.0): 100%|██████████| 50/50 [00:17<00:00,  2.92it/s]
  1%|▏         | 2/150 [00:02<02:36,  1.06s/it]
Average Metric: 16 / 50  (32.0): 100%|██████████| 50/50 [00:09<00:00,  5.36it/s]
  5%|▍         | 7/150 [00:22<07:32,  3.16s/it]
Average Metric: 16 / 50  (32.0): 100%|██████████| 50/50 [00:07<00:00,  6.33it/s]
  8%|▊         | 12/150 [00:37<07:12,  3.13s/it]
Average Metric: 18 / 50  (36.0): 100%|██████████| 50/50 [00:14<00:00,  3.39it/s]


In [35]:
evaluate(optimized_react)

  0%|          | 0/300 [00:00<?, ?it/s]

Average Metric: 93 / 300  (31.0): 100%|██████████| 300/300 [02:08<00:00,  2.33it/s]


Unnamed: 0,question,example_answer,gold_titles,observations,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Qionghai', 'Cangzhou'}",[],no,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}","[[""2017 NHL Expansion Draft | The 2017 NHL Expansion Draft was an expansion draft conducted by the National Hockey League on June 18–20, 2017 to...",George McPhee,False
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","[['Gretzky (disambiguation) | Wayne Gretzky is a retired National Hockey League player.'], [""2006–07 Detroit Red Wings season | The 2006–07 Detroit Red Wings season was...","""Steve Yzerman""",✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","[['Dumfries | Dumfries ( ; possibly from Scottish Gaelic: ""Dùn Phris"" ) is a market town and former royal burgh within the Dumfries and Galloway...",River Nith,False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","[['Ealhswith | Ealhswith or Ealswitha (died 5 December 902) was the wife of King Alfred the Great. Her father was a Mercian nobleman, Æthelred Mucel,...","""Edward the Elder""",False


31.0

In [32]:
optimized_reactX = optimized_react.deepcopy()
del optimized_reactX.candidate_programs

config = dict(max_bootstrapped_demos=2, max_labeled_demos=0, num_candidate_programs=20, num_threads=8)
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, **config)
optimized_react2 = tp.compile(agent, trainset=trainset, valset=valset, teacher=optimized_reactX)

In [38]:
evaluate(optimized_react2)

Average Metric: 105 / 300  (35.0): 100%|██████████| 300/300 [01:10<00:00,  4.27it/s]


Unnamed: 0,question,example_answer,gold_titles,observations,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Qionghai', 'Cangzhou'}","[['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...",no,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}","[[""2017 NHL Expansion Draft | The 2017 NHL Expansion Draft was an expansion draft conducted by the National Hockey League on June 18–20, 2017 to...",George McPhee,False
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","[['Tony Resch | Tony Resch is a retired lacrosse player, and current field and box lacrosse head coach. He is the former head coach of...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","[[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...",River Tyne,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","[['Steve Ellsworth | Steven Clark Ellsworth (born July 30, 1960 in Chicago) is the son of Dick Ellsworth and is a former Major League Baseball...",answer,False


35.0

### 5) Zero-Shot Aggregator.

Let's now extract the best five bootstrapped ReAct programs. We'll build a simple DSPy aggregator that runs all of them then produces a final answer.

In [49]:
from dsp.utils import flatten, deduplicate

# the best-performing five ReAct programs from the optimization process
AGENTS = [x[-1] for x in optimized_react2.candidate_programs[:5]]

class Aggregator(dspy.Module):
	def __init__(self, temperature=0.0):
		self.aggregate = dspy.ChainOfThought('context, question -> answer')
		self.temperature = temperature

	def forward(self, question):
		# Run all five agents with high temperature, then extract and deduplicate their observed contexts
		with dspy.context(lm=gpt3.copy(temperature=self.temperature)):
			preds = [agent(question=question) for agent in AGENTS]
			context = deduplicate(flatten([flatten(p.observations) for p in preds]))

		# Run the aggregation step to produce a final answer
		return self.aggregate(context=context, question=question)

Let's quickly evaluate the aggregator prior to optimization.

In [53]:
aggregator = Aggregator()
evaluate(aggregator)

Average Metric: 64 / 300  (21.3): 100%|██████████| 300/300 [12:53<00:00,  2.58s/it]


Unnamed: 0,question,example_answer,gold_titles,rationale,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Qionghai', 'Cangzhou'}","determine whether both Cangzhou and Qionghai are in the Hebei province of China. We know that Cangzhou is a prefecture-level city in eastern Hebei province,...","No, only Cangzhou is in Hebei province, while Qionghai is not.assistant",False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}",answer this question. We know that Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season. We also know that the Vegas...,The National Hockey League (NHL).assistant`,False
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}",answer this question. We know that the Wings entered a new era following the retirement of a Canadian retired professional ice hockey player and current...,Steve Yzerman.,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","find the answer. We know that the Crichton Collegiate Church is situated near the hamlet of Crichton in Midlothian, Scotland. We also know that the...",The River Esk.,False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}",find the answer. We know that Ealhswith was the wife of King Alfred the Great. We also know that Æthelweard was the younger son of...,King Alfred the Great.,✔️ [True]


21.33

### 6) Optimized Aggregator.

In [51]:
kwargs = dict(max_bootstrapped_demos=2, max_labeled_demos=6, num_candidate_programs=10, num_threads=8)
tp = BootstrapFewShotWithRandomSearch(metric=dspy.evaluate.answer_exact_match, **kwargs)
optimized_aggregator = tp.compile(aggregator, trainset=trainset, valset=valset)

Average Metric: 0 / 1  (0.0):   0%|          | 0/50 [00:00<?, ?it/s]

Average Metric: 9 / 50  (18.0): 100%|██████████| 50/50 [01:24<00:00,  1.69s/it]
Average Metric: 26 / 50  (52.0): 100%|██████████| 50/50 [00:15<00:00,  3.17it/s]
  3%|▎         | 4/150 [01:18<48:03, 19.75s/it]
Average Metric: 25 / 50  (50.0): 100%|██████████| 50/50 [00:09<00:00,  5.27it/s]
  5%|▌         | 8/150 [02:15<40:06, 16.95s/it]
Average Metric: 25 / 50  (50.0): 100%|██████████| 50/50 [00:16<00:00,  3.09it/s]
  1%|          | 1/150 [00:18<47:10, 19.00s/it]
Average Metric: 28 / 50  (56.0): 100%|██████████| 50/50 [00:12<00:00,  3.88it/s]
  1%|          | 1/150 [00:01<03:32,  1.42s/it]
Average Metric: 26 / 50  (52.0): 100%|██████████| 50/50 [00:12<00:00,  4.14it/s]
  1%|          | 1/150 [00:16<40:39, 16.37s/it]
Average Metric: 26 / 50  (52.0): 100%|██████████| 50/50 [00:15<00:00,  3.24it/s]
  3%|▎         | 4/150 [01:52<1:08:19, 28.08s/it]
Average Metric: 27 / 50  (54.0): 100%|██████████| 50/50 [00:17<00:00,  2.87it/s]
  1%|▏         | 2/150 [00:29<35:48, 14.52s/it]
Average Metric:

In [52]:
optimized_aggregator2 = optimized_aggregator.deepcopy()
optimized_aggregator2.temperature = 0.7

evaluate(optimized_aggregator2)

  0%|          | 0/300 [00:00<?, ?it/s]

Average Metric: 176 / 300  (58.7): 100%|██████████| 300/300 [11:59<00:00,  2.40s/it]


Unnamed: 0,question,example_answer,gold_titles,rationale,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Qionghai', 'Cangzhou'}","find the answer. We know that Cangzhou is a prefecture-level city in eastern Hebei province, while Qionghai is one of the seven county-level cities of...",No,✔️ [True]
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}","find the answer. We know that Marc-André Fleury was drafted to the Vegas Golden Knights in the 2017 NHL Expansion Draft, and we also know...",The National Hockey League,✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","find the answer. We know that the Wings entered a new era, following the retirement of Steve Yzerman, and we also know that Yzerman is...",Steve Yzerman,✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Collegiate Church', 'Crichton Castle'}","find the answer. We know that the Crichton Collegiate Church is situated in Midlothian, Scotland, and we also know that the River Esk flows through...",the River Esk,False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","find the answer. We know that Ealhswith was the wife of King Alfred the Great, and we also know that Æthelweard was the younger son...",King Alfred the Great,✔️ [True]


58.67

### 7) Conclusion.

Normally, we like to release notebooks with pre-computed caches and to inspect the prompts with `llama3.inspect_history` to explore the behavior of optimization. See the intro notebook (or any of the Colab notebooks on the README) for such annotated examples!

To keep the current release super quick, Omar will extend this notebook into an annotated version if there's significant interest.