# Multi-Agent Question-Answer

In this example, we are building a workflow for [HotpotQA](https://arxiv.org/abs/1809.09600), which requires the agent to retrieve from wiki-2017 documents twice to answer a factorial question.

The implementation is adopted from [dspy](https://github.com/stanfordnlp/dspy?tab=readme-ov-file#5a-dspy-vs-thin-wrappers-for-prompts-openai-api-minichain-basic-templating), including three agents in total:
- **Query agent 0**: generates a search query from the user question.
- **Query agent 1**: refines the search by retrieving additional information based on initial results.
- **Answer agent**: synthesizes the retrieved documents to provide a final answer.

![hotpotqa](../imgs/hotpotqa.png)

## 1) Setup

First, let's set the environment for workflow execution. Following keys are required:

OPENAI_API_KEY="your-openai-key"
COLBERT_URL="colbert-serving-url"

> **Note:** 
>
> If you are using DSPy's ColBERT service, try link `http://20.102.90.50:2017/wiki17_abstracts`. 
>
> For hosting on your local machine, check [ColBERT official repo](https://github.com/stanford-futuredata/ColBERT) for installation and setup.

## 2) Check HotPotQA Workflow

The complete code for this workflow is based on `dspy` and is avaibale in `workflow.py`. Try it out with:

In [1]:
%run workflow.py

{'answer': 'The 2010 population of Woodmere, New York, the birthplace of Gerard Piel, was 17,121.'}


## 3) Optimize The Workflow

The workflow entry point is already registered using annotation `cognify.register_workflow`.

Here we configure the optimization pipeline:
1. Define the evaluation method
2. Define the data loader
3. Config the optimizer

### 3.1 Tell Cognify how good the answer is

We use builtin f1 score to evaluate the similarity between the predicted answer and the given ground truth.

In [None]:
import cognify
from cognify.hub.evaluators import f1_score_str

@cognify.register_evaluator
def answer_f1(answer: str, ground_truth: str):
    return f1_score_str(answer, ground_truth)

### 3.2 Tell Cognify what data to use

We directly use the hotpotqa dataset from DSPy with some minor formatting changes.

The loaded data should be a series of pairs (input / ground_truth). 

Both `input` and `ground_truth` should be a dictionary.

Cognify will dispath the data by matching their name to the function signature, in short:

```python
# register workflow
# register evaluator

data: [(input, ground_truth), ...] = data_loader()
for input, ground_truth in data:
    result = workflow(**input)
    eval_inputs = as_per_func_signature(evaluator, input, result, ground_truth)
    score = evaluator(**eval_inputs)
```

According to the above rule, we register the data loader as follows:

In [3]:
def formatting(item):
    return (
        {'question': item.question},
        {'ground_truth': item.answer}
    )

@cognify.register_data_loader
def load_hotpotqa_data():
    from dspy.datasets.hotpotqa import HotPotQA
    dataset = HotPotQA(train_seed=1, train_size=150, eval_seed=2023, dev_size=200, test_size=0)
    
    trainset = [formatting(x) for x in dataset.train[0:100]]
    valset = [formatting(x) for x in dataset.train[100:150]]
    devset = [formatting(x) for x in dataset.dev]
    return trainset, valset, devset

### 3.3 Config the optimizer

Let's use the default configuration to optimize this workflow. The search space includes:
- 2 fewshot examples to add for each agent
- whether to apply Chain-of-thought to each agent

In [7]:
from cognify.hub.search import default

search_settings = default.create_search()

## 4. Start the Optimization

You can save the above configs in `config.py` file and use Cognify's CLI to fire the optimization with:

```console
$ cognify optimize workflow.py
```

Alternatively you can run the following:

In [None]:
train, val, dev = load_hotpotqa_data()

opt_cost, pareto_frontier, opt_logs = cognify.optimize(
    script_path="workflow.py",
    control_param=search_settings,
    train_set=train,
    val_set=val,
    eval_fn=answer_f1,
    force=True, # This will overwrite the existing results
)

## 5. Optimization Results

Cognfiy will output each optimized workflow to a `.cog` file. For this workflow, the optimizer chooses the following optimizations:
- ensemble the first query generation module
- add few-shot examples to the ensembled query generation modules
- for the answer generation module, add few-shot examples. 

The final optimized workflow is depicted below, with optimizations steps highlighted in green.

![hotpotqa-opt](../imgs/hotpotqa_optimized.png)

The few-shot examples inserted into the prompt for the query generation modules were as follows:


> **Demonstration 1**:  
> Input  
> {  
>     "question": "Gustav Mahler composed a beautiful piece performed by the Bach-Elgar Choir. What is the name of that piece??"  
> }  
>    
> Response  
> {"search_query":"Gustav Mahler piece performed by Bach-Elgar Choir"}  
> 
> **Demonstration 2**:  
> Input  
> {  
>     "question": "Merle Reagle did crosswords for what magazine that has a focus on aging issues?"  
> }  
>    
> Response  
> {"search_query":"Merle Reagle crosswords magazine aging issues"}

The few-shot examples inserted into the prompt for the answer generation modules were as follows:

> **Demonstration 1:**  
> 	Input  
> 	{  
> 		"context": [  
> 			"Bach-Elgar Choir | The Bach-Elgar Choir is a community chorus of long standing in Hamilton, Ontario. The Choir is composed of accomplished amateur singers from Hamilton and neighbouring cities of Burlington, Oakville, Mississauga and Simcoe. Notable performances by the ensemble include the North American premi\u00e8re of Verdi's \"Requiem\" and the Canadian premi\u00e8res of G\u00f3recki's \"Miserere\" and Mahler's \"Symphony No. 2\" (the Resurrection). The choir has had several distinguished directors throughout its history and has performed in several notable venues including Roy Thomson Hall in Toronto, the Brantford's Sanderson Centre, with the Buffalo Philharmonic Orchestra, and at the Boris Brott Summer Festival. The choir enjoys frequent guest appearances with the Hamilton Philharmonic Orchestra.",  
> 			"Symphony No. 8 (Mahler) | The Symphony No. 8 in E-flat major by Gustav Mahler is one of the largest-scale choral works in the classical concert repertoire. Because it requires huge instrumental and vocal forces it is frequently called the \"Symphony of a Thousand\", although the work is normally presented with far fewer than a thousand performers and the composer did not sanction that name. The work was composed in a single inspired burst, at Maiernigg in southern Austria in the summer of 1906. The last of Mahler's works that was premiered in his lifetime, the symphony was a critical and popular success when he conducted the Munich Philharmonic in its first performance, in Munich, on 12 September 1910."  
> 		],  
> 		"question": "Gustav Mahler composed a beautiful piece performed by the Bach-Elgar Choir. What is the name of that piece??"  
> 	}  
> 	
> 	Response  
> 	{"answer":"Symphony No. 2 (the Resurrection)"}  
>  
> **Demonstration 2:**  
> 	Input  
> 	{  
> 		"context": [  
> 			"Merl Reagle | Merl Harry Reagle (January 5, 1950 \u2013 August 22, 2015) was an American crossword constructor. For 30 years, he constructed a puzzle every Sunday for the \"San Francisco Chronicle\" (originally the \"San Francisco Examiner\"), which he syndicated to more than 50 Sunday newspapers, including the \"Washington Post\", the \"Los Angeles Times\", the \"Philadelphia Inquirer\", the \"Seattle Times\", \"The Plain Dealer\" (Cleveland, Ohio), the \"Hartford Courant\", the \"New York Observer\", and the \"Arizona Daily Star\". Reagle also produced a bimonthly crossword puzzle for \"AARP The Magazine\" magazine, a monthly crossword puzzle for the Society of Former Special Agents of the FBI, and puzzles for the American Crossword Puzzle Tournament.",  
> 			"Aging and Disease | Aging and Disease is a bimonthly peer-reviewed open access medical journal published by JKL International on behalf of the International Society on Aging and Disease. It covers all issues pertaining to the biology of aging, pathophysiology of age-related diseases, and novel treatments for diseases afflicting the elderly. The journal was established in 2010 and the editors-in-chief are Kunlin Jin (University of North Texas), Ashok Shetty (University of Texas at Austin), and David Greenberg (Buck Institute for Research on Aging).",  
> 			"AARP The Magazine | AARP The Magazine is an American bi-monthly magazine, published by the American Association of Retired People, AARP, which focuses on aging issues."  
> 		],  
> 		"question": "Merle Reagle did crosswords for what magazine that has a focus on aging issues?"  
> 	}  
> 	
> 	Response  
> 	{"answer":"AARP The Magazine"}  


Check out more details on [how to interpret optimization results](https://cognify-ai.readthedocs.io/en/latest/user_guide/tutorials/interpret.html#detailed-transformation-trace).