# Parallel Agent Workflow
Author: [Zain Hasan](https://x.com/ZainHasan6)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/togethercomputer/together-cookbook/blob/main/Agents/Parallel_Agent_Workflow.ipynb)

## Introduction

In this notebook, we'll demonstrate how you can create an agent workflow that will execute multiple LLMs in parallel to solve the same task - proposing solutions which a final LLM aggregates into a response for the user.

This strategy, proposed in the [Mixture of Agents paper](https://arxiv.org/abs/2406.04692), shows results far beyond the capabilities of individual contributing models.

This notebook covers:

1. An architecture that enables multiple LLMs to solve a task in parallel and propose solutions
2. An aggregator model that synthesizes individual proposals into one final answer

## Parallel Agent Workflow - Mixture of Agents

<img src="https://github.com/togethercomputer/together-cookbook/blob/main/images/parallel_same.png?raw=1" width="700">

In this **parallel agent workflow**, we demonstrate how you can orchestrate multiple LLMs to work simultaneously on the same task, with each model proposing its own solution. This approach leverages the collective intelligence of different models, with their responses being aggregated by a final LLM that synthesizes a comprehensive solution. Each model in the parallel workflow can have different capabilities or specialized knowledge, contributing unique perspectives to the task at hand.

The aggregator model then analyzes these various proposals to create an enhanced final response that builds upon the strengths of each individual contribution.

For our specific use case, the workflow breaks down as follows:

1. Given a user prompt, multiple LLMs process the task in parallel, each generating their own proposed solution
2. All individual responses are collected and passed to an aggregator LLM
3. The aggregator LLM analyzes the collective proposals and synthesizes them into a final, improved response for the user

Now let's see the coded implementation of this workflow.


## Setup and Utils

In [None]:
# Install libraries
!pip install -qU pydantic together

In [None]:
# Import libraries
import json
import asyncio
import together
from together import AsyncTogether, Together

from typing import Any, Optional, Dict, List, Literal
from pydantic import Field, BaseModel, ValidationError

TOGETHER_API_KEY = "--Your API Key--"

client = Together(api_key= TOGETHER_API_KEY)
async_client = AsyncTogether(api_key= TOGETHER_API_KEY)

In [None]:
# Simple LLM call helper function
def run_llm(user_prompt : str, model : str, system_prompt : Optional[str] = None):
    """ Run the language model with the given user prompt and system prompt. """
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})

    messages.append({"role": "user", "content": user_prompt})

    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0.7,
        max_tokens=4000,
    )

    return response.choices[0].message.content

## Parallel Agent Implementation

In [None]:
# The function below will call the reference LLMs in parallel
async def run_llm_parallel(user_prompt : str, model : str, system_prompt : str = None):
    """Run parallel LLM call with a reference model."""
    for sleep_time in [1, 2, 4]:
        try:
            messages = []
            if system_prompt:
                messages.append({"role": "system", "content": system_prompt})

            messages.append({"role": "user", "content": user_prompt})

            response = await async_client.chat.completions.create(
                model=model,
                messages=messages,
                temperature=0.7,
                max_tokens=2000,
            )
            break
        except together.error.RateLimitError as e:
            print(e)
            await asyncio.sleep(sleep_time)
    return response.choices[0].message.content

In [None]:
# These will be the intermediate proposer models

reference_models = [
    "microsoft/WizardLM-2-8x22B",
    "Qwen/Qwen2.5-72B-Instruct-Turbo",
    "google/gemma-2-27b-it",
    "meta-llama/Llama-3.3-70B-Instruct-Turbo",
]

user_prompt = """Tim wants to invest some money in a bank which compounds quarterly
with an annual interest rate of $7\%$. To the nearest dollar, how much money should he
invest if he wants a total of $\$60,\!000$ at the end of $5$ years?"""

# Generate intermediate reference responses
results = await asyncio.gather(*[run_llm_parallel(user_prompt=user_prompt, model=model) for model in reference_models])

  user_prompt = """Tim wants to invest some money in a bank which compounds quarterly


In [None]:
# Check we get all responses
assert len(results) == len(reference_models)

Next we'll aggregate the intermediate responses using anohter LLM:

In [None]:
# Gather all intermediate responses along with the system prompt
aggregator_system_prompt = """You have been provided with a set of responses from various open-source models to the latest user query.
Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information
provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the
given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured,
coherent, and adheres to the highest standards of accuracy and reliability.

Responses from models:"""

print(aggregator_system_prompt + "\n" + "\n".join(f"{i+1}. {str(element)}" for i, element in enumerate(results)))

You have been provided with a set of responses from various open-source models to the latest user query.
Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information
provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the
given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured,
coherent, and adheres to the highest standards of accuracy and reliability.

Responses from models:
1.  Let's think step by step.To solve this problem, we will use the formula for compound interest, which is given by:

\[ A = P \left(1 + \frac{r}{n}\right)^{nt} \]

where:
- \( A \) is the amount of money accumulated after \( n \) years, including interest.
- \( P \) is the principal amount (the initial amount of money).
- \( r \) is the annual interest rate (in decimal form).
- \( n \) is

In [None]:
# We will use the best open source model to aggregate the responses
aggregator_model = "deepseek-ai/DeepSeek-V3"

final_output = run_llm(user_prompt=user_prompt, # task to be completed
                       model=aggregator_model,
                       system_prompt=aggregator_system_prompt + "\n" + "\n".join(f"{i+1}. {str(element)}" for i, element in enumerate(results)
           ))

print(final_output)

To determine how much Tim should invest to have \$60,000 at the end of 5 years with an annual interest rate of 7% compounded quarterly, we can use the **compound interest formula**:

\[
A = P \left(1 + \frac{r}{n}\right)^{nt}
\]

Where:
- \( A \) is the future value (\$60,000),
- \( P \) is the principal amount (the initial investment we need to find),
- \( r \) is the annual interest rate (7% or 0.07),
- \( n \) is the number of times interest is compounded per year (quarterly means \( n = 4 \)),
- \( t \) is the number of years (5).

We rearrange the formula to solve for \( P \):

\[
P = \frac{A}{\left(1 + \frac{r}{n}\right)^{nt}}
\]

Now, substitute the given values:

\[
P = \frac{60,\!000}{\left(1 + \frac{0.07}{4}\right)^{4 \cdot 5}}
\]

Simplify the terms inside the parentheses:

\[
1 + \frac{0.07}{4} = 1.0175
\]

Calculate the exponent:

\[
4 \cdot 5 = 20
\]

Now, compute \( (1.0175)^{20} \):

\[
(1.0175)^{20} \approx 1.414778
\]

Finally, solve for \( P \):

\[
P = \frac{60,\!00

## Generic Implementation

Now we will create a succint function that you can use to try out different combinations of proposers

In [None]:
async def parallel_workflow(prompt : str, proposer_models : List[str], aggregator_model : str, aggregator_prompt: str):
    """Run a parallel chain of LLM calls to address the `input_query`
    using a list of models specified in `models`.

    Returns output from final aggregator model.
    """

    # Gather intermediate responses from proposer models
    proposed_responses = await asyncio.gather(*[run_llm_parallel(prompt, model) for model in proposer_models])

    # Aggregate responses using an aggregator model
    final_output = run_llm(user_prompt=prompt,
                           model=aggregator_model,
                           system_prompt=aggregator_prompt + "\n" + "\n".join(f"{i+1}. {str(element)}" for i, element in enumerate(proposed_responses)
           ))

    return final_output, proposed_responses


In [None]:
# Example Usage

reference_models = [
    "microsoft/WizardLM-2-8x22B",
    "Qwen/Qwen2.5-72B-Instruct-Turbo",
    "google/gemma-2-27b-it",
    "meta-llama/Llama-3.3-70B-Instruct-Turbo",
]

user_prompt = """Jenna and her mother picked some apples from their apple farm.
Jenna picked half as many apples as her mom. If her mom got 20 apples, how many apples did they both pick?"""

aggregator_model = "deepseek-ai/DeepSeek-V3"

aggregator_system_prompt = """You have been provided with a set of responses from various open-source models to the latest user query.
Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information
provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the
given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured,
coherent, and adheres to the highest standards of accuracy and reliability.

Responses from models:"""

answer, intermediate_reponses = await parallel_workflow(prompt = user_prompt,
                                                        proposer_models = reference_models,
                                                        aggregator_model = aggregator_model,
                                                        aggregator_prompt = aggregator_system_prompt)

In [None]:
print(f"Final Answer: {answer}\n")

Final Answer: To determine the total number of apples Jenna and her mother picked together, follow these steps:

1. **Jenna's Apples:**  
   Jenna picked half as many apples as her mother. Since her mother picked 20 apples, Jenna picked:  
   \[
   \frac{1}{2} \times 20 = 10 \text{ apples}
   \]

2. **Total Apples:**  
   To find the total number of apples they both picked, add the number of apples Jenna's mother picked to the number of apples Jenna picked:  
   \[
   20 \text{ (mother's apples)} + 10 \text{ (Jenna's apples)} = 30 \text{ apples}
   \]

**Final Answer:**  
Jenna and her mother together picked **30 apples**.



---