# Lesson 3: Building an Agent Reasoning Loop

So far all our queries are performed in a single *forward* pass. What if the user asks a complex question consisting of multiple steps, or a vague question that needs clarification. In this lesson we will develop an Agent Reasoning Loop where an agent is able to reason over tools in multiple steps.

## Setup

In [1]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio
nest_asyncio.apply()

## Load the data

To download this paper, below is the needed code:

#!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf

**Note**: The pdf file is included with this lesson. To access it, go to the `File` menu and select`Open...`.

## Setup the Query Tools

`vector_tool` and `summary_tool` are the tools we have developed in the previous lessons. We have packaged this into a single function call.

In [3]:
from utils import get_doc_tools

vector_tool, summary_tool = get_doc_tools("metagpt.pdf", "metagpt")

## Setup Function Calling Agent

In LlamaIndex, an agent consists of two main components: an AgentWorker and an AgentRunner. An AgentWorker is responsible for the execution of the next step of an agent, while the AgentRunner is the overall task dispatcher, responsible for the creation of task, dispatching to workers, and returning the final response.

![agent](img/agent.jpg)

In [4]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

For the agent worker, we pass the set of tools: the vector tool, and the summary tool. Given the current conversation history and user input, the `FunctionCallingAgentWorker` uses function calling to decide the next function call, call that tool, and decide whether or not to return the result.

In [5]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

Below, we can see that the agent has broken the question into a series of steps. The first step calls the summary tool. One may argue that the vector tool could be more appropriate. Although it is still a reasonable one, more powerful models may be better at picking the correct tool. The output of this step is then used to perform chain-of-though reasoning.

In [6]:
response = agent.query(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT are Product Manager, Architect, Project Manager, Engineer, QA Engineer, Developer, and User. Each role has specific responsibilities and expertise tailored to different aspects of the collaborative software development process, such as analyzing user requirements, translating requirements into system design components, distributing tasks, executing code, formulating test cases, generating product requirement documents, designing system architecture, implementing design, providing input commands, and ensuring high-quality software through unit testing. These roles work together in a structured manner following Standard Operating Procedures (SOPs) to efficiently complete complex software development tasks.
=== C

Similarly to the previous lesson, we can look into the output to trace the various steps.

In [None]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-07-13
last_modified_date: 2024-06-24

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks, however, 

Calling `agent.query()` allows you to query the agent in a one-off manner which does not preserve state. Let's now try to retain history over time. The agent is able to maintain chats in a conversational memory buffer. The memory module can be customized, but by default it is a flat list of items, a rolling buffer. Therefore the previous conversation history is also used when taking the next step.

Using `agent.chat()` we can start this conversation history, as shown below.

In [8]:
response = agent.chat(
    "Tell me about the evaluation datasets used."
)

Added user message to memory: Tell me about the evaluation datasets used.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation datasets used in MetaGPT"}
=== Function Output ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and the SoftwareDev dataset.
=== LLM Response ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and the SoftwareDev dataset.


In [9]:
response = agent.chat("Tell me the results over one of the above datasets.")

Added user message to memory: Tell me the results over one of the above datasets.
=== Calling Function ===
Calling function: vector_tool_metagpt with args: {"query": "results over HumanEval dataset"}
=== Function Output ===
MetaGPT achieved 85.9% and 87.7% Pass rates over the HumanEval dataset.
=== LLM Response ===
MetaGPT achieved 85.9% and 87.7% Pass rates over the HumanEval dataset.


## Lower-Level: Debuggability and Control

In this section we will show how to exert a much finer control over the agent. The key benefits are:

1. Decoupling of task creation and execution: users gain the flexibility to schedule tasks execution according to their needs.
2. Enhanced debuggability: offers deeper insights into each step of the execution process, improving troubleshooting capabilities.
3. Steerability: allows users to directly modify intermediate steps and incorporate human feedback for refined control.

We start defining an agent worker and an agent runner as seen in the previous lesson.

In [10]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

Now let's execute a single step of a task. We create a task for the agent using the same question we used before. This will return a task object that contains the input as well as additional state.

In [11]:
task = agent.create_task(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

This calls the summary tool, and then stops.

In [12]:
step_output = agent.run_step(task.task_id)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT include Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities and expertise tailored to different aspects of the software development process, such as analyzing user requirements, translating requirements into system design components, handling task distribution, executing code, and formulating test cases to ensure code quality. These roles work together in a sequential manner following Standard Operating Procedures (SOPs) to efficiently complete complex software development tasks within the MetaGPT framework.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents communicate with each other in MetaGPT"}
=== Fun

If we look at the completed steps, we can see that 1 step has been completed, and we can inspect the output.

In [13]:
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

Num completed for task a4ee1fb1-6db7-4e79-91a1-f5977d514241: 1
The agent roles in MetaGPT include Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities and expertise tailored to different aspects of the software development process, such as analyzing user requirements, translating requirements into system design components, handling task distribution, executing code, and formulating test cases to ensure code quality. These roles work together in a sequential manner following Standard Operating Procedures (SOPs) to efficiently complete complex software development tasks within the MetaGPT framework.


We can inspect the upcoming steps. `input=None` is because the agent auto-generates inputs for the next steps.

In [14]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

Num upcoming steps for task a4ee1fb1-6db7-4e79-91a1-f5977d514241: 1


TaskStep(task_id='a4ee1fb1-6db7-4e79-91a1-f5977d514241', step_id='444ac121-f99b-4ea8-b2d7-58e50596a206', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)

Let's run the next two steps and try injecting user input. This was not part of the original task query, but by injecting it, we can modify agent execution.

In [15]:
step_output = agent.run_step(
    task.task_id, input="What about how agents share information?"
)

Added user message to memory: What about how agents share information?
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents share information in MetaGPT"}
=== Function Output ===
Agents in MetaGPT share information through a structured communication protocol that includes a shared message pool. This pool allows agents to publish structured messages and subscribe to relevant messages based on their profiles. Additionally, agents can obtain directional information from other roles and public information from the environment. This structured communication interface enhances role communication efficiency within the framework. Agents also utilize a subscription mechanism to filter out irrelevant information and ensure they receive only pertinent details based on their role profiles.


We need to run one last step to synthesize the answer, and to verify that this is the last step, we can chenck the value of `step_output.is_last`.

In [16]:
step_output = agent.run_step(task.task_id)
print(step_output.is_last)

=== LLM Response ===
Agents in MetaGPT share information through a structured communication protocol that includes a shared message pool. This pool allows agents to publish structured messages and subscribe to relevant messages based on their profiles. Additionally, agents can obtain directional information from other roles and public information from the environment. This structured communication interface enhances role communication efficiency within the framework. Agents also utilize a subscription mechanism to filter out irrelevant information and ensure they receive only pertinent details based on their role profiles.
True


To translate this response into an agent response, call `.finalize_response(...)`, and we get back the final answer.

In [17]:
response = agent.finalize_response(task.task_id)

In [18]:
print(str(response))

assistant: Agents in MetaGPT share information through a structured communication protocol that includes a shared message pool. This pool allows agents to publish structured messages and subscribe to relevant messages based on their profiles. Additionally, agents can obtain directional information from other roles and public information from the environment. This structured communication interface enhances role communication efficiency within the framework. Agents also utilize a subscription mechanism to filter out irrelevant information and ensure they receive only pertinent details based on their role profiles.
