# Lesson 3: Building an Agent Reasoning Loop

So far, our queries have been done in a single forward pass. Given the query, call the right tool with the right parameters, and get back the response. But this is still quite limiting. What if the user asks a complex question consisting of multiple steps, or a vague question that needs clarification? In this lesson, we will define a complete agent reasoning loop. Instead of tool calling in a single-shot setting, an agent is able to reason over tools in multiple steps. 

We will use the function calling agent implementation, which is an agent that natively integrates with the function calling capabilities of LLMs.

## Setup

In [1]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio
nest_asyncio.apply()

## Load the data

To download this paper, below is the needed code:

#!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf

**Note**: The pdf file is included with this lesson. To access it, go to the `File` menu and select`Open...`.

## Setup the Query Tools

We will use the same MetaGPT paper, and we will also set up the auto-retrieval vector search tool and the summarization tool from Lesson 2. To make this more concise, we have packaged this into `get_doc_tools()` that we can import from the `utils` module.

In [3]:
from utils import get_doc_tools

vector_tool, summary_tool = get_doc_tools("metagpt.pdf", "metagpt")

## Setup Function Calling Agent

We now set up our function calling agent. We use GPT 3.5 turbo as our LLM. We will then define our agent.

In [4]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

So we import `FunctionCallingAgentWorker` and `AgentRunner` from LlamaIndex. And for `FunctionCallingAgentWorker()`, we pass in 2 set of tools: `vector_tool` and `summary_tool`. We also pass in the LLM and set `verbose=True` to look at the intermediate outputs.

Think about `FunctionCallingAgentWorker` primary responsibility as given the existing conversation history, memory and any passed state along with the current user input. Use function calling to decide the next protocol, call that tool, and decide whether or not to return a final response. The overall agent interface is behind the agent runner, and that's what we are going to use to query the agent.

We will first ask this question: `Tell me about the agent roles in MetaGPT, and then how they communicate with each other.` 

So let's trace through the outputs of this agent. We see that the agent is able to break down this overall question into steps. So the first part of the question is asking about agent roles and MetaGPT, and it calls the summary tool to answer this question. Now a quick note is that the summary tool isn't necessarily the most precise. We could argue that the vector tool will actually give us back a more concise set of context that better represents this relevant pieces of text that we are looking for. However, a summary tool is still a reasonable tool for the job. And of coure, more powerful models like Turbo 4 or Claude 3 Opus & Sonnet might be able to pick the more precise vector tool to help answer this question. In any case, we are able to get back the output: `Agent roles in MetaGPT include Product Manager, Architect, Project Manager, Engineer, and QA Engineer.`

And then it uses this to perform chain-of-throught to then trigger the next question, which is `communication between agent roles in MetaGPT`. We are able to get back an answer about that too: `Communication between agent roles in MetaGPT is structured and efficient...`, and we are able to combine this entire conversation history to generate a final response: `In MetaGPT, the agent roles include.... They work together in a sequential workflow following Standard Operating Procedures...`

In [5]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [6]:
response = agent.query(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
Agent roles in MetaGPT include Product Manager, Architect, Project Manager, Engineer, and QA Engineer. The Product Manager is responsible for analyzing user requirements and formulating a detailed Product Requirement Document (PRD). The Architect translates these requirements into system design components. The Project Manager distributes tasks based on the system design, while Engineers execute designated classes and functions. The QA Engineer formulates test cases to ensure code quality. These roles work together in a sequential workflow following Standard Operating Procedures (SOPs) to efficiently develop software solutions in MetaGPT.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents com

When we run a multi-step query like this, we want to make sure that we are actually able to trace the sources. So we use `response.source_nodes` to look at the content of these nodes. So we inspect the content of the first source node that is retrieved, which is the paper's first page.

In [7]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-10-23
last_modified_date: 2024-06-24

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks, however, 

Calling `agent.query()` allows us to query the agent in a one-off manner, but does not preserve state. So now let's try maintaining conversation history over time. The agent is able to maintain chats in a conversational memory buffer. The memory module can be customized, but by default it's a flat list of items that's a rolling buffer depending on the size of the context window of the LLM. Therefore, when the agent decides to use a tool, it not only uses a current chat, but also the previous conversation history to take the next step or perform the next action. 

So instead of `agent.query()`, we will do `agent.chat()`. We will ask `Tell me about the evaluation datasets used.` Here we see that it uses the summary tool to ask `evaluation data sets used in MetaGPT`. And we see `The evaluation datasets used in MetaGPT include HumanEval, MBPP, and Software Dev.`

In [8]:
response = agent.chat(
    "Tell me about the evaluation datasets used."
)

Added user message to memory: Tell me about the evaluation datasets used.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation datasets used in MetaGPT"}
=== Function Output ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and the SoftwareDev dataset.
=== LLM Response ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and the SoftwareDev dataset. These datasets are utilized to evaluate the performance and capabilities of MetaGPT in various tasks and domains.


We will see an example of this ability to maintain conversation history because we will ask a follow-up question `Tell me the results over one of the above datasets.` Obviously, to know what the "above datasets" are, we need to have that stored in the conversation history somewhere. So let's run this and it is able to translate this query plus conversation history into a query on the vector tool. And it asks `results over HumanEval dataset`, which is one of the eval datasets used. And it is able to give us back a final answer. So we just provided a nice high level interface for interacting with an agent.

In [9]:
response = agent.chat("Tell me the results over one of the above datasets.")

Added user message to memory: Tell me the results over one of the above datasets.
=== Calling Function ===
Calling function: vector_tool_metagpt with args: {"query": "results over HumanEval dataset", "page_numbers": ["7"]}
=== Function Output ===
MetaGPT achieved 85.9% and 87.7% Pass rates over the HumanEval dataset.
=== LLM Response ===
MetaGPT achieved pass rates of 85.9% and 87.7% over the HumanEval dataset. These results demonstrate the effectiveness of MetaGPT in performing tasks and generating responses in the evaluated dataset.


## Lower-Level: Debuggability and Control

The next section will show us capabilities that let us step through and control the agent in a much more granular fashion. This allows us to not only create a higher level research assistant over our RAG pipelines, but also debug and control it. Some of the benefits include greater debuggability into the execution of each step, as well as steerability by allowing us to inject user feedback. 

Having this low-level agent interface is powerful for 2 main reasons. 

1. The first is debuggability. If we are a developer building an agent, we want to have greater transparency and visibility into what's actually going on under the hood. If our agent isn't working the first time around, then we can go in and trace through the agent execution, see where it is failing, and try different inputs to see what actually modifies the agent execution into a correct response. 


2. Another reason is it enables richer UXs, where we are building a product experience around this core agentic capability. For instance, we want to listen to human feedback in the middle of agent execution, as opposed to only after the agent execution is complete for a given task. Then, we can create async queue, where we are able to listen to inputs from humans throughout the middle of agent execution. And if human input does come in, we can interrupt and modify the agent execution as it is going through a larger task, as opposed to having to wait until the agent task is complete.

So we will start by defining our agent again through `FunctionCallingAgentWorker` as well as the `AgentRunner` setup. And then we will start using the low level API. We will first create a task object from the user query. And then we will running through steps for event interjecting our own. 

Now let's try executing a single step of this task. So let's create a task for this agent. And we will use the same question we used in the first part of this lesson `Tell me about the agent roles in MetaGPT, and then how they comunicate with each other.` This will return a task object which contains the input as well as additional state in the task object. 

And now let's try executing a single step of this task. We will call `agent.run_step(task.task_id)`. And the agent will execute a step of that task through the task ID and give us back a step output. We will see that it calls the summary tool with the input `agent roles in MetaGPT`, which is a very first part of this question. And then it stops there.

In [10]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [11]:
task = agent.create_task(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

In [12]:
step_output = agent.run_step(task.task_id)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT include Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities and expertise within the collaborative framework to efficiently complete complex software development tasks. The Product Manager conducts business-oriented analysis, the Architect translates requirements into system design components, the Project Manager distributes tasks, the Engineer executes code, and the QA Engineer formulates test cases to ensure code quality. Additionally, there are other team members involved in experiments, comparisons, figure creation, and overall project advising within the MetaGPT framework.


When we inspect the logs and the output of the agent, we see that the first part was actually executed. So we call `agent.get_completed-steps()` on the task ID, and we are able to look at `Num completed for task`. We see that 1 step has been completed, and this is a current output so far.

In [13]:
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

Num completed for task b5bf95f0-7747-49c2-9bd3-8fca8d15228b: 1
The agent roles in MetaGPT include Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities and expertise within the collaborative framework to efficiently complete complex software development tasks. The Product Manager conducts business-oriented analysis, the Architect translates requirements into system design components, the Project Manager distributes tasks, the Engineer executes code, and the QA Engineer formulates test cases to ensure code quality. Additionally, there are other team members involved in experiments, comparisons, figure creation, and overall project advising within the MetaGPT framework.


We can also take a look at any upcoming steps for the agent through `agent.get_upcoming_steps()`. We pass the task ID into the agent, and we can print out the number of upcoming steps for the task. We see that it is also 1, and we are able to look at a task step object with a task ID and an existing input. This input is currently `None`, because the agent actually just auto-generates action from the conversation history, and it doesn't need to generate an additional external input. 

The nice thing about this debugging interface is that we can pause execution now if we want to. We can take the intermediate results without completing the agent flow.

In [14]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

Num upcoming steps for task b5bf95f0-7747-49c2-9bd3-8fca8d15228b: 1


TaskStep(task_id='b5bf95f0-7747-49c2-9bd3-8fca8d15228b', step_id='33980ddd-c5e3-4b41-9a8b-5e1605293aba', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)

Let's run the next 2 steps and actually try injecting user input. So let's ask `What about how agents share information?` as user input. This was not part of the original task query, but by injecting this, we can modify agent execution to give us back the result that we want. 

We see that we added the user message to memory, and that the next call here is `how agents share information in MetaGPT`. And we see from the function output that it is able to give back the response.

In [15]:
step_output = agent.run_step(
    task.task_id, input="What about how agents share information?"
)

Added user message to memory: What about how agents share information?
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents share information in MetaGPT"}
=== Function Output ===
Agents in MetaGPT share information through a structured communication protocol that includes a shared message pool. This pool allows agents to publish structured messages and subscribe to relevant messages based on their profiles. Additionally, agents can obtain directional information from other roles and public information from the environment. This structured communication interface enhances role communication efficiency within the framework. Agents also utilize a subscription mechanism based on their role-specific interests to extract relevant information, ensuring they receive only task-related information and avoid distractions from irrelevant details.


The overall task is roughly complete, and we just need to run one final step to synthesize the answer. To double check that this output is the last step, we just need to do `step_output.is_last`. 

So we are able to get back the answer about how agents and MetaGPT share information. And this is indeed the last step (ie. `is_last` returns `True` in the output).

In [16]:
step_output = agent.run_step(task.task_id)
print(step_output.is_last)

=== LLM Response ===
In MetaGPT, agents share information through a structured communication protocol that includes a shared message pool. This pool allows agents to publish structured messages and subscribe to relevant messages based on their profiles. Agents can obtain directional information from other roles and public information from the environment. This structured communication interface enhances role communication efficiency within the framework. Agents also use a subscription mechanism based on their role-specific interests to extract relevant information, ensuring they receive only task-related information and avoid distractions from irrelevant details.
True


To translate this into an agent response, we just have to call `response = agent.finalize_response()`, and we will get back the final answer.

In [17]:
response = agent.finalize_response(task.task_id)

In [18]:
print(str(response))

assistant: In MetaGPT, agents share information through a structured communication protocol that includes a shared message pool. This pool allows agents to publish structured messages and subscribe to relevant messages based on their profiles. Agents can obtain directional information from other roles and public information from the environment. This structured communication interface enhances role communication efficiency within the framework. Agents also use a subscription mechanism based on their role-specific interests to extract relevant information, ensuring they receive only task-related information and avoid distractions from irrelevant details.


So that's it for Lesson 3. We have learned both about the high level interface for an agent as well as a low level debugging interface.