# Building Agentic RAG with Llamaindex

[deeplearning.ai course link here](https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/1/introduction)

## Introduction
***

* Jerry Liu: co-founder and CEO of LlamaIndex and instructor of course.

* Will learn about agentic RAGs, a framework to help you build research agents, capable of doing reasoning, decision making over your data.

* Standard RAG pipeline is mostly good for simpler questions over a small set of documents, works by retrieving some context and sticking to an LLM prompt and calling it a single time to get a response.

* We will build an autonomous research agent. Will learn **a progression of reasoning ingredients** to build a full agent:

    * **Routing**: We add decision making to route requests to multiple tools
    * **Tool-use**: Create interface for agents to select a tool and generate the right arguments for that tool.
    * **Multi-step reasoning**: Use an LLM to perform reasoning with a range of tools for retaining memory throughout that process.


* Let user optionally inject guidance at intermediate steps (e.g. guiding when searching a document). Just like an experienced manager giving a junior employee a nudge to consider a new piece of info to achieve much better performance.

* **First lesson**: Build a router over a single document that can handle question-answering and summarization.

## Router Query Engine
***

Simplest form of agentic RAG. Given a query, router picks one of several query entrants (ways) to execute a query. Will build a single router for a single document to do question-answering and summarization.




![Router Engine](imgs/router_engine.png)

Load OpenAI API Key:

In [6]:
import os

# Accessing the environment variable
openai_api_key = os.getenv('OPENAI_API_KEY')

# Check if the variable is loaded properly
if openai_api_key is not None:
    print("OPENAI_API_KEY loaded successfully!")
else:
    print("Failed to load OPENAI_API_KEY. Please check if it is set correctly.")


OPENAI_API_KEY loaded successfully!


Jupyter runs an event loop behind the scenes, and a lot of our modules use async; to make async play nicely with Jupyter Notebooks we need this `nest_asyncio` package.



In [7]:
import nest_asyncio
nest_asyncio.apply()

Next is to load in a sample PDF document. Read PDF into a **parsed document representation**.

We will use the `SimpleDirectoryReader` module and *LlamaIndex* to read in this PDF into a parsed document representation.

In [8]:
import os
from llama_index.core import SimpleDirectoryReader

# Define your base directory
base_dir = 'data/pdfs'

# Define your file name
file_name = 'P1-mher-arabian.pdf'

file_path = os.path.join(base_dir, file_name)

print("The full path to the file is:", file_path)

# load documents
documents = SimpleDirectoryReader(input_files=[file_path]).load_data()

The full path to the file is: data/pdfs/P1-mher-arabian.pdf


Next we'll split documents into even size chunks using the `SentenceSplitter`, we'll split on the order of sentences. We set the chunk size to 1024. 

We call `splitter.get_nodes_from_documents()` to split these documents into **nodes**.

In [9]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [10]:
print(f"Type of nodes object is: {type(nodes[0])}")

Type of nodes object is: <class 'llama_index.core.schema.TextNode'>


In [11]:
print(f"The length of the nodes list is: {len(nodes)}")

The length of the nodes list is: 3


This next step allows us to define an LLM and embedding model. Can do this by specifying a global config setting where you specify the LLM and embedding model that you want to inject as part of the global config.

By default we use 3.5 turbo and text embedding ada-002 in this course. This allows you to have the groundwork to inject your own LLMs and embeddings.

In [12]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

Now we are ready to start building some indexes. Here we define two indexes, over these nodes:

* Summary Index
* Vector Index

Can think of an index as a set of of metadata over our data.  You can query an index, and different indexes will have different retrieval behaviors.



![Router Engine](imgs/vector_index_vs_summary_index.png)

### Vector Index

A **vector index** indexes nodes via text embedings and is a core abstraction in LlamaIndex and in any sort of RAG system. 

Querying a vector index will return the most similar nodes by embedding similarity.


![Router Engine](imgs/vector_embeddings.png)

### Summary Index

* Very simple index -  querying it will return all the nodes currently in the index. So it doesn't depend on the user query, but will return all the nodes currently in the index.





![Router Engine](imgs/summary_index.png)

### Setting Them Both Up

In [13]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

Now let's turn these indexes into **query engines** and then **query tools**:

* Each query engine represents an overall query interface over the data that's stored in that index. It **combines retrieval with LLM synthesis**.

* Each query engine is good for certain type of questions - great use case for a **router** which can route dynamically between these different query engines.

* A **query tool** is now just a query engine with metadata, specifically a description of what types of questions the tool can answer.

In [14]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

* Can see that the query engines are derived from each of the indexes.

* Can see we use `use_async=True` for the summary query engine to basically enforce faster query generation by allowing async capabilities.

Next a **query tool** is just a query engine with some metadata -specifically a description of what type of question(s) that tool can answer.

We'll define a **query tool** for both the *summary* and *vector* query engines.

In [15]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to `Assignment P1 CS 6675: Advanced Internet Systems and Applications report.`"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from `Assignment P1 CS 6675: Advanced Internet Systems and Applications report.`"
    ),
)


### Selectors

Now ready to define our **Router**. LlamaIndex provides several different types **Selectors** to enable you to build a router, each of these selectors have distinct attributes.

There are several selectors available:

* The **LLM selectors** use the LLM to output a JSON that is parsed, and the corresponding indexes are queried.

* The **Pydantic selectors** use the OpenAl Function
Calling API to produce pydantic selection objects, rather than parsing raw JSON.



For each of these kinds of selectors, also have dynamic capabilities to select one index to route to, or actually multiple.

Let's try an LLM powered single selector - called `LLMSingleSelector`

In [16]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

We import two modules:
* **RouterQueryEngine**
* **LLMSingleSelector**

`RouterQueryEngine` takes in **selector type** as well as a set of **query engine tools**.

Let's test out some queries.

In [17]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: The summary of the document is related to summarization questions, making choice 1 the most relevant..
[0mThe document discusses various Internet-fueled innovations such as cloud computing, smart-watches, and community-based chat platforms. It highlights how these innovations automate tasks for users and involve cloud computing in some way. Additionally, it explains the differences between smart-watches and other innovations, emphasizing the unique hardware aspect of smart-watches. The document also delves into the concepts of surface web and deep web activities, explaining the challenges of crawling dynamic URLs in the deep web. Lastly, it touches upon limitations of a specific crawler project and proposes strategies for designing a crawler for a university website.


Verbose output allows us to view the intermediate steps that have been taken. We see that the output includes:

* Selecting query engine 0: Useful for summarization questions. This means the first option was picked to help answer this question. As a result - you are able to get back a response.

The response comes with **sources**. Let's take a look using `response.sources_nodes`:

In [18]:
print(len(response.source_nodes))

3


Exactly equal to the number of chunks of the entire document. We see the summary query engine must have been getting call - because the **summary query engine returns all the chunks corresponding to the items within its index**.

Let's take a look at another example:

In [19]:
response = query_engine.query(
    "Who write this paper?"
)
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it focuses on retrieving specific context from the report, which includes information about the author..
[0mMher Arabian.


* Used vector search tool, as opposed to the summary tool.

* Here focus is on retrieving specific context from the PDF - especially when information is located within a paragraph of the document.

**Puting everything together**: 
* Single helper function that takes in a file path and builds a router query engine with both vector search and summarization over it.
* See `get_router_query_engine` in *utils.py* file.

In [20]:
from utils import get_router_query_engine

# Load paper "METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK"
file_path = os.path.join('data/pdfs', 'metagpt.pdf')
query_engine = get_router_query_engine(file_path)

In [21]:
response = query_engine.query("Tell me about the ablation study results?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: The ablation study results are specific context from the MetaGPT paper, making choice 2 the most relevant..
[0mThe ablation study results demonstrate the effectiveness of MetaGPT in addressing challenges related to context utilization, reducing hallucinations in software generation, and managing information overload. The study highlights how MetaGPT's unique designs successfully unfold natural language descriptions accurately, maintain information validity in lengthy contexts, and reduce code hallucination problems. Additionally, the study showcases the use of a global message pool and subscription mechanism to streamline communication and filter out irrelevant information, ultimately enhancing the relevance and utility of the generated information.


In [22]:
response = query_engine.query("Summarize the paper for me.")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: Useful for summarization questions related to MetaGPT.
[0mThe paper introduces MetaGPT, a meta-programming framework that utilizes Standardized Operating Procedures (SOPs) to enhance multi-agent systems based on Large Language Models (LLMs). MetaGPT focuses on improving code generation quality through role specialization, workflow management, and efficient communication mechanisms. It outperforms existing approaches in various benchmarks, demonstrating its effectiveness in software development tasks. The framework uses natural language programming to generate software code by involving different agents like the Architect, Engineer, and QA Engineer. MetaGPT aims to simplify the process of transforming abstract requirements into detailed designs by efficiently dividing tasks. The paper also addresses challenges such as reducing code hallucinations and handling information overload, while discussing ethical concerns like skill obsolescence, transp

## Tool Calling
***




### How it works

* In a basic RAG pipeline, LLMs are only use for synthesis. The previous lesson showed you how to use LLMs to make a decision by picking a choice of different pipelines.

* This is a simplified form of tool calling. In this lesson we will learn how to use an LLM to not only pick a function to execute, but also infer an argument to pass to the function.

One of the Promises of LLMs:

* Ability to take actions/interact with external environment. For this, we need a good interface for the LLMs to use.

* We call this **Tool Calling**.


* In previous lesson we saw how to use LLMs in a slightly more sophisticated matter than just synthesis. By using it to pick the best query pipeline to answer the user query. In this lesson we will learn how to use LLMs to not only pick a function to execute - but also infer an argument to pass to that function.

* Allows LLM to figure out how to use a vector DB, instead of just consuming its outputs.

* Tool calling enables LLMs to interact with external environments through a dynamic interface where tool calling not only helps choosing the appropriate tool but also infer necessary arguments for execution.


* Tool calling adds a layer of query understanding on top of a RAG pipeline, enable users to ask complex queries and get
back more precise results.


* Final result: Users are able to ask more questions, and get back more precise results than standard RAG techniques thru tool calling.

Let's see how to define a tool interface from a python function. LLM will infer the parameters from the signature of the python function using LlamaIndex abstractions.

Let's see how Tool Calling works using two toy calculator functions:

In [23]:
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def mystery(x: int, y: int) -> int:
    """Mystery function that operates on top of two numbers."""
    return (x + y) * (x + y)

add_tool = FunctionTool.from_defaults(fn=add)
mystery_tool = FunctionTool.from_defaults(fn=mystery)

Core abstraction is **FunctionTool**, wraps any given python function you feed it, here it takes in both the add function as well as the mystery function.

You see both have type annotations for both x,y variables and docstrings. These things are not just for stylistic purposes, they will actually be used as a prompt for the LLM.

LlamaIndex FunctionTools integrate natively with the function calling capabilities of many different LLMs.

To pass the tools to an LLM, have to import the LLM module and call `predict_and_call`.


Here we import the OpenAI module explicitly, and we see the model is 3.5 turbo - we call the `predict_and_call` function on top of the LLM.

What predict and call does:
* Takes in a set of tools, as well as an input prompt string, or a series of chat messages, and then its able to both make a decision of the tool to call, as well as call the tool itself and get back the final response.

In [24]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
response = llm.predict_and_call(
    [add_tool, mystery_tool], 
    "Tell me the output of the mystery function on 2 and 9", 
    verbose=True
)
print(str(response))

=== Calling Function ===
Calling function: mystery with args: {"x": 2, "y": 9}
=== Function Output ===
121
121


In [25]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
response = llm.predict_and_call(
    [add_tool, mystery_tool], 
    "Tell me the output of add function with 10 and 1", 
    verbose=True
)
print(str(response))

=== Calling Function ===
Calling function: add with args: {"x": 10, "y": 1}
=== Function Output ===
11
11


We see intermediate steps here - calling function mystery with the arguments x=2 and y=9. We see called the right tools, and inferred the right parameters. The output is 121.

**Note**: This simple example is effectively an expanded version of the router. Not only does the LLM pick the tool but also decides what params to give it.

Let's define a slightly more sophisticated agentic layer on top of vector search. Not only can the LLM choose vector search, but **we can also get it to infer metadata filters** - which is a structured list of tags that helps return a more precised set of search results.

Let's pay attention to the actual nodes themselves - the chunks. Because we'll take a look at the actual metadata attached to these chunks.



Again we will use the `SimpleDirectoryReader` module to load in the parsed reprentation of this PDF file.

In [26]:
from llama_index.core import SimpleDirectoryReader
# load documents
file_path = os.path.join('data/pdfs', "metagpt.pdf")
documents = SimpleDirectoryReader(input_files=[file_path]).load_data()

Next will split these documents into a set of even chunks, with a chunk size of 1024.

In [27]:
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [28]:
len(nodes)

34

Each node represents a chunk, let's look at content of an example chunk:

In [29]:
# "all" is a special setting - not just enable to print content of node
# but also the metadata attached to the doc, which is propagated to every node
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: data/pdfs/metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-06-07
last_modified_date: 2024-06-07

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks,

In [30]:
# all is a special setting - not just enable to print content of node
# but also the metadata attached to the doc, which is propagated to every node
print(nodes[4].get_content())

Preprint
Figure 2: An example of the communication protocol (left) and iterative programming with exe-
cutable feedback (right). Left: Agents use a shared message pool to publish structured messages.
They can also subscribe to relevant messages based on their profiles. Right : After generating the
initial code, the Engineer agent runs and checks for errors. If errors occur, the agent checks past
messages stored in memory and compares them with the PRD, system design, and code files.
3 M ETAGPT: A M ETA-PROGRAMMING FRAMEWORK
MetaGPT is a meta-programming framework for LLM-based multi-agent systems. Sec. 3.1 pro-
vides an explanation of role specialization, workflow and structured communication in this frame-
work, and illustrates how to organize a multi-agent system within the context of SOPs. Sec. 3.2
presents a communication protocol that enhances role communication efficiency. We also imple-
ment structured communication interfaces and an effective publish-subscribe mechanism. These


Notice how it added a `page_label` annotation to each chunk.

Next let's define a **vector store** index over these nodes:

In [31]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes)
query_engine = vector_index.as_query_engine(similarity_top_k=2)

### Querying RAG

This will build a RAG indexing pipeline over these nodes. Will add an embedding for each node, and it will get back a query engine.

Differently from last time, we can try querying this RAG pipeline via metadata filters.

In [32]:
from llama_index.core.vector_stores import MetadataFilters

query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "2"}
        ]
    )
)

response = query_engine.query(
    "What are some of the high-level results of MetaGPT?"
)

In [33]:
print(str(response))

MetaGPT achieves a new state-of-the-art (SoTA) in code generation benchmarks with 85.9% and 87.7% in Pass@1. It stands out in handling higher levels of software complexity and offering extensive functionality. Additionally, MetaGPT demonstrates a 100% task completion rate in experimental evaluations, showcasing its robustness and efficiency in terms of time and token costs.


In [34]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'data/pdfs/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-06-07', 'last_modified_date': '2024-06-07'}


### Enhanced Data retrieval

* Integrating Metadata Filters into a retrieval tool function.

* This function enables more precise retrieval retrieval by accepting a query string and optional metadata filters, such as page numbers.

* The LLM can intelligently infer relevant metadata filters (e.g., page numbers) based on the user's query.

* You can define different type of metadata filters like section IDs, headers, or footers,...

Function below takes in a query as well as page numbers. This allows you to perform a vector search over an index, along with specifying page numbers as a metadata filter.

At the very end we define a vector query tool. We pass in the vector query function into the vector query tool - which allows us to then use it with a language model.

In [35]:
from typing import List
from llama_index.core.vector_stores import FilterCondition


def vector_query(
    query: str, 
    page_numbers: List[str]
) -> str:
    """Perform a vector search over an index.
    
    query (str): the string query to be embedded.
    page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
        over all pages. Otherwise, filter by the set of specified pages.
    
    """

    metadata_dicts = [
        {"key": "page_label", "value": p} for p in page_numbers
    ]
    
    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts,
            condition=FilterCondition.OR
        )
    )
    response = query_engine.query(query)
    return response
    

vector_query_tool = FunctionTool.from_defaults(
    name="vector_tool",
    fn=vector_query
)

Let's call this tool with an LLM. It should be able to infer both the string as well as the metadata filters.

In [36]:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

response = llm.predict_and_call(
    [vector_query_tool],
    "What are the high-level results of MetaGPT as described in page 2",
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "high-level results of MetaGPT", "page_numbers": ["2"]}
=== Function Output ===
MetaGPT achieves a new state-of-the-art (SoTA) in code generation benchmarks with 85.9% and 87.7% in Pass@1. It stands out in handling higher levels of software complexity and offering extensive functionality, demonstrating a 100% task completion rate in experimental evaluations.


LLM formulates the right query, as well as specify the page numbers. We get back the correct answer.

Let's verify the source nodes:

In [37]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'data/pdfs/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-06-07', 'last_modified_date': '2024-06-07'}


Finally, we can bring in the summary tool from the router example of the first lesson and we can combine that with the vector tool, to create this **overall tool picking system**.

This code sets up a summary index over the same set of nodes and wraps it in a summary tool similar to lesson 1.

In [38]:
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool


summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
summary_tool = QueryEngineTool.from_defaults(
    name="summary_tool",
    query_engine=summary_query_engine,
    description=(
        "Useful if you want to get a summary of MetaGPT"
    ),
)

In [39]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What are the MetaGPT comparisons with ChatDev described on page 8?", 
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "MetaGPT comparisons with ChatDev", "page_numbers": ["8"]}
=== Function Output ===
MetaGPT outperforms ChatDev on the challenging SoftwareDev dataset in nearly all metrics. For example, MetaGPT achieves a higher score in executability, takes less time for execution, requires more tokens for code generation but needs fewer tokens to generate one line of code compared to ChatDev. Additionally, MetaGPT demonstrates superior performance in code statistics and the cost of human revision when compared to ChatDev.


Now LLM has slightly harder task - must pick the right tool in addition to inferring the function parameters.

Let's verify this by printing out the sources:

In [40]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '8', 'file_name': 'metagpt.pdf', 'file_path': 'data/pdfs/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-06-07', 'last_modified_date': '2024-06-07'}


Let's now ask a question, to show that the LLM can still pick the summary tool when necessary:

In [41]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What is a summary of the paper?", 
    verbose=True
)

=== Calling Function ===
Calling function: summary_tool with args: {"input": "summary"}
=== Function Output ===
MetaGPT is a meta-programming framework that utilizes Standardized Operating Procedures (SOPs) to enhance multi-agent systems based on Large Language Models (LLMs). It introduces role specialization, structured communication interfaces, and executable feedback mechanisms to improve code generation quality. In experiments, MetaGPT surpassed previous approaches in various benchmarks, showcasing its effectiveness in software development tasks. The framework also introduces innovative concepts like self-improvement mechanisms and multi-agent economies for future research and development. The Architect agent devises technical specifications based on the Product Requirement Document (PRD), while the Project Manager breaks down tasks and assigns them to Engineers. The Engineer agent requires fundamental development skills, and the QA Engineer generates unit test code to ensure high-

Next lesson: How to build a full agent over a document.

## Building an Agent Reasoning Loop
***

In this lesson, will learn how to define a complete agent reasoning loop. Instead of tool calling in a single-shot setting, an agent is able to reason over tools in multiple steps.

Will use the function calling agent implementation which is an agent that natively integrates with the function calling capabilities of LLMs.

Let's have some fun...

In [64]:
import os

# Accessing the environment variable
openai_api_key = os.getenv('OPENAI_API_KEY')

# Check if the variable is loaded properly
if openai_api_key is not None:
    print("OPENAI_API_KEY loaded successfully!")
else:
    print("Failed to load OPENAI_API_KEY. Please check if it is set correctly.")


OPENAI_API_KEY loaded successfully!


In [65]:
# necessary setup to run in notebook environment
import nest_asyncio
nest_asyncio.apply()

Let's also setup the autoretrieval vector seach tool and summarization tool from the last lesson.

Made it easy with a function in the utils file. Check it out.

In [68]:
from utils import get_doc_tools

# make sure you have a valid pdf file in this directory
file_path = os.path.join('data/pdfs', 'metagpt.pdf')

vector_tool, summary_tool = get_doc_tools(file_path, "metagpt")

### High-level Interface

Let's setup our function calling agent.


In LlamaIndex, an agent consists of two main components:
1. **Agent Worker**: Responsible for executing the next step of a given agent.
2. **Agent Runner**: Overall task dispatcher, responsible for creating a task, orchestrating runs of agent workers on top of a given task, and being able to return back the final response to the user.


![Agent Intro](imgs/agent_intro.png)

In [45]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)
agent = AgentRunner(agent_worker)

`FunctionCallingAgentWorker` primary responsibility as given the existing converstion history, memory, and any past state, along with the current user input: use function calling to decide the next tool to call, call that tool and decide whether or not to return a final response.

The overall agent interface is behind the agent runner, and that's what we're gonna use to query the agent.

In [46]:
response = agent.query(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT include the Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities and tasks assigned to them within the collaborative framework to streamline the software development workflow and ensure successful project completion. The Product Manager focuses on requirements and user stories, the Architect designs the system architecture, the Project Manager handles task allocation, the Engineer implements the code, and the QA Engineer ensures software quality through testing and bug fixing.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents communicate with each other in MetaGPT"}
=== Function Output ===
Agents in MetaGP

When you run a multi-step query like this. You want to make sure that you're actually able to trace the sources. 

So luckily, similar to previous lessons, can look at `response.source_nodes`. Take a look at the content of these nodes:

In [50]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

page_label: 7
file_name: metagpt.pdf
file_path: data/pdfs/metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-06-07
last_modified_date: 2024-06-07

Preprint
wareDev: (1) HumanEval includes 164 handwritten programming tasks. These tasks encompass
function specifications, descriptions, reference codes, and tests. (2) MBPP consists of 427 Python
tasks. These tasks cover core concepts and standard library features and include descriptions, ref-
erence codes, and automated tests. (3) Our SoftwareDev dataset is a collection of 70 representa-
tive examples of software development tasks, each with its own task prompt (see Table 8). These
tasks have diverse scopes (See Figure 5), such as mini-games, image processing algorithms, data
visualization. They offer a robust testbed for authentic development tasks. Contrary to previous
datasets (Chen et al., 2021a; Austin et al., 2021), SoftwareDev focuses on the engineering aspects.
In the comparisons, we randomly select sev

![Router Engine](imgs/full_agent_reasoning_loop.png)



Calling `agent.query()` allows you to query the agent in a one-off manner, but does not preserve state. 

So now let's try maintaining conversation history over time.

The agent is able to maintain chats in a conversational memory buffer.

The memory module can be customized - by default is a flat list of items that's a rolling buffer depending on the size of the context window of the LLM.

Therfore, when the agent decides to use a tool, it not only uses a current chat, but also previously convo history to take the next step/perform next action.



In [48]:
response = agent.chat(
    "Tell me about the evaluation datasets used."
)

Added user message to memory: Tell me about the evaluation datasets used.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation datasets used in MetaGPT"}
=== Function Output ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and a self-generated SoftwareDev dataset. The HumanEval dataset consists of 164 handwritten programming tasks, while MBPP consists of 427 Python tasks. The SoftwareDev dataset comprises 70 representative software development tasks covering various scopes like mini-games, image processing algorithms, and data visualization.
=== LLM Response ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and a self-generated SoftwareDev dataset. HumanEval consists of 164 handwritten programming tasks, MBPP includes 427 Python tasks, and the SoftwareDev dataset comprises 70 representative software development tasks covering various scopes.


Now let's ask a follow-up question.

In [49]:
response = agent.chat(
    "Tell me the results over one of the above datasets."
)

Added user message to memory: Tell me the results over one of the above datasets.
=== Calling Function ===
Calling function: vector_tool_metagpt with args: {"query": "results over HumanEval dataset", "page_numbers": ["7"]}
=== Function Output ===
MetaGPT achieved a Pass rate of 85.9% and 87.7% over the HumanEval dataset.
=== LLM Response ===
MetaGPT achieved a Pass rate of 85.9% and 87.7% over the HumanEval dataset.


### Low-level Interface

Just provided a nice, high-level interface for interacting with an agent.

Next section will show you capabilities that let you step through and control the agent in a much more granular fashion. Not only allows you to create a higher level research assistant over your RAG pipelines, but also debug/control it.

![Router Engine](imgs/agent_control.png)



Having this low-level agent interace is powerful for two main reasons:
1. **Allows developers of agents to have greater transparency/visibility into what's actually going on under the hood**, especially if the agent isn't working the first time around, can go in, trace thru execution of the agent, see where it is failing, try out different inputs to if that actually modifies the agent execution into a correct response

2. **Enable useful UX's when building a product experience around this core agentic capability**. For instance, let's say you want to listen to human feedback in the middle of agent execution, as opposed to only after the agent execution is complete for a given task. Then can imagine creating some sort of async queue, where you're able to listen to inputs from humans throughout the middle of agent execution and if human input comes in, can actually interrupt and modify the execution of an agent as it's going on thru a larger task, as opposed to having to wait until the agent's task is complete.

In [52]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)
agent = AgentRunner(agent_worker)

Let's start using the low-level API. First, will create a task object from the user query, and then will start running thru steps or even interjecting our own. 

Let's try executing a single step of this task:

In [54]:
task = agent.create_task(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

We created a task for this agnet, this will return a **task object** which contains the input as well **additional state in the task object**.

Now let's try executing a single step of this task:

In [55]:
step_output = agent.run_step(task.task_id)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT are Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities within the software development process. The Product Manager focuses on requirements analysis and documentation, the Architect designs the system architecture and interfaces, the Project Manager allocates tasks, the Engineer implements the code based on specifications, and the QA Engineer ensures code quality through testing and bug fixing. These roles work together in a structured workflow to successfully complete complex software development projects within the MetaGPT framework.


The agent executes a step of this task through the task ID and give you back a step output.

It calls the summary tool with the input: "agent roles in MetaGPT", which is the very first part of this question. And then it stops there.

When we inspect the logs, and the output of the agent, we see that the first part was actually executed. So we call `agent.get_completed_steps()` with the *task_id*, and we're able to look at the num completed for the task:


In [56]:
completed_steps = agent.get_completed_steps(task.task_id)

print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

Num completed for task e2612548-b67d-4c38-9448-8fa2c58c31f8: 1
The agent roles in MetaGPT are Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities within the software development process. The Product Manager focuses on requirements analysis and documentation, the Architect designs the system architecture and interfaces, the Project Manager allocates tasks, the Engineer implements the code based on specifications, and the QA Engineer ensures code quality through testing and bug fixing. These roles work together in a structured workflow to successfully complete complex software development projects within the MetaGPT framework.


We see that one step has been completed and this is the current output so far.

We can also take a look at any upcoming steps for the agent thru `agent.get_upcoming_steps()` and passing the *task_id*.

In [57]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

Num upcoming steps for task e2612548-b67d-4c38-9448-8fa2c58c31f8: 1


TaskStep(task_id='e2612548-b67d-4c38-9448-8fa2c58c31f8', step_id='d1c8481c-24a2-4f89-8d3d-d8c3e03841a3', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)

We see it's also 1, and we're able to look at a `TaskStep` object with a *task id*, as well as an existing input. This input is currently `None` because the way the agent works is it actually just **autogenerates the action from the conversation history**, and doesn't need to generate an additional external input.

The nice thing about this debugging interface is that if you wanted to pause execution now, you can. You can take the intermediate results without completing the agent flow. 

Let's run the next two steps and actually try and drafting user input:

In [59]:
step_output = agent.run_step(
    task.task_id, input="What about how agents share information?"
)

Added user message to memory: What about how agents share information?
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents share information in MetaGPT"}
=== Function Output ===
Agents in MetaGPT share information through a structured communication protocol that includes structured communication interfaces and a publish-subscribe mechanism. This mechanism allows agents to exchange structured messages in a shared message pool, enabling transparent access to information from other agents. Agents can subscribe to specific information based on their role profiles, ensuring efficient communication and task-related information dissemination. The global message pool centralizes information exchange, and the subscription mechanism filters out irrelevant data, ensuring that agents receive only the most relevant and useful information. This structured approach helps streamline communication and enhance the effectiveness of information sharing among ag

Not part of the original task query, but by injecting this, can actually modify agent execution to give you back the result that you want.

We'll see that we added the *user message* to **memory**. Next call is "how agents share information in MetaGPT".

We see here it's able to give back a response. Overall task is complete, just need to run 1 final step to synthesize the answer, and to double check that this output is the last step:

In [60]:
step_output = agent.run_step(task.task_id)

=== LLM Response ===
Agents in MetaGPT share information through a structured communication protocol that includes structured communication interfaces and a publish-subscribe mechanism. This mechanism allows agents to exchange structured messages in a shared message pool, enabling transparent access to information from other agents. Agents can subscribe to specific information based on their role profiles, ensuring efficient communication and task-related information dissemination. The global message pool centralizes information exchange, and the subscription mechanism filters out irrelevant data, ensuring that agents receive only the most relevant and useful information. This structured approach helps streamline communication and enhance the effectiveness of information sharing among agents in MetaGPT.


In [61]:
print(step_output.is_last)

True


We get back the answer, this is the last step. To translate this into an **agent response**, similar to what we've seen in some of the previous notebook cells, then all you have you do is call:

In [62]:
response = agent.finalize_response(task.task_id)

In [63]:
print(str(response))

Agents in MetaGPT share information through a structured communication protocol that includes structured communication interfaces and a publish-subscribe mechanism. This mechanism allows agents to exchange structured messages in a shared message pool, enabling transparent access to information from other agents. Agents can subscribe to specific information based on their role profiles, ensuring efficient communication and task-related information dissemination. The global message pool centralizes information exchange, and the subscription mechanism filters out irrelevant data, ensuring that agents receive only the most relevant and useful information. This structured approach helps streamline communication and enhance the effectiveness of information sharing among agents in MetaGPT.


That's it! You've learned about: The high-level interface for an agent, as well as a low-level debugging interface.

In the next lesson, we'll learn how to build an agent over multiple documents.