# Building Agentic RAG with Llamaindex

[deeplearning.ai course link here](https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/1/introduction)

## Introduction
***

* Jerry Liu: co-founder and CEO of LlamaIndex and instructor of course.

* Will learn about agentic RAGs, a framework to help you build research agents, capable of doing reasoning, decision making over your data.

* Standard RAG pipeline is mostly good for simpler questions over a small set of documents, works by retrieving some context and sticking to an LLM prompt and calling it a single time to get a response.

* We will build an autonomous research agent. Will learn **a progression of reasoning ingredients** to build a full agent:

    * **Routing**: We add decision making to route requests to multiple tools
    * **Tool-use**: Create interface for agents to select a tool and generate the right arguments for that tool.
    * **Multi-step reasoning**: Use an LLM to perform reasoning with a range of tools for retaining memory throughout that process.


* Let user optionally inject guidance at intermediate steps (e.g. guiding when searching a document). Just like an experienced manager giving a junior employee a nudge to consider a new piece of info to achieve much better performance.

* **First lesson**: Build a router over a single document that can handle question-answering and summarization.

## Router Query Engine
***

Simplest form of agentic RAG. Given a query, router picks one of several query entrants (ways) to execute a query. Will build a single router for a single document to do question-answering and summarization.




![Router Engine](imgs/router_engine.png)

Load OpenAI API Key:

In [56]:
import os

# Accessing the environment variable
openai_api_key = os.getenv('OPENAI_API_KEY')

# Check if the variable is loaded properly
if openai_api_key is not None:
    print("OPENAI_API_KEY loaded successfully!")
else:
    print("Failed to load OPENAI_API_KEY. Please check if it is set correctly.")


OPENAI_API_KEY loaded successfully!


Jupyter runs an event loop behind the scenes, and a lot of our modules use async; to make async play nicely with Jupyter Notebooks we need this `nest_asyncio` package.



In [57]:
import nest_asyncio
nest_asyncio.apply()

Next is to load in a sample PDF document. Read PDF into a **parsed document representation**.

We will use the `SimpleDirectoryReader` module and *LlamaIndex* to read in this PDF into a parsed document representation.

In [58]:
import os
from llama_index.core import SimpleDirectoryReader

# Define your base directory
base_dir = 'data/pdfs'

# Define your file name
file_name = 'P1-mher-arabian.pdf'

file_path = os.path.join(base_dir, file_name)

print("The full path to the file is:", file_path)

# load documents
documents = SimpleDirectoryReader(input_files=[file_path]).load_data()

The full path to the file is: data/pdfs/P1-mher-arabian.pdf


Next we'll split documents into even size chunks using the `SentenceSplitter`, we'll split on the order of sentences. We set the chunk size to 1024. 

We call `splitter.get_nodes_from_documents()` to split these documents into **nodes**.

In [59]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [60]:
print(f"Type of nodes object is: {type(nodes[0])}")

Type of nodes object is: <class 'llama_index.core.schema.TextNode'>


In [61]:
print(f"The length of the nodes list is: {len(nodes)}")

The length of the nodes list is: 3


This next step allows us to define an LLM and embedding model. Can do this by specifying a global config setting where you specify the LLM and embedding model that you want to inject as part of the global config.

By default we use 3.5 turbo and text embedding ada-002 in this course. This allows you to have the groundwork to inject your own LLMs and embeddings.

In [62]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

Now we are ready to start building some indexes. Here we define two indexes, over these nodes:

* Summary Index
* Vector Index

Can think of an index as a set of of metadata over our data.  You can query an index, and different indexes will have different retrieval behaviors.



![Router Engine](imgs/vector_index_vs_summary_index.png)

### Vector Index

A **vector index** indexes nodes via text embedings and is a core abstraction in LlamaIndex and in any sort of RAG system. 

Querying a vector index will return the most similar nodes by embedding similarity.


![Router Engine](imgs/vector_embeddings.png)

### Summary Index

* Very simple index -  querying it will return all the nodes currently in the index. So it doesn't depend on the user query, but will return all the nodes currently in the index.





![Router Engine](imgs/summary_index.png)

### Setting Them Both Up

In [13]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

Now let's turn these indexes into **query engines** and then **query tools**:

* Each query engine represents an overall query interface over the data that's stored in that index. It **combines retrieval with LLM synthesis**.

* Each query engine is good for certain type of questions - great use case for a **router** which can route dynamically between these different query engines.

* A **query tool** is now just a query engine with metadata, specifically a description of what types of questions the tool can answer.

In [63]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

NameError: name 'summary_index' is not defined

* Can see that the query engines are derived from each of the indexes.

* Can see we use `use_async=True` for the summary query engine to basically enforce faster query generation by allowing async capabilities.

Next a **query tool** is just a query engine with some metadata -specifically a description of what type of question(s) that tool can answer.

We'll define a **query tool** for both the *summary* and *vector* query engines.

In [15]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to `Assignment P1 CS 6675: Advanced Internet Systems and Applications report.`"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from `Assignment P1 CS 6675: Advanced Internet Systems and Applications report.`"
    ),
)


### Selectors

Now ready to define our **Router**. LlamaIndex provides several different types **Selectors** to enable you to build a router, each of these selectors have distinct attributes.

There are several selectors available:

* The **LLM selectors** use the LLM to output a JSON that is parsed, and the corresponding indexes are queried.

* The **Pydantic selectors** use the OpenAl Function
Calling API to produce pydantic selection objects, rather than parsing raw JSON.



For each of these kinds of selectors, also have dynamic capabilities to select one index to route to, or actually multiple.

Let's try an LLM powered single selector - called `LLMSingleSelector`

In [16]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

We import two modules:
* **RouterQueryEngine**
* **LLMSingleSelector**

`RouterQueryEngine` takes in **selector type** as well as a set of **query engine tools**.

Let's test out some queries.

In [17]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: The summary of the document is related to summarization questions, making choice 1 the most relevant..
[0mThe document discusses various Internet-fueled innovations such as cloud computing, smart-watches, and community-based chat platforms. It highlights how these innovations automate tasks for users and involve cloud computing in some way. Additionally, it explains the differences between smart-watches and other innovations, emphasizing the unique hardware aspect of smart-watches. The document also delves into the concepts of surface web and deep web activities, explaining the challenges of crawling dynamic URLs in the deep web. Lastly, it touches upon limitations of a specific crawler project and proposes strategies for designing a crawler for a university website.


Verbose output allows us to view the intermediate steps that have been taken. We see that the output includes:

* Selecting query engine 0: Useful for summarization questions. This means the first option was picked to help answer this question. As a result - you are able to get back a response.

The response comes with **sources**. Let's take a look using `response.sources_nodes`:

In [18]:
print(len(response.source_nodes))

3


Exactly equal to the number of chunks of the entire document. We see the summary query engine must have been getting call - because the **summary query engine returns all the chunks corresponding to the items within its index**.

Let's take a look at another example:

In [19]:
response = query_engine.query(
    "Who write this paper?"
)
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it focuses on retrieving specific context from the report, which includes information about the author..
[0mMher Arabian.


* Used vector search tool, as opposed to the summary tool.

* Here focus is on retrieving specific context from the PDF - especially when information is located within a paragraph of the document.

**Puting everything together**: 
* Single helper function that takes in a file path and builds a router query engine with both vector search and summarization over it.
* See `get_router_query_engine` in *utils.py* file.

In [20]:
from utils import get_router_query_engine

# Load paper "METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK"
file_path = os.path.join('data/pdfs', 'metagpt.pdf')
query_engine = get_router_query_engine(file_path)

In [21]:
response = query_engine.query("Tell me about the ablation study results?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: The ablation study results are specific context from the MetaGPT paper, making choice 2 the most relevant..
[0mThe ablation study results demonstrate the effectiveness of MetaGPT in addressing challenges related to context utilization, reducing hallucinations in software generation, and managing information overload. The study highlights how MetaGPT's unique designs successfully unfold natural language descriptions accurately, maintain information validity in lengthy contexts, and reduce code hallucination problems. Additionally, the study showcases the use of a global message pool and subscription mechanism to streamline communication and filter out irrelevant information, ultimately enhancing the relevance and utility of the generated information.


In [22]:
response = query_engine.query("Summarize the paper for me.")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: Useful for summarization questions related to MetaGPT.
[0mThe paper introduces MetaGPT, a meta-programming framework that utilizes Standardized Operating Procedures (SOPs) to enhance multi-agent systems based on Large Language Models (LLMs). MetaGPT focuses on improving code generation quality through role specialization, workflow management, and efficient communication mechanisms. It outperforms existing approaches in various benchmarks, demonstrating its effectiveness in software development tasks. The framework uses natural language programming to generate software code by involving different agents like the Architect, Engineer, and QA Engineer. MetaGPT aims to simplify the process of transforming abstract requirements into detailed designs by efficiently dividing tasks. The paper also addresses challenges such as reducing code hallucinations and handling information overload, while discussing ethical concerns like skill obsolescence, transp

## Tool Calling
***




### How it works

* In a basic RAG pipeline, LLMs are only use for synthesis. The previous lesson showed you how to use LLMs to make a decision by picking a choice of different pipelines.

* This is a simplified form of tool calling. In this lesson we will learn how to use an LLM to not only pick a function to execute, but also infer an argument to pass to the function.

One of the Promises of LLMs:

* Ability to take actions/interact with external environment. For this, we need a good interface for the LLMs to use.

* We call this **Tool Calling**.


* In previous lesson we saw how to use LLMs in a slightly more sophisticated matter than just synthesis. By using it to pick the best query pipeline to answer the user query. In this lesson we will learn how to use LLMs to not only pick a function to execute - but also infer an argument to pass to that function.

* Allows LLM to figure out how to use a vector DB, instead of just consuming its outputs.

* Tool calling enables LLMs to interact with external environments through a dynamic interface where tool calling not only helps choosing the appropriate tool but also infer necessary arguments for execution.


* Tool calling adds a layer of query understanding on top of a RAG pipeline, enable users to ask complex queries and get
back more precise results.


* Final result: Users are able to ask more questions, and get back more precise results than standard RAG techniques thru tool calling.

Let's see how to define a tool interface from a python function. LLM will infer the parameters from the signature of the python function using LlamaIndex abstractions.

Let's see how Tool Calling works using two toy calculator functions:

In [4]:
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def mystery(x: int, y: int) -> int:
    """Mystery function that operates on top of two numbers."""
    return (x + y) * (x + y)

add_tool = FunctionTool.from_defaults(fn=add)
mystery_tool = FunctionTool.from_defaults(fn=mystery)

Core abstraction is **FunctionTool**, wraps any given python function you feed it, here it takes in both the add function as well as the mystery function.

You see both have type annotations for both x,y variables and docstrings. These things are not just for stylistic purposes, they will actually be used as a prompt for the LLM.

LlamaIndex FunctionTools integrate natively with the function calling capabilities of many different LLMs.

To pass the tools to an LLM, have to import the LLM module and call `predict_and_call`.


Here we import the OpenAI module explicitly, and we see the model is 3.5 turbo - we call the `predict_and_call` function on top of the LLM.

What predict and call does:
* Takes in a set of tools, as well as an input prompt string, or a series of chat messages, and then its able to both make a decision of the tool to call, as well as call the tool itself and get back the final response.

In [5]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
response = llm.predict_and_call(
    [add_tool, mystery_tool], 
    "Tell me the output of the mystery function on 2 and 9", 
    verbose=True
)
print(str(response))

=== Calling Function ===
Calling function: mystery with args: {"x": 2, "y": 9}
=== Function Output ===
121
121


In [6]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
response = llm.predict_and_call(
    [add_tool, mystery_tool], 
    "Tell me the output of add function with 10 and 1", 
    verbose=True
)
print(str(response))

=== Calling Function ===
Calling function: add with args: {"x": 10, "y": 1}
=== Function Output ===
11
11


We see intermediate steps here - calling function mystery with the arguments x=2 and y=9. We see called the right tools, and inferred the right parameters. The output is 121.

**Note**: This simple example is effectively an expanded version of the router. Not only does the LLM pick the tool but also decides what params to give it.

Let's define a slightly more sophisticated agentic layer on top of vector search. Not only can the LLM choose vector search, but **we can also get it to infer metadata filters** - which is a structured list of tags that helps return a more precised set of search results.

Let's pay attention to the actual nodes themselves - the chunks. Because we'll take a look at the actual metadata attached to these chunks.



Again we will use the `SimpleDirectoryReader` module to load in the parsed reprentation of this PDF file.

In [26]:
from llama_index.core import SimpleDirectoryReader
# load documents
file_path = os.path.join('data/pdfs', "metagpt.pdf")
documents = SimpleDirectoryReader(input_files=[file_path]).load_data()

Next will split these documents into a set of even chunks, with a chunk size of 1024.

In [27]:
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [28]:
len(nodes)

34

Each node represents a chunk, let's look at content of an example chunk:

In [29]:
# "all" is a special setting - not just enable to print content of node
# but also the metadata attached to the doc, which is propagated to every node
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: data/pdfs/metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-06-07
last_modified_date: 2024-06-07

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks,

In [30]:
# all is a special setting - not just enable to print content of node
# but also the metadata attached to the doc, which is propagated to every node
print(nodes[4].get_content())

Preprint
Figure 2: An example of the communication protocol (left) and iterative programming with exe-
cutable feedback (right). Left: Agents use a shared message pool to publish structured messages.
They can also subscribe to relevant messages based on their profiles. Right : After generating the
initial code, the Engineer agent runs and checks for errors. If errors occur, the agent checks past
messages stored in memory and compares them with the PRD, system design, and code files.
3 M ETAGPT: A M ETA-PROGRAMMING FRAMEWORK
MetaGPT is a meta-programming framework for LLM-based multi-agent systems. Sec. 3.1 pro-
vides an explanation of role specialization, workflow and structured communication in this frame-
work, and illustrates how to organize a multi-agent system within the context of SOPs. Sec. 3.2
presents a communication protocol that enhances role communication efficiency. We also imple-
ment structured communication interfaces and an effective publish-subscribe mechanism. These


Notice how it added a `page_label` annotation to each chunk.

Next let's define a **vector store** index over these nodes:

In [31]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes)
query_engine = vector_index.as_query_engine(similarity_top_k=2)

### Querying RAG

This will build a RAG indexing pipeline over these nodes. Will add an embedding for each node, and it will get back a query engine.

Differently from last time, we can try querying this RAG pipeline via metadata filters.

In [32]:
from llama_index.core.vector_stores import MetadataFilters

query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "2"}
        ]
    )
)

response = query_engine.query(
    "What are some of the high-level results of MetaGPT?"
)

In [33]:
print(str(response))

MetaGPT achieves a new state-of-the-art (SoTA) in code generation benchmarks with 85.9% and 87.7% in Pass@1. It stands out in handling higher levels of software complexity and offering extensive functionality. Additionally, MetaGPT demonstrates a 100% task completion rate in experimental evaluations, showcasing its robustness and efficiency in terms of time and token costs.


In [34]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'data/pdfs/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-06-07', 'last_modified_date': '2024-06-07'}


### Enhanced Data retrieval

* Integrating Metadata Filters into a retrieval tool function.

* This function enables more precise retrieval retrieval by accepting a query string and optional metadata filters, such as page numbers.

* The LLM can intelligently infer relevant metadata filters (e.g., page numbers) based on the user's query.

* You can define different type of metadata filters like section IDs, headers, or footers,...

Function below takes in a query as well as page numbers. This allows you to perform a vector search over an index, along with specifying page numbers as a metadata filter.

At the very end we define a vector query tool. We pass in the vector query function into the vector query tool - which allows us to then use it with a language model.

In [35]:
from typing import List
from llama_index.core.vector_stores import FilterCondition


def vector_query(
    query: str, 
    page_numbers: List[str]
) -> str:
    """Perform a vector search over an index.
    
    query (str): the string query to be embedded.
    page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
        over all pages. Otherwise, filter by the set of specified pages.
    
    """

    metadata_dicts = [
        {"key": "page_label", "value": p} for p in page_numbers
    ]
    
    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts,
            condition=FilterCondition.OR
        )
    )
    response = query_engine.query(query)
    return response
    

vector_query_tool = FunctionTool.from_defaults(
    name="vector_tool",
    fn=vector_query
)

Let's call this tool with an LLM. It should be able to infer both the string as well as the metadata filters.

In [36]:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

response = llm.predict_and_call(
    [vector_query_tool],
    "What are the high-level results of MetaGPT as described in page 2",
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "high-level results of MetaGPT", "page_numbers": ["2"]}
=== Function Output ===
MetaGPT achieves a new state-of-the-art (SoTA) in code generation benchmarks with 85.9% and 87.7% in Pass@1. It stands out in handling higher levels of software complexity and offering extensive functionality, demonstrating a 100% task completion rate in experimental evaluations.


LLM formulates the right query, as well as specify the page numbers. We get back the correct answer.

Let's verify the source nodes:

In [37]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'data/pdfs/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-06-07', 'last_modified_date': '2024-06-07'}


Finally, we can bring in the summary tool from the router example of the first lesson and we can combine that with the vector tool, to create this **overall tool picking system**.

This code sets up a summary index over the same set of nodes and wraps it in a summary tool similar to lesson 1.

In [38]:
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool


summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
summary_tool = QueryEngineTool.from_defaults(
    name="summary_tool",
    query_engine=summary_query_engine,
    description=(
        "Useful if you want to get a summary of MetaGPT"
    ),
)

In [39]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What are the MetaGPT comparisons with ChatDev described on page 8?", 
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "MetaGPT comparisons with ChatDev", "page_numbers": ["8"]}
=== Function Output ===
MetaGPT outperforms ChatDev on the challenging SoftwareDev dataset in nearly all metrics. For example, MetaGPT achieves a higher score in executability, takes less time for execution, requires more tokens for code generation but needs fewer tokens to generate one line of code compared to ChatDev. Additionally, MetaGPT demonstrates superior performance in code statistics and the cost of human revision when compared to ChatDev.


Now LLM has slightly harder task - must pick the right tool in addition to inferring the function parameters.

Let's verify this by printing out the sources:

In [40]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '8', 'file_name': 'metagpt.pdf', 'file_path': 'data/pdfs/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-06-07', 'last_modified_date': '2024-06-07'}


Let's now ask a question, to show that the LLM can still pick the summary tool when necessary:

In [41]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What is a summary of the paper?", 
    verbose=True
)

=== Calling Function ===
Calling function: summary_tool with args: {"input": "summary"}
=== Function Output ===
MetaGPT is a meta-programming framework that utilizes Standardized Operating Procedures (SOPs) to enhance multi-agent systems based on Large Language Models (LLMs). It introduces role specialization, structured communication interfaces, and executable feedback mechanisms to improve code generation quality. In experiments, MetaGPT surpassed previous approaches in various benchmarks, showcasing its effectiveness in software development tasks. The framework also introduces innovative concepts like self-improvement mechanisms and multi-agent economies for future research and development. The Architect agent devises technical specifications based on the Product Requirement Document (PRD), while the Project Manager breaks down tasks and assigns them to Engineers. The Engineer agent requires fundamental development skills, and the QA Engineer generates unit test code to ensure high-

Next lesson: How to build a full agent over a document.

## Building an Agent Reasoning Loop
***

In this lesson, will learn how to define a complete agent reasoning loop. Instead of tool calling in a single-shot setting, an agent is able to reason over tools in multiple steps.

Will use the function calling agent implementation which is an agent that natively integrates with the function calling capabilities of LLMs.

Let's have some fun...

In [64]:
import os

# Accessing the environment variable
openai_api_key = os.getenv('OPENAI_API_KEY')

# Check if the variable is loaded properly
if openai_api_key is not None:
    print("OPENAI_API_KEY loaded successfully!")
else:
    print("Failed to load OPENAI_API_KEY. Please check if it is set correctly.")


OPENAI_API_KEY loaded successfully!


In [65]:
# necessary setup to run in notebook environment
import nest_asyncio
nest_asyncio.apply()

Let's also setup the autoretrieval vector seach tool and summarization tool from the last lesson.

Made it easy with a function in the utils file. Check it out.

In [53]:
from utils import get_doc_tools

# make sure you have a valid pdf file in this directory
file_path = os.path.join('data/pdfs', 'metagpt.pdf')

vector_tool, summary_tool = get_doc_tools(file_path, "metagpt")

### High-level Interface

Let's setup our function calling agent.


In LlamaIndex, an agent consists of two main components:
1. **Agent Worker**: Responsible for executing the next step of a given agent.
2. **Agent Runner**: Overall task dispatcher, responsible for creating a task, orchestrating runs of agent workers on top of a given task, and being able to return back the final response to the user.


![Agent Intro](imgs/agent_intro.png)

In [54]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)
agent = AgentRunner(agent_worker)

To the `FunctionCallingAgentWorker`, we pass in two set of tools:

* Vector tool
* Summary tool

We also pass in the LLM and set verbose equal to true to look at the intermediate outputs.

Think about the `FunctionCallingAgentWorker`'s primary responsibility as given the existing converstion history, memory, and any past state, along with the current user input: 
1. **Use function calling to decide the next tool to call**
2. **Call that tool**
3. **Decide whether or not to return a final response**

The overall agent interface is behind the **agent runner**, and **that's what we're gonna use to query the agent**.

In [55]:

response = agent.query(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

Added user message to memory: Explain the book Human Physiology by Fox
=== Calling Function ===
Calling function: vector_tool_metagpt with args: {"query": "Explain the book Human Physiology by Fox"}
=== Function Output ===
The book "Human Physiology" by Fox is currently in its sixteenth edition and is published by McGraw Hill LLC. It aims to help students understand the concepts of human physiology in a readable, current, and student-oriented manner. The sixteenth edition has been updated with contributions from Krista Rompolski, Ph.D., from Moravian College, who significantly revised chapters 8 and 18. The book maintains its tradition of readability, accessibility, and usefulness to students, with a team at McGraw-Hill supporting the author in this endeavor.
=== LLM Response ===
The book "Human Physiology" by Fox is currently in its sixteenth edition and is published by McGraw Hill LLC. It aims to help students understand the concepts of human physiology in a readable, current, and st

In [71]:
response = agent.query(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT framework are Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities and expertise that contribute to the efficient breakdown and execution of complex tasks within the multi-agent system. The Product Manager generates quadrant charts for applications, the Architect devises technical specifications and designs system architecture, the Project Manager assigns tasks to Engineers, the Engineer completes development tasks, and the QA Engineer ensures high-quality software through unit test code generation.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "communication between agent roles in MetaGPT"}
=== Function Output ==

When you run a multi-step query like this. You want to make sure that you're actually able to trace the sources. 

So luckily, similar to previous lessons, can look at `response.source_nodes`. Take a look at the content of these nodes:

In [72]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: data/pdfs/metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-06-07
last_modified_date: 2024-06-07

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks,

![Router Engine](imgs/full_agent_reasoning_loop.png)



Calling `agent.query()` allows you to query the agent in a one-off manner, but does not preserve state. 

So now let's **try maintaining conversation history over time**.

The agent is able to maintain chats in a conversational memory buffer.

The memory module can be customized - by default is a flat list of items that's a rolling buffer depending on the size of the context window of the LLM.

Therefore, when the agent decides to use a tool, it not only uses the current chat, but also previously the previous conversation history to take the next step/perform next action.



In [73]:
response = agent.chat(
    "Tell me about the evaluation datasets used."
)

Added user message to memory: Tell me about the evaluation datasets used.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation datasets in MetaGPT"}
=== Function Output ===
The evaluation datasets in MetaGPT include HumanEval and MBPP benchmarks, which assess functional accuracy and performance in code generation tasks.
=== LLM Response ===
The evaluation datasets used in MetaGPT include HumanEval and MBPP benchmarks. These datasets are utilized to assess functional accuracy and performance in code generation tasks within the MetaGPT framework.


Now let's ask a follow-up question.

In [74]:
response = agent.chat(
    "Tell me the results over one of the above datasets."
)

Added user message to memory: Tell me the results over one of the above datasets.
=== Calling Function ===
Calling function: vector_tool_metagpt with args: {"query": "results over HumanEval dataset", "page_numbers": ["7"]}
=== Function Output ===
MetaGPT achieved a Pass rate of 85.9% and 87.7% over the HumanEval dataset.
=== LLM Response ===
MetaGPT achieved a Pass rate of 85.9% and 87.7% over the HumanEval dataset.


### Low-level Interface

Just provided a nice, high-level interface for interacting with an agent.

Next section will show you capabilities that let you step through and control the agent in a much more granular fashion. Not only allows you to create a higher level research assistant over your RAG pipelines, but also debug/control it.

![Router Engine](imgs/agent_control.png)



Having this low-level agent interace is powerful for two main reasons:
1. **Allows developers of agents to have greater transparency/visibility into what's actually going on under the hood**, especially if the agent isn't working the first time around, can go in, trace thru execution of the agent, see where it is failing, try out different inputs to if that actually modifies the agent execution into a correct response

2. **Enable richer UX's when building a product experience around this core agentic capability**. For instance, let's say you want to listen to human feedback in the middle of agent execution, as opposed to only after the agent execution is complete for a given task. Then can imagine creating some sort of async queue, where you're able to listen to inputs from humans throughout the middle of agent execution and if human input comes in, can actually interrupt and modify the execution of an agent as it's going on thru a larger task, as opposed to having to wait until the agent's task is complete.

In [90]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)
agent = AgentRunner(agent_worker)

Let's **start using the low-level API**. 

1. We'll first ***create a task object*** from the user query

2. And then we'll start running thru steps or ***even interjecting our own***. 

Let's try executing a single step of this task:

In [91]:
task = agent.create_task(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

In [92]:
task.task_id

'a7707354-71f2-462a-a423-e992f9d7c0af'

We created a task for this agnet; this will return a **task object** which ***contains the input*** as well as ***additional state*** in the task object.

Now let's try **executing a single step of this task**:

In [93]:
step_output = agent.run_step(task.task_id)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT include the Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities and tasks assigned to them, contributing to the overall software development process. The Product Manager focuses on requirements analysis and documentation, the Architect designs the technical specifications and system architecture, the Project Manager breaks down tasks and allocates them, the Engineer implements the code based on specifications, and the QA Engineer ensures code quality through testing and bug fixing. These roles work together in a structured workflow to efficiently develop software solutions.


The agent executes a step of this task through the task ID and give you back a step output.

It calls the summary tool with the input: "agent roles in MetaGPT", which is the very first part of this question. And then it stops there.

When we inspect the logs, and the output of the agent, we see that the first part was actually executed. So we call `agent.get_completed_steps()` with the *task_id*, and we're able to look at the num completed for the task:


In [94]:
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

Num completed for task a7707354-71f2-462a-a423-e992f9d7c0af: 1
The agent roles in MetaGPT include the Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities and tasks assigned to them, contributing to the overall software development process. The Product Manager focuses on requirements analysis and documentation, the Architect designs the technical specifications and system architecture, the Project Manager breaks down tasks and allocates them, the Engineer implements the code based on specifications, and the QA Engineer ensures code quality through testing and bug fixing. These roles work together in a structured workflow to efficiently develop software solutions.


We see that one step has been completed and this is the current output so far.

We can also take a look at any upcoming steps for the agent thru `agent.get_upcoming_steps()` and passing the *task_id*.

In [95]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

Num upcoming steps for task a7707354-71f2-462a-a423-e992f9d7c0af: 1


TaskStep(task_id='a7707354-71f2-462a-a423-e992f9d7c0af', step_id='4c8c291a-1009-43ed-a9b0-de92e01de8f4', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)

We see it's also 1, and we're able to look at a `TaskStep` object with a *task id*, as well as an existing input. This input is currently `None` because the way the agent works is it actually just **autogenerates the action from the conversation history**, and doesn't need to generate an additional external input.

The nice thing about this debugging interface is that if you wanted to pause execution now, you can. You can take the intermediate results without completing the agent flow. 

Let's keep going and run the next two steps and actually try injecting user input:

In [96]:
step_output = agent.run_step(
    task.task_id, input="What about how agents share information?"
)

Added user message to memory: What about how agents share information?
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents share information in MetaGPT"}
=== Function Output ===
Agents in MetaGPT share information through a structured communication protocol that includes a shared message pool and a publish-subscribe mechanism. This system allows agents to exchange structured messages directly and subscribe to relevant information based on their roles. The shared message pool facilitates the exchange of task-related information efficiently, while the publish-subscribe mechanism ensures that agents receive only the necessary information, thus enhancing communication effectiveness within the multi-agent framework.


Not part of the original task query, but by injecting this, can actually modify agent execution to give you back the result that you want.

We'll see that we added the *user message* to **memory**. Next call is "how agents share information in MetaGPT".

We see here it's able to give back a response. Overall task is complete, just need to run 1 final step to synthesize the answer, and to double check that this output is the last step:

In [97]:
# one final step to synthesize the answer
step_output = agent.run_step(task.task_id)

=== LLM Response ===
In MetaGPT, agents share information through a structured communication protocol that includes a shared message pool and a publish-subscribe mechanism. This system allows agents to exchange structured messages directly and subscribe to relevant information based on their roles. The shared message pool facilitates the exchange of task-related information efficiently, while the publish-subscribe mechanism ensures that agents receive only the necessary information, thus enhancing communication effectiveness within the multi-agent framework.


In [98]:
print(step_output.is_last)

True


We get back the answer, this is the last step. To translate this into an **agent response**, similar to what we've seen in some of the previous notebook cells, then all you have you do is call:

In [99]:
response = agent.finalize_response(task.task_id)

In [100]:
print(str(response))

In MetaGPT, agents share information through a structured communication protocol that includes a shared message pool and a publish-subscribe mechanism. This system allows agents to exchange structured messages directly and subscribe to relevant information based on their roles. The shared message pool facilitates the exchange of task-related information efficiently, while the publish-subscribe mechanism ensures that agents receive only the necessary information, thus enhancing communication effectiveness within the multi-agent framework.


That's it! You've learned about: The high-level interface for an agent, as well as a low-level debugging interface.

In the next lesson, we'll learn how to build an agent over multiple documents.

## Building a Multi-Document Agent
***

### Three Document Agent

In previous lesson, we:
* Built an agent that can reason over a single document.

* And answer complex questions over it while maintaining memory.

* In this lesson **we will learn how to extend that agent to handle multiple documents in increasingly degrees of complexity**.

We will start with a 3 document use-case, then we will expand to a 11 document use-case:



![Multi Doc Agent](imgs/multi_doc_agent.png)



Let's setup our OpenAI key and import nest asyncio:

In [1]:
import os

# Accessing the environment variable
openai_api_key = os.getenv('OPENAI_API_KEY')

# Check if the variable is loaded properly
if openai_api_key is not None:
    print("OPENAI_API_KEY loaded successfully!")
else:
    print("Failed to load OPENAI_API_KEY. Please check if it is set correctly.")


OPENAI_API_KEY loaded successfully!


In [2]:
import nest_asyncio
nest_asyncio.apply()

First task is to **setup our function calling agent over three papers.**

We do this by **combining the vector summary tools for each document into a list, and passing it to the agent so that the agent actually has 6 tools in total**.

Let's download these 3 papers:

1. METAGPT: META PROGRAMMING FOR A
MULTI-AGENT COLLABORATIVE FRAMEWORK
2. LONGLORA: EFFICIENT FINE-TUNING OF LONGCONTEXT LARGE LANGUAGE MODELS
3. SELF-RAG: LEARNING TO RETRIEVE, GENERATE, AND
CRITIQUE THROUGH SELF-REFLECTION

In [3]:
# Define your base directory
base_dir = 'data/pdfs'


urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8"
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf"
]

# Generate the full paths for each paper
papers = [os.path.join(base_dir, paper) for paper in papers]

Next we will convert each paper into a tool.

In [24]:
# automatically builds a vector index tool + summary index tool 
# over a given paper

# Vector tool - performs vector search
# Summarization tool - performs summarization across entire document

from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: data/pdfs/metagpt.pdf
Getting tools for paper: data/pdfs/longlora.pdf
Getting tools for paper: data/pdfs/selfrag.pdf


In [25]:
# help(paper_to_tools_dict[papers[0]][0])

So for each paper we get back:
1. Vector tool
2. Summary tool

We put it into this overall dictionary, mapping each paper name to the vector tool and summary tool.

Next we'll simply get these tools in a flat list:

In [26]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [27]:
from llama_index.llms.openai import OpenAI

# define LLM of choice
llm = OpenAI(model="gpt-3.5-turbo")

Let's look at the tools being passed to the agent:

![Three Doc Agent](imgs/three_document_agent.png)

6 tools: We have **three papers**, **two tools** for each paper (vector and summary tool).



In [28]:
len(initial_tools)

6

Next step is to construct our overall **agent worker**. This agent worker includes the six tools as well as the LLM that we pass to it:

In [29]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools,
    llm=llm,
    verbose=True
)

agent = AgentRunner(agent_worker)

Now able to **ask questions across these three documents, or over a single document**:

![Three Doc Agent Query](imgs/three_document_agent_query.png)

In [30]:
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation dataset"}
=== Function Output ===
The evaluation dataset used in the experiments mentioned in the context is the PG19 test split.
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation results"}
=== Function Output ===
The evaluation results include the perplexity values for different models and baselines on the proof-pile (Azerbayev et al., 2022) and PG19 datasets. The models achieve better perplexity with longer context sizes, indicating the effectiveness of the fine-tuning method. The perplexity decreases as the context size increases, with improvements observed when increasing the context window size. Additionally, the maximum context lengths that can be fine-tuned on a single 8 × A100 machine are extended for differ

We get back the answer, PG19 test split was one of the eval datasets used, and then we're able to workout the eval results for LongLoRA models.

The next question we can ask is:

In [31]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework designed to improve the quality and factuality of large language models by incorporating retrieval on demand and self-reflection mechanisms. It enables a single arbitrary LM to retrieve, generate, and critique text passages and its own outputs using reflection tokens. This approach allows the model to adjust its behavior during inference to better meet the requirements of different tasks. Experimental results have shown that Self-RAG surpasses other language models and retrieval-augmented models in various tasks, showcasing its effectiveness in enhancing performance, factuality, and citation accuracy.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}
=== Function Output ===
LongLoRA is an efficient fine-tuning approach 

Try out some queries on your own by trying any combination of these papers and **ask for both summaries as well as specific information within the papers**, to see whether or not the agent is able to reason about the summary and vector tools within each document.

### Inspecting a Single Tool

Getting LlamaIndex Tool metadata

In [44]:
tool = paper_to_tools_dict[papers[0]][0]

tool.metadata

ToolMetadata(description='vector_tool_metagpt(query: str, page_numbers: Optional[List[str]] = None) -> str\nUse to answer questions over a given paper.\n\n        Useful if you have specific questions over the paper.\n        Always leave page_numbers as None UNLESS there is a specific page you want to search for.\n\n        Args:\n            query (str): the string query to be embedded.\n            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE\n                if we want to perform a vector search\n                over all pages. Otherwise, filter by the set of specified pages.\n\n        ', name='vector_tool_metagpt', fn_schema=<class 'pydantic.v1.main.vector_tool_metagpt'>, return_direct=False)

Using *python* `inspect` module to look at the function behind 

In [32]:
import inspect

# Assuming paper_to_tools_dict and papers are defined

# Access the specific tool object
tool = paper_to_tools_dict[papers[0]][0]

# Access the function wrapped by the tool
function = tool.fn
function_name = function.__name__  # Get the name of the function
function_signature = inspect.signature(function)  # Get the signature of the function

# Access the metadata
metadata = tool.metadata
tool_name = getattr(metadata, 'name', 'No name provided')  # Access the name attribute from metadata
tool_description = getattr(metadata, 'description', 'No description provided')  # Access the description attribute from metadata

# Print the extracted information
print("Function Name:", function_name)
print("Function Signature:", function_signature)
print("Tool Name:", tool_name)
print("Tool Description:", tool_description)


Function Name: vector_query
Function Signature: (query: str, page_numbers: Optional[List[str]] = None) -> str
Tool Name: vector_tool_metagpt
Tool Description: vector_tool_metagpt(query: str, page_numbers: Optional[List[str]] = None) -> str
Use to answer questions over a given paper.

        Useful if you have specific questions over the paper.
        Always leave page_numbers as None UNLESS there is a specific page you want to search for.

        Args:
            query (str): the string query to be embedded.
            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE
                if we want to perform a vector search
                over all pages. Otherwise, filter by the set of specified pages.

        


*Note*: `inspect` module is a standard Python library that provides several useful functions to help get information about live objects such as modules, classes, methods, functions, tracebacks, frame objects, and code object.

In [33]:
# how `inspect` works in python
import inspect

def add(x: int, y: int) -> int:
    """Adds two integers and returns the result."""
    return x + y

# Retrieve the signature of the 'add' function
signature = inspect.signature(add)
print("Signature:", signature)

# Print detailed parameter information
for name, param in signature.parameters.items():
    print(f"Name: {name}, Type: {param.annotation}, Default: {param.default}")

# You can also access the return annotation
print("Return type:", signature.return_annotation)


Signature: (x: int, y: int) -> int
Name: x, Type: <class 'int'>, Default: <class 'inspect._empty'>
Name: y, Type: <class 'int'>, Default: <class 'inspect._empty'>
Return type: <class 'int'>


This output shows you how to call the function and what arguments it expects. The `inspect` module is very useful for developers who need to understand or debug code, especially when working with complex and dynamically created objects.

### Eleven Document Agent

Let's expand into a more advanced use-case: 11 papers:

In [34]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D"
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "loftq.pdf",
    "swebench.pdf",
    "selfrag.pdf",
    "zipformer.pdf",
    "values.pdf",
    "finetune_fair_diffusion.pdf",
    "knowledge_card.pdf",
    "metra.pdf",
    "vr_mcl.pdf"
]

# Generate the full paths for each paper
papers = [os.path.join(base_dir, paper) for paper in papers]

Similar to previous section, we will now build a dictionary mapping each paper to its vector/summary tool (can take a little bit of time, since we need to process, index, and embed 11 documents).

In [35]:
paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: data/pdfs/metagpt.pdf
Getting tools for paper: data/pdfs/longlora.pdf
Getting tools for paper: data/pdfs/loftq.pdf
Getting tools for paper: data/pdfs/swebench.pdf
Getting tools for paper: data/pdfs/selfrag.pdf
Getting tools for paper: data/pdfs/zipformer.pdf
Getting tools for paper: data/pdfs/values.pdf
Getting tools for paper: data/pdfs/finetune_fair_diffusion.pdf
Getting tools for paper: data/pdfs/knowledge_card.pdf
Getting tools for paper: data/pdfs/metra.pdf
Getting tools for paper: data/pdfs/vr_mcl.pdf


In [36]:
# collapse these tools into a flat list
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

Here we will **need a slightly more advanced agent and tool architecture**.

Issue is that let's say we try to index all 11 papers, which now includes 20 tools. Or we try to index a 100 paper, 1000 papers, etc.

Even though LLM context windows are getting longer - stuffing too many tool selections into the LLM prompt leads to the following issues:

1. **Tools may not all fit in the prompt**, especially if your number of documents are big, and you're modeling each document as a separate tool or a set of tools.
2. **Cost and latency will spike**, because you're increasing the number of tokens in your prompt.
3. **LLM can get confused**, may fail to pick the right tool when the number of choices is too large.

A *solution* is **When the user asks a query, we actually peform Retrieval Augmentation, but not on the level of text, but actually on the level of tools.** 

1. We first retrieve a small set of relevant tools, and then feed the relevant tools to the agent reasoning prompt, instead of all the tools. This retrieval process is similar to the retrieval process used in RAG. At its simplest can just be a *top-k* vector search. But of course - can advance all the retrieval techniques you want to filter out the relevant set of results.

The LlamaIndex agents let you plug in a **tool retriever**, that allows you to accomplish exactly this.


![Tool RAG 1](imgs/tool_rag_1.png)
![Tool RAG 2](imgs/tool_rag_2.png)
![Tool RAG 3](imgs/tool_rag_3.png)

Let's show you how to get this done. First we will want to **index the tools**.

LlamaIndex has extensive indexing capabilities over general text objects, but since these tools are python objects, we need a way to convert and serialize these objects into a string representation and back.

This is all thru the `ObjectIndex` abstraction:

In [45]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex # standard interface for indexing text 
from llama_index.core.objects import ObjectIndex

# directly plug in these python tools as input into the index
obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

We wrap the standard `VectorStoreIndex` with `ObjectIndex`. To construct an object index, we directly plug-in these python tools as input to the index.

You can retrieve from an object index using **object retriever**. This will call the underlying retriever from the index, and return the output directly as objects. In this case, it will be tools.

In [46]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

Now let's walk thru a very simple example:

In [47]:
tools = obj_retriever.retrieve(
    "Tell me about eval dataset used in MetaGPT and SWE-Bench"
)

In [40]:
# first tool in this list

tools[0].metadata

ToolMetadata(description='Useful for summarization questions related to metagpt', name='summary_tool_metagpt', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

We directly received a set of tools, and the first one is the **summary tool for metagpt**.

In [41]:
# second tool in this list

tools[1].metadata

ToolMetadata(description='Useful for summarization questions related to metra', name='summary_tool_metra', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [42]:
# third tool in this list

tools[2].metadata

ToolMetadata(description='Useful for summarization questions related to swebench', name='summary_tool_swebench', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

**Of course, the quality of retriever is dependant on your embedding model.**

Now we are **ready to setup our function calling agent**. 

Setup is pretty similar to the setup in the last lesson, however as an additional feature, we show that you can add a system prompt to the agent if you want 

***(Optional)*** **Helps provide additional guidance if you want to prompt the agent to output things in a certain way.**

**Useful if you want it \[agent\] to take into account certain factors when it reasons over these tools.**

In [48]:
# Example of system prompt
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [49]:
# comparison queries
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation dataset used in MetaGPT"}
=== Function Output ===
The evaluation dataset used in MetaGPT is the SoftwareDev dataset, which consists of 70 diverse software development tasks. The dataset includes tasks such as creating games like Snake, Brick Breaker, 2048, Flappy Bird, and Tank Battle, as well as tasks involving Excel data processing, CRUD management, music transcription, custom press releases, Gomoku game implementation, weather dashboard creation, and more. The dataset prompts users to develop various software applications and systems, providing a range of challenges to assess MetaGPT's performance in generating executable code for different types of tasks.
=== Calling Function ===
Calling function: summary_tool_swebench with args: {"input": "evaluation dataset used in SWE-B

### Compare and Contrast Wwo LoRa papers

This is the final example. Let's compare and contrast the two LoRA papers: `LongLoRA` and `LoftQ`. and analyze the approach in each paper first. 

In [52]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}
=== Function Output ===
LongLoRA is an efficient fine-tuning method that extends the context length of large language models while minimizing GPU memory cost and training time compared to standard full fine-tuning. It introduces S2-Attn to approximate the standard self-attention pattern during training, allowing for improved efficiency and performance in extending the context length of language models.
=== Calling Function ===
Calling function: summary_tool_loftq with args: {"input": "LoftQ"}
=== Function Output ===
LoftQ is a quantization framework designed for Large Language Models (LLMs) that simultaneously applies quantization and low-rank approximation to the original high-precision pre-trained weights. This approach aims to provide an initialization for sub

First step it takes, is it takes this input task and actually retrieves the set of input tools that help it fulfill its task. Thru the `object retriever` the **hope it that it retrieves LongLoRA and loftq query tools in order help it fulfill its response**.

If we take a look at the intermediate outputs of the agent, we see **it is able to have access to relevant tools** from LongLoRA and also loftq.

First calls `summary_tool_longlora with args: {"input": "LongLoRA"}` gets back summary.

Then calls `summary_tool_loftq with args: {"input": "LoftQ"}` gets back summary.

Final LLM response is able to compare these two approaches by comparing the responses from these two tools, and combining it to synthesize and answer that satisfies the user query.

Can now build agents over multiple documents. Enabling us to build more general, complex, context-augmented research assistants that can answer complex questions.

## Conclusion
***

Complete!

We learned about:

* Agentic RAG: starting from building a router agent, to tool calling, to building your own agent that can reason not just over a single document, but over multiple documents.

* If you want to build custom agents, refer to these resources. 

* **Custom agents**: https://docs.llamaindex.ai/en/stable/examples/agent/custom_agent/


* **Community-built Agents (LlamaHub)**: https://llamahub.ai/?tab=agent


* **Advaned document parsing with LlamaParse**: https://cloud.llamaindex.ai

Build cool stuff with **agentic RAG**!