# Agents with LlamaIndex I - Data Agents

Sources [1](https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/), [2](https://docs.llamaindex.ai/en/stable/understanding/putting_it_all_together/agents/), [3](https://docs.llamaindex.ai/en/stable/examples/agent/custom_agent/), [4](https://docs.llamaindex.ai/en/stable/examples/agent/openai_agent/), [5](https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/agent_runner/), [6](https://medium.com/llamaindex-blog/data-agents-eed797d7972f), [7](https://akash-mathur.medium.com/advanced-rag-query-augmentation-for-next-level-search-using-llamaindex-d362fed7ecc3)    

Data Agents, empowered by LLMs are knowledge workers within LlamaIndex, designed to interact with various types of data. These agents can handle both unstructured and structured data, significantly enhancing the capabilities beyond traditional query/chat engines.

![](https://miro.medium.com/v2/resize:fit:1000/format:webp/1*cWwW01Ez_JIS2hcMJwcV8Q.png)  

Here's a high level overview about it:

+ Functionality: Data Agents can autonomously conduct searches and retrieve information across unstructured, semi-structured, and structured data. They are not limited to just reading data; they can also write, modify, and store information by integrating with external service APIs.

+ Dynamic Interaction: Unlike static query/chat engines, Data Agents can dynamically interact with data sources. They can ingest new data and adapt based on the information they process, offering a more flexible and responsive approach to data management.

Building a data agent requires the following core components:

1. Reasoning loop
2. Tool abstractions

![](https://miro.medium.com/v2/resize:fit:1000/format:webp/1*WPOS7tiljXCrd3IkJy84CQ.png)  

A data agent is initialized with set of APIs, or Tools, to interact with; these APIs can be called by the agent to return information or modify state. Given an input task, the data agent uses a reasoning loop to decide which tools to use, in which sequence, and the parameters to call each tool.

---

1. Reasoning Loop - The reasoning loop depends on the type of agent, as following:
    + ReAct agent (works across any chat/text completion endpoint). - [Video](https://www.youtube.com/watch?v=pRUc6JPw6CY), [Code](https://colab.research.google.com/drive/1XYNaGvEdyKVbs4g_Maffyq08DUArcW8H?usp=sharing)  
    + Function Calling Agents (integrates with any function calling LLM) - [Video](https://www.youtube.com/watch?v=6INvyrC4WrA), [Code](https://colab.research.google.com/drive/1GyPRMiwxS7rKxKpRt4r-ckYfmAw2GxdQ?usp=sharing)  
    + "Advanced Agents": 
        + Retrieval Augmented Function Calling Agent - [Video](https://www.youtube.com/watch?v=K7h17Jjtbzg), [Code](https://colab.research.google.com/drive/1R41zIhVybCNqg67eVEPuyLeMp_HYwTlA?usp=sharing)  
        + Controlling Agent Reasoning Loop - [Video](https://www.youtube.com/watch?v=gFRbkRtLGZQ), [Code](https://colab.research.google.com/drive/1c5ORIlqs3YMWosDSMgs6_ZHb5eiANS1c?usp=sharing)  
        + StepWise Controllable Agent - [Video](https://www.youtube.com/watch?v=JGkSxdPFgyQ), [Code](https://colab.research.google.com/drive/1x-CR_KA7LzhPhLdITycojAJVamEaKsaO?usp=sharing)  
        + [LLMCompiler](https://llamahub.ai/l/llama-packs/llama-index-packs-agents-llm-compiler?from=)  
        + [Chain-of-Abstraction](https://llamahub.ai/l/llama-packs/llama-index-packs-agents-coa?from=)  
        + [Language Agent Tree Search](https://llamahub.ai/l/llama-packs/llama-index-packs-agents-lats?from=)  
        + and more...

---

2. Tool Abstractions - At their core, tool abstractions allow for a structured way to define how Data Agents can interact with data or services. Unlike typical APIs designed for human users, these tools are optimized for automated interactions, enabling agents to execute tasks with precision and efficiency.

Types of Tools:

+ FunctionTool - A function tool allows users to easily convert any user-defined function into a Tool. It can also auto-infer the function schema.
+ QueryEngineTool - A tool that wraps an existing query engine. 

[LlamaHub Tools](https://llamahub.ai/?tab=tools)  
    
--- 

## Installing Packages

In [None]:
!pip install -q openai
!pip install -q llama-index
!pip install -q llama-index-experimental
!pip install -qU llama-index-llms-openai
!pip install -q pypdf
!pip install -q docx2txt

## Importing Packages

In [None]:
import os
import openai

#os.environ["OPENAI_API_KEY"] = "<the key>"
openai.api_key = os.environ["OPENAI_API_KEY"]

import sys
import shutil
import glob
import logging
from pathlib import Path
from IPython.display import Image

import warnings
warnings.filterwarnings('ignore')

import pandas as pd

## Llamaindex LLMs
from llama_index.llms.openai import OpenAI

## Llamaindex readers
from llama_index.core import SimpleDirectoryReader

## LlamaIndex Index Types
from llama_index.core import ListIndex
from llama_index.core import VectorStoreIndex
from llama_index.core import TreeIndex
from llama_index.core import KeywordTableIndex
from llama_index.core import SimpleKeywordTableIndex
from llama_index.core import DocumentSummaryIndex
from llama_index.core import SummaryIndex
from llama_index.core import KnowledgeGraphIndex
from llama_index.experimental.query_engine import PandasQueryEngine

## LlamaIndex Context Managers
from llama_index.core import StorageContext
from llama_index.core import load_index_from_storage
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core.schema import Node
from llama_index.core import Settings

## LlamaIndex Templates
from llama_index.core.prompts import PromptTemplate
from llama_index.core.prompts import ChatPromptTemplate
from llama_index.core.base.llms.types import ChatMessage, MessageRole

## LlamaIndex Agents
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import ReActAgent

## LlamaIndex Callbacks
from llama_index.core.callbacks import CallbackManager
from llama_index.core.callbacks import LlamaDebugHandler

import nest_asyncio
nest_asyncio.apply()

In [None]:
import logging

#logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
#logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# 1st. Example - [Router Engine](https://docs.llamaindex.ai/en/stable/module_guides/querying/router/)  

![](https://miro.medium.com/v2/resize:fit:1000/1*pwwxzWABv4cJDWonJ7LImg.png)

A Router Query Engine serves as a powerful decision-making module that plays a crucial role in selecting the most appropriate choices based on user queries and metadata-defined options. These routers are versatile modules that can operate independently as “selector modules” or can be utilized as query engines or retrievers on top of other query engines or retrievers.

Routers excel in various use cases, including selecting the appropriate data source from a diverse range of options and deciding whether to perform summarization or semantic search based on the user query. They can also handle more complex tasks like trying out multiple choices simultaneously and combining the results using multi-routing capabilities.

We also define a “selector”. Users can easily employ routers as query engines or retrievers, with the router taking on the responsibility of selecting query engines or retrievers to route user queries effectively.

Steps:  
+ The source document is indexed with sentence splitting with a fixed window size
+ Create summary and vector indexes
+ From indices, we obtain query engines, respectively
+ Bind all search engines to the agents as tools
+ The agent will choose a tool based on the overall prompts
+ Query on the agent.

#### Defining Models

In [None]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

#model="gpt-4o"
model="gpt-4o-mini"

Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
Settings.llm = OpenAI(temperature=0, 
                      model=model, 
                      #max_tokens=512
                      PRESENCE_PENALTY=-2,
                      TOP_P=1,
                     )

#### Defining Folders

In [None]:
DOCS_DIR = "../../Data/"
PERSIST_DIR = "../../Index/"

print(f"Current dir: {os.getcwd()}")

if not os.path.exists(DOCS_DIR):
  os.mkdir(DOCS_DIR)
docs = os.listdir(DOCS_DIR)
docs = [d for d in docs]
docs.sort()
print(f"Files in {DOCS_DIR}")
for doc in docs:
    print(doc)

## Load Data  
A single file

In [None]:
#### To download this paper, below is the needed code:
#!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O ../Data/metagpt.pdf

In [None]:
documents = SimpleDirectoryReader(input_files=[f"{DOCS_DIR}metagpt.pdf"]).load_data()

In [None]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [None]:
print(len(nodes))
nodes[0]

## Define Summary Index and Vector Index over the Same Data

In [None]:
#from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

## Define Query Engines Tools and Set Metadata for each Tool

In [None]:
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize", 
                                                     use_async=True,)

vector_query_engine = vector_index.as_query_engine(similarity_top_k=3,
                                                   retriever_mode="embedding",
                                                   response_mode="compact",
                                                   verbose=True)

In [None]:
from llama_index.core.tools import QueryEngineTool

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=("Useful for summarization questions related to MetaGPT"),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=("Useful for retrieving specific context from the MetaGPT paper."),
)

## Define Router Query Engine to use Tools

In [None]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[summary_tool, vector_tool,],
    verbose=True
)

## Testing

In [None]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

In [None]:
print(len(response.source_nodes))

In [None]:
response = query_engine.query("How do agents share information with other agents?")
print(str(response))

In [None]:
print(len(response.source_nodes))

# 2nd. Example - [Sub Question Query Engine](https://docs.llamaindex.ai/en/stable/examples/query_engine/sub_question_query_engine/)  

[Source data](https://docs.llamaindex.ai/en/latest/examples/usecases/10k_sub_question/)

Normal query engines are designed to locate relevant information within vast datasets. They act as intermediaries between users’ questions and stored data. When a user poses a query, the engine carefully analyzes it, pinpoints relevant data, and presents a comprehensive response.  

While traditional query engines excel at straightforward questions, they often face challenges when confronted with multi-faceted questions spanning multiple documents. Simply merging documents and extracting top k elements frequently fails to capture the nuances required for truly informative responses.  

Decomposition Strategy: To address this complexity, Sub-Question Query Engines adopt a divide-and-conquer approach. They elegantly decompose complex queries into a series of sub-questions, each targeting specific aspects of the original inquiry.  

The implementation involves defining a Sub-Question Query Engine for each data source. Instead of treating all documents equally, the engine strategically addresses sub-questions specific to each data source. To generate the final response, a top-level Sub-Question Query Engine is then employed to synthesize the results from individual sub-questions.  

Given the initial complex question, we use LLM to generate sub-questions and execute sub-questions on selected data sources. It gathers all sub-responses and then synthesizes the final response.  

Sub-Question Query Engine focuses on the divide-and-conquer approach. It decomposes a complex query into a series of smaller, focused sub-questions. Each sub-question is sent to a dedicated Sub-Question Query Engine that retrieves relevant information from its specific data source. Hence, it ensures each sub-question gets the appropriate data source, leading to more precise results. It provides comprehensive answers by aggregating insights from various sub-questions to provide a holistic response.  

In [None]:
#from llama_index.core.tools import QueryEngineTool
from llama_index.core.tools import ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine

#### Load data

In [None]:
lyft_docs = SimpleDirectoryReader(input_files=[f"{DOCS_DIR}lyft_2021.pdf"]).load_data()
uber_docs = SimpleDirectoryReader(input_files=[f"{DOCS_DIR}uber_2021.pdf"]).load_data()

#### Build indices

In [None]:
lyft_index = VectorStoreIndex.from_documents(lyft_docs)
uber_index = VectorStoreIndex.from_documents(uber_docs)

#### Build query engines

In [None]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3,
                                         retriever_mode="embedding",
                                         response_mode="compact",
                                         verbose=True)

uber_engine = uber_index.as_query_engine(similarity_top_k=3,
                                         retriever_mode="embedding",
                                         response_mode="compact",
                                         verbose=True)

In [None]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft",
            description=(
                "Provides information about Lyft financials for year 2021"
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber",
            description=(
                "Provides information about Uber financials for year 2021"
            ),
        ),
    ),
]

s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)

#### Run queries

In [None]:
response = s_engine.query("Compare and contrast the customer segments and geographies that grew the fastest")

In [None]:
print(response)

In [None]:
response = s_engine.query("Compare revenue growth of Uber and Lyft from 2020 to 2021")

In [None]:
print(response)

# 3nd. Example - [Tool Calling](https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/)  


![](https://miro.medium.com/v2/resize:fit:720/1*WPOS7tiljXCrd3IkJy84CQ.png)

## Define Simple Tools

In [None]:
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def subtract(x: int, y: int) -> int: 
    """Subtract the second number from the first number."""
    return (x - y)

def multiply(x: int, y: int) -> int: 
    """Multiply one number by the other number."""
    return (x * y)

def uppercase(x: str) -> str: 
    """Return the input string in uppercase."""
    return (x.upper())

add_tool = FunctionTool.from_defaults(fn=add)
subtract_tool = FunctionTool.from_defaults(fn=subtract)
multiply_tool = FunctionTool.from_defaults(fn=multiply)
uppercase_tool = FunctionTool.from_defaults(fn=uppercase)

In [None]:
#llm = Settings.llm
#llm = OpenAI(model="gpt-3.5-turbo")
llm = OpenAI(model="gpt-4o")   #Sometimes GPT4o-mini does not work well with ill defined tools

response = llm.predict_and_call(
    [add_tool, subtract_tool, multiply_tool, uppercase_tool], 
    "Tell me the output of 3 - 12 ", 
    verbose=True
)
print(str(response))

In [None]:
response = llm.predict_and_call(
    [add_tool, subtract_tool, multiply_tool, uppercase_tool], 
    "Write ```This phrase``` in uppercase", 
    verbose=True
)
print(str(response))

#### Now that you understand the mechanism behind simple tools:  

## Define an Auto-Retrieval Tool using metadata as filter

In [None]:
## Uncomment if not ran before

#import nest_asyncio
#nest_asyncio.apply()
#from llama_index.core import SimpleDirectoryReader
#documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()
#from llama_index.core.node_parser import SentenceSplitter
#splitter = SentenceSplitter(chunk_size=1024)
#nodes = splitter.get_nodes_from_documents(documents)

### Examining the metadata that was automatically added to the document nodes:

In [None]:
print(nodes[0].get_content(metadata_mode="all"))

In [None]:
## Uncomment if not ran before

#from llama_index.core import VectorStoreIndex
#vector_index = VectorStoreIndex(nodes)
#query_engine = vector_index.as_query_engine(similarity_top_k=2)

### Defining Metadata Filters

In [None]:
from llama_index.core.vector_stores import MetadataFilters

query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "2"},
            {"key": "file_path", "value": "../../Data/metagpt.pdf"}
        ]
    )
)

response = query_engine.query(
    "What are some high-level results of MetaGPT?", 
)

print(str(response))

In [None]:
for n in response.source_nodes:
    print(n.metadata)

## Define an Auto-Retrieval Tool

In [None]:
from typing import List
from llama_index.core.vector_stores import FilterCondition


def vector_query(query: str, page_numbers: List[str]) -> str:
    """Perform a vector search over an index.
    
    query (str): the string query to be embedded.
    page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
    over all pages. Otherwise, filter by the set of specified pages.
    
    """

    metadata_dicts = [{"key": "page_label", "value": p} for p in page_numbers]
    
    query_engine = vector_index.as_query_engine(similarity_top_k=2,
                                                filters=MetadataFilters.from_dicts(
                                                    metadata_dicts,
                                                    condition=FilterCondition.OR
                                                )
                                               )
    response = query_engine.query(query)
    return response

vector_query_tool = FunctionTool.from_defaults(name="vector_tool", fn=vector_query)

In [None]:
response = llm.predict_and_call(
    [vector_query_tool], 
    "What are the high-level results of MetaGPT as described on page 2?", 
    verbose=True
)

In [None]:
for n in response.source_nodes:
    print(n.metadata)

## Adding some other Data tools!

In [None]:
#from llama_index.core import SummaryIndex
#from llama_index.core.tools import QueryEngineTool

summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
summary_tool = QueryEngineTool.from_defaults(
    name="summary_tool",
    query_engine=summary_query_engine,
    description=(
        "Useful if you want to get a summary of the paper"
    ),
)

#### Creating a query to trigger the vector query tool:

In [None]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What are the MetaGPT comparisons with ChatDev described on page 8?", 
    verbose=True
)

In [None]:
for n in response.source_nodes:
    print(n.metadata)

#### Creating a query to trigger the summary tool:

In [None]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What is a summary of the paper?", 
    verbose=True
)

In [None]:
for n in response.source_nodes:
    print(n.metadata)

# 4rd. Example - [Building an Agent Reasoning Loop](https://docs.llamaindex.ai/en/latest/examples/agent/return_direct_agent/) with memory  
### The agent reasoning loop keeps the session alive and works interactively  

In [None]:
Image(filename="../../Imgs/20240516_201241.png")

In [None]:
Image(filename="../../Imgs/20240516_201558.png")

In [None]:
## Uncomment if not ran before
## vector_tool, summary_tool as defined before
#import nest_asyncio
#nest_asyncio.apply()

In [None]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

#### Let's make a two steps query

In [None]:
response = agent.query(
    "What are the MetaGPT comparisons with ChatDev described on page 8,"
    "and how Agents communicate with other agents?"
)

In [None]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

In [None]:
response = agent.chat(
    "Tell me about the evaluation datasets used."
)

In [None]:
response = agent.chat("Tell me the results over one of the above datasets.")

## Lower-Level: Debuggability and Control

In [None]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [None]:
task = agent.create_task(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

In [None]:
step_output = agent.run_step(task.task_id)

In [None]:
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

In [None]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

### Inserting a new step in the task

In [None]:
step_output = agent.run_step(
    task.task_id, input="What about how agents share information?"
)

In [None]:
print(step_output.is_last)
#step_output = agent.run_step(task.task_id)   ##Error

In [None]:
response = agent.finalize_response(task.task_id)

In [None]:
print(str(response))

# 5th. Example - [Building a Multi-Document Agent](https://docs.llamaindex.ai/en/stable/examples/agent/multi_document_agents/)

In [None]:
Image(filename="../../Imgs/20240516_230659.png") 

## Setup an Multi-Document Agent over 3 papers

In [None]:
## Uncomment if not ran before
#import nest_asyncio
#nest_asyncio.apply()
#from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, SummaryIndex
#from llama_index.core.node_parser import SentenceSplitter
#from llama_index.core.tools import FunctionTool, QueryEngineTool
#from llama_index.core.vector_stores import MetadataFilters, FilterCondition
from typing import List, Optional
from pathlib import Path

In [None]:
def get_doc_tools(file_path: str, name: str,) -> str:
    """Get vector query and summary query tools from a document."""

    documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
    splitter = SentenceSplitter(chunk_size=1024)
    nodes = splitter.get_nodes_from_documents(documents)
    vector_index = VectorStoreIndex(nodes)
    
    def vector_query(query: str, page_numbers: Optional[List[str]] = None) -> str:
        """Use to answer questions over a given paper.
    
        Useful if you have specific questions over the paper.
        Always leave page_numbers as None UNLESS there is a specific page you want to search for.
    
        Args:
            query (str): the string query to be embedded.
            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE 
                if we want to perform a vector search
                over all pages. Otherwise, filter by the set of specified pages.
        
        """
    
        page_numbers = page_numbers or []
        metadata_dicts = [{"key": "page_label", "value": p} for p in page_numbers]
        query_engine = vector_index.as_query_engine(similarity_top_k=2,
                                                    filters=MetadataFilters.from_dicts(metadata_dicts,
                                                                                       condition=FilterCondition.OR)
                                                   )
        response = query_engine.query(query)
        return response
    
    vector_query_tool = FunctionTool.from_defaults(name=f"vector_tool_{name}", fn=vector_query)
    
    summary_index = SummaryIndex(nodes)
    summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize",                                                  use_async=True,)
    summary_tool = QueryEngineTool.from_defaults(name=f"summary_tool_{name}",
                                                 query_engine=summary_query_engine,
                                                 description=(f"Useful for summarization questions related to {name}"),)

    return vector_query_tool, summary_tool

In [None]:
docs = os.listdir(DOCS_DIR)
docs = [d for d in docs]
docs.sort()
print(f"Files in {DOCS_DIR}")
for doc in docs:
    print(doc)

In [None]:
papers = [
    "../../Data/metagpt.pdf",
    "../../Data/longlora.pdf",
    "../../Data/selfrag.pdf",
]

In [None]:
paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

In [None]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]
print(len(initial_tools))

In [None]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [None]:
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

In [None]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

## Setup an Multi-Document Agent over 10 papers

In [None]:
papers = [
    "../../Data/metagpt.pdf",
    "../../Data/longlora.pdf",
    "../../Data/loftq.pdf",
    "../../Data/swebench.pdf",
    "../../Data/selfrag.pdf",
    "../../Data/zipformer.pdf",
    "../../Data/values.pdf",
    "../../Data/knowledge_card.pdf",
    "../../Data/metra.pdf",
    "../../Data/vr_mcl.pdf"
]

In [None]:
paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

## Extend the Agent with Tool Retrieval

In [None]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

### Define an "object" index and retriever over these tools

In [None]:
#from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [None]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

### The tools will be chosen by similarity

In [None]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)

In [None]:
for t in tools:
    print(t.metadata)

In [None]:
#from llama_index.core.agent import FunctionCallingAgentWorker
#from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [None]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

In [None]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)