# LlamaIndex: Building Multi-Document Agent

Below are experiments in agent reasoning over multiple documents.

Please reference this [DeepLearning.AI](https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/nfa5y/building-a-multi-document-agent) course for more details. 

### Setup Environment

In [1]:
from dotenv import load_dotenv
import nest_asyncio
import llama_index
import os
import httpx
from pathlib import Path
import textwrap 

from llama_index.llms.openai import OpenAI
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

from tools_util import get_doc_tools


load_dotenv()
nest_asyncio.apply()

llm = OpenAI(model="o4-mini", temperature=0)

print(llama_index.core.__version__)

0.12.36


### Load Data



In [3]:
files = ['https://arxiv.org/pdf/2505.10543', 'https://arxiv.org/pdf/2505.11423', 'https://arxiv.org/pdf/2505.13259' ]

os.makedirs('./data', exist_ok=True)

papers = []

for f in files:
    file_name =  Path(f).name
    file_path = f"./data/{file_name}.pdf"
    if not os.path.exists(file_path):
        r = httpx.get(f, timeout=20)
        with open(file_path, 'wb') as f:
            f.write(r.content)
    papers.append((file_name.replace(".", "_"), file_path))
print(papers)

[('2505_10543', './data/2505.10543.pdf'), ('2505_11423', './data/2505.11423.pdf'), ('2505_13259', './data/2505.13259.pdf')]


### Create the Query Tools

In [4]:
paper_to_tools_dict = {}
for name, path in papers:
    print(f"Getting tools for paper: {name}")
    vector_tool, summary_tool = get_doc_tools(path, name)
    paper_to_tools_dict[name] = [vector_tool, summary_tool]

Getting tools for paper: 2505_10543
Getting tools for paper: 2505_11423
Getting tools for paper: 2505_13259


In [4]:
paper_to_tools_dict.keys()

dict_keys(['2505_10543', '2505_11423', '2505_13259'])

When creating the LlamaIndex query engine in the `tools_util.get_doc_tools` function, I use the summary index query engine to create a once sentence description of the document the tools are build for. This was done to give the Agent Runner as more information when selecting the tool to be used to answer the question. Below is the description that was synthesized for one of the documents. 

In [5]:
wrapped_text = textwrap.fill(paper_to_tools_dict['2505_13259'][1].metadata.description, 
                             width=140, replace_whitespace=False)
print(wrapped_text)

Useful for summarization questions related to this document which is about: The document delves into the increasing roles of Large Language
Models (LLMs) in scientific research, categorizing their functions as Tools, Analysts, and Scientists, while discussing challenges and
future prospects in AI-driven scientific exploration.


### Create The Agent runner and worker.

In [6]:
initial_tools = [t for name, _ in papers for t in paper_to_tools_dict[name]]
assert len(initial_tools) == 6

In [None]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

### Submit a Query

Below, a query is submitted using a prompt that only the document `2505.13259.pdf` entitled "From Automation to Autonomy:
A Survey on Large Language Models in Scientific Discovery" would be able to answer. Lets see if the agent selected the right tool for the query.

In [None]:
response = agent.query(
    "What are the levels of autonomy when Large Language models are used for scientific research."
)

Added user message to memory: What are the levels of autonomy when Large Language models are used for scientific research.
=== Calling Function ===
Calling function: summary_query_engine_2505_13259 with args: {"input": "levels of autonomy"}
=== Function Output ===
LLMs in scientific discovery progress through three levels of autonomy:  
1. LLM as Tool: Foundational application where LLMs function as tools under human supervision to automate specific tasks within a single stage of the scientific method.  
2. LLM as Analyst: LLMs exhibit greater autonomy in processing complex information, conducting analyses, and offering insights with reduced human intervention for intermediate steps.  
3. LLM as Scientist: LLM-based systems operate as active agents capable of orchestrating and navigating multiple stages of the scientific discovery process with considerable independence, driving substantial portions of the research cycle with minimal human intervention.
=== LLM Response ===
The paper de

The results look correct. Now lets try getting information from two of the documents. The expected tools used will be those for:

1. Document `2505.11423.pdf`: "When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs".
2. Document `2505.13259.pdf`: "From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery".

In [19]:
response = agent.query(
    "For Large Language Models, summarize the pitfalls of reasoning and levels of autonomy for scientific research."
)

Added user message to memory: For Large Language Models, summarize the pitfalls of reasoning and levels of autonomy for scientific research.
=== Calling Function ===
Calling function: summary_query_engine_2505_11423 with args: {"input": "pitfalls of reasoning in LLMs"}
=== Function Output ===
The pitfalls of reasoning in large language models (LLMs) include overemphasizing high-level content at the expense of simple mechanical constraints, introducing unnecessary content that violates constraints, and potentially leading to failures in meeting specific requirements. Additionally, reasoning may inadvertently break constraints by introducing elements that deviate from the desired output, such as violating punctuation rules or language restrictions. While reasoning can aid in organizing content more effectively, it can also distract from strict constraint control, resulting in failures to meet all specified criteria.
=== Calling Function ===
Calling function: summary_query_engine_2505_132

The results are as expected.

### Limiting The Number of Tools Used to Answer A Query

A Tool Retriever enables limiting the tools used.

First the tools are indexed. The tools are serialized bdelow.

In [5]:
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

all_tools = [t for name, _ in papers for t in paper_to_tools_dict[name]]

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

Now an object retriever is created.

In [6]:
obj_retriever = obj_index.as_retriever(similarity_top_k=2)

Lets experiment by providing a very generic question.

In [7]:
tools = obj_retriever.retrieve(
    "Tell me something about LLMs."
)

In [8]:
print(len(tools))
for tool in tools:
    print(tool.metadata.name)

2
summary_query_engine_2505_13259
vector_query_engine_2505_13259


This looks correct: Two tools were returned when all the papers could have been used to answer the question.

Now create a function calling agent that uses the object retriever.

In [10]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given documents.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\
""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [11]:
agent.memory.reset()
response = agent.query("Describe the reasoning-enhanced large language models and large language models in scientific discovery.")

Added user message to memory: Describe the reasoning-enhanced large language models and large language models in scientific discovery.
=== Calling Function ===
Calling function: summary_query_engine_2505_13259 with args: {"input": "Describe large language models in scientific discovery"}
=== Function Output ===
Large language models play a crucial role in scientific discovery by assisting researchers in tasks such as literature search, hypothesis formulation, experiment planning, data analysis, and conclusion validation across various domains. These models are evolving to become increasingly autonomous agents, integrating advanced capabilities like planning, complex reasoning, and instruction following. They are utilized at different levels of autonomy, transitioning from tools for task automation to analysts for data modeling and analysis, and eventually to scientists capable of autonomously conducting major research stages. The future of large language models in scientific discovery 