<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/agent/multi_document_agents-v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi-Document Agents (V1)

In this guide, you learn towards setting up a multi-document agent over the LlamaIndex documentation.

This is an extension of V0 multi-document agents with the additional features:
- Reranking during document (tool) retrieval
- Query planning tool that the agent can use to plan


We do this with the following architecture:

- setup a "document agent" over each Document: each doc agent can do QA/summarization within its doc
- setup a top-level agent over this set of document agents. Do tool retrieval and then do CoT over the set of tools to answer a question.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [1]:
%pip install llama-index-core
%pip install llama-index-agent-openai
%pip install llama-index-readers-file
%pip install llama-index-postprocessor-cohere-rerank
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai
%pip install unstructured[html]

Collecting llama-index-core
  Downloading llama_index_core-0.12.10.post1-py3-none-any.whl (1.6 MB)
[K     |████████████████████████████████| 1.6 MB 2.9 MB/s eta 0:00:01
[?25hCollecting numpy
  Downloading numpy-2.2.1-cp310-cp310-macosx_14_0_arm64.whl (5.4 MB)
[K     |████████████████████████████████| 5.4 MB 4.5 MB/s eta 0:00:01
[?25hCollecting filetype<2.0.0,>=1.2.0
  Downloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Collecting aiohttp<4.0.0,>=3.8.6
  Downloading aiohttp-3.11.11-cp310-cp310-macosx_11_0_arm64.whl (455 kB)
[K     |████████████████████████████████| 455 kB 8.4 MB/s eta 0:00:01
Collecting typing-inspect>=0.8.0
  Using cached typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Collecting nltk>3.8.1
  Using cached nltk-3.9.1-py3-none-any.whl (1.5 MB)
Collecting deprecated>=1.2.9.3
  Downloading Deprecated-1.2.15-py2.py3-none-any.whl (9.9 kB)
Collecting pydantic>=2.8.0
  Downloading pydantic-2.10.5-py3-none-any.whl (431 kB)
[K     |████████████████████████████████| 431 

In [2]:
%load_ext autoreload
%autoreload 2

## Setup and Download Data

In this section, we'll load in the LlamaIndex documentation.

In [3]:
domain = "docs.llamaindex.ai"
docs_url = "https://docs.llamaindex.ai/en/latest/"
!wget -e robots=off --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains {domain} --no-parent {docs_url}

Both --no-clobber and --convert-links were specified, only --convert-links will be used.
--2025-01-09 14:36:43--  https://docs.llamaindex.ai/en/latest/
Resolving docs.llamaindex.ai (docs.llamaindex.ai)... 2606:4700::6812:a3, 2606:4700::6812:1a3, 104.18.1.163, ...
Connecting to docs.llamaindex.ai (docs.llamaindex.ai)|2606:4700::6812:a3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 621832 (607K) [text/html]
Saving to: ‘docs.llamaindex.ai/en/latest/index.html’


2025-01-09 14:36:44 (4.35 MB/s) - ‘docs.llamaindex.ai/en/latest/index.html’ saved [621832/621832]

--2025-01-09 14:36:44--  https://docs.llamaindex.ai/en/latest/getting_started/concepts/
Reusing existing connection to [docs.llamaindex.ai]:443.
HTTP request sent, awaiting response... 200 OK
Length: 621415 (607K) [text/html]
Saving to: ‘docs.llamaindex.ai/en/latest/getting_started/concepts/index.html’


2025-01-09 14:36:44 (10.7 MB/s) - ‘docs.llamaindex.ai/en/latest/getting_started/concepts/index.html’ sa

In [5]:
%pip install -U unstructured

Collecting unstructured
  Downloading unstructured-0.16.12-py3-none-any.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 1.1 MB/s eta 0:00:01
Collecting python-oxmsg
  Downloading python_oxmsg-0.0.1-py3-none-any.whl (31 kB)
Collecting numpy<2
  Using cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl (14.0 MB)
Collecting langdetect
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[K     |████████████████████████████████| 981 kB 17.4 MB/s eta 0:00:01
[?25hCollecting rapidfuzz
  Downloading rapidfuzz-3.11.0-cp310-cp310-macosx_11_0_arm64.whl (1.4 MB)
[K     |████████████████████████████████| 1.4 MB 16.7 MB/s eta 0:00:01
[?25hCollecting backoff
  Using cached backoff-2.2.1-py3-none-any.whl (15 kB)
Collecting python-iso639
  Downloading python_iso639-2024.10.22-py3-none-any.whl (274 kB)
[K     |████████████████████████████████| 274 kB 32.6 MB/s eta 0:00:01
[?25hCollecting ndjson
  Downloading ndjson-0.3.1-py2.py3-none-any.whl (5.3 kB)
Collecting chardet
  Using cached c

In [1]:
from llama_index.readers.file import UnstructuredReader

reader = UnstructuredReader()

In [41]:
pwd

'/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/test'

In [38]:
all_files_gen

<generator object Path.rglob at 0x349601d20>

In [39]:
all_files = [f.resolve() for f in all_files_gen]

In [26]:
from pathlib import Path

all_files_gen = Path("/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files").rglob("*")
all_files = [f.resolve() for f in all_files_gen]

In [27]:
all_html_files = [f for f in all_files if f.suffix.lower() == ".html"]

In [28]:
len(all_files)

14

In [4]:
# # ... existing code ...

# # Add these lines before the NLTK download
# import ssl
# try:
#     _create_unverified_https_context = ssl._create_unverified_context
# except AttributeError:
#     pass
# else:
#     ssl._create_default_https_context = _create_unverified_https_context

# # Now try downloading
# import nltk
# nltk.download('punkt_tab')
# # ... existing code ...

In [5]:
# # ... existing code ...

# import ssl
# try:
#     _create_unverified_https_context = ssl._create_unverified_context
# except AttributeError:
#     pass
# else:
#     ssl._create_default_https_context = _create_unverified_https_context
# # 
# import nltk
# nltk.download('punkt')
# nltk.download('averaged_perceptron_tagger')  # This is the correct package name
# nltk.download('universal_tagset')

# # ... existing code ...

In [29]:
from llama_index.core import Document

# TODO: set to higher value if you want more docs
doc_limit = 100

docs = []
for idx, f in enumerate(all_files):
    if idx > doc_limit:
        break
    print(f"Idx {idx}/{len(all_files)}")
    loaded_docs = reader.load_data(file=f, split_documents=True)
    loaded_doc = Document(
        text="\n\n".join([d.get_content() for d in loaded_docs]),
        metadata={"path": str(f)},
    )
    print(loaded_doc.metadata["path"])
    docs.append(loaded_doc)

Idx 0/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/reglamento-supervision.pdf
Idx 1/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/LMV.pdf
Idx 2/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/LTOSF.pdf
Idx 3/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/file1.pdf
Idx 4/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/LRITF.pdf
Idx 5/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/file2.pdf
Idx 6/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/file3.pdf
Idx 7/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/file6.pdf
Idx 8/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/file4.pdf
Idx 9/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/file5.pdf
Idx 10/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/LPDUSF.pdf
Idx 11/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/LGOAAC-2.pdf
Idx 12/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/LIC.pdf
Idx 13/14




/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/app/files/LGOAAC.pdf


Define Global LLM + Embeddings

In [30]:
import os

os.environ["OPENAI_API_KEY"] = "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"

import nest_asyncio

nest_asyncio.apply()

In [31]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

llm = OpenAI(model="gpt-4o-mini")
Settings.llm = llm
Settings.embed_model = OpenAIEmbedding(
    model="text-embedding-3-small", embed_batch_size=256
)

## Building Multi-Document Agents

In this section we show you how to construct the multi-document agent. We first build a document agent for each document, and then define the top-level parent agent with an object index.

### Build Document Agent for each Document

In this section we define "document agents" for each document.

We define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.

This document agent can dynamically choose to perform semantic search or summarization within a given document.

We create a separate document agent for each city.

In [32]:
from llama_index.agent.openai import OpenAIAgent
from llama_index.core import (
    load_index_from_storage,
    StorageContext,
    VectorStoreIndex,
)
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.node_parser import SentenceSplitter
import os
from tqdm.notebook import tqdm
import pickle


async def build_agent_per_doc(nodes, file_base):
    print(file_base)

    vi_out_path = f"./data/llamaindex_docs/{file_base}"
    summary_out_path = f"./data/llamaindex_docs/{file_base}_summary.pkl"
    if not os.path.exists(vi_out_path):
        Path("./data/llamaindex_docs/").mkdir(parents=True, exist_ok=True)
        # build vector index
        vector_index = VectorStoreIndex(nodes)
        vector_index.storage_context.persist(persist_dir=vi_out_path)
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=vi_out_path),
        )

    # build summary index
    summary_index = SummaryIndex(nodes)

    # define query engines
    vector_query_engine = vector_index.as_query_engine(llm=llm)
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize", llm=llm
    )

    # extract a summary
    if not os.path.exists(summary_out_path):
        Path(summary_out_path).parent.mkdir(parents=True, exist_ok=True)
        summary = str(
            await summary_query_engine.aquery(
                "Extract a concise 1-2 line summary of this document"
            )
        )
        pickle.dump(summary, open(summary_out_path, "wb"))
    else:
        summary = pickle.load(open(summary_out_path, "rb"))

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name=f"vector_tool_{file_base}",
                description=f"Useful for questions related to specific facts",
            ),
        ),
        QueryEngineTool(
            query_engine=summary_query_engine,
            metadata=ToolMetadata(
                name=f"summary_tool_{file_base}",
                description=f"Useful for summarization questions",
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-4")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
        system_prompt=f"""\
You are a specialized agent designed to answer queries about the `{file_base}.html` part of the LlamaIndex docs.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
    )

    return agent, summary


async def build_agents(docs):
    node_parser = SentenceSplitter()

    # Build agents dictionary
    agents_dict = {}
    extra_info_dict = {}

    # # this is for the baseline
    # all_nodes = []

    for idx, doc in enumerate(tqdm(docs)):
        nodes = node_parser.get_nodes_from_documents([doc])
        # all_nodes.extend(nodes)

        # ID will be base + parent
        file_path = Path(doc.metadata["path"])
        file_base = str(file_path.parent.stem) + "_" + str(file_path.stem)
        agent, summary = await build_agent_per_doc(nodes, file_base)

        agents_dict[file_base] = agent
        extra_info_dict[file_base] = {"summary": summary, "nodes": nodes}

    return agents_dict, extra_info_dict

In [33]:
agents_dict, extra_info_dict = await build_agents(docs)

  0%|          | 0/14 [00:00<?, ?it/s]

files_reglamento-supervision
files_LMV
files_LTOSF
files_file1
files_LRITF
files_file2
files_file3
files_file6
files_file4
files_file5
files_LPDUSF
files_LGOAAC-2
files_LIC
files_LGOAAC


### Build Retriever-Enabled OpenAI Agent

We build a top-level agent that can orchestrate across the different document agents to answer any user query.

This `RetrieverOpenAIAgent` performs tool retrieval before tool use (unlike a default agent that tries to put all tools in the prompt).

**Improvements from V0**: We make the following improvements compared to the "base" version in V0.

- Adding in reranking: we use Cohere reranker to better filter the candidate set of documents.
- Adding in a query planning tool: we add an explicit query planning tool that's dynamically created based on the set of retrieved tools.


In [34]:
# define tool for each document agent
all_tools = []
for file_base, agent in agents_dict.items():
    summary = extra_info_dict[file_base]["summary"]
    doc_tool = QueryEngineTool(
        query_engine=agent,
        metadata=ToolMetadata(
            name=f"tool_{file_base}",
            description=summary, 
        ),
    )
    all_tools.append(doc_tool)

In [35]:
print(all_tools[0].metadata)

ToolMetadata(description='This document outlines the regulations for the supervision, inspection, and verification of financial institutions by the Comisión Nacional para la Protección y Defensa de los Usuarios de Servicios Financieros, detailing the procedures, responsibilities, and requirements for compliance.', name='tool_files_reglamento-supervision', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)


In [36]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import (
    ObjectIndex,
    ObjectRetriever,
)
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.schema import QueryBundle
from llama_index.llms.openai import OpenAI


llm = OpenAI(model_name="gpt-4o")

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)
vector_node_retriever = obj_index.as_node_retriever(
    similarity_top_k=10,
)


# define a custom object retriever that adds in a query planning tool
class CustomObjectRetriever(ObjectRetriever):
    def __init__(
        self,
        retriever,
        object_node_mapping,
        node_postprocessors=None,
        llm=None,
    ):
        self._retriever = retriever
        self._object_node_mapping = object_node_mapping
        self._llm = llm or OpenAI("gpt-4-0613")
        self._node_postprocessors = node_postprocessors or []

    def retrieve(self, query_bundle):
        if isinstance(query_bundle, str):
            query_bundle = QueryBundle(query_str=query_bundle)

        nodes = self._retriever.retrieve(query_bundle)
        for processor in self._node_postprocessors:
            nodes = processor.postprocess_nodes(
                nodes, query_bundle=query_bundle
            )
        tools = [self._object_node_mapping.from_node(n.node) for n in nodes]

        sub_question_engine = SubQuestionQueryEngine.from_defaults(
            query_engine_tools=tools, llm=self._llm
        )
        sub_question_description = f"""\
Useful for any queries that involve comparing multiple documents. ALWAYS use this tool for comparison queries - make sure to call this \
tool with the original query. Do NOT use the other tools for any queries involving multiple documents.
"""
        sub_question_tool = QueryEngineTool(
            query_engine=sub_question_engine,
            metadata=ToolMetadata(
                name="compare_tool", description=sub_question_description
            ),
        )

        return tools + [sub_question_tool]

In [37]:
os.environ['COHERE_API_KEY'] = "jTH4sPbtG5t33ODpscEdDuf8i3IRarIaAUckF8vI"

In [38]:
# wrap it with ObjectRetriever to return objects
custom_obj_retriever = CustomObjectRetriever(
    vector_node_retriever,
    obj_index.object_node_mapping,
    node_postprocessors=[CohereRerank(top_n=5)],
    llm=llm,
)

In [20]:
%pip install llama-index-question-gen-openai

Collecting llama-index-question-gen-openai
  Downloading llama_index_question_gen_openai-0.3.0-py3-none-any.whl (2.9 kB)
Collecting llama-index-program-openai<0.4.0,>=0.3.0
  Downloading llama_index_program_openai-0.3.1-py3-none-any.whl (5.3 kB)
Installing collected packages: llama-index-program-openai, llama-index-question-gen-openai
Successfully installed llama-index-program-openai-0.3.1 llama-index-question-gen-openai-0.3.0
You should consider upgrading via the '/Users/inaki/Documents/Personal/Daat/Instrag/front-end/api/.env/bin/python -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


In [13]:
tmps = custom_obj_retriever.retrieve("hello")

# should be 5 + 1 -- 5 from reranker, 1 from subquestion
print(len(tmps))

6


In [17]:
import joblib

In [18]:
output_dir = Path("./saved_models")
output_dir.mkdir(exist_ok=True)

joblib.dump(custom_obj_retriever._retriever, output_dir / "retriever.joblib")
joblib.dump(custom_obj_retriever._object_node_mapping, output_dir / "object_node_mapping.joblib")



PicklingError: Can't pickle <function OpenAIAgentWorker.__init__.<locals>.<lambda> at 0x34b980790>: it's not found as llama_index.agent.openai.step.OpenAIAgentWorker.__init__.<locals>.<lambda>

In [39]:
from llama_index.agent.openai import OpenAIAgent
from llama_index.core.agent import ReActAgent

top_agent = OpenAIAgent.from_tools(
    tool_retriever=custom_obj_retriever,
    system_prompt=""" \
You are an agent designed to answer queries about the documentation.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    llm=llm,
    verbose=True,
)

# top_agent = ReActAgent.from_tools(
#     tool_retriever=custom_obj_retriever,
#     system_prompt=""" \
# You are an agent designed to answer queries about the documentation.
# Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

# """,
#     llm=llm,
#     verbose=True,
# )

### Define Baseline Vector Store Index

As a point of comparison, we define a "naive" RAG pipeline which dumps all docs into a single vector index collection.

We set the top_k = 4

In [23]:
all_nodes = [
    n for extra_info in extra_info_dict.values() for n in extra_info["nodes"]
]

In [24]:
base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)

## Running Example Queries

Let's run some example queries, ranging from QA / summaries over a single document to QA / summarization over multiple documents.

In [40]:
response = top_agent.query(
    "What types of agents are available in LlamaIndex?",
)

Added user message to memory: What types of agents are available in LlamaIndex?
=== Calling Function ===
Calling function: tool_files_LMV with args: {"input":"types of agents"}
Added user message to memory: types of agents
=== Calling Function ===
Calling function: vector_tool_files_LMV with args: {
  "input": "types of agents"
}
Got output: The types of agents mentioned include:

1. **Apoderado** - A representative designated by the brokerage firm to execute operations on behalf of the client.
2. **Representantes legales** - Legal representatives who are duly accredited to instruct the execution of operations.
3. **Personas autorizadas** - Individuals authorized in writing by the client for specific actions as outlined in the contract.

These agents can act on behalf of clients in various capacities related to the execution of financial operations.

Got output: The `files_LMV.html` part of the LlamaIndex docs mentions three types of agents:

1. **Apoderado**: A representative designat

In [26]:
print(response)

The types of agents available in LlamaIndex are LLMs (Language Model), Vector Stores, and Agent Tools.


In [27]:
# baseline
response = base_query_engine.query(
    "What types of agents are available in LlamaIndex?",
)
print(str(response))

LLamaIndex offers various types of agents such as LLMs, vector stores, agent tools, and more.


In [None]:
response = top_agent.query(
    "Compare the content in the agents page vs. tools page."
)

Added user message to memory: Compare the content in the agents page vs. tools page.
=== Calling Function ===
Calling function: compare_tool with args: {"input":"agents vs tools"}
Generated 2 sub questions.
[1;3;38;2;237;90;200m[tool_understanding_index] Q: What are the functionalities of agents in the Llama Index platform?
[0mAdded user message to memory: What are the functionalities of agents in the Llama Index platform?
[1;3;38;2;90;149;237m[tool_understanding_index] Q: How do agents differ from tools in the Llama Index platform?
[0mAdded user message to memory: How do agents differ from tools in the Llama Index platform?
=== Calling Function ===
Calling function: vector_tool_understanding_index with args: {
  "input": "difference between agents and tools"
}
=== Calling Function ===
Calling function: vector_tool_understanding_index with args: {
  "input": "functionalities of agents"
}
Got output: Agents are typically individuals or entities that act on behalf of others, making d

In [None]:
print(response)

The comparison between the content in the agents page and the tools page highlights the difference in their roles and functionalities. Agents on the Llama Index platform are responsible for decision-making and interacting with users, while tools are instruments used to perform specific functions or tasks, controlled by agents to assist in providing responses.


In [28]:
response = top_agent.query(
    "Can you compare the compact and tree_summarize response synthesizer response modes at a very high-level?"
)

Added user message to memory: Can you compare the compact and tree_summarize response synthesizer response modes at a very high-level?
=== Calling Function ===
Calling function: compare_tool with args: {"input":"compact response synthesizer response mode, tree_summarize response synthesizer response mode"}
Generated 5 sub questions.
[1;3;38;2;237;90;200m[tool_evaluating_index] Q: What is the guidance provided by the tool_evaluating_index on evaluating and understanding a specific topic?
[0mAdded user message to memory: What is the guidance provided by the tool_evaluating_index on evaluating and understanding a specific topic?
[1;3;38;2;90;149;237m[tool_understanding_index] Q: What insights does the tool_understanding_index offer on understanding concepts related to front-end API testing?
[0mAdded user message to memory: What insights does the tool_understanding_index offer on understanding concepts related to front-end API testing?
[1;3;38;2;11;159;203m[tool_usage_pattern_index] Q

In [29]:
print(str(response))

At a very high level, the compact response synthesizer provides concise summaries of the information, while the tree_summarize response synthesizer organizes the responses in a structured tree format for better understanding.
