# Multi-Document Agents

https://docs.llamaindex.ai/en/stable/examples/agent/multi_document_agents

In this guide, you learn towards setting up an agent that can effectively answer different types of questions over a larger set of documents.

These questions include the following

- QA over a specific doc
- QA comparing different docs
- Summaries over a specific doc
- Comparing summaries between different docs

We do this with the following architecture:

- setup a "document agent" over each Document: each doc agent can do QA/summarization within its doc
- setup a top-level agent over this set of document agents. Do tool retrieval and then do CoT over the set of tools to answer a question.

## Setup and Download Data

In this section, we'll define imports and then download Wikipedia articles about different cities. Each article is stored separately.

We load in 18 cities - this is not quite at the level of "hundreds" of documents but its still large enough to warrant some top-level document retrieval!

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [None]:
%pip install llama-index-agent-openai
%pip install llama-index-embeddings-openai
%pip install llama-index-llms-openai

In [None]:
!pip install llama-index

In [3]:
!ls

ndc_multi_document_agents.ipynb plot_earthkit.ipynb
plot_data_graphs.ipynb          plot_graphs_excel.ipynb


In [4]:
import os

os.chdir("/Users/josingh/hobby/climate-dashboard")

In [5]:
from llama_index.core import (
    VectorStoreIndex,
    SimpleKeywordTableIndex,
    SimpleDirectoryReader,
)
from llama_index.core import SummaryIndex
from llama_index.core.schema import IndexNode
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.llms.openai import OpenAI
from llama_index.core.callbacks import CallbackManager

In [2]:
ndc_file_name_path_mapping = {
    'Cambodia': 'data/ndc/20201231_NDC_Update_Cambodia.pdf', 
    'Myanmar': 'data/ndc/Myanmar Updated  NDC July 2021.pdf', 
    'Laos': 'data/ndc/NDC 2020 of Lao PDR (English), 09 April 2021 (1).pdf', 
    'Singapore': 'data/ndc/Singapore Second Update of First NDC.pdf', 
    'Brunei': "data/ndc/Brunei Darussalam's NDC 2020.pdf", 
    'Vietnam': 'data/ndc/Viet Nam NDC 2022 Update.pdf', 
    'Malaysia': 'data/ndc/Malaysia NDC Updated Submission to UNFCCC July 2021 final.pdf', 
    'Indonesia': 'data/ndc/ENDC Indonesia.pdf'
}

sea_countries = list(ndc_file_name_path_mapping.keys())

In [3]:
sea_countries

['Cambodia',
 'Myanmar',
 'Laos',
 'Singapore',
 'Brunei',
 'Vietnam',
 'Malaysia',
 'Indonesia']

In [None]:
# Load all country documents
country_docs = {}
for country in sea_countries:
    country_docs[country] = SimpleDirectoryReader(
        input_files=[ndc_file_name_path_mapping[country]]
    ).load_data()

Define Global LLM and Embeddings

In [8]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(temperature=0, model="gpt-4o-mini")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

## Building Multi-Document Agents

In this section we show you how to construct the multi-document agent. We first build a document agent for each document, and then define the top-level parent agent with an object index.

### Build Document Agent for each Document

In this section we define "document agents" for each document.

We define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.

This document agent can dynamically choose to perform semantic search or summarization within a given document.

We create a separate document agent for each city.

In [9]:
from llama_index.agent.openai import OpenAIAgent
from llama_index.core import load_index_from_storage, StorageContext
from llama_index.core.node_parser import SentenceSplitter
import os

node_parser = SentenceSplitter()

# Build agents dictionary
agents = {}
query_engines = {}

# this is for the baseline
all_nodes = []

for idx, country in enumerate(sea_countries):
    nodes = node_parser.get_nodes_from_documents(country_docs[country])
    all_nodes.extend(nodes)

    if not os.path.exists(f"./data/vector_store/ndc/{country}"):
        # build vector index
        vector_index = VectorStoreIndex(nodes)
        vector_index.storage_context.persist(
            persist_dir=f"./data/{country}"
        )
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=f"./data/vector_store/ndc/{country}"),
        )

    # build summary index
    summary_index = SummaryIndex(nodes)
    # define query engines
    vector_query_engine = vector_index.as_query_engine(llm=Settings.llm)
    summary_query_engine = summary_index.as_query_engine(llm=Settings.llm)

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name="vector_tool",
                description=(
                    "Useful for questions related to specific aspects of"
                    f" {country}."
                ),
            ),
        ),
        QueryEngineTool(
            query_engine=summary_query_engine,
            metadata=ToolMetadata(
                name="summary_tool",
                description=(
                    "Useful for any requests that require a holistic summary"
                    f" of EVERYTHING about {country}. For questions about"
                    " more specific sections, please use the vector_tool."
                ),
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-4o-mini")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
        system_prompt=f"""\
You are a specialized agent designed to answer queries about {country}.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
    )

    agents[country] = agent
    query_engines[country] = vector_index.as_query_engine(
        similarity_top_k=2
    )

### Build Retriever-Enabled OpenAI Agent

We build a top-level agent that can orchestrate across the different document agents to answer any user query.

This agent takes in all document agents as tools. This specific agent `RetrieverOpenAIAgent` performs tool retrieval before tool use (unlike a default agent that tries to put all tools in the prompt).

Here we use a top-k retriever, but we encourage you to customize the tool retriever method!


In [10]:
# define tool for each document agent
all_tools = []
for country in sea_countries:
    country_summary = (
        f"This content contains Nationally Determined Contributions for {country}. Use"
        f" this tool if you want to answer any questions about {country}.\n"
    )
    doc_tool = QueryEngineTool(
        query_engine=agents[country],
        metadata=ToolMetadata(
            name=f"tool_{country}",
            description=country_summary,
        ),
    )
    all_tools.append(doc_tool)

In [11]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [12]:
from llama_index.agent.openai import OpenAIAgent

top_agent = OpenAIAgent.from_tools(
    tool_retriever=obj_index.as_retriever(similarity_top_k=3),
    system_prompt=""" \
You are an agent designed to answer queries about a set of given South East Asia countries.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True,
)

## Running Example Queries

Let's run some example queries, ranging from QA / summaries over a single document to QA / summarization over multiple documents.

In [14]:
# should use Boston agent -> vector tool
response = top_agent.query("Tell me about the NDCs of Singapore")

Added user message to memory: Tell me about the NDCs of Singapore
=== Calling Function ===
Calling function: tool_Singapore with args: {"input":"NDCs"}
Added user message to memory: NDCs
=== Calling Function ===
Calling function: vector_tool with args: {"input":"NDCs in Singapore"}
Got output: Singapore's Nationally Determined Contributions (NDCs) focus on an economy-wide absolute greenhouse gas (GHG) emissions limitation target. The NDC does not include non-greenhouse gas components or climate forcers not covered by IPCC guidelines. Additionally, the NDC does not reference any baseline for its target, which aims to peak and subsequently reduce emissions to an absolute level.

Got output: Singapore's Nationally Determined Contributions (NDCs) emphasize an economy-wide absolute greenhouse gas (GHG) emissions limitation target. The NDC does not include non-greenhouse gas components or climate forcers outside the IPCC guidelines, and it does not specify a baseline for its target. The aim 

In [15]:
print(response)

Singapore's Nationally Determined Contributions (NDCs) focus on an economy-wide absolute greenhouse gas (GHG) emissions limitation target. The NDC does not include non-greenhouse gas components or climate forcers outside the IPCC guidelines, and it does not specify a baseline for its target. The primary goal is to peak and subsequently reduce emissions to an absolute level.


In [17]:
# should use Houston agent -> vector tool
response = top_agent.query(
    "Give me a summary of all positive aspects of Singapore's NDC"
)

Added user message to memory: Give me a summary of all positive aspects of Singapore's NDC
=== Calling Function ===
Calling function: tool_Singapore with args: {"input":"positive aspects of NDC"}
Added user message to memory: positive aspects of NDC
=== Calling Function ===
Calling function: vector_tool with args: {"input":"positive aspects of NDC Singapore"}
Got output: Singapore's Nationally Determined Contributions (NDC) highlight several positive aspects, including:

1. **Strong Performance in Carbon Intensity**: Singapore is recognized as one of the top performers globally in terms of carbon intensity, reflecting its commitment to environmentally responsible growth despite its small share of global GDP.

2. **Energy Efficiency Initiatives**: The country emphasizes energy efficiency as a key strategy for emissions reduction, encouraging industries to adopt energy-efficient technologies and cleaner fuel sources through strong pollution control laws and government support.

3. **Inve

In [19]:
print(response)

The positive aspects of Singapore's Nationally Determined Contributions (NDC) include:

1. **Strong Performance in Carbon Intensity**: Singapore is recognized as a top performer globally in carbon intensity, reflecting its commitment to environmentally responsible growth.

2. **Energy Efficiency Initiatives**: The country emphasizes energy efficiency as a key strategy for emissions reduction, encouraging industries to adopt cleaner technologies and fuels.

3. **Investment in Research and Development**: Significant investments in low-carbon technologies position Singapore for sustainable growth.

4. **International Cooperation**: Singapore actively fosters international cooperation on climate action, supporting developing countries and participating in multilateral frameworks.

5. **Capacity-Building Programs**: The Singapore Cooperation Programme enhances global efforts in sustainable development through capacity-building initiatives.

6. **Dedicated Climate Action Package**: The launc

In [22]:
response = top_agent.query(
    "Tell me the NDCs of Singapore, and then compare that with the"
    " NDCs of Malaysia"
)

Added user message to memory: Tell the NDCs of Singapore, and then compare that with the NDCs of Malaysia
=== Calling Function ===
Calling function: tool_Singapore with args: {"input": "Nationally Determined Contributions"}
Added user message to memory: Nationally Determined Contributions
=== Calling Function ===
Calling function: vector_tool with args: {"input":"Nationally Determined Contributions Singapore"}
Got output: Singapore's Nationally Determined Contributions (NDC) reflect its commitment to the Paris Agreement and the multilateral framework for addressing climate change. The country aims to achieve its mitigation objectives primarily through domestic emissions reduction while also exploring opportunities for international cooperation under Article 6 of the Paris Agreement. This includes the potential use of internationally transferred mitigation outcomes (ITMOs), ensuring environmental integrity throughout the process. The NDC incorporates specific assumptions and methodologi

In [23]:
print(response)

### Nationally Determined Contributions (NDCs)

**Singapore:**
- Singapore's NDC demonstrates its commitment to the Paris Agreement and the global effort to combat climate change.
- The country focuses on achieving its emissions reduction goals primarily through domestic measures, while also considering international cooperation under Article 6 of the Paris Agreement.
- This may involve the use of internationally transferred mitigation outcomes (ITMOs), with a strong emphasis on maintaining environmental integrity.
- The NDC outlines specific assumptions and methodologies for tracking progress, which are reported in its Biennial Update Report or Biennial Transparency Report.

**Malaysia:**
- Malaysia's NDC aims to reduce its economy-wide carbon intensity by 45% by 2030 compared to 2005 levels.
- This target is unconditional and represents a 10% increase from the previous submission.
- The NDC addresses seven greenhouse gases: carbon dioxide (CO2), methane (CH4), nitrous oxide (N2O), hy

In [26]:
response = top_agent.query(
    "Tell me the differences between Malaysia's and Indonesia's NDCs in terms of"
    " meeting their climate targets."
)

Added user message to memory: Tell me the differences between Malaysia's and Indonesia's NDCs in terms of meeting their climate targets.
=== Calling Function ===
Calling function: tool_Malaysia with args: {"input": "differences in climate targets and NDCs"}
Added user message to memory: differences in climate targets and NDCs
=== Calling Function ===
Calling function: vector_tool with args: {"input":"differences in climate targets and NDCs in Malaysia"}
Got output: Malaysia's updated Nationally Determined Contribution (NDC) reflects a commitment to reduce its economy-wide carbon intensity by 45% by 2030 compared to 2005 levels. This target represents an increase of 10% from the previous submission. The updated NDC also expands the greenhouse gas coverage to include seven gases: carbon dioxide, methane, nitrous oxide, hydrofluorocarbons, perfluorocarbons, sulfur hexafluoride, and nitrogen trifluoride. 

In terms of fairness and ambition, Malaysia considers its NDC to be aligned with its

In [27]:
print(str(response))

The differences between Malaysia's and Indonesia's Nationally Determined Contributions (NDCs) in terms of meeting their climate targets are as follows:

### Malaysia's NDCs:
1. **Carbon Intensity Reduction**: Malaysia's updated NDC commits to reducing economy-wide carbon intensity by 45% by 2030 compared to 2005 levels, which is a 10% increase from the previous submission.
2. **Greenhouse Gas Coverage**: The updated NDC expands the coverage to include seven greenhouse gases: carbon dioxide, methane, nitrous oxide, hydrofluorocarbons, perfluorocarbons, sulfur hexafluoride, and nitrogen trifluoride.
3. **Economic Context**: The NDC is aligned with Malaysia's national circumstances, particularly considering the significance of the oil and gas industry to its economy.
4. **Unconditional Commitment**: Malaysia's NDC is unconditional, meaning it will pursue these targets through domestic measures without relying on international cooperation.

### Indonesia's NDCs:
1. **Initial and Updated Ta