# Lesson 4: Building a Multi-Document Agent

In this lesson, we extend lesson 3 by handling multiple documents and increasing degrees of complexity.

## Setup

In [1]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [2]:
import nest_asyncio
nest_asyncio.apply()

## 1. Setup an agent over 3 papers

**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`.

In [3]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf",
]

For each paper, create the vector tool and the summary tool.

In [4]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: metagpt.pdf
Getting tools for paper: longlora.pdf
Getting tools for paper: selfrag.pdf


<div><img src="images/lesson4-1.png" style="width: 50%"/>

In [5]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [6]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [7]:
len(initial_tools)

6

In [8]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

<div><img src="images/lesson4-2.png" style="width: 50%"/>

In [9]:
response = agent.query("Tell me about the dataset used in the Metagpt paper")

Added user message to memory: Tell me about the dataset used in the Metagpt paper
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "dataset"}
=== Function Output ===
The dataset mentioned in the context includes HumanEval, MBPP, and SoftwareDev. HumanEval comprises 164 handwritten programming tasks, MBPP consists of 427 Python tasks, and SoftwareDev includes 70 diverse software development tasks covering various scopes like mini-games, image processing algorithms, data visualization, and more. Each task in the SoftwareDev dataset is associated with specific code statistics, documentation statistics, cost statistics, cost of revision, and code executability metrics.
=== LLM Response ===
The dataset used in the MetaGPT paper includes HumanEval, MBPP, and SoftwareDev. HumanEval consists of 164 handwritten programming tasks, MBPP contains 427 Python tasks, and SoftwareDev comprises 70 diverse software development tasks covering various scopes such as min

Here we ask for a summary from 2 papers.

In [10]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework that enhances the quality and factuality of a large language model through a combination of retrieval and self-reflection. It aims to improve the generation quality of the language model by training it to retrieve relevant information on-demand and to reflect on its own output using special tokens called reflection tokens. This framework allows the language model to adaptively retrieve passages, generate responses, and evaluate the relevance and quality of its own output, ultimately leading to improved performance on various tasks compared to existing models like ChatGPT and retrieval-augmented Llama2-chat.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}
=== Function Output ===
LongLoRA is an efficient method for exte

## 2. Setup an agent over 11 papers

Now let's extend the example to 11 papers.

### Download 11 ICLR papers

In [11]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D"
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "loftq.pdf",
    "swebench.pdf",
    "selfrag.pdf",
    "zipformer.pdf",
    "values.pdf",
    "finetune_fair_diffusion.pdf",
    "knowledge_card.pdf",
    "metra.pdf",
    "vr_mcl.pdf"
]

To download these papers, below is the needed code:


    #for url, paper in zip(urls, papers):
         #!wget "{url}" -O "{paper}"
    
    
**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`.

In [12]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: metagpt.pdf
Getting tools for paper: longlora.pdf
Getting tools for paper: loftq.pdf
Getting tools for paper: swebench.pdf
Getting tools for paper: selfrag.pdf
Getting tools for paper: zipformer.pdf
Getting tools for paper: values.pdf
Getting tools for paper: finetune_fair_diffusion.pdf
Getting tools for paper: knowledge_card.pdf
Getting tools for paper: metra.pdf
Getting tools for paper: vr_mcl.pdf


### Extend the Agent with Tool Retrieval

When list of documents get larger (10, 100 or more), in this case 11 documents, we will encounter several issues:

- stuffing all tool selections may not fit into the llm prompt
- costs and latency will increase
- llm get confused and unable to pick the right tool when number of tools are too large

Solution:

Instead on text, we perform RAG on the list of tools. Retrieve the small subset of relevant tools, and feed as input to the reasoning prompt, instead of all the tools.

<div><img src="images/lesson4-3.png" style="width: 50%"/>


In [13]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

First, define and object index over the tools and perform RAG on the tools using object retriever.

In [14]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [15]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [16]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)

In [21]:
tools[0].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_swebench', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [22]:
tools[1].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_metra', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [17]:
tools[2].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_values', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [18]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [19]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench
=== Calling Function ===
Calling function: summary_tool_metra with args: {"input": "evaluation dataset used in MetaGPT"}
=== Function Output ===
The evaluation dataset used in MetaGPT is not explicitly mentioned in the provided context information.
=== Calling Function ===
Calling function: summary_tool_swebench with args: {"input": "evaluation dataset used in SWE-Bench"}


Retrying llama_index.llms.openai.base.OpenAI._achat in 0.15970313835327787 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-vqFYxgzzHIM2dUljWi76qehk on tokens per min (TPM): Limit 60000, Used 56708, Requested 3581. Please try again in 289ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.38518828733557353 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-vqFYxgzzHIM2dUljWi76qehk on tokens per min (TPM): Limit 60000, Used 56671, Requested 3634. Please try again in 305ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.2682222109943

=== Function Output ===
The evaluation dataset used in SWE-Bench consists of task instances collected from real GitHub issues and pull requests across popular Python repositories. It includes task instructions, issue text, retrieved files, an example patch file, and a prompt for generating the patch file. The dataset is constructed by scraping pull requests from the top Python libraries, validated through execution-based verification, and continuously updated with new task instances. The finalized task instances are validated through execution-based steps and saved in a .json file for download, along with corresponding ground truth test results. The dataset involves tasks related to software engineering problems from various repositories, including specific issues like handling QDP files and resolving cyclic dependencies.
=== LLM Response ===
The evaluation dataset used in MetaGPT is not explicitly mentioned in the provided context information. 

On the other hand, the evaluation datas

In [20]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA paper"}


Retrying llama_index.llms.openai.base.OpenAI._achat in 0.357269422514362 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-vqFYxgzzHIM2dUljWi76qehk on tokens per min (TPM): Limit 60000, Used 59587, Requested 3585. Please try again in 3.172s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.36399870525478184 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-vqFYxgzzHIM2dUljWi76qehk on tokens per min (TPM): Limit 60000, Used 59346, Requested 3820. Please try again in 3.166s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 1.0291031191695

=== Function Output ===
The LongLoRA paper introduces an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs) with limited computation cost. It utilizes shifted sparse attention (S2-Attn) during training to approximate long context effectively and efficiently. By combining this with a parameter-efficient fine-tuning regime, LongLoRA demonstrates strong empirical results on various tasks across different Llama2 models. The approach allows for extending the context of LLMs while retaining their original architectures and is compatible with existing techniques like Flash-Attention2. Additionally, LongLoRA improves training speed and memory efficiency compared to conventional methods, making it a valuable contribution to the field of large language models.
=== Calling Function ===
Calling function: summary_tool_loftq with args: {"input": "LoftQ paper"}


Retrying llama_index.llms.openai.base.OpenAI._achat in 0.8639854401143886 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-vqFYxgzzHIM2dUljWi76qehk on tokens per min (TPM): Limit 60000, Used 57024, Requested 3087. Please try again in 111ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.7122240588150683 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-vqFYxgzzHIM2dUljWi76qehk on tokens per min (TPM): Limit 60000, Used 56885, Requested 3226. Please try again in 111ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 1.797960894935826

=== Function Output ===
The LoftQ paper introduces a novel quantization framework designed for Large Language Models (LLMs) that integrates quantization and Low-Rank Adaptation (LoRA) fine-tuning. It aims to bridge the performance gap observed between full fine-tuning and quantization plus LoRA fine-tuning approaches when applied together on a pre-trained model. LoftQ simultaneously quantizes an LLM and finds a suitable low-rank initialization for LoRA fine-tuning, enhancing generalization in downstream tasks. The framework has been evaluated across various natural language tasks, demonstrating superior effectiveness, especially in challenging low-bit scenarios. Additionally, the paper discusses model compression ratios, memory footprints, quantization time, and GLUE dataset statistics, showcasing the comprehensive evaluation and performance of LoftQ.
=== LLM Response ===
The LongLoRA paper introduces an efficient fine-tuning approach that extends the context sizes of pre-trained large

<div><img src="images/lesson4-4.png" style="width: 50%"/>