# Lesson 4: Building a Multi-Document Agent

## Setup

In [1]:
import os
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [2]:
import nest_asyncio
nest_asyncio.apply()

## 1. Setup an agent over 3 papers

**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`.

In [3]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf",
]

In [5]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: metagpt.pdf
Getting tools for paper: longlora.pdf
Getting tools for paper: selfrag.pdf


In [6]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [7]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [8]:
len(initial_tools)

6

In [9]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [10]:
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation dataset used in LongLoRA"}
=== Function Output ===
The evaluation dataset used in LongLoRA includes the RedPajama dataset, the PG19 dataset, the PG19 validation set, the PG19 test split, the LongAlpaca-12k dataset, and the SAFECONV dataset.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation results of LongLoRA"}
=== Function Output ===
The evaluation results of LongLoRA indicate that it achieves comparable or even superior performance compared to other Llama2-based long-context models like Vicuna and LongChat. It demonstrates efficiency in terms of training hours and GPU memory cost, requiring significantly lower computational overhead and fewer training hours. Additionally, LongLoRA is effective in providing struc

In [11]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework that enhances the quality and factuality of a large language model through retrieval and self-reflection. It trains a single arbitrary language model to adaptively retrieve passages on-demand, generate text informed by these passages, and reflect on both the retrieved passages and its own generations using special tokens called reflection tokens. This framework aims to improve the generation quality and factuality of the language model without compromising its original creativity and versatility.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}


Retrying llama_index.llms.openai.base.OpenAI._achat in 0.4290864797568116 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-LXJdh8mBCdaeYl79io2xzzqI on tokens per min (TPM): Limit 80000, Used 77288, Requested 3393. Please try again in 510ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.08311059922243091 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-LXJdh8mBCdaeYl79io2xzzqI on tokens per min (TPM): Limit 80000, Used 77283, Requested 3410. Please try again in 519ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.30444508482382

=== Function Output ===
LongLoRA is an efficient method for extending the context length of Large Language Models (LLMs) while minimizing computational costs. It combines shifted sparse attention (S2-Attn) with LoRA to achieve this extension, allowing for significant computation savings compared to full fine-tuning. LongLoRA retains the original model architectures and is compatible with existing techniques like Flash-Attention2. Additionally, an improved version called LongLoRA+ enables better performance by allowing the normalization and embedding layers to be trainable during the adaptation process. This method has shown strong empirical results on various tasks and models of different sizes, demonstrating its effectiveness in adapting LLMs to longer context lengths.
=== LLM Response ===
Here are summaries of Self-RAG and LongLoRA:

1. Self-RAG: Self-RAG is a framework that enhances the quality and factuality of a large language model through retrieval and self-reflection. It trains

## 2. Setup an agent over 11 papers

### Download 11 ICLR papers

In [12]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D"
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "loftq.pdf",
    "swebench.pdf",
    "selfrag.pdf",
    "zipformer.pdf",
    "values.pdf",
    "finetune_fair_diffusion.pdf",
    "knowledge_card.pdf",
    "metra.pdf",
    "vr_mcl.pdf"
]

To download these papers, below is the needed code:


    #for url, paper in zip(urls, papers):
         #!wget "{url}" -O "{paper}"
    
    
**Note**: The pdf files are included with this lesson. To access these papers, go to the `File` menu and select`Open...`.

In [15]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: metagpt.pdf
Getting tools for paper: longlora.pdf
Getting tools for paper: loftq.pdf
Getting tools for paper: swebench.pdf
Getting tools for paper: selfrag.pdf
Getting tools for paper: zipformer.pdf
Getting tools for paper: values.pdf
Getting tools for paper: finetune_fair_diffusion.pdf
Getting tools for paper: knowledge_card.pdf
Getting tools for paper: metra.pdf
Getting tools for paper: vr_mcl.pdf


### Extend the Agent with Tool Retrieval

In [16]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [17]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [18]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [19]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)

In [20]:
tools[2].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_values', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [21]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [22]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench
=== Calling Function ===
Calling function: summary_tool_metra with args: {"input": "evaluation dataset used in MetaGPT"}
=== Function Output ===
The evaluation dataset used in MetaGPT is not explicitly mentioned in the provided context information.
=== Calling Function ===
Calling function: summary_tool_swebench with args: {"input": "comparison of evaluation datasets in MetaGPT and SWE-Bench"}
=== Function Output ===
The evaluation datasets in MetaGPT and SWE-Bench differ significantly in terms of the tasks they focus on and the complexity of the challenges presented. MetaGPT's dataset involves self-contained problems that can be solved with a few lines of code, while SWE-Bench's dataset consists of real-world software engineering tasks that are more intricate and involve navigating large code repositories, understanding interactions between functions in different files, 

In [23]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA paper"}


Retrying llama_index.llms.openai.base.OpenAI._achat in 0.21019268753197506 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-LXJdh8mBCdaeYl79io2xzzqI on tokens per min (TPM): Limit 80000, Used 79817, Requested 3071. Please try again in 2.166s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.7341919421383195 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-LXJdh8mBCdaeYl79io2xzzqI on tokens per min (TPM): Limit 80000, Used 79308, Requested 3585. Please try again in 2.169s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.937814448007

=== Function Output ===
The LongLoRA paper introduces an efficient method for extending the context sizes of pre-trained large language models (LLMs) with limited computational cost. It utilizes shifted sparse attention (S2-Attn) during training to approximate long contexts effectively and efficiently. By combining this with parameter-efficient fine-tuning techniques, LongLoRA extends the context window of LLMs while retaining their original architectures. The approach demonstrates strong empirical results on various tasks and models, showcasing significant context extensions while maintaining performance and reducing memory costs compared to full fine-tuning. Additionally, LongLoRA is compatible with existing techniques like Flash-Attention2 and enables supervised fine-tuning with the LongAlpaca dataset.
=== Calling Function ===
Calling function: summary_tool_loftq with args: {"input": "LoftQ paper"}


Retrying llama_index.llms.openai.base.OpenAI._achat in 0.18852901443953807 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-LXJdh8mBCdaeYl79io2xzzqI on tokens per min (TPM): Limit 80000, Used 79751, Requested 3226. Please try again in 2.232s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.5396290620688187 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-LXJdh8mBCdaeYl79io2xzzqI on tokens per min (TPM): Limit 80000, Used 79620, Requested 3353. Please try again in 2.229s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 1.576319395908

=== Function Output ===
The LoftQ paper introduces a quantization framework for Large Language Models (LLMs) that involves applying quantization and low-rank approximation alternately to the original high-precision pre-trained weights. The method aims to provide an initialization for subsequent LoRA fine-tuning. LoftQ has been evaluated on various natural language processing tasks, demonstrating superior performance compared to existing methods like QLoRA, especially in challenging low-bit quantization scenarios. The paper discusses model compression ratios, memory footprints, quantization time, and GLUE dataset statistics, showcasing the effectiveness and robustness of LoftQ in tasks such as natural language understanding, question answering, summarization, and natural language generation.
=== LLM Response ===
The LongLoRA paper introduces an efficient method for extending the context sizes of pre-trained large language models (LLMs) with limited computational cost. It utilizes shifte