# Building a Multi-Document Agent

## Setup

In [15]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

import nest_asyncio
nest_asyncio.apply()

## 1. Setup an agent over 3 papers

In [4]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf",
]

In [16]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: metagpt.pdf
Getting tools for paper: longlora.pdf
Getting tools for paper: loftq.pdf
Getting tools for paper: swebench.pdf
Getting tools for paper: selfrag.pdf
Getting tools for paper: zipformer.pdf
Getting tools for paper: values.pdf
Getting tools for paper: finetune_fair_diffusion.pdf
Getting tools for paper: knowledge_card.pdf
Getting tools for paper: metra.pdf
Getting tools for paper: vr_mcl.pdf


In [17]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [18]:
print(initial_tools)

[<llama_index.core.tools.function_tool.FunctionTool object at 0x000002084407DCD0>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x00000208440AAB40>, <llama_index.core.tools.function_tool.FunctionTool object at 0x00000208440AFAD0>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x00000208440AC410>, <llama_index.core.tools.function_tool.FunctionTool object at 0x000002084128D3D0>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x0000020844082420>, <llama_index.core.tools.function_tool.FunctionTool object at 0x0000020842BBEFC0>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x000002084087E150>, <llama_index.core.tools.function_tool.FunctionTool object at 0x00000208440AF920>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x0000020844062A50>, <llama_index.core.tools.function_tool.FunctionTool object at 0x0000020843F24B60>, <llama_index.core.tools.query_engine.QueryEngineTool object at 0x0000020843F068D0>, <ll

In [19]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [20]:
len(initial_tools)

22

In [21]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [22]:
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation dataset"}
=== Function Output ===
The evaluation dataset mentioned in the context is called SAFECONV. It is designed for research on conversational safety, annotating unsafe spans in utterances and providing safe alternative responses. The dataset contains unsafespans, unsafe responses, and safe alternative responses for over 100,000 dialogues from social media platforms. It aims to explain why an utterance is unsafe and offers guidance for generating safer responses. Comparisons with other datasets indicate that SAFECONV is more comprehensive and effective in identifying and mitigating unsafe behavior in chatbots.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation results"}
=== Function Output ===
The evaluation r

In [23]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework that enhances the quality and factuality of large language models by training them to retrieve, generate, and critique text passages and their own generation using reflection tokens. It allows for customization of model behaviors at test time and has shown significant improvements in model performance and factuality compared to other models. The system focuses on browser-assisted question-answering and aims to provide accurate and informative responses based on given instructions and evidence, evaluating relevance, supportiveness, and overall utility of the response.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}
=== Function Output ===
LongLoRA is a framework that consists of the Action Units Relation Transformer (A

In [24]:
response = agent.query(
    "Tell me about MetaGPT, compare it with SWE-Bench and LongLoRA"
)
print(str(response))


Added user message to memory: Tell me about MetaGPT, compare it with SWE-Bench and LongLoRA
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "MetaGPT"}
=== Function Output ===
MetaGPT is a meta-programming framework that enhances problem-solving capabilities in multi-agent systems based on Large Language Models (LLMs). It incorporates Standardized Operating Procedures (SOPs) to streamline workflows and improve collaboration among agents. MetaGPT utilizes role specialization, structured communication interfaces, and executable feedback mechanisms to enhance code generation quality during runtime. In experiments, MetaGPT has shown state-of-the-art performance in various benchmarks, outperforming other frameworks in tasks like code generation and software development. It simplifies the process of transforming abstract requirements into detailed class and function designs through a specialized division of labor and standard operating procedures workflow.

## 2. Setup an agent over 11 papers

### Download 11 ICLR papers

In [25]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D"
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "loftq.pdf",
    "swebench.pdf",
    "selfrag.pdf",
    "zipformer.pdf",
    "values.pdf",
    "finetune_fair_diffusion.pdf",
    "knowledge_card.pdf",
    "metra.pdf",
    "vr_mcl.pdf"
]

In [26]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: metagpt.pdf
Getting tools for paper: longlora.pdf
Getting tools for paper: loftq.pdf
Getting tools for paper: swebench.pdf
Getting tools for paper: selfrag.pdf
Getting tools for paper: zipformer.pdf
Getting tools for paper: values.pdf
Getting tools for paper: finetune_fair_diffusion.pdf
Getting tools for paper: knowledge_card.pdf
Getting tools for paper: metra.pdf
Getting tools for paper: vr_mcl.pdf


### Extend the Agent with Tool Retrieval

In [27]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [28]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [29]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [30]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)

In [32]:
tools[2].metadata
tools[1].metadata
tools[0].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_swebench', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [33]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [34]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench
=== Calling Function ===
Calling function: summary_tool_metra with args: {"input": "evaluation dataset used in MetaGPT"}
=== Function Output ===
The evaluation dataset used in MetaGPT is not explicitly mentioned in the provided context information.
=== Calling Function ===
Calling function: summary_tool_swebench with args: {"input": "evaluation dataset used in SWE-Bench"}
=== Function Output ===
The evaluation dataset used in SWE-Bench consists of task instances constructed from pull requests that meet specific criteria, including being merged, resolving issues, and introducing new tests. Each task instance includes the codebase, problem statement, a test patch, and a gold patch. The dataset is validated through execution-based verification to ensure the usability and correctness of the solutions. It is designed to evaluate the performance of language models in generating

In [35]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA paper"}
=== Function Output ===
The LongLoRA paper introduces an efficient fine-tuning approach to extend the context length of large language models, utilizing Shifted Sparse Attention (S2-Attn) during training to approximate standard self-attention patterns. It also presents an Action Units Relation Learning framework that includes the Action Units Relation Transformer (ART) and the Tampered AU Prediction (TAP) processes. The ART encoder focuses on modeling relations between facial action units at AU-agnostic patches for forgery detection, while the TAP process tampers AU-related regions to enhance generalization to unseen manipulation methods. The paper achieves state-of-the-art performance on cross-dataset and cross-manipulation evaluations, demonstrating the e