# Lesson 3: Multi Document Agent

## Setup

In [9]:
import nest_asyncio

nest_asyncio.apply()

## Setup an Agent Over Three Documents

In [10]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "/teamspace/uploads/dlai/metagpt.pdf",
    "/teamspace/uploads/dlai/longlora.pdf",
    "/teamspace/uploads/dlai/selfrag.pdf",
]

In [11]:
from utils import get_doc_tools
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: /teamspace/uploads/dlai/metagpt.pdf


Getting tools for paper: /teamspace/uploads/dlai/longlora.pdf
Getting tools for paper: /teamspace/uploads/dlai/selfrag.pdf


In [12]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [13]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [14]:
len(initial_tools)

6

In [15]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [16]:
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results


=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation dataset"}
=== Function Output ===
The evaluation dataset used in the research presented in the provided context includes the RedPajama dataset, PG19 dataset, LongBench dataset, and LEval dataset.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation results"}
=== Function Output ===
The evaluation results presented in the provided context cover a range of aspects related to model performance, efficiency, and effectiveness across different tasks and context lengths. These results demonstrate the effectiveness of various methods such as S2-Attn and LoRA+ in achieving comparable performance to full fine-tuning while being more efficient. Additionally, the evaluation results highlight the impact of attention patterns on model performance during fine-tuning, with insights into the effectiveness of dilated attention and sparse attention in different sc

In [17]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA


=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework that enhances the quality and factuality of a large language model through retrieval and self-reflection. It incorporates reflection tokens to evaluate its own output during both training and inference, consisting of a generator model, a retriever, and a critic model. By dynamically deciding when to retrieve text passages based on predictions and generating special tokens to evaluate its own predictions, Self-RAG aims to improve an LM's generation quality, including its factual accuracy, without compromising its versatility. This framework has shown significant performance advantages over supervised fine-tuned LLMs and existing retrieval-augmented models in various tasks, demonstrating substantial gains in improving factuality and citation accuracy for long-form generations.
=== Calling Function ===
Calling function: summary_tool_longlora with

## Extend the Agent with Tool Retrieval

In [18]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [19]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [20]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [21]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)

In [22]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [23]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation dataset used in MetaGPT"}
=== Function Output ===
The evaluation dataset used in MetaGPT includes HumanEval, MBPP, and SoftwareDev.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "SWE-Bench dataset"}
=== Function Output ===
The SWE-Bench dataset is not referenced in the provided context information.
=== LLM Response ===
The evaluation dataset used in MetaGPT includes HumanEval, MBPP, and SoftwareDev. However, the SWE-Bench dataset is not referenced in the provided context information.
assistant: The evaluation dataset used in MetaGPT includes HumanEval, MBPP, and SoftwareDev. However, the SWE-Bench dataset is not referenced in the provided context information.


In [24]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LoftQ"}
=== Function Output ===
LoftQ is a method that focuses on efficient fine-tuning and retaining the original architecture during inference. It aims to save substantial fine-tuning costs while maintaining the quality of the original attention. LoftQ allows full access to the entire input via unmodified attention during inference.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}
=== Function Output ===
LongLoRA is an efficient method for extending the context length of Large Language Models (LLMs) while minimizing additional trainable parameters and computational costs. It combines shifted sparse attention (S2-Attn) and LoRA to effectively fine-tune models with longer context windows, maintaining the original model archit