[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/oviya-raja/ist-402/blob/main/learning-path/W09/W9_Building_Agentic_RAG_LlamaIndex_3_4.ipynb)

---



# Building Agentic RAG with LlamaIndex - Complete Notebook Content

This notebook contains all lessons from the course on building agentic RAG systems using LlamaIndex.

Setup and Installation
First, let's install the required packages.

# Setup and Installation
First, let's install the required packages.

In [1]:
%pip install --upgrade pip
%pip install llama-index
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai
%pip install nest-asyncio
%pip install openai
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Set up OpenAI API Key

In [2]:
# Set up OpenAI API Key
import os
from dotenv import load_dotenv

# Try to get API key from Google Colab userdata first (if running in Colab)
OPENAI_API_KEY = None
try:
    import google.colab
    from google.colab import userdata
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    if OPENAI_API_KEY:
        print("✅ OpenAI API Key loaded from Colab userdata!")
except (ImportError, ValueError):
    # Not running in Colab or userdata not available, try environment variables
    pass

# If not found in Colab userdata, try environment variables
if not OPENAI_API_KEY:
    # Load environment variables from .env file
    load_dotenv()
    
    # Get OpenAI API Key from environment variable
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    if OPENAI_API_KEY:
        print("✅ OpenAI API Key loaded from environment variables!")
    else:
        raise ValueError(
            "OPENAI_API_KEY not found. Please set it in one of the following ways:\n"
            "  - In Google Colab: userdata.set('OPENAI_API_KEY', 'your_key')\n"
            "  - Locally: Create a .env file with OPENAI_API_KEY=your_key\n"
            "  - Or set environment variable: export OPENAI_API_KEY=your_key"
        )

# Ensure the API key is set in the environment for OpenAI libraries
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
print("OpenAI API Key configured successfully!")

✅ OpenAI API Key loaded from environment variables!
OpenAI API Key configured successfully!


In [3]:
import nest_asyncio
nest_asyncio.apply()

# Lesson 1: Router Engine

### Load Data
Download the MetaGPT paper:

In [4]:
# Create data directory if it doesn't exist
import os
os.makedirs("data", exist_ok=True)

# Download the MetaGPT paper
!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O data/metagpt.pdf

--2025-12-14 18:39:16--  https://openreview.net/pdf?id=VtmBAGCN7o
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16911937 (16M) [application/pdf]
Saving to: ‘data/metagpt.pdf’


2025-12-14 18:39:17 (28.7 MB/s) - ‘data/metagpt.pdf’ saved [16911937/16911937]



In [5]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["data/metagpt.pdf"]).load_data()

## Define LLM and Embedding Model

In [6]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [7]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

## Define Summary Index and Vector Index

In [8]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

2025-12-14 18:39:19,220 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


## Define Query Engines and Set Metadata

In [9]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [10]:
from llama_index.core.tools import QueryEngineTool

print("Creating summary tool...")
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)
print(f"✓ Summary tool created successfully")
print(f"  Description: {summary_tool.metadata.description}")

print("\nCreating vector tool...")
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
)
print(f"✓ Vector tool created successfully")
print(f"  Description: {vector_tool.metadata.description}")
print("\n✓ Both tools are ready to use!")

Creating summary tool...
✓ Summary tool created successfully
  Description: Useful for summarization questions related to MetaGPT

Creating vector tool...
✓ Vector tool created successfully
  Description: Useful for retrieving specific context from the MetaGPT paper.

✓ Both tools are ready to use!


## Define Router Query Engine

In [11]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

In [12]:
response = query_engine.query("What is the summary of the document?")
print(str(response))
print(len(response.source_nodes))

2025-12-14 18:39:20,934 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:20,943 - INFO - Selecting query engine 0: Useful for summarization questions related to MetaGPT.


[1;3;38;5;200mSelecting query engine 0: Useful for summarization questions related to MetaGPT.
[0m

2025-12-14 18:39:22,350 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:22,920 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:24,385 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The document introduces MetaGPT, a meta-programming framework that utilizes Standardized Operating Procedures (SOPs) to enhance multi-agent systems based on Large Language Models (LLMs). It incorporates role specialization, workflow management, and efficient communication mechanisms to improve code generation quality. Through experiments, MetaGPT demonstrates high performance in collaborative software engineering tasks. The document also details the development of a "Drawing App" using MetaGPT, discussing requirements, UI design, implementation, and testing strategies for the color meter feature. It highlights the use of Python libraries for GUI creation and color selection. Additionally, it covers task distribution among team members, performance comparisons with other models, and ethical considerations related to AI frameworks.
34


In [13]:
response = query_engine.query(
    "How do agents share information with other agents?"
)
print(str(response))

2025-12-14 18:39:25,629 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:25,632 - INFO - Selecting query engine 1: This choice is more relevant as it focuses on retrieving specific context from the MetaGPT paper, which may contain information on how agents share information with other agents..
2025-12-14 18:39:25,788 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it focuses on retrieving specific context from the MetaGPT paper, which may contain information on how agents share information with other agents..
[0m

2025-12-14 18:39:26,586 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Agents share information with other agents by utilizing a shared message pool where they publish structured messages. Additionally, agents can subscribe to relevant messages based on their profiles. This approach allows all agents to exchange messages directly and access messages from other entities transparently, enhancing communication efficiency.


# Lesson 2: Tool Calling

### 1. Define a Simple Tool

In [14]:
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def mystery(x: int, y: int) -> int:
    """Mystery function that operates on top of two numbers."""
    return (x + y) * (x + y)

add_tool = FunctionTool.from_defaults(fn=add)
mystery_tool = FunctionTool.from_defaults(fn=mystery)

In [15]:
# ============================================================================
# EXPLANATION: Using LLM with Function/Tool Calling
# ============================================================================
# This demonstrates how an LLM can intelligently choose and call functions/tools
# based on a natural language query.

from llama_index.llms.openai import OpenAI

# Step 1: Initialize the OpenAI LLM
# This creates a connection to OpenAI's GPT-3.5-turbo model
print("Step 1: Initializing OpenAI LLM (gpt-3.5-turbo)...")
llm = OpenAI(model="gpt-3.5-turbo")
print("✓ LLM initialized\n")

# Step 2: Use predict_and_call to let the LLM decide which tool to use
# The LLM will:
#   - Analyze the query: "Tell me the output of the mystery function on 2 and 9"
#   - Understand it needs to call a function with arguments 2 and 9
#   - Choose the appropriate tool (mystery_tool in this case)
#   - Call the function with the correct arguments
#   - Return the result

print("Step 2: LLM analyzing query and selecting appropriate tool...")
print("Query: 'Tell me the output of the mystery function on 2 and 9'")
print("Available tools: add_tool, mystery_tool")
print("\nLLM reasoning process (verbose=True shows this):")
print("-" * 60)

response = llm.predict_and_call(
    [add_tool, mystery_tool],  # List of available tools the LLM can choose from
    "Tell me the output of the mystery function on 2 and 9",  # User's query
    verbose=True  # Shows the LLM's decision-making process
)

print("-" * 60)
print("\nStep 3: Final response from LLM:")
print("=" * 60)
print(str(response))
print("=" * 60)

# Explanation of what happened:
print("\n" + "=" * 60)
print("WHAT HAPPENED:")
print("=" * 60)
print("1. The LLM received your query asking about 'mystery function'")
print("2. It analyzed the available tools and chose 'mystery_tool'")
print("3. It extracted the arguments: x=2, y=9")
print("4. It called mystery_tool(2, 9) which calculates: (2+9) * (2+9) = 121")
print("5. It returned the result: 121")
print("=" * 60)

Step 1: Initializing OpenAI LLM (gpt-3.5-turbo)...
✓ LLM initialized

Step 2: LLM analyzing query and selecting appropriate tool...
Query: 'Tell me the output of the mystery function on 2 and 9'
Available tools: add_tool, mystery_tool

LLM reasoning process (verbose=True shows this):
------------------------------------------------------------


2025-12-14 18:39:27,778 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


=== Calling Function ===
Calling function: mystery with args: {"x": 2, "y": 9}
=== Function Output ===
121
------------------------------------------------------------

Step 3: Final response from LLM:
121

WHAT HAPPENED:
1. The LLM received your query asking about 'mystery function'
2. It analyzed the available tools and chose 'mystery_tool'
3. It extracted the arguments: x=2, y=9
4. It called mystery_tool(2, 9) which calculates: (2+9) * (2+9) = 121
5. It returned the result: 121


### 2. Define an Auto-Retrieval Tool

In [16]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["data/metagpt.pdf"]).load_data()

In [17]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: data/metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2025-12-14
last_modified_date: 2025-12-14

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4, Jinlin Wang1, Zili Wang, Steven Ka Shing Yau5, Zijuan Lin4,
Liyang Zhou6, Chenyu Ran1, Lingfeng Xiao1,7, Chenglin Wu1†, J¨urgen Schmidhuber2,8
1DeepWisdom, 2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University, 4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University, 6University of Pennsylvania,
7University of California, Berkeley, 8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex 

In [18]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes)
query_engine = vector_index.as_query_engine(similarity_top_k=2)

2025-12-14 18:39:28,726 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [19]:
from llama_index.core.vector_stores import MetadataFilters

query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "2"}
        ]
    )
)

response = query_engine.query(
    "What are some high-level results of MetaGPT?",
)
print(str(response))
for n in response.source_nodes:
    print(n.metadata)

2025-12-14 18:39:29,053 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:39:30,320 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


MetaGPT achieves a new state-of-the-art (SoTA) in code generation benchmarks with 85.9% and 87.7% in Pass@1. It stands out in handling higher levels of software complexity and offering extensive functionality. In experimental evaluations, MetaGPT achieves a 100% task completion rate, demonstrating robustness and efficiency in design.
{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'data/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2025-12-14', 'last_modified_date': '2025-12-14'}


### Define the Auto-Retrieval Tool

In [20]:
from typing import List
from llama_index.core.vector_stores import FilterCondition

def vector_query(
    query: str,
    page_numbers: List[str]
) -> str:
    """Perform a vector search over an index.

    query (str): the string query to be embedded.
    page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
        over all pages. Otherwise, filter by the set of specified pages.

    """

    metadata_dicts = [
        {"key": "page_label", "value": p} for p in page_numbers
    ]

    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts,
            condition=FilterCondition.OR
        )
    )
    response = query_engine.query(query)
    return response


vector_query_tool = FunctionTool.from_defaults(
    name="vector_tool",
    fn=vector_query
)

In [21]:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
response = llm.predict_and_call(
    [vector_query_tool],
    "What are the high-level results of MetaGPT as described on page 2?",
    verbose=True
)
for n in response.source_nodes:
    print(n.metadata)

2025-12-14 18:39:30,812 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:30,921 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


=== Calling Function ===
Calling function: vector_tool with args: {"query": "high-level results of MetaGPT", "page_numbers": ["2"]}


2025-12-14 18:39:31,565 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


=== Function Output ===
MetaGPT achieves a new state-of-the-art in code generation benchmarks with 85.9% and 87.7% in Pass@1. It stands out in handling higher levels of software complexity and offering extensive functionality, demonstrating robustness and efficiency in task completion.
{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'data/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2025-12-14', 'last_modified_date': '2025-12-14'}


### Add More Tools

In [22]:
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool

summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
summary_tool = QueryEngineTool.from_defaults(
    name="summary_tool",
    query_engine=summary_query_engine,
    description=(
        "Useful if you want to get a summary of MetaGPT"
    ),
)

In [23]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool],
    "What are the MetaGPT comparisons with ChatDev described on page 8?",
    verbose=True
)
for n in response.source_nodes:
    print(n.metadata)

2025-12-14 18:39:32,180 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


=== Calling Function ===
Calling function: vector_tool with args: {"query": "MetaGPT comparisons with ChatDev", "page_numbers": ["8"]}


2025-12-14 18:39:32,637 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:39:33,397 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


=== Function Output ===
MetaGPT outperforms ChatDev on the SoftwareDev dataset in various aspects. For example, MetaGPT achieves a higher score in executability, takes less time for execution, requires more tokens but fewer tokens per line of code compared to ChatDev. Additionally, MetaGPT demonstrates superior performance in code statistics and human revision cost when compared to ChatDev.
{'page_label': '8', 'file_name': 'metagpt.pdf', 'file_path': 'data/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2025-12-14', 'last_modified_date': '2025-12-14'}


In [24]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool],
    "What is a summary of the paper?",
    verbose=True
)

2025-12-14 18:39:33,899 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:33,992 - INFO - Retrying request to /chat/completions in 0.435596 seconds


=== Calling Function ===
Calling function: summary_tool with args: {"input": "The paper discusses the impact of climate change on biodiversity and ecosystems."}


2025-12-14 18:39:34,463 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:35,563 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:36,288 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


=== Function Output ===
The paper does not discuss the impact of climate change on biodiversity and ecosystems.


# Lesson 3: Building an Agent Reasoning Loop

## Setup Function Calling Agent

In [25]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

In [26]:
from llama_index.core.agent.workflow import FunctionAgent

agent = FunctionAgent(
    tools=[vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

In [27]:
# For FunctionAgent - must use asyncio.run() for async execution
import asyncio

response = asyncio.run(agent.run(
    "Tell me about the agent roles in MetaGPT, and how they communicate."
))
print(str(response))

2025-12-14 18:39:37,388 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:37,593 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:39:37,602 - INFO - Retrying request to /chat/completions in 0.453312 seconds
2025-12-14 18:39:38,062 - INFO - Retrying request to /chat/completions in 0.885810 seconds
2025-12-14 18:39:40,476 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:41,292 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In MetaGPT, the agent roles include Product Manager, Architect, Project Manager, Engineer, and QA Engineer. These roles are specialized for specific tasks in the software development process. Communication among these agents is facilitated through a shared message pool where structured messages are published and accessed. Agents can subscribe to relevant messages based on their profiles, enabling efficient exchange of information. This approach ensures that agents can retrieve necessary information from the shared pool without the need for direct inquiries, enhancing communication efficiency within the multi-agent system.


In [28]:
response = asyncio.run(agent.run(
    "Tell me about the evaluation datasets used."
))
print(str(response))

2025-12-14 18:39:42,419 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:42,744 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:39:43,935 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:44,400 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The evaluation datasets used are HumanEval, MBPP, and SoftwareDev.


In [29]:
response = asyncio.run(agent.run("Tell me the results over one of the above datasets."))

print(str(response))

2025-12-14 18:39:44,915 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:45,704 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:39:46,854 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:47,610 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The results over the SoftwareDev dataset show that MetaGPT achieved an average score of 3.9, outperforming ChatDev's score of 2.1. Other general intelligent algorithms like AutoGPT scored 1.0, indicating a lower performance in generating executable code effectively. MetaGPT's systematic deconstruction of requirements and structured messaging and feedback mechanisms contribute to its strong performance in communication and code execution.


# Lesson 4: Building a Multi-Document Agent

## 1. Setup an Agent Over 3 Papers

In [30]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf",
]

In [31]:
# Download papers
for url, paper in zip(urls, papers):
    !wget "{url}" -O "data/{paper}"

--2025-12-14 18:39:48--  https://openreview.net/pdf?id=VtmBAGCN7o
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16911937 (16M) [application/pdf]
Saving to: ‘data/metagpt.pdf’


2025-12-14 18:39:49 (25.4 MB/s) - ‘data/metagpt.pdf’ saved [16911937/16911937]

--2025-12-14 18:39:49--  https://openreview.net/pdf?id=6PmJoRfdaK
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1168720 (1.1M) [application/pdf]
Saving to: ‘data/longlora.pdf’


2025-12-14 18:39:50 (5.54 MB/s) - ‘data/longlora.pdf’ saved [1168720/1168720]

--2025-12-14 18:39:50--  https://openreview.net/pdf?id=hSyW5go0v8
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.


In [32]:
# Helper function to create tools for each paper
# Works in both Google Colab and local environments
from pathlib import Path
from llama_index.core import SimpleDirectoryReader, SummaryIndex, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import QueryEngineTool

def get_doc_tools(file_path: str, name: str):
    """
    Get vector and summary query engine tools from a document.
    
    This function works in both Google Colab and local environments.
    Make sure Settings.llm and Settings.embed_model are configured before calling this function.
    
    Args:
        file_path: Path to the document file (relative or absolute path)
        name: Name identifier for the document (used in tool descriptions)
    
    Returns:
        tuple: (vector_tool, summary_tool) - QueryEngineTool instances
    """

    # Load documents
    documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
    splitter = SentenceSplitter(chunk_size=1024)
    nodes = splitter.get_nodes_from_documents(documents)

    # Create indices
    vector_index = VectorStoreIndex(nodes)
    summary_index = SummaryIndex(nodes)

    # Create query engines
    vector_query_engine = vector_index.as_query_engine()
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )

    # Create tools with improved descriptions that include paper name variations
    # Map common filename patterns to proper paper names
    paper_name_map = {
        "selfrag": "Self-RAG",
        "longlora": "LongLoRA",
        "metagpt": "MetaGPT",
        "loftq": "LoftQ",
    }
    
    # Get proper paper name or use the provided name
    paper_name = paper_name_map.get(name.lower(), name.replace("_", "-").title())
    
    # Create tools with explicit names and improved descriptions
    vector_tool = QueryEngineTool.from_defaults(
        query_engine=vector_query_engine,
        name=f"{paper_name}_vector_tool",
        description=(
            f"Use this tool to retrieve specific context, details, facts, or information from the {paper_name} paper. "
            f"Use when asked about {paper_name}, {name}, or {name.replace('_', '-')}. "
            f"This tool searches the {paper_name} paper for specific information."
        ),
    )

    summary_tool = QueryEngineTool.from_defaults(
        query_engine=summary_query_engine,
        name=f"{paper_name}_summary_tool",
        description=(
            f"Use this tool to get summaries, overviews, or high-level information about the {paper_name} paper. "
            f"Use when asked for a summary of {paper_name}, {name}, or {name.replace('_', '-')}. "
            f"This tool provides comprehensive summaries of the {paper_name} paper."
        ),
    )

    return vector_tool, summary_tool

In [33]:
import os
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    
    # Check if file exists in data directory before processing
    file_path = f"data/{paper}"
    if not os.path.exists(file_path):
        print(f"  ⚠️  Warning: File '{file_path}' does not exist. Skipping...")
        print(f"  💡 Tip: Make sure you've downloaded all papers first.")
        continue
    
    try:
        vector_tool, summary_tool = get_doc_tools(file_path, Path(paper).stem)
        paper_to_tools_dict[paper] = [vector_tool, summary_tool]
        print(f"  ✓ Successfully created tools for {paper}\n")
    except Exception as e:
        print(f"  ❌ Error processing {paper}: {e}")
        print(f"  Skipping this paper...\n")
        continue

print(f"\n✓ Successfully processed {len(paper_to_tools_dict)} papers")
print(f"Papers with tools: {list(paper_to_tools_dict.keys())}")

Getting tools for paper: metagpt.pdf


2025-12-14 18:39:52,249 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for metagpt.pdf

Getting tools for paper: longlora.pdf


2025-12-14 18:39:53,555 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for longlora.pdf

Getting tools for paper: selfrag.pdf


2025-12-14 18:39:55,041 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for selfrag.pdf


✓ Successfully processed 3 papers
Papers with tools: ['metagpt.pdf', 'longlora.pdf', 'selfrag.pdf']


In [34]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [35]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
print(f"Number of tools: {len(initial_tools)}")

# Debug: Print tool information
if len(initial_tools) > 0:
    print(f"\n✅ Tools available:")
    for i, tool in enumerate(initial_tools[:6]):  # Show first 6 tools
        # ToolMetadata is an object, not a dict - access attributes directly
        if hasattr(tool, 'metadata') and tool.metadata:
            tool_name = getattr(tool.metadata, 'name', 'Unknown')
            tool_desc = getattr(tool.metadata, 'description', 'No description')
        else:
            tool_name = 'Unknown'
            tool_desc = 'No description'
        print(f"  Tool {i+1}: {tool_name}")
        print(f"    Description: {tool_desc[:80]}...")
else:
    print("\n❌ WARNING: No tools available! Make sure papers were processed successfully.")
    print(f"   paper_to_tools_dict has {len(paper_to_tools_dict)} papers")
    print(f"   papers list: {papers}")

Number of tools: 6

✅ Tools available:
  Tool 1: MetaGPT_vector_tool
    Description: Use this tool to retrieve specific context, details, facts, or information from ...
  Tool 2: MetaGPT_summary_tool
    Description: Use this tool to get summaries, overviews, or high-level information about the M...
  Tool 3: LongLoRA_vector_tool
    Description: Use this tool to retrieve specific context, details, facts, or information from ...
  Tool 4: LongLoRA_summary_tool
    Description: Use this tool to get summaries, overviews, or high-level information about the L...
  Tool 5: Self-RAG_vector_tool
    Description: Use this tool to retrieve specific context, details, facts, or information from ...
  Tool 6: Self-RAG_summary_tool
    Description: Use this tool to get summaries, overviews, or high-level information about the S...


In [36]:
# LlamaIndex >= 0.14.6
from llama_index.core.agent.workflow import FunctionAgent

if len(initial_tools) == 0:
    raise ValueError(
        "ERROR: No tools available! Make sure papers were processed successfully.\n"
        f"paper_to_tools_dict has {len(paper_to_tools_dict)} entries.\n"
        "Run the paper processing cell above and check for errors."
    )

# Verify tools are properly configured and callable
print(f"✅ Creating agent with {len(initial_tools)} tools")
tool_names = []
tool_descriptions = []

for tool in initial_tools:
    # ToolMetadata is an object, access attributes directly
    if hasattr(tool, 'metadata') and tool.metadata:
        tool_name = getattr(tool.metadata, 'name', 'Unknown')
        tool_desc = getattr(tool.metadata, 'description', 'No description')
    else:
        tool_name = 'Unknown'
        tool_desc = 'No description'
    
    tool_names.append(tool_name)
    tool_descriptions.append(f"{tool_name}: {tool_desc[:80]}...")
    
    # Verify tool has required methods for FunctionAgent
    has_call = hasattr(tool, 'call') or hasattr(tool, 'acall') or callable(tool)
    if not has_call:
        print(f"⚠️  Warning: Tool {tool_name} may not be callable - missing call methods")

print(f"\n📋 Available tools:")
for desc in tool_descriptions:
    print(f"   {desc}")

# Build a comprehensive system prompt that lists all available tools with descriptions
tool_list_with_desc = "\n".join([f"  - {name}: {desc[:100]}..." for name, desc in zip(tool_names, [getattr(t.metadata, 'description', '') if hasattr(t, 'metadata') and t.metadata else '' for t in initial_tools])])

# Create agent with explicit instructions and tool list
# CRITICAL: FunctionAgent needs tools to be properly formatted QueryEngineTool instances
agent = FunctionAgent(
    tools=initial_tools,  # These should be QueryEngineTool instances from get_doc_tools
    llm=llm,
    verbose=True,  # Shows detailed tool selection and calling - IMPORTANT for debugging
    system_prompt=(
        "You are a research assistant with access to tools for querying academic papers. "
        "CRITICAL: You MUST use the available tools to answer ALL questions. "
        "Never say you cannot retrieve information - always try the tools first.\n\n"
        f"AVAILABLE TOOLS (you MUST use these):\n{tool_list_with_desc}\n\n"
        "TOOL SELECTION RULES:\n"
        "1. When asked about 'Self-RAG' or 'selfrag', you MUST call a tool with 'Self-RAG' in the name.\n"
        "2. When asked about 'LongLoRA' or 'longlora', you MUST call a tool with 'LongLoRA' in the name.\n"
        "3. When asked about 'MetaGPT' or 'metagpt', you MUST call a tool with 'MetaGPT' in the name.\n"
        "4. For summary requests (e.g., 'give me a summary', 'summarize'), use tools ending with '_summary_tool'.\n"
        "5. For specific details, facts, or questions, use tools ending with '_vector_tool'.\n\n"
        "EXAMPLES OF CORRECT TOOL USAGE:\n"
        "- Query: 'Give me a summary of Self-RAG' → MUST call 'Self-RAG_summary_tool'\n"
        "- Query: 'Tell me about LongLoRA' → MUST call 'LongLoRA_summary_tool' or 'LongLoRA_vector_tool'\n"
        "- Query: 'What datasets does MetaGPT use?' → MUST call 'MetaGPT_vector_tool'\n"
        "- Query: 'Give me a summary of both Self-RAG and LongLoRA' → MUST call BOTH 'Self-RAG_summary_tool' AND 'LongLoRA_summary_tool'\n\n"
        "MANDATORY BEHAVIOR:\n"
        "- ALWAYS call at least one tool before responding\n"
        "- If a tool call fails, try another tool for the same paper\n"
        "- NEVER respond without calling a tool first\n"
        "- NEVER say 'I cannot retrieve information' or 'I don't have access' - you have tools, use them!"
    ),
)

print("\n✅ Agent created successfully")
print("💡 Tip: With verbose=True, you'll see detailed logs of tool selection and calls")


✅ Creating agent with 6 tools

📋 Available tools:
   MetaGPT_vector_tool: Use this tool to retrieve specific context, details, facts, or information from ...
   MetaGPT_summary_tool: Use this tool to get summaries, overviews, or high-level information about the M...
   LongLoRA_vector_tool: Use this tool to retrieve specific context, details, facts, or information from ...
   LongLoRA_summary_tool: Use this tool to get summaries, overviews, or high-level information about the L...
   Self-RAG_vector_tool: Use this tool to retrieve specific context, details, facts, or information from ...
   Self-RAG_summary_tool: Use this tool to get summaries, overviews, or high-level information about the S...

✅ Agent created successfully
💡 Tip: With verbose=True, you'll see detailed logs of tool selection and calls


In [37]:
# Debug: Print all available tools with their names and descriptions
print("="*60)
print("AVAILABLE TOOLS FOR AGENT:")
print("="*60)
if len(initial_tools) == 0:
    print("❌ ERROR: No tools available!")
    print(f"   paper_to_tools_dict has {len(paper_to_tools_dict)} entries")
    print(f"   papers list: {papers}")
    print("\n💡 Make sure you've:")
    print("   1. Downloaded all papers (run the download cell)")
    print("   2. Successfully processed papers (check for errors above)")
else:
    for i, tool in enumerate(initial_tools, 1):
        # ToolMetadata is an object, access attributes directly
        if hasattr(tool, 'metadata') and tool.metadata:
            tool_name = getattr(tool.metadata, 'name', 'Unknown')
            tool_desc = getattr(tool.metadata, 'description', 'No description')
        else:
            tool_name = 'Unknown'
            tool_desc = 'No description'
        print(f"\nTool {i}: {tool_name}")
        print(f"  Description: {tool_desc[:100]}...")
    print(f"\n✅ Total: {len(initial_tools)} tools available")
    print("="*60)

AVAILABLE TOOLS FOR AGENT:

Tool 1: MetaGPT_vector_tool
  Description: Use this tool to retrieve specific context, details, facts, or information from the MetaGPT paper. U...

Tool 2: MetaGPT_summary_tool
  Description: Use this tool to get summaries, overviews, or high-level information about the MetaGPT paper. Use wh...

Tool 3: LongLoRA_vector_tool
  Description: Use this tool to retrieve specific context, details, facts, or information from the LongLoRA paper. ...

Tool 4: LongLoRA_summary_tool
  Description: Use this tool to get summaries, overviews, or high-level information about the LongLoRA paper. Use w...

Tool 5: Self-RAG_vector_tool
  Description: Use this tool to retrieve specific context, details, facts, or information from the Self-RAG paper. ...

Tool 6: Self-RAG_summary_tool
  Description: Use this tool to get summaries, overviews, or high-level information about the Self-RAG paper. Use w...

✅ Total: 6 tools available


In [38]:
# Test if tools are actually callable before using agent
print("🧪 Testing tool callability:\n")
if len(initial_tools) > 0:
    # Test calling a summary tool directly
    test_tool = None
    for tool in initial_tools:
        if 'summary' in str(tool.metadata.name).lower() if hasattr(tool, 'metadata') else '':
            test_tool = tool
            break
    
    if test_tool:
        print(f"Testing tool: {getattr(test_tool.metadata, 'name', 'Unknown')}")
        try:
            # Try to call the tool directly
            test_result = test_tool.call("What is this paper about?")
            print(f"✅ Tool is callable! Result preview: {str(test_result)[:100]}...")
        except Exception as e:
            print(f"❌ Tool call failed: {e}")
            print("   This might be why the agent can't use the tools")
    else:
        print("⚠️  Could not find a summary tool to test")
else:
    print("❌ No tools to test")

🧪 Testing tool callability:

Testing tool: MetaGPT_summary_tool


2025-12-14 18:39:56,690 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:57,675 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:39:59,829 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


✅ Tool is callable! Result preview: This paper introduces a meta-programming framework called MetaGPT that utilizes Standardized Operati...


In [39]:
# CRITICAL DEBUG: Verify tools are actually accessible to the agent
print("🔍 DEBUGGING: Verifying tool accessibility for FunctionAgent\n")
print("=" * 70)

# Check if tools have the right interface for FunctionAgent
for i, tool in enumerate(initial_tools[:3], 1):  # Check first 3 tools
    tool_name = getattr(tool.metadata, 'name', 'Unknown') if hasattr(tool, 'metadata') and tool.metadata else 'Unknown'
    print(f"\nTool {i}: {tool_name}")
    
    # Check for required attributes/methods
    checks = {
        'has_metadata': hasattr(tool, 'metadata'),
        'has_call': hasattr(tool, 'call'),
        'has_acall': hasattr(tool, 'acall'),
        'is_callable': callable(tool),
        'has_fn': hasattr(tool, 'fn'),
        'has_metadata_name': hasattr(tool, 'metadata') and hasattr(tool.metadata, 'name') if hasattr(tool, 'metadata') else False,
    }
    
    for check_name, check_result in checks.items():
        status = "✅" if check_result else "❌"
        print(f"   {status} {check_name}: {check_result}")
    
    # Try to get the function signature that FunctionAgent would see
    if hasattr(tool, 'metadata') and tool.metadata:
        print(f"   Tool description: {getattr(tool.metadata, 'description', 'N/A')[:80]}...")

print("\n" + "=" * 70)
print("💡 If tools don't have 'call' or 'acall', FunctionAgent might not be able to use them")
print("=" * 70)

🔍 DEBUGGING: Verifying tool accessibility for FunctionAgent


Tool 1: MetaGPT_vector_tool
   ✅ has_metadata: True
   ✅ has_call: True
   ✅ has_acall: True
   ✅ is_callable: True
   ❌ has_fn: False
   ✅ has_metadata_name: True
   Tool description: Use this tool to retrieve specific context, details, facts, or information from ...

Tool 2: MetaGPT_summary_tool
   ✅ has_metadata: True
   ✅ has_call: True
   ✅ has_acall: True
   ✅ is_callable: True
   ❌ has_fn: False
   ✅ has_metadata_name: True
   Tool description: Use this tool to get summaries, overviews, or high-level information about the M...

Tool 3: LongLoRA_vector_tool
   ✅ has_metadata: True
   ✅ has_call: True
   ✅ has_acall: True
   ✅ is_callable: True
   ❌ has_fn: False
   ✅ has_metadata_name: True
   Tool description: Use this tool to retrieve specific context, details, facts, or information from ...

💡 If tools don't have 'call' or 'acall', FunctionAgent might not be able to use them


In [40]:
# Test: Directly call a tool to verify it works
print("🧪 Testing direct tool call (bypassing agent):\n")

if len(initial_tools) > 0:
    # Find Self-RAG summary tool
    test_tool = None
    for tool in initial_tools:
        if hasattr(tool, 'metadata') and tool.metadata:
            tool_name = getattr(tool.metadata, 'name', '')
            if 'Self-RAG' in tool_name and 'summary' in tool_name:
                test_tool = tool
                break
    
    if test_tool:
        tool_name = getattr(test_tool.metadata, 'name', 'Unknown')
        print(f"Testing tool: {tool_name}")
        print(f"Query: 'What is Self-RAG about?'\n")
        
        try:
            import asyncio
            async def test_direct_call():
                # Try different ways to call the tool
                if hasattr(test_tool, 'acall'):
                    result = await test_tool.acall("What is Self-RAG about?")
                elif hasattr(test_tool, 'call'):
                    result = test_tool.call("What is Self-RAG about?")
                elif hasattr(test_tool, 'fn'):
                    result = test_tool.fn("What is Self-RAG about?")
                else:
                    raise AttributeError("Tool has no callable method")
                return result
            
            test_result = asyncio.run(test_direct_call())
            print(f"✅ Direct tool call SUCCESSFUL!")
            print(f"Result preview: {str(test_result)[:200]}...")
            print("\n💡 If this works but agent doesn't, the issue is with FunctionAgent tool selection")
        except Exception as e:
            print(f"❌ Direct tool call FAILED: {e}")
            import traceback
            traceback.print_exc()
            print("\n💡 This indicates a problem with the tool itself, not the agent")
    else:
        print("⚠️  Could not find Self-RAG summary tool to test")
else:
    print("❌ No tools available")

2025-12-14 18:39:59,967 - INFO - Retrying request to /chat/completions in 0.498864 seconds
2025-12-14 18:39:59,968 - INFO - Retrying request to /chat/completions in 0.499515 seconds


🧪 Testing direct tool call (bypassing agent):

Testing tool: Self-RAG_summary_tool
Query: 'What is Self-RAG about?'



2025-12-14 18:40:01,955 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:02,390 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:03,664 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


✅ Direct tool call SUCCESSFUL!
Result preview: Self-RAG is a framework that enhances the quality and factuality of large language models by training a single arbitrary LM to adaptively retrieve passages, generate text informed by retrieved passage...

💡 If this works but agent doesn't, the issue is with FunctionAgent tool selection


In [41]:
import asyncio

response = asyncio.run(agent.run(
    user_msg=(
        "Tell me about the evaluation dataset used in LongLoRA, "
        "and then tell me about the evaluation results"
    )
))
print(str(response))


2025-12-14 18:40:04,137 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:04,830 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:05,464 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:06,176 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:06,361 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:06,997 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:07,884 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The evaluation dataset used in LongLoRA is PG19. The evaluation results show that as the evaluation context length increases, the models achieve better perplexity.


## Test the Agent

The agent is configured with all available tools. It should automatically select and use the appropriate tool based on your query.

In [42]:
import asyncio

response = asyncio.run(
    agent.run(
        user_msg="Give me a summary of both Self-RAG and LongLoRA"
    )
)
print(str(response))


2025-12-14 18:40:09,854 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:11,915 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:12,457 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:12,602 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:12,609 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:13,653 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:13,962 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:14,543 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The SELF-RAG framework enhances large language models by incorporating retrieval on demand and self-reflection through special tokens called reflection tokens. It outperforms existing models in correctness, factuality, and fluency, with the ability to customize behaviors during inference. Ablation studies highlight the framework's key components, and its efficiency and accuracy trade-off can be adjusted by controlling retrieval frequency. The LongLoRA method extends the context length of large language models efficiently with minimal accuracy compromise by introducing Shifted Sparse Attention during training. It bridges the gap between low-rank adaptation and full fine-tuning, enabling the extension of LLMs like Llama2. Additionally, the framework for forgery detection focuses on modeling relations between facial action units using the Action Units Relation Transformer and Tampered AU Prediction components, achieving top performance in evaluations and enhancing model generalization.


## 2. Setup an Agent Over 11 Papers

In [43]:
# Test the agent with a simple query to verify it calls tools
print("🧪 Testing agent with a simple query:\n")
import asyncio

try:
    # Test with a simple, direct query
    test_response = asyncio.run(agent.run(
        user_msg="Give me a summary of Self-RAG"
    ))
    print(f"\n✅ Agent response preview: {str(test_response)[:300]}...")
    print("\n💡 If you see tool calls in the verbose output above, the agent is working!")
except Exception as e:
    print(f"\n❌ Agent test failed: {type(e).__name__}: {e}")
    print("   Check the verbose output above to see what went wrong")
    import traceback
    traceback.print_exc()

🧪 Testing agent with a simple query:



2025-12-14 18:40:16,125 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:17,916 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:17,925 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:19,284 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:40:19,728 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



✅ Agent response preview: The SELF-RAG framework enhances large language models by integrating retrieval on demand and self-reflection. It improves factuality and citation accuracy by training models to retrieve, generate, and critique text passages. SELF-RAG outperforms existing models on various tasks and allows customizat...

💡 If you see tool calls in the verbose output above, the agent is working!


In [44]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D"
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "loftq.pdf",
    "swebench.pdf",
    "selfrag.pdf",
    "zipformer.pdf",
    "values.pdf",
    "finetune_fair_diffusion.pdf",
    "knowledge_card.pdf",
    "metra.pdf",
    "vr_mcl.pdf"
]

In [45]:
# Download all papers
for url, paper in zip(urls, papers):
    !wget "{url}" -O "data/{paper}"

--2025-12-14 18:40:20--  https://openreview.net/pdf?id=VtmBAGCN7o
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16911937 (16M) [application/pdf]
Saving to: ‘data/metagpt.pdf’


2025-12-14 18:40:23 (11.2 MB/s) - ‘data/metagpt.pdf’ saved [16911937/16911937]

--2025-12-14 18:40:23--  https://openreview.net/pdf?id=6PmJoRfdaK
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1168720 (1.1M) [application/pdf]
Saving to: ‘data/longlora.pdf’


2025-12-14 18:40:23 (5.56 MB/s) - ‘data/longlora.pdf’ saved [1168720/1168720]

--2025-12-14 18:40:24--  https://openreview.net/pdf?id=LzPWWPAdY4
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.


In [46]:
import os
from pathlib import Path

paper_to_tools_dict = {}

for paper in papers:
    print(f"Getting tools for paper: {paper}")
    
    # Check if file exists in data directory before processing
    file_path = f"data/{paper}"
    if not os.path.exists(file_path):
        print(f"  ⚠️  Warning: File '{file_path}' does not exist. Skipping...")
        print(f"  💡 Tip: Make sure you've downloaded all papers first.")
        continue
    
    try:
        vector_tool, summary_tool = get_doc_tools(file_path, Path(paper).stem)
        paper_to_tools_dict[paper] = [vector_tool, summary_tool]
        print(f"  ✓ Successfully created tools for {paper}")

    except UnicodeEncodeError as e:
        print(f" Unicode error while processing {paper}: {e}")
        try:
            # Attempt to re-read text safely and re-generate tools
            text_bytes = Path(file_path).read_bytes()
            safe_text = text_bytes.decode("utf-8", errors="replace")

            # Optionally, save the cleaned text for inspection
            clean_path = Path(file_path).with_name(Path(file_path).stem + "_clean.txt")
            clean_path.write_text(safe_text, encoding="utf-8")
            print(f"Saved cleaned version: {clean_path}")

            # Retry tool creation if your get_doc_tools can accept a string path
            vector_tool, summary_tool = get_doc_tools(str(clean_path), Path(paper).stem)
            paper_to_tools_dict[paper] = [vector_tool, summary_tool]
            print(f" Retried successfully for {paper}")

        except Exception as inner_e:
            print(f" Still failed on {paper}: {inner_e}")

    except Exception as e:
        print(f" Unexpected error for {paper}: {e}")

print(f"\n✓ Successfully processed {len(paper_to_tools_dict)} papers")
print(f"Papers with tools: {list(paper_to_tools_dict.keys())}")


Getting tools for paper: metagpt.pdf


2025-12-14 18:40:32,500 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for metagpt.pdf
Getting tools for paper: longlora.pdf


2025-12-14 18:40:33,459 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for longlora.pdf
Getting tools for paper: loftq.pdf


2025-12-14 18:40:34,033 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for loftq.pdf
Getting tools for paper: swebench.pdf


2025-12-14 18:40:36,270 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for swebench.pdf
Getting tools for paper: selfrag.pdf


2025-12-14 18:40:38,025 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for selfrag.pdf
Getting tools for paper: zipformer.pdf


2025-12-14 18:40:38,540 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for zipformer.pdf
Getting tools for paper: values.pdf


2025-12-14 18:40:39,801 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for values.pdf
Getting tools for paper: finetune_fair_diffusion.pdf


2025-12-14 18:40:43,808 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for finetune_fair_diffusion.pdf
Getting tools for paper: knowledge_card.pdf


2025-12-14 18:40:45,049 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for knowledge_card.pdf
Getting tools for paper: metra.pdf


2025-12-14 18:40:46,084 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for metra.pdf
Getting tools for paper: vr_mcl.pdf
 Unicode error while processing vr_mcl.pdf: 'utf-8' codec can't encode character '\ud835' in position 94329: surrogates not allowed
Saved cleaned version: data/vr_mcl_clean.txt


2025-12-14 18:40:49,388 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:50,499 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:51,404 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:52,232 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:53,902 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:54,817 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:56,037 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:57,057 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:58,092 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:40:59,301 - INFO - HTTP

 Retried successfully for vr_mcl.pdf

✓ Successfully processed 11 papers
Papers with tools: ['metagpt.pdf', 'longlora.pdf', 'loftq.pdf', 'swebench.pdf', 'selfrag.pdf', 'zipformer.pdf', 'values.pdf', 'finetune_fair_diffusion.pdf', 'knowledge_card.pdf', 'metra.pdf', 'vr_mcl.pdf']


## Extend the Agent with Tool Retrieval

In [47]:
# FIXED: Only iterate over papers that were successfully processed
all_tools = [t for paper in paper_to_tools_dict.keys() for t in paper_to_tools_dict[paper]]
print(f"Total tools created: {len(all_tools)}")

Total tools created: 22


In [48]:
# Define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

2025-12-14 18:41:11,690 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [49]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [50]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)
tools[2].metadata

2025-12-14 18:41:11,964 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


ToolMetadata(description='Use this tool to get summaries, overviews, or high-level information about the Swebench paper. Use when asked for a summary of Swebench, swebench, or swebench. This tool provides comprehensive summaries of the Swebench paper.', name='Swebench_summary_tool', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [51]:
from llama_index.core.agent.workflow import FunctionAgent

# FIXED: Use tool_retriever parameter instead of RetrieverTool
# The tool_retriever dynamically retrieves relevant tools based on the query
# This allows scaling to many documents without overwhelming the LLM context

try:
    # Option 1: Use tool_retriever (preferred for LlamaIndex >= 0.10.30)
    agent = FunctionAgent(
        tools=[],  # Empty - tools are retrieved dynamically via tool_retriever
        tool_retriever=obj_retriever,  # Retrieves top-k relevant tools per query
        llm=llm,
        system_prompt=(
            "You are an agent designed to answer queries over a set of given papers. "
            "Always use the provided tools to answer a question. Do not rely on prior knowledge."
        ),
        verbose=True,
    )
    print(f"✅ Agent created with tool_retriever (can retrieve from {len(all_tools)} tools)")
except TypeError as e:
    if 'tool_retriever' in str(e):
        # Option 2: Fallback - pass all tools directly (works in all versions)
        print("⚠️  tool_retriever not supported, falling back to passing all tools directly")
        agent = FunctionAgent(
            tools=all_tools,  # Pass all tools directly
            llm=llm,
            system_prompt=(
                "You are an agent designed to answer queries over a set of given papers. "
                "Always use the provided tools to answer a question. Do not rely on prior knowledge."
            ),
            verbose=True,
        )
        print(f"✅ Agent created with {len(all_tools)} tools directly")
    else:
        raise e


✅ Agent created with tool_retriever (can retrieve from 22 tools)


In [52]:
import asyncio

response = asyncio.run(
    agent.run(
        user_msg="Give me a summary of both Self-RAG and LongLoRA"
    )
)
print(str(response))


2025-12-14 18:41:12,165 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:12,922 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:13,072 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:14,455 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:14,602 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:15,694 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:16,324 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:16,632 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:16,898 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2

The SELF-RAG framework enhances the quality and factuality of large language models by incorporating retrieval on demand and self-reflection. It utilizes reflection tokens to retrieve, generate, and critique text passages, allowing for customized behaviors at test time through parameter adjustments. Experimental results show that SELF-RAG outperforms other models on various tasks, improving model performance, factuality, and citation accuracy. The model evaluates the relevance, supportiveness, and usefulness of generated text based on given instructions and evidence, ensuring factual support and meeting information needs.

The LongLoRA method extends the context length of large language models efficiently with minimal accuracy compromise. It introduces S2-Attn to approximate standard self-attention patterns during training. By fine-tuning LLMs with LongLoRA, models can achieve extended context lengths, such as up to 100k for 7B models and 32k for 70B models. LongLoRA is effective in su

In [53]:
import asyncio

response = asyncio.run(
    agent.run(
        user_msg=(
            "Tell me about the evaluation dataset used "
            "in MetaGPT and compare it against SWE-Bench."
        )
    )
)
print(str(response))


2025-12-14 18:41:21,142 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:21,954 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:22,170 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:22,348 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:22,484 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:23,760 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:23,902 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:24,342 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:24,737 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12

The evaluation dataset used in MetaGPT is the SoftwareDev dataset, which consists of 70 diverse software development tasks covering scopes like mini-games, image processing algorithms, and data visualization. It includes task prompts for each task and is used for evaluation through human assessments or statistical analysis based on metrics like Executability, Cost, Code Statistics, Productivity, and Human Revision Cost.

In comparison, SWE-Bench's evaluation dataset comprises task instances extracted from various open-source repositories, each with a problem statement, codebase, and associated patches. These task instances are used to evaluate models in generating code patches to resolve the described issues. The dataset includes task instances from repositories like scikit-learn, xarray, requests, django, and sphinx-doc, covering issues related to code functionality, bug fixes, and enhancements. It serves as a benchmark to assess models' success in addressing software engineering task

In [54]:
import asyncio

response = asyncio.run(
    agent.run(
        user_msg=(
            "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
            "Analyze the approach in each paper first."
        )
    )
)
print(str(response))


2025-12-14 18:41:29,032 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:29,757 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:29,961 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:30,059 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:31,183 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:32,094 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:32,500 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:32,642 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:33,415 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2

The LongLoRA paper introduces a method that extends the context length of large language models with minimal accuracy compromise by using shifted sparse attention during training. It involves fine-tuning pre-trained models with LongLoRA to achieve extended context lengths with reduced GPU memory cost and training time. The approach includes trainable normalization and embedding layers to bridge the gap between low-rank adaptation and full fine-tuning.

On the other hand, the LoftQ paper presents a quantization framework for Large Language Models (LLMs) that combines quantization and low-rank approximation to approximate high-precision pre-trained weights. It offers a beneficial initialization for subsequent Low-Rank Adaptation (LoRA) fine-tuning, leading to improved generalization in downstream tasks. LoftQ outperforms existing quantization methods, especially in challenging low-bit scenarios, across various natural language processing tasks. It achieves close to full-finetuning perfor

---

# 🎯 Extensions & Modifications Section

This section demonstrates understanding of Agentic RAG concepts through:
1. Documentation of bug fixes applied
2. Custom query extensions
3. Performance analysis
4. Key learnings and reflections

## 🔧 Bug Fixes Applied

### Bug 1: KeyError in `all_tools` List Comprehension

**Problem:** The original code iterated over all papers in the `papers` list, regardless of whether they were successfully processed:

```python
# ❌ BROKEN: Crashes if any paper failed to process
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]
```

**Root Cause:** If a paper download failed or processing encountered an error, it wouldn't be added to `paper_to_tools_dict`, causing a `KeyError`.

**Fix Applied:**
```python
# ✅ FIXED: Only iterate over successfully processed papers
all_tools = [t for paper in paper_to_tools_dict.keys() for t in paper_to_tools_dict[paper]]
```

---

### Bug 2: Incorrect Agent Architecture with RetrieverTool

**Problem:** The original code wrapped `obj_retriever` in a `RetrieverTool`:

```python
# ❌ BROKEN: Agent can retrieve tools but cannot CALL them
retriever_tool = RetrieverTool.from_defaults(retriever=obj_retriever, ...)
agent = FunctionAgent(tools=[retriever_tool], ...)
```

**Root Cause:** `ObjectIndex` stores tool objects, and `obj_retriever.retrieve()` returns tools. However, wrapping this in `RetrieverTool` means the agent only has one tool (the retriever) - it cannot actually invoke the retrieved document tools!

**Fix Applied:**
```python
# ✅ FIXED: Use tool_retriever parameter for dynamic tool selection
agent = FunctionAgent(
    tools=[],  # Empty - tools retrieved dynamically
    tool_retriever=obj_retriever,  # Agent retrieves AND calls relevant tools
    llm=llm,
    verbose=True,
)
```

**Why This Works:** The `tool_retriever` parameter tells `FunctionAgent` to:
1. Use the retriever to find relevant tools based on the query
2. Make those tools available to the LLM for calling
3. Execute the selected tools and return results

## 🚀 Extension 1: Cross-Paper Comparison Query

Testing the agent's ability to synthesize information across multiple papers on different topics.

In [55]:
# Extension 1: Complex cross-paper analysis
import asyncio

print("="*70)
print("EXTENSION 1: Cross-Paper Comparison")
print("="*70)

response = asyncio.run(agent.run(
    user_msg=(
        "Compare how MetaGPT and Self-RAG each improve LLM outputs. "
        "What are their key innovations and how do they differ in approach?"
    )
))

print("\n📊 RESPONSE:")
print(str(response))

2025-12-14 18:41:36,292 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


EXTENSION 1: Cross-Paper Comparison


2025-12-14 18:41:37,237 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:37,461 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:37,551 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:38,545 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:39,455 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:39,769 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:40,388 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:41,214 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:41,686 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200


📊 RESPONSE:
MetaGPT enhances Language Model (LLM) outputs by incorporating efficient human workflows into multi-agent collaborations, introducing Standardized Operating Procedures (SOPs) to streamline workflows, assigning diverse roles to various agents, and implementing an executive feedback mechanism that debugs and executes code during runtime. These innovations significantly improve code generation quality and transform abstract requirements into detailed designs through granular tasks like requirement analysis and package selection. MetaGPT also utilizes a global message pool and subscription mechanism to streamline communication and filter out irrelevant information, addressing the challenge of information overload. These innovations collectively lead to state-of-the-art performance on various benchmarks and enhance the quality and efficiency of LLM outputs in software development tasks.

On the other hand, Self-RAG improves LLM outputs by incorporating retrieval on demand and s

## 🚀 Extension 2: Efficiency-Focused Query

Testing the agent's ability to identify and compare papers focused on efficiency improvements.

In [56]:
# Extension 2: Efficiency-focused analysis across papers
print("="*70)
print("EXTENSION 2: Efficiency Analysis Across Papers")
print("="*70)

response = asyncio.run(agent.run(
    user_msg=(
        "Which papers focus on making models more efficient? "
        "Compare the efficiency techniques used in LongLoRA and LoftQ."
    )
))

print("\n📊 RESPONSE:")
print(str(response))

EXTENSION 2: Efficiency Analysis Across Papers


2025-12-14 18:41:44,584 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:45,332 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:45,472 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:45,573 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:46,326 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:46,454 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:46,834 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:47,448 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:47,964 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2


📊 RESPONSE:
The efficiency techniques used in LongLoRA include Flash-Attention2, DeepSpeed, Gradient checkpoint, and S2-Attn. These techniques have been instrumental in optimizing training hours and memory usage for models like Llama2, enhancing overall efficiency in deep learning tasks.

On the other hand, LoftQ focuses on efficiency techniques such as quantization methods to achieve performance close to full finetuning with reduced memory requirements. Additionally, low-rank adapters applied to convolutional layers help optimize computational efficiency by approximating convolutional operations with low-rank matrices. These techniques aim to improve efficiency in training and storage while maintaining performance levels.


## 🚀 Extension 3: Specific Technical Detail Query

Testing the vector tool's ability to retrieve specific technical details.

In [57]:
# Extension 3: Specific technical detail retrieval
print("="*70)
print("EXTENSION 3: Technical Detail Retrieval")
print("="*70)

response = asyncio.run(agent.run(
    user_msg=(
        "What specific metrics and benchmarks does MetaGPT use to evaluate "
        "code generation quality? Provide specific numbers if available."
    )
))

print("\n📊 RESPONSE:")
print(str(response))

EXTENSION 3: Technical Detail Retrieval


2025-12-14 18:41:50,146 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:50,990 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:51,214 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:51,446 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:52,778 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:52,982 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:53,975 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"



📊 RESPONSE:
MetaGPT uses the following metrics and benchmarks to evaluate code generation quality:

1. Executability
2. Cost (including running time, token usage, and expenses)
3. Code Statistics (such as code files, lines of code per file, and total code lines)
4. Productivity (calculated as token usage divided by lines of code)
5. Human Revision Cost

The benchmarks used for evaluation are:
1. SoftwareDev
2. HumanEval
3. MBPP

These benchmarks focus on different aspects of code generation quality assessment.


## 🚀 Extension 4: Multi-Paper Summary with Categorization

Testing the agent's ability to categorize and summarize multiple papers.

In [58]:
# Extension 4: Categorize papers by their main contribution
print("="*70)
print("EXTENSION 4: Multi-Paper Categorization")
print("="*70)

response = asyncio.run(agent.run(
    user_msg=(
        "Categorize the following papers by their main focus area: "
        "MetaGPT, Self-RAG, LongLoRA, LoftQ, and SWE-Bench. "
        "Group them into categories like 'Code Generation', 'Retrieval', "
        "'Efficiency', 'Benchmarking', etc."
    )
))

print("\n📊 RESPONSE:")
print(str(response))

EXTENSION 4: Multi-Paper Categorization


2025-12-14 18:41:55,076 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:56,377 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:56,643 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:56,649 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:56,818 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:56,818 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:41:57,516 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:58,102 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:41:58,410 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18


📊 RESPONSE:
Based on the summaries provided:

- **MetaGPT**: Focuses on code generation and enhancing the problem-solving capabilities of multi-agent systems through natural language programming.
- **Self-RAG**: Focuses on enabling engineers to program using natural language and enhancing communication between multiple agents.
- **LongLoRA**: Focuses on an efficient fine-tuning approach that extends the context length of large language models with minimal accuracy compromise.
- **LoftQ**: No specific information available to categorize.
- **SWE-Bench**: Focuses on benchmarking and providing a comprehensive dataset for diverse software development tasks.


## 📈 Performance Analysis

Analyzing the tool retrieval and agent performance.

In [59]:
# Performance Analysis: Measure tool retrieval accuracy
import time

print("="*70)
print("PERFORMANCE ANALYSIS: Tool Retrieval Accuracy")
print("="*70)

test_queries = [
    ("What is Self-RAG?", ["Self-RAG"]),
    ("Explain LongLoRA's approach", ["LongLoRA"]),
    ("Compare MetaGPT and SWE-Bench", ["MetaGPT", "Swebench"]),
]

print("\nTesting tool retrieval for specific queries:\n")

for query, expected_papers in test_queries:
    print(f"Query: '{query}'")
    print(f"Expected tools from: {expected_papers}")
    
    # Retrieve tools
    retrieved = obj_retriever.retrieve(query)
    retrieved_names = [t.metadata.name if hasattr(t, 'metadata') else str(t) for t in retrieved]
    
    print(f"Retrieved tools: {retrieved_names}")
    
    # Check if expected papers are in retrieved tools
    found = [p for p in expected_papers if any(p.lower() in name.lower() for name in retrieved_names)]
    print(f"✅ Found {len(found)}/{len(expected_papers)} expected papers")
    print("-" * 50)

PERFORMANCE ANALYSIS: Tool Retrieval Accuracy

Testing tool retrieval for specific queries:

Query: 'What is Self-RAG?'
Expected tools from: ['Self-RAG']


2025-12-14 18:42:04,872 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


Retrieved tools: ['Self-RAG_vector_tool', 'Self-RAG_summary_tool', 'LongLoRA_vector_tool']
✅ Found 1/1 expected papers
--------------------------------------------------
Query: 'Explain LongLoRA's approach'
Expected tools from: ['LongLoRA']


2025-12-14 18:42:05,084 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


Retrieved tools: ['LongLoRA_summary_tool', 'LongLoRA_vector_tool', 'LoftQ_summary_tool']
✅ Found 1/1 expected papers
--------------------------------------------------
Query: 'Compare MetaGPT and SWE-Bench'
Expected tools from: ['MetaGPT', 'Swebench']


2025-12-14 18:42:05,306 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


Retrieved tools: ['MetaGPT_summary_tool', 'MetaGPT_vector_tool', 'Swebench_summary_tool']
✅ Found 2/2 expected papers
--------------------------------------------------


## 📚 Key Learnings & Reflections

### 1. Router Engine Pattern
The Router Engine intelligently routes queries to the appropriate tool based on query intent:
- **Summary queries** → `summary_tool` (uses tree summarization)
- **Specific detail queries** → `vector_tool` (uses semantic search)

This is more efficient than always using both tools, as it reduces API calls and latency.

### 2. Tool Calling with QueryEngineTool
`QueryEngineTool` wraps LlamaIndex query engines to make them callable by agents:
- Provides `metadata` (name, description) for LLM tool selection
- Implements `call()` and `acall()` methods for sync/async execution
- Tool descriptions are **critical** - they guide the LLM's tool selection

### 3. Agent Reasoning Loop
`FunctionAgent` implements a ReAct-style reasoning loop:
1. **Observe** - Receive user query
2. **Think** - Decide which tool(s) to use
3. **Act** - Call the selected tool(s)
4. **Repeat** - Continue until the query is answered

Setting `verbose=True` is essential for debugging tool selection issues.

### 4. Scaling with ObjectIndex + tool_retriever
For many documents, passing all tools to the agent overwhelms the LLM context. Solution:
- **ObjectIndex** - Stores tools as retrievable objects
- **tool_retriever** - Dynamically retrieves top-k relevant tools per query
- This enables scaling to 100s of documents while keeping context manageable

### 5. Common Pitfalls Discovered
| Pitfall | Solution |
|---------|----------|
| `RetrieverTool` for tool retrieval | Use `tool_retriever` parameter instead |
| Iterating over unprocessed papers | Check `paper_to_tools_dict.keys()` |
| LLM hallucinating tool inputs | Improve tool descriptions with examples |
| Agent not calling tools | Add explicit instructions in system prompt |

## ❓ Embedded Questions Answered

### Q1: What is the difference between a Router and an Agent?

**Router:**
- Single decision point - routes query to ONE tool
- No iteration or multi-step reasoning
- Faster but less flexible
- Example: `RouterQueryEngine` selecting between summary vs. vector tool

**Agent:**
- Multi-step reasoning loop (ReAct pattern)
- Can call MULTIPLE tools in sequence
- Can reason about tool outputs and decide next steps
- More powerful but higher latency/cost
- Example: `FunctionAgent` calling multiple paper tools to answer a comparison query

---

### Q2: When should you use `tool_retriever` vs passing all tools directly?

| Scenario | Approach |
|----------|----------|
| < 10 tools | Pass all tools directly |
| 10-50 tools | Either approach works |
| > 50 tools | Use `tool_retriever` to avoid context overflow |
| Tools have similar names | Pass directly (retrieval might confuse similar tools) |
| Diverse tool set | Use `tool_retriever` for efficiency |

---

### Q3: Why does `FunctionAgent` need `acall` method on tools?

`FunctionAgent` uses async execution internally for:
- Parallel tool calls when multiple tools are needed
- Non-blocking I/O during API calls
- Better performance in production environments

`QueryEngineTool` provides both `call()` (sync) and `acall()` (async), making it compatible with `FunctionAgent`.

## 🏁 Conclusion

This notebook successfully demonstrates:

1. ✅ **Router Engine** - Routing queries to appropriate summary/vector tools
2. ✅ **Tool Calling** - Creating and using `QueryEngineTool` for document access
3. ✅ **Agent Reasoning** - Building `FunctionAgent` with multi-step reasoning
4. ✅ **Multi-Document Scaling** - Using `ObjectIndex` + `tool_retriever` for 11 papers
5. ✅ **Bug Fixes** - Resolved KeyError and incorrect agent architecture issues
6. ✅ **Extensions** - Custom queries demonstrating cross-paper analysis capabilities

### Architecture Summary

```
User Query
    │
    ▼
┌─────────────────┐
│  FunctionAgent  │
│  (with LLM)     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ tool_retriever  │ ──► Retrieves top-k relevant tools
│ (ObjectIndex)   │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────────┐
│         Retrieved Tools                 │
│  ┌──────────┐  ┌──────────┐            │
│  │MetaGPT   │  │Self-RAG  │  ...       │
│  │_vector   │  │_summary  │            │
│  └──────────┘  └──────────┘            │
└────────┬────────────────────────────────┘
         │
         ▼
    Agent calls selected tools
         │
         ▼
    Synthesized Response
```

**End of Notebook**

This complete notebook covers all 4 lessons for building agentic RAG systems with LlamaIndex:


*   Router Engine - Route queries to appropriate tools

*   Tool Calling - Create and use custom function tools
*   Agent Reasoning Loop - Build agents with multi-step reasoning
*   Multi-Document Agent - Scale to multiple documents with tool retrieval