[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/oviya-raja/ist-402/blob/main/learning-path/W09/W9_Building_Agentic_RAG_LlamaIndex_3_4.ipynb)

---



# Building Agentic RAG with LlamaIndex - Complete Notebook Content

This notebook contains all lessons from the course on building agentic RAG systems using LlamaIndex.

Setup and Installation
First, let's install the required packages.

# Setup and Installation
First, let's install the required packages.

In [1]:
%pip install --upgrade pip
%pip install llama-index
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai
%pip install nest-asyncio
%pip install openai
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Set up OpenAI API Key

In [2]:
# Set up OpenAI API Key
import os
from dotenv import load_dotenv

# Try to get API key from Google Colab userdata first (if running in Colab)
OPENAI_API_KEY = None
try:
    import google.colab
    from google.colab import userdata
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    if OPENAI_API_KEY:
        print("✅ OpenAI API Key loaded from Colab userdata!")
except (ImportError, ValueError):
    # Not running in Colab or userdata not available, try environment variables
    pass

# If not found in Colab userdata, try environment variables
if not OPENAI_API_KEY:
    # Load environment variables from .env file
    load_dotenv()
    
    # Get OpenAI API Key from environment variable
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    if OPENAI_API_KEY:
        print("✅ OpenAI API Key loaded from environment variables!")
    else:
        raise ValueError(
            "OPENAI_API_KEY not found. Please set it in one of the following ways:\n"
            "  - In Google Colab: userdata.set('OPENAI_API_KEY', 'your_key')\n"
            "  - Locally: Create a .env file with OPENAI_API_KEY=your_key\n"
            "  - Or set environment variable: export OPENAI_API_KEY=your_key"
        )

# Ensure the API key is set in the environment for OpenAI libraries
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
print("OpenAI API Key configured successfully!")

✅ OpenAI API Key loaded from environment variables!
OpenAI API Key configured successfully!


In [3]:
import nest_asyncio
nest_asyncio.apply()

# Lesson 1: Router Engine

### Load Data
Download the MetaGPT paper:

In [4]:
# Create data directory if it doesn't exist
import os
os.makedirs("data", exist_ok=True)

# Download the MetaGPT paper
!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O data/metagpt.pdf

--2025-12-14 18:00:14--  https://openreview.net/pdf?id=VtmBAGCN7o
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16911937 (16M) [application/pdf]
Saving to: ‘data/metagpt.pdf’


2025-12-14 18:00:16 (24.2 MB/s) - ‘data/metagpt.pdf’ saved [16911937/16911937]



In [5]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["data/metagpt.pdf"]).load_data()

## Define LLM and Embedding Model

In [6]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [7]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

## Define Summary Index and Vector Index

In [8]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

2025-12-14 18:00:18,147 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


## Define Query Engines and Set Metadata

In [9]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [10]:
from llama_index.core.tools import QueryEngineTool

print("Creating summary tool...")
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)
print(f"✓ Summary tool created successfully")
print(f"  Description: {summary_tool.metadata.description}")

print("\nCreating vector tool...")
vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
)
print(f"✓ Vector tool created successfully")
print(f"  Description: {vector_tool.metadata.description}")
print("\n✓ Both tools are ready to use!")

Creating summary tool...
✓ Summary tool created successfully
  Description: Useful for summarization questions related to MetaGPT

Creating vector tool...
✓ Vector tool created successfully
  Description: Useful for retrieving specific context from the MetaGPT paper.

✓ Both tools are ready to use!


## Define Router Query Engine

In [11]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

In [12]:
response = query_engine.query("What is the summary of the document?")
print(str(response))
print(len(response.source_nodes))

2025-12-14 18:00:18,898 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:18,911 - INFO - Selecting query engine 0: This choice indicates that the document is useful for summarization questions related to MetaGPT..


[1;3;38;5;200mSelecting query engine 0: This choice indicates that the document is useful for summarization questions related to MetaGPT..
[0m

2025-12-14 18:00:20,817 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:21,165 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:22,527 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The document introduces MetaGPT, a meta-programming framework that utilizes Standardized Operating Procedures (SOPs) to enhance multi-agent systems based on Large Language Models (LLMs). It incorporates role specialization, workflow management, and efficient communication mechanisms to improve code generation quality. The framework involves agents like Product Managers, Architects, Engineers, and QA Engineers contributing to different stages of software development. Through iterative testing and feedback mechanisms, MetaGPT achieves state-of-the-art performance on various benchmarks. The document also discusses the development of a "Drawing App" using MetaGPT, detailing the process from creating a user-friendly GUI to implementing color selection functionality. It covers system architecture, implementation using Python libraries, testing procedures, breakdown of tasks, unit testing, performance evaluation, challenges, ethical concerns, and the impact of MetaGPT on programming accessibi

In [13]:
response = query_engine.query(
    "How do agents share information with other agents?"
)
print(str(response))

2025-12-14 18:00:23,629 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:23,632 - INFO - Selecting query engine 1: This choice is more relevant as it specifically mentions retrieving specific context, which would be necessary to understand how agents share information with other agents..


[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it specifically mentions retrieving specific context, which would be necessary to understand how agents share information with other agents..
[0m

2025-12-14 18:00:23,858 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:00:25,306 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Agents share information with other agents by utilizing a shared message pool where they can publish structured messages. Additionally, agents can subscribe to relevant messages based on their profiles. This approach allows for transparent exchange of information among agents, enabling them to access and retrieve necessary information directly from the shared pool without the need for individual inquiries or waiting for responses.


# Lesson 2: Tool Calling

### 1. Define a Simple Tool

In [14]:
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def mystery(x: int, y: int) -> int:
    """Mystery function that operates on top of two numbers."""
    return (x + y) * (x + y)

add_tool = FunctionTool.from_defaults(fn=add)
mystery_tool = FunctionTool.from_defaults(fn=mystery)

In [15]:
# ============================================================================
# EXPLANATION: Using LLM with Function/Tool Calling
# ============================================================================
# This demonstrates how an LLM can intelligently choose and call functions/tools
# based on a natural language query.

from llama_index.llms.openai import OpenAI

# Step 1: Initialize the OpenAI LLM
# This creates a connection to OpenAI's GPT-3.5-turbo model
print("Step 1: Initializing OpenAI LLM (gpt-3.5-turbo)...")
llm = OpenAI(model="gpt-3.5-turbo")
print("✓ LLM initialized\n")

# Step 2: Use predict_and_call to let the LLM decide which tool to use
# The LLM will:
#   - Analyze the query: "Tell me the output of the mystery function on 2 and 9"
#   - Understand it needs to call a function with arguments 2 and 9
#   - Choose the appropriate tool (mystery_tool in this case)
#   - Call the function with the correct arguments
#   - Return the result

print("Step 2: LLM analyzing query and selecting appropriate tool...")
print("Query: 'Tell me the output of the mystery function on 2 and 9'")
print("Available tools: add_tool, mystery_tool")
print("\nLLM reasoning process (verbose=True shows this):")
print("-" * 60)

response = llm.predict_and_call(
    [add_tool, mystery_tool],  # List of available tools the LLM can choose from
    "Tell me the output of the mystery function on 2 and 9",  # User's query
    verbose=True  # Shows the LLM's decision-making process
)

print("-" * 60)
print("\nStep 3: Final response from LLM:")
print("=" * 60)
print(str(response))
print("=" * 60)

# Explanation of what happened:
print("\n" + "=" * 60)
print("WHAT HAPPENED:")
print("=" * 60)
print("1. The LLM received your query asking about 'mystery function'")
print("2. It analyzed the available tools and chose 'mystery_tool'")
print("3. It extracted the arguments: x=2, y=9")
print("4. It called mystery_tool(2, 9) which calculates: (2+9) * (2+9) = 121")
print("5. It returned the result: 121")
print("=" * 60)

Step 1: Initializing OpenAI LLM (gpt-3.5-turbo)...
✓ LLM initialized

Step 2: LLM analyzing query and selecting appropriate tool...
Query: 'Tell me the output of the mystery function on 2 and 9'
Available tools: add_tool, mystery_tool

LLM reasoning process (verbose=True shows this):
------------------------------------------------------------


2025-12-14 18:00:26,319 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


=== Calling Function ===
Calling function: mystery with args: {"x": 2, "y": 9}
=== Function Output ===
121
------------------------------------------------------------

Step 3: Final response from LLM:
121

WHAT HAPPENED:
1. The LLM received your query asking about 'mystery function'
2. It analyzed the available tools and chose 'mystery_tool'
3. It extracted the arguments: x=2, y=9
4. It called mystery_tool(2, 9) which calculates: (2+9) * (2+9) = 121
5. It returned the result: 121


### 2. Define an Auto-Retrieval Tool

In [16]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["data/metagpt.pdf"]).load_data()

In [17]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: data/metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2025-12-14
last_modified_date: 2025-12-14

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4, Jinlin Wang1, Zili Wang, Steven Ka Shing Yau5, Zijuan Lin4,
Liyang Zhou6, Chenyu Ran1, Lingfeng Xiao1,7, Chenglin Wu1†, J¨urgen Schmidhuber2,8
1DeepWisdom, 2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University, 4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University, 6University of Pennsylvania,
7University of California, Berkeley, 8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex 

In [18]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes)
query_engine = vector_index.as_query_engine(similarity_top_k=2)

2025-12-14 18:00:26,960 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [19]:
from llama_index.core.vector_stores import MetadataFilters

query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "2"}
        ]
    )
)

response = query_engine.query(
    "What are some high-level results of MetaGPT?",
)
print(str(response))
for n in response.source_nodes:
    print(n.metadata)

2025-12-14 18:00:27,271 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:00:28,100 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


Some high-level results of MetaGPT include achieving a new state-of-the-art in code generation benchmarks with 85.9% and 87.7% in Pass@1, outperforming other popular frameworks like AutoGPT, LangChain, AgentVerse, and ChatDev. Additionally, MetaGPT demonstrates robustness and efficiency by achieving a 100% task completion rate in experimental evaluations, highlighting its effectiveness in handling higher levels of software complexity and offering extensive functionality.
{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'data/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2025-12-14', 'last_modified_date': '2025-12-14'}


### Define the Auto-Retrieval Tool

In [20]:
from typing import List
from llama_index.core.vector_stores import FilterCondition

def vector_query(
    query: str,
    page_numbers: List[str]
) -> str:
    """Perform a vector search over an index.

    query (str): the string query to be embedded.
    page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
        over all pages. Otherwise, filter by the set of specified pages.

    """

    metadata_dicts = [
        {"key": "page_label", "value": p} for p in page_numbers
    ]

    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts,
            condition=FilterCondition.OR
        )
    )
    response = query_engine.query(query)
    return response


vector_query_tool = FunctionTool.from_defaults(
    name="vector_tool",
    fn=vector_query
)

In [21]:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
response = llm.predict_and_call(
    [vector_query_tool],
    "What are the high-level results of MetaGPT as described on page 2?",
    verbose=True
)
for n in response.source_nodes:
    print(n.metadata)

2025-12-14 18:00:28,989 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:29,186 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


=== Calling Function ===
Calling function: vector_tool with args: {"query": "high-level results of MetaGPT", "page_numbers": ["2"]}


2025-12-14 18:00:29,929 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


=== Function Output ===
MetaGPT achieves a new state-of-the-art (SoTA) in code generation benchmarks with 85.9% and 87.7% in Pass@1. It stands out in handling higher levels of software complexity and offering extensive functionality, demonstrating a 100% task completion rate in experimental evaluations.
{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'data/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2025-12-14', 'last_modified_date': '2025-12-14'}


### Add More Tools

In [22]:
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool

summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
summary_tool = QueryEngineTool.from_defaults(
    name="summary_tool",
    query_engine=summary_query_engine,
    description=(
        "Useful if you want to get a summary of MetaGPT"
    ),
)

In [23]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool],
    "What are the MetaGPT comparisons with ChatDev described on page 8?",
    verbose=True
)
for n in response.source_nodes:
    print(n.metadata)

2025-12-14 18:00:30,434 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:30,607 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


=== Calling Function ===
Calling function: vector_tool with args: {"query": "MetaGPT comparisons with ChatDev", "page_numbers": ["8"]}


2025-12-14 18:00:31,466 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


=== Function Output ===
MetaGPT outperforms ChatDev on the SoftwareDev dataset in various metrics. For example, MetaGPT achieves a higher score in executability, takes less time for execution, uses more tokens but requires fewer tokens to generate one line of code compared to ChatDev. Additionally, MetaGPT shows better performance in code statistics and human revision cost when compared to ChatDev.
{'page_label': '8', 'file_name': 'metagpt.pdf', 'file_path': 'data/metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2025-12-14', 'last_modified_date': '2025-12-14'}


In [24]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool],
    "What is a summary of the paper?",
    verbose=True
)

2025-12-14 18:00:32,490 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:32,584 - INFO - Retrying request to /chat/completions in 0.454054 seconds


=== Calling Function ===
Calling function: summary_tool with args: {"input": "The paper discusses the impact of climate change on biodiversity and ecosystems."}


2025-12-14 18:00:33,191 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:34,878 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:35,340 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


=== Function Output ===
The paper does not discuss the impact of climate change on biodiversity and ecosystems.


# Lesson 3: Building an Agent Reasoning Loop

## Setup Function Calling Agent

In [25]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

In [26]:
from llama_index.core.agent.workflow import FunctionAgent

agent = FunctionAgent(
    tools=[vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

In [27]:
# For FunctionAgent - must use asyncio.run() for async execution
import asyncio

response = asyncio.run(agent.run(
    "Tell me about the agent roles in MetaGPT, and how they communicate."
))
print(str(response))

2025-12-14 18:00:35,870 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:36,484 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:00:36,502 - INFO - Retrying request to /chat/completions in 0.477865 seconds
2025-12-14 18:00:36,985 - INFO - Retrying request to /chat/completions in 0.778235 seconds
2025-12-14 18:00:38,943 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:39,335 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In MetaGPT, the agent roles are defined based on specific tasks and expertise, such as Product Manager, Architect, Project Manager, Engineer, and QA Engineer. These roles are tailored to handle different aspects of the software development process. Communication among these agents is facilitated through a shared message pool where structured messages are published and accessed. Agents can subscribe to relevant messages based on their profiles, allowing for efficient exchange of information. This approach ensures that agents can obtain necessary information from other roles and the environment, enhancing collaboration and productivity within the multi-agent system.


In [28]:
response = asyncio.run(agent.run(
    "Tell me about the evaluation datasets used."
))
print(str(response))

2025-12-14 18:00:40,433 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:40,669 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:00:41,301 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:42,227 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The evaluation datasets used are HumanEval, MBPP, and SoftwareDev.


In [29]:
response = asyncio.run(agent.run("Tell me the results over one of the above datasets."))

print(str(response))

2025-12-14 18:00:42,825 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:43,558 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:00:44,019 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:44,908 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The results over dataset A show that MetaGPT achieved an average score of 3.9, surpassing ChatDev's score of 2.1.


# Lesson 4: Building a Multi-Document Agent

## 1. Setup an Agent Over 3 Papers

In [30]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf",
]

In [31]:
# Download papers
for url, paper in zip(urls, papers):
    !wget "{url}" -O "data/{paper}"

--2025-12-14 18:00:45--  https://openreview.net/pdf?id=VtmBAGCN7o
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16911937 (16M) [application/pdf]
Saving to: ‘data/metagpt.pdf’


2025-12-14 18:00:46 (30.7 MB/s) - ‘data/metagpt.pdf’ saved [16911937/16911937]

--2025-12-14 18:00:46--  https://openreview.net/pdf?id=6PmJoRfdaK
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1168720 (1.1M) [application/pdf]
Saving to: ‘data/longlora.pdf’


2025-12-14 18:00:46 (5.69 MB/s) - ‘data/longlora.pdf’ saved [1168720/1168720]

--2025-12-14 18:00:47--  https://openreview.net/pdf?id=hSyW5go0v8
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.


In [32]:
# Helper function to create tools for each paper
# Works in both Google Colab and local environments
from pathlib import Path
from llama_index.core import SimpleDirectoryReader, SummaryIndex, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import QueryEngineTool

def get_doc_tools(file_path: str, name: str):
    """
    Get vector and summary query engine tools from a document.
    
    This function works in both Google Colab and local environments.
    Make sure Settings.llm and Settings.embed_model are configured before calling this function.
    
    Args:
        file_path: Path to the document file (relative or absolute path)
        name: Name identifier for the document (used in tool descriptions)
    
    Returns:
        tuple: (vector_tool, summary_tool) - QueryEngineTool instances
    """

    # Load documents
    documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
    splitter = SentenceSplitter(chunk_size=1024)
    nodes = splitter.get_nodes_from_documents(documents)

    # Create indices
    vector_index = VectorStoreIndex(nodes)
    summary_index = SummaryIndex(nodes)

    # Create query engines
    vector_query_engine = vector_index.as_query_engine()
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )

    # Create tools with improved descriptions that include paper name variations
    # Map common filename patterns to proper paper names
    paper_name_map = {
        "selfrag": "Self-RAG",
        "longlora": "LongLoRA",
        "metagpt": "MetaGPT",
        "loftq": "LoftQ",
    }
    
    # Get proper paper name or use the provided name
    paper_name = paper_name_map.get(name.lower(), name.replace("_", "-").title())
    
    # Create tools with explicit names and improved descriptions
    vector_tool = QueryEngineTool.from_defaults(
        query_engine=vector_query_engine,
        name=f"{paper_name}_vector_tool",
        description=(
            f"Use this tool to retrieve specific context, details, facts, or information from the {paper_name} paper. "
            f"Use when asked about {paper_name}, {name}, or {name.replace('_', '-')}. "
            f"This tool searches the {paper_name} paper for specific information."
        ),
    )

    summary_tool = QueryEngineTool.from_defaults(
        query_engine=summary_query_engine,
        name=f"{paper_name}_summary_tool",
        description=(
            f"Use this tool to get summaries, overviews, or high-level information about the {paper_name} paper. "
            f"Use when asked for a summary of {paper_name}, {name}, or {name.replace('_', '-')}. "
            f"This tool provides comprehensive summaries of the {paper_name} paper."
        ),
    )

    return vector_tool, summary_tool

In [33]:
import os
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    
    # Check if file exists in data directory before processing
    file_path = f"data/{paper}"
    if not os.path.exists(file_path):
        print(f"  ⚠️  Warning: File '{file_path}' does not exist. Skipping...")
        print(f"  💡 Tip: Make sure you've downloaded all papers first.")
        continue
    
    try:
        vector_tool, summary_tool = get_doc_tools(file_path, Path(paper).stem)
        paper_to_tools_dict[paper] = [vector_tool, summary_tool]
        print(f"  ✓ Successfully created tools for {paper}\n")
    except Exception as e:
        print(f"  ❌ Error processing {paper}: {e}")
        print(f"  Skipping this paper...\n")
        continue

print(f"\n✓ Successfully processed {len(paper_to_tools_dict)} papers")
print(f"Papers with tools: {list(paper_to_tools_dict.keys())}")

Getting tools for paper: metagpt.pdf


2025-12-14 18:00:48,472 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for metagpt.pdf

Getting tools for paper: longlora.pdf


2025-12-14 18:00:49,582 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for longlora.pdf

Getting tools for paper: selfrag.pdf


2025-12-14 18:00:50,834 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for selfrag.pdf


✓ Successfully processed 3 papers
Papers with tools: ['metagpt.pdf', 'longlora.pdf', 'selfrag.pdf']


In [34]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [35]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
print(f"Number of tools: {len(initial_tools)}")

# Debug: Print tool information
if len(initial_tools) > 0:
    print(f"\n✅ Tools available:")
    for i, tool in enumerate(initial_tools[:6]):  # Show first 6 tools
        # ToolMetadata is an object, not a dict - access attributes directly
        if hasattr(tool, 'metadata') and tool.metadata:
            tool_name = getattr(tool.metadata, 'name', 'Unknown')
            tool_desc = getattr(tool.metadata, 'description', 'No description')
        else:
            tool_name = 'Unknown'
            tool_desc = 'No description'
        print(f"  Tool {i+1}: {tool_name}")
        print(f"    Description: {tool_desc[:80]}...")
else:
    print("\n❌ WARNING: No tools available! Make sure papers were processed successfully.")
    print(f"   paper_to_tools_dict has {len(paper_to_tools_dict)} papers")
    print(f"   papers list: {papers}")

Number of tools: 6

✅ Tools available:
  Tool 1: MetaGPT_vector_tool
    Description: Use this tool to retrieve specific context, details, facts, or information from ...
  Tool 2: MetaGPT_summary_tool
    Description: Use this tool to get summaries, overviews, or high-level information about the M...
  Tool 3: LongLoRA_vector_tool
    Description: Use this tool to retrieve specific context, details, facts, or information from ...
  Tool 4: LongLoRA_summary_tool
    Description: Use this tool to get summaries, overviews, or high-level information about the L...
  Tool 5: Self-RAG_vector_tool
    Description: Use this tool to retrieve specific context, details, facts, or information from ...
  Tool 6: Self-RAG_summary_tool
    Description: Use this tool to get summaries, overviews, or high-level information about the S...


In [36]:
# LlamaIndex >= 0.14.6
from llama_index.core.agent.workflow import FunctionAgent

if len(initial_tools) == 0:
    raise ValueError(
        "ERROR: No tools available! Make sure papers were processed successfully.\n"
        f"paper_to_tools_dict has {len(paper_to_tools_dict)} entries.\n"
        "Run the paper processing cell above and check for errors."
    )

# Verify tools are properly configured and callable
print(f"✅ Creating agent with {len(initial_tools)} tools")
tool_names = []
tool_descriptions = []

for tool in initial_tools:
    # ToolMetadata is an object, access attributes directly
    if hasattr(tool, 'metadata') and tool.metadata:
        tool_name = getattr(tool.metadata, 'name', 'Unknown')
        tool_desc = getattr(tool.metadata, 'description', 'No description')
    else:
        tool_name = 'Unknown'
        tool_desc = 'No description'
    
    tool_names.append(tool_name)
    tool_descriptions.append(f"{tool_name}: {tool_desc[:80]}...")
    
    # Verify tool has required methods for FunctionAgent
    has_call = hasattr(tool, 'call') or hasattr(tool, 'acall') or callable(tool)
    if not has_call:
        print(f"⚠️  Warning: Tool {tool_name} may not be callable - missing call methods")

print(f"\n📋 Available tools:")
for desc in tool_descriptions:
    print(f"   {desc}")

# Build a comprehensive system prompt that lists all available tools with descriptions
tool_list_with_desc = "\n".join([f"  - {name}: {desc[:100]}..." for name, desc in zip(tool_names, [getattr(t.metadata, 'description', '') if hasattr(t, 'metadata') and t.metadata else '' for t in initial_tools])])

# Create agent with explicit instructions and tool list
# CRITICAL: FunctionAgent needs tools to be properly formatted QueryEngineTool instances
agent = FunctionAgent(
    tools=initial_tools,  # These should be QueryEngineTool instances from get_doc_tools
    llm=llm,
    verbose=True,  # Shows detailed tool selection and calling - IMPORTANT for debugging
    system_prompt=(
        "You are a research assistant with access to tools for querying academic papers. "
        "CRITICAL: You MUST use the available tools to answer ALL questions. "
        "Never say you cannot retrieve information - always try the tools first.\n\n"
        f"AVAILABLE TOOLS (you MUST use these):\n{tool_list_with_desc}\n\n"
        "TOOL SELECTION RULES:\n"
        "1. When asked about 'Self-RAG' or 'selfrag', you MUST call a tool with 'Self-RAG' in the name.\n"
        "2. When asked about 'LongLoRA' or 'longlora', you MUST call a tool with 'LongLoRA' in the name.\n"
        "3. When asked about 'MetaGPT' or 'metagpt', you MUST call a tool with 'MetaGPT' in the name.\n"
        "4. For summary requests (e.g., 'give me a summary', 'summarize'), use tools ending with '_summary_tool'.\n"
        "5. For specific details, facts, or questions, use tools ending with '_vector_tool'.\n\n"
        "EXAMPLES OF CORRECT TOOL USAGE:\n"
        "- Query: 'Give me a summary of Self-RAG' → MUST call 'Self-RAG_summary_tool'\n"
        "- Query: 'Tell me about LongLoRA' → MUST call 'LongLoRA_summary_tool' or 'LongLoRA_vector_tool'\n"
        "- Query: 'What datasets does MetaGPT use?' → MUST call 'MetaGPT_vector_tool'\n"
        "- Query: 'Give me a summary of both Self-RAG and LongLoRA' → MUST call BOTH 'Self-RAG_summary_tool' AND 'LongLoRA_summary_tool'\n\n"
        "MANDATORY BEHAVIOR:\n"
        "- ALWAYS call at least one tool before responding\n"
        "- If a tool call fails, try another tool for the same paper\n"
        "- NEVER respond without calling a tool first\n"
        "- NEVER say 'I cannot retrieve information' or 'I don't have access' - you have tools, use them!"
    ),
)

print("\n✅ Agent created successfully")
print("💡 Tip: With verbose=True, you'll see detailed logs of tool selection and calls")


✅ Creating agent with 6 tools
   Available tools: MetaGPT_vector_tool, MetaGPT_summary_tool, LongLoRA_vector_tool, LongLoRA_summary_tool, Self-RAG_vector_tool, Self-RAG_summary_tool
✅ Agent created successfully


In [37]:
# Debug: Print all available tools with their names and descriptions
print("="*60)
print("AVAILABLE TOOLS FOR AGENT:")
print("="*60)
if len(initial_tools) == 0:
    print("❌ ERROR: No tools available!")
    print(f"   paper_to_tools_dict has {len(paper_to_tools_dict)} entries")
    print(f"   papers list: {papers}")
    print("\n💡 Make sure you've:")
    print("   1. Downloaded all papers (run the download cell)")
    print("   2. Successfully processed papers (check for errors above)")
else:
    for i, tool in enumerate(initial_tools, 1):
        # ToolMetadata is an object, access attributes directly
        if hasattr(tool, 'metadata') and tool.metadata:
            tool_name = getattr(tool.metadata, 'name', 'Unknown')
            tool_desc = getattr(tool.metadata, 'description', 'No description')
        else:
            tool_name = 'Unknown'
            tool_desc = 'No description'
        print(f"\nTool {i}: {tool_name}")
        print(f"  Description: {tool_desc[:100]}...")
    print(f"\n✅ Total: {len(initial_tools)} tools available")
    print("="*60)

AVAILABLE TOOLS FOR AGENT:

Tool 1: MetaGPT_vector_tool
  Description: Use this tool to retrieve specific context, details, facts, or information from the MetaGPT paper. U...

Tool 2: MetaGPT_summary_tool
  Description: Use this tool to get summaries, overviews, or high-level information about the MetaGPT paper. Use wh...

Tool 3: LongLoRA_vector_tool
  Description: Use this tool to retrieve specific context, details, facts, or information from the LongLoRA paper. ...

Tool 4: LongLoRA_summary_tool
  Description: Use this tool to get summaries, overviews, or high-level information about the LongLoRA paper. Use w...

Tool 5: Self-RAG_vector_tool
  Description: Use this tool to retrieve specific context, details, facts, or information from the Self-RAG paper. ...

Tool 6: Self-RAG_summary_tool
  Description: Use this tool to get summaries, overviews, or high-level information about the Self-RAG paper. Use w...

✅ Total: 6 tools available


In [38]:
# Test if tools are actually callable before using agent
print("🧪 Testing tool callability:\n")
if len(initial_tools) > 0:
    # Test calling a summary tool directly
    test_tool = None
    for tool in initial_tools:
        if 'summary' in str(tool.metadata.name).lower() if hasattr(tool, 'metadata') else '':
            test_tool = tool
            break
    
    if test_tool:
        print(f"Testing tool: {getattr(test_tool.metadata, 'name', 'Unknown')}")
        try:
            # Try to call the tool directly
            test_result = test_tool.call("What is this paper about?")
            print(f"✅ Tool is callable! Result preview: {str(test_result)[:100]}...")
        except Exception as e:
            print(f"❌ Tool call failed: {e}")
            print("   This might be why the agent can't use the tools")
    else:
        print("⚠️  Could not find a summary tool to test")
else:
    print("❌ No tools to test")

🧪 Testing tool callability:

Testing tool: MetaGPT_summary_tool


2025-12-14 18:00:52,253 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:52,699 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:54,168 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


✅ Tool is callable! Result preview: The paper discusses a meta-programming framework called MetaGPT that utilizes Standardized Operating...


In [39]:
import asyncio

response = asyncio.run(agent.run(
    user_msg=(
        "Tell me about the evaluation dataset used in LongLoRA, "
        "and then tell me about the evaluation results"
    )
))
print(str(response))


2025-12-14 18:00:55,039 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:55,297 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:00:55,313 - INFO - Retrying request to /chat/completions in 0.453204 seconds
2025-12-14 18:00:55,314 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:00:55,323 - INFO - Retrying request to /chat/completions in 0.389740 seconds
2025-12-14 18:00:56,175 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:56,839 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:57,348 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The evaluation dataset used in LongLoRA is the PG19 test split. The evaluation results on this dataset show that as the evaluation context length increases, the models achieve better perplexity.


## Test the Agent

The agent is configured with all available tools. It should automatically select and use the appropriate tool based on your query.

In [40]:
import asyncio

response = asyncio.run(
    agent.run(
        user_msg="Give me a summary of both Self-RAG and LongLoRA"
    )
)
print(str(response))


2025-12-14 18:00:58,148 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:59,802 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:00:59,810 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:01:00,070 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:01:00,269 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:01:00,846 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:01:01,986 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:01:02,409 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The Self-RAG framework enhances large language models by integrating retrieval on demand and self-reflection through reflection tokens. It allows for customizing model behaviors at test time and surpasses other models in various tasks, enhancing model performance, factuality, and citation accuracy. The system evaluates text quality based on evidence and instructions to improve precision and quality by integrating fine-grained evaluations and feedback mechanisms.

On the other hand, the LongLoRA method extends the context length of large language models efficiently while preserving accuracy. It introduces S2-Attn during training to approximate self-attention patterns and uses trainable normalization and embedding layers for fine-tuning. This enables significant extensions in context length for models like Llama2 7B and 70B models. LongLoRA demonstrates promising results in long-sequence language modeling and retrieval-based evaluations.


## 2. Setup an Agent Over 11 Papers

In [None]:
# Test the agent with a simple query to verify it calls tools
print("🧪 Testing agent with a simple query:\n")
import asyncio

try:
    # Test with a simple, direct query
    test_response = asyncio.run(agent.run(
        user_msg="Give me a summary of Self-RAG"
    ))
    print(f"\n✅ Agent response preview: {str(test_response)[:300]}...")
    print("\n💡 If you see tool calls in the verbose output above, the agent is working!")
except Exception as e:
    print(f"\n❌ Agent test failed: {type(e).__name__}: {e}")
    print("   Check the verbose output above to see what went wrong")
    import traceback
    traceback.print_exc()

In [41]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    "https://openreview.net/pdf?id=9WD9KwssyT",
    "https://openreview.net/pdf?id=yV6fD7LYkF",
    "https://openreview.net/pdf?id=hnrB5YHoYu",
    "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    "https://openreview.net/pdf?id=TpD2aG1h0D"
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "loftq.pdf",
    "swebench.pdf",
    "selfrag.pdf",
    "zipformer.pdf",
    "values.pdf",
    "finetune_fair_diffusion.pdf",
    "knowledge_card.pdf",
    "metra.pdf",
    "vr_mcl.pdf"
]

In [42]:
# Download all papers
for url, paper in zip(urls, papers):
    !wget "{url}" -O "data/{paper}"

--2025-12-14 18:01:03--  https://openreview.net/pdf?id=VtmBAGCN7o
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16911937 (16M) [application/pdf]
Saving to: ‘data/metagpt.pdf’


2025-12-14 18:01:04 (28.5 MB/s) - ‘data/metagpt.pdf’ saved [16911937/16911937]

--2025-12-14 18:01:04--  https://openreview.net/pdf?id=6PmJoRfdaK
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1168720 (1.1M) [application/pdf]
Saving to: ‘data/longlora.pdf’


2025-12-14 18:01:04 (5.49 MB/s) - ‘data/longlora.pdf’ saved [1168720/1168720]

--2025-12-14 18:01:05--  https://openreview.net/pdf?id=LzPWWPAdY4
Resolving openreview.net (openreview.net)... 34.57.44.88
Connecting to openreview.net (openreview.net)|34.57.44.88|:443... connected.


In [43]:
import os
from pathlib import Path

paper_to_tools_dict = {}

for paper in papers:
    print(f"Getting tools for paper: {paper}")
    
    # Check if file exists in data directory before processing
    file_path = f"data/{paper}"
    if not os.path.exists(file_path):
        print(f"  ⚠️  Warning: File '{file_path}' does not exist. Skipping...")
        print(f"  💡 Tip: Make sure you've downloaded all papers first.")
        continue
    
    try:
        vector_tool, summary_tool = get_doc_tools(file_path, Path(paper).stem)
        paper_to_tools_dict[paper] = [vector_tool, summary_tool]
        print(f"  ✓ Successfully created tools for {paper}")

    except UnicodeEncodeError as e:
        print(f" Unicode error while processing {paper}: {e}")
        try:
            # Attempt to re-read text safely and re-generate tools
            text_bytes = Path(file_path).read_bytes()
            safe_text = text_bytes.decode("utf-8", errors="replace")

            # Optionally, save the cleaned text for inspection
            clean_path = Path(file_path).with_name(Path(file_path).stem + "_clean.txt")
            clean_path.write_text(safe_text, encoding="utf-8")
            print(f"Saved cleaned version: {clean_path}")

            # Retry tool creation if your get_doc_tools can accept a string path
            vector_tool, summary_tool = get_doc_tools(str(clean_path), Path(paper).stem)
            paper_to_tools_dict[paper] = [vector_tool, summary_tool]
            print(f" Retried successfully for {paper}")

        except Exception as inner_e:
            print(f" Still failed on {paper}: {inner_e}")

    except Exception as e:
        print(f" Unexpected error for {paper}: {e}")

print(f"\n✓ Successfully processed {len(paper_to_tools_dict)} papers")
print(f"Papers with tools: {list(paper_to_tools_dict.keys())}")


Getting tools for paper: metagpt.pdf


2025-12-14 18:01:13,447 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for metagpt.pdf
Getting tools for paper: longlora.pdf


2025-12-14 18:01:14,698 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for longlora.pdf
Getting tools for paper: loftq.pdf


2025-12-14 18:01:15,523 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for loftq.pdf
Getting tools for paper: swebench.pdf


2025-12-14 18:01:17,704 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for swebench.pdf
Getting tools for paper: selfrag.pdf


2025-12-14 18:01:19,819 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for selfrag.pdf
Getting tools for paper: zipformer.pdf


2025-12-14 18:01:20,513 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for zipformer.pdf
Getting tools for paper: values.pdf


2025-12-14 18:01:21,890 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for values.pdf
Getting tools for paper: finetune_fair_diffusion.pdf


2025-12-14 18:01:25,971 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for finetune_fair_diffusion.pdf
Getting tools for paper: knowledge_card.pdf


2025-12-14 18:01:27,272 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for knowledge_card.pdf
Getting tools for paper: metra.pdf


2025-12-14 18:01:28,604 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


  ✓ Successfully created tools for metra.pdf
Getting tools for paper: vr_mcl.pdf
 Unicode error while processing vr_mcl.pdf: 'utf-8' codec can't encode character '\ud835' in position 94329: surrogates not allowed
Saved cleaned version: data/vr_mcl_clean.txt


2025-12-14 18:01:31,474 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:32,784 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:33,928 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:34,850 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:36,268 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:37,162 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:37,922 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:39,869 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:40,689 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:41,915 - INFO - HTTP

 Retried successfully for vr_mcl.pdf

✓ Successfully processed 11 papers
Papers with tools: ['metagpt.pdf', 'longlora.pdf', 'loftq.pdf', 'swebench.pdf', 'selfrag.pdf', 'zipformer.pdf', 'values.pdf', 'finetune_fair_diffusion.pdf', 'knowledge_card.pdf', 'metra.pdf', 'vr_mcl.pdf']


## Extend the Agent with Tool Retrieval

In [44]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [45]:
# Define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

2025-12-14 18:01:54,007 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [46]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [47]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)
tools[2].metadata

2025-12-14 18:01:54,211 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


ToolMetadata(description='Use this tool to get summaries, overviews, or high-level information about the Swebench paper. Use when asked for a summary of Swebench, swebench, or swebench. This tool provides comprehensive summaries of the Swebench paper.', name='Swebench_summary_tool', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [48]:
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.core.tools import RetrieverTool  # ✅ wrap retrievers as tools

retriever_tool = RetrieverTool.from_defaults(
    retriever=obj_retriever,
    name="paper_retriever",
    description="Retrieve relevant chunks from the loaded papers."
)

agent = FunctionAgent(
    tools=[retriever_tool],
    llm=llm,
    system_prompt=(
        "You are an agent designed to answer queries over a set of given papers. "
        "Always use the provided tools to answer a question. Do not rely on prior knowledge."
    ),
    verbose=True,
)


In [49]:
import asyncio

response = asyncio.run(
    agent.run(
        user_msg="Give me a summary of both Self-RAG and LongLoRA"
    )
)
print(str(response))


2025-12-14 18:01:55,589 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:01:55,767 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:55,772 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:56,147 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:01:57,047 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:57,079 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:57,598 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


I apologize for the inconvenience, but it seems there is an issue with retrieving information on Self-RAG and LongLoRA. If you have any other questions or need assistance with something else, please feel free to ask.


In [50]:
import asyncio

response = asyncio.run(
    agent.run(
        user_msg=(
            "Tell me about the evaluation dataset used "
            "in MetaGPT and compare it against SWE-Bench."
        )
    )
)
print(str(response))


2025-12-14 18:01:58,664 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:01:58,823 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:01:59,222 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:01:59,893 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:02:00,962 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:02:01,271 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:02:01,751 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:02:02,709 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:02:02,742 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18

I encountered an issue while trying to retrieve information about the evaluation datasets used in MetaGPT and SWE-Bench. Let me try to provide a comparison based on what I know.

MetaGPT uses an evaluation dataset for its testing purposes, but unfortunately, I couldn't retrieve specific details about it. On the other hand, SWE-Bench also utilizes an evaluation dataset for its evaluations. 

If you have any other questions or need further information, feel free to ask.


In [51]:
import asyncio

response = asyncio.run(
    agent.run(
        user_msg=(
            "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
            "Analyze the approach in each paper first."
        )
    )
)
print(str(response))


2025-12-14 18:02:05,572 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:02:06,088 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:02:06,346 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:02:06,799 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:02:07,282 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:02:08,117 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:02:08,510 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-12-14 18:02:09,602 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18:02:09,975 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-12-14 18

I apologize for the inconvenience, but I am unable to retrieve information about the LongLoRA and LoftQ papers at the moment. If you have any other questions or need assistance with a different topic, please feel free to ask.


**End of Notebook**

This complete notebook covers all 4 lessons for building agentic RAG systems with LlamaIndex:


*   Router Engine - Route queries to appropriate tools

*   Tool Calling - Create and use custom function tools
*   Agent Reasoning Loop - Build agents with multi-step reasoning
*   Multi-Document Agent - Scale to multiple documents with tool retrieval