# building an autonomous research agent
- rag with llamaindex
- 1. routing: add decision making to route requests to multiple tools

- 2. tool use: create an interface for the agents to select a tool and generate the right argument for that tool

- 3. multi-step reasoning with a range of tool using the LLM while maintaining language throughout that process

- 4. let users optionally inject guidance during intermediate steps (during reasoning)

In [13]:
import sys
import os
import nest_asyncio # make jupyter nb play nicelly with llamaindex
from helper import load_api_key
OPENAI_API_KEY = load_api_key('openai_pat.txt')
OPENAI_ORG_KEY = load_api_key('openai_org.txt')
import openai 
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# from openai import OpenAI

# # Set up the OpenAI client
# client = OpenAI(
#     api_key=OPENAI_API_KEY,
#     organization=OPENAI_ORG_KEY
# )

#### test API connections

In [12]:
# test API connection
from llama_index.llms.openai import OpenAI

resp = OpenAI().complete("Paul Graham is ")
print(resp)

CompletionResponse(text='a computer scientist, entrepreneur, and venture capitalist. He is best known for co-founding the startup accelerator Y Combinator and for his work on programming languages and web development. Graham has also written several influential essays on technology, startups, and entrepreneurship.', additional_kwargs={}, raw=ChatCompletion(id='chatcmpl-ACzfKF8IQI2pMnZ6qc3HthTMyYt5H', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='a computer scientist, entrepreneur, and venture capitalist. He is best known for co-founding the startup accelerator Y Combinator and for his work on programming languages and web development. Graham has also written several influential essays on technology, startups, and entrepreneurship.', refusal=None, role='assistant', function_call=None, tool_calls=None))], created=1727660342, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=Comple

In [None]:
from llama_index.llms.openai import OpenAI

llm = OpenAI()
resp = llm.stream_complete("Paul Graham is ")
for r in resp:
    print(r.delta, end="")

In [14]:
from llama_index.core.llms import ChatMessage
from llama_index.llms.openai import OpenAI

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = OpenAI().chat(messages)
print(resp)

assistant: Ahoy matey! The name's Captain Rainbowbeard, the most colorful pirate on the seven seas! What can I do for ye today?


## load & parse data, define indexes

In [None]:
# load data
from llama_index.core import SimpleDirectoryReader

# read pdf into a parsed documentation # ICRL 2024 multi-agent paper
documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()

In [None]:
# split the documents into even sized chucks
# split on the order of sentences
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
# define documents into nodes 
nodes = splitter.get_nodes_from_documents(documents)

In [None]:
# global confile setting 
# specify the LLM embedding model to use 
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from utils import *

# Settings.llm = OpenAI(model="gpt-3.5-turbo") 
# Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
Settings.llm = OpenAI(model="gpt-4o") 
Settings.embed_model =OpenAIEmbedding(model="text-embedding-3-large")

## Rounting
- using llm to make a decision (on which query engine to route to )

In [20]:
# define two indexes over these nodes: a summary index & a vector index 
# (index is a set of meta data over our data - can query an index, and different indexes have different retrieval behaviors) 

from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

### define query engines and tools

In [22]:
# turn indexes into query engines and query tools: turn indexes into query engines 
# query engines are derived from each of the query indexes

summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()



# turn indexes into query engines and query tools: set metadata (query tool)
# a query tool for each of the query engine

from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
)

### Define Router 
- llm powered
- pydantic

In [23]:
# LLM powered single selector
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

### test query 

In [24]:

# the verbose output shows the intermediate steps taken
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: The question asks for a summary of the document, which aligns with the purpose of summarization questions related to MetaGPT..
[0mThe document introduces MetaGPT, a meta-programming framework designed for multi-agent collaboration using Large Language Models (LLMs). MetaGPT incorporates Standardized Operating Procedures (SOPs) to streamline workflows and reduce errors in complex tasks. It assigns specific roles to agents, such as Product Manager, Architect, Engineer, and QA Engineer, to break down tasks into manageable subtasks. The framework uses structured communication interfaces and a publish-subscribe mechanism to enhance efficiency and minimize information overload. MetaGPT also includes an executable feedback mechanism to iteratively improve code quality during runtime. The framework has demonstrated state-of-the-art performance on benchmarks like HumanEval and MBPP, and it excels in handling complex software development tasks compared t

In [25]:
# the response comes with sources 
# to inspect the sources:
print(len(response.source_nodes))
# this length of source_nodes exactly = the number of chunks of the entire doc
# the summary engine must have been called since it returns all the chunks corresponding to all the items within its index

34


In [26]:
response = query_engine.query(
    "How do agents share information with other agents?"
)

print(str(response))

[1;3;38;5;200mSelecting query engine 1: The question asks for specific context about how agents share information, which aligns with retrieving specific context from the MetaGPT paper..
[0mAgents share information with other agents by using a shared message pool to publish structured messages. They can also subscribe to relevant messages based on their profiles, allowing them to obtain directional information from other roles and public information from the environment.


### put all above into a module: get_router_query_engine

In [27]:
from utils import get_router_query_engine

query_engine = get_router_query_engine("metagpt.pdf")

In [28]:
response = query_engine.query("Tell me about the ablation study results?")
print(str(response))

[1;3;38;5;200mSelecting query engine 1: The question 'Tell me about the ablation study results?' is asking for specific context from the MetaGPT paper, which aligns with choice 2..
[0mThe ablation study results demonstrate the effectiveness of MetaGPT in addressing challenges related to context utilization, reducing hallucinations in software generation, and managing information overload. The study highlights how MetaGPT's unique designs successfully tackle issues such as ambiguity in natural language descriptions, maintaining information validity in lengthy contexts, reducing code hallucinations, and handling information overload through a global message pool and subscription mechanism.


## Tool Calling
- make a decision on which tool to call + infer what to pass to the tool

### create a tool interface to a python function 

In [29]:
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def mystery(x: int, y: int) -> int: 
    """Mystery function that operates on top of two numbers."""
    return (x + y) * (x + y)

# tool interface allowing them to be used in a prompt in a LLM
add_tool = FunctionTool.from_defaults(fn=add)
mystery_tool = FunctionTool.from_defaults(fn=mystery)

In [31]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")
response = llm.predict_and_call(
    [add_tool, mystery_tool], 
    "Tell me the output of the mystery function on 2 and 9", 
    verbose=True
)
print(str(response))

=== Calling Function ===
Calling function: mystery with args: {"x": 2, "y": 9}
=== Function Output ===
121
121


### define an auto-retrieval tool

In [32]:
# take a look at the very first chunk
# print out the content and the metadata attached 
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-09-29
last_modified_date: 2024-09-29

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks, however, 

### metafilters
- a structured list of tags that helps to return a more precise set of search results 

In [34]:
# build a rag indexing pipeline over these nodes; add embedding to each node; get back a query engine
# different query with metadata filters 

from llama_index.core.vector_stores import MetadataFilters

query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "2"}
        ]
    )
)

response = query_engine.query(
    "What are some high-level results of MetaGPT?", 
)

print(str(response))

MetaGPT achieves a new state-of-the-art (SoTA) in code generation benchmarks with 85.9% and 87.7% in Pass@1. Additionally, it demonstrates a 100% task completion rate in experimental evaluations, highlighting its robustness and efficiency in handling complex software projects.


In [35]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-09-29', 'last_modified_date': '2024-09-29'}


### a function of auto-retrieval tool
- integrate metadata filters into retrieval tool function
- better result by query + optional metadata fileter (page number)
- llmt then infer the metadata filter (page numbers) based on user's query

In [36]:
from typing import List
from llama_index.core.vector_stores import FilterCondition


def vector_query(
    query: str, 
    page_numbers: List[str]
) -> str:
    """Perform a vector search over an index.
    
    query (str): the string query to be embedded.
    page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
        over all pages. Otherwise, filter by the set of specified pages.
    
    """

    metadata_dicts = [
        {"key": "page_label", "value": p} for p in page_numbers
    ]
    
    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts,
            condition=FilterCondition.OR
        )
    )
    response = query_engine.query(query)
    return response
    

vector_query_tool = FunctionTool.from_defaults(
    name="vector_tool",
    fn=vector_query
)

In [37]:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
response = llm.predict_and_call(
    [vector_query_tool], 
    "What are the high-level results of MetaGPT as described on page 2?", 
    verbose=True
)


=== Calling Function ===
Calling function: vector_tool with args: {"query": "high-level results of MetaGPT", "page_numbers": ["2"]}
=== Function Output ===
MetaGPT achieves a new state-of-the-art (SoTA) in code generation benchmarks with Pass@1 scores of 85.9% and 87.7% on HumanEval and MBPP evaluations, respectively. Additionally, it demonstrates a 100% task completion rate in experimental evaluations, highlighting its robustness and efficiency in handling complex software projects.
MetaGPT achieves a new state-of-the-art (SoTA) in code generation benchmarks with Pass@1 scores of 85.9% and 87.7% on HumanEval and MBPP evaluations, respectively. Additionally, it demonstrates a 100% task completion rate in experimental evaluations, highlighting its robustness and efficiency in handling complex software projects.


In [38]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-09-29', 'last_modified_date': '2024-09-29'}


# use the new vector_query_tool 
- with previous summary tool

In [39]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What are the MetaGPT comparisons with ChatDev described on page 8?", 
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "MetaGPT comparisons with ChatDev", "page_numbers": ["8"]}
=== Function Output ===
MetaGPT outperforms ChatDev in nearly all metrics on the SoftwareDev dataset. For executability, MetaGPT achieves a score of 3.75, which is close to flawless, compared to ChatDev's 2.25. MetaGPT also takes less time (503 seconds) than ChatDev (762 seconds). Although MetaGPT uses more tokens (24,613 or 31,255 compared to ChatDev's 19,292), it is more efficient in generating lines of code, needing only 126.5/124.3 tokens per line compared to ChatDev's 248.9 tokens. Additionally, MetaGPT produces more code files, lines of code per file, and total code lines, and it has a lower human revision cost (0.83) compared to ChatDev (2.5).


In [40]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '8', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-09-29', 'last_modified_date': '2024-09-29'}


In [41]:
# still can pick the summary tool when appropriate
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What is a summary of the paper?", 
    verbose=True
)

=== Calling Function ===
Calling function: query_engine_tool with args: {"input": "summary of the paper"}
=== Function Output ===
The paper introduces MetaGPT, a meta-programming framework designed to enhance multi-agent collaboration using Large Language Models (LLMs). MetaGPT integrates Standardized Operating Procedures (SOPs) into prompt sequences to streamline workflows, allowing agents with human-like domain expertise to verify intermediate results and reduce errors. The framework employs an assembly line paradigm to assign diverse roles to various agents, breaking down complex tasks into subtasks. This approach improves the coherence and accuracy of solutions, particularly in collaborative software engineering tasks.

MetaGPT's design includes role specialization, structured communication interfaces, and a publish-subscribe mechanism to enhance communication efficiency among agents. Additionally, it features an executable feedback mechanism that iteratively improves code quality 