In [1]:
!python3 -m pip install llama-index==0.10.27 llama-index-llms-openai==0.1.15 llama-index-embeddings-openai==0.1.7

Collecting llama-index==0.10.27
  Downloading llama_index-0.10.27-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-llms-openai==0.1.15
  Downloading llama_index_llms_openai-0.1.15-py3-none-any.whl.metadata (559 bytes)
Collecting llama-index-embeddings-openai==0.1.7
  Downloading llama_index_embeddings_openai-0.1.7-py3-none-any.whl.metadata (603 bytes)
Collecting llama-index-agent-openai<0.3.0,>=0.1.4 (from llama-index==0.10.27)
  Downloading llama_index_agent_openai-0.2.7-py3-none-any.whl.metadata (678 bytes)
Collecting llama-index-cli<0.2.0,>=0.1.2 (from llama-index==0.10.27)
  Downloading llama_index_cli-0.1.12-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.11.0,>=0.10.27 (from llama-index==0.10.27)
  Downloading llama_index_core-0.10.48-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-indices-managed-llama-cloud<0.2.0,>=0.1.2 (from llama-index==0.10.27)
  Downloading llama_index_indices_managed_llama_cloud-0.1.6-py3-none-any.whl.metadata (3.8 kB)
C

In [2]:
import os
openai_api_key = os.getenv("OPENAI_API_KEY")

In [3]:
# Allow nested event loops
import nest_asyncio

nest_asyncio.apply()

## Load Data

In [4]:
!wget https://arxiv.org/pdf/2308.00352 -O metagpt.pdf

--2024-06-22 08:43:46--  https://arxiv.org/pdf/2308.00352
Resolving arxiv.org (arxiv.org)... 151.101.131.42, 151.101.195.42, 151.101.67.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.131.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16715764 (16M) [application/pdf]
Saving to: ‘metagpt.pdf’


2024-06-22 08:44:17 (518 KB/s) - ‘metagpt.pdf’ saved [16715764/16715764]



In [5]:
# Parser
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()

In [6]:
# Chunking
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [8]:
# Setup LLM Model and Embedding Model
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

## How  Each Index Works
https://docs.llamaindex.ai/en/v0.10.17/module_guides/indexing/index_guide.html

### Create a Summary Index and Vextor Index
We will be using two functions from llama_index.core for the same. 
#### SummaryIndex: 
The summary index is a simple data structure where nodes are stored in a sequence. During index construction, the document texts are chunked up, converted to nodes, and stored in a list.

During query time, the summary index iterates through the nodes with some optional filter parameters, and synthesizes an answer from all the nodes.

#### VectorStoreIndex:
We use the vector store within the index to store embeddings for the input text chunks. Once constructed, the index can be used for querying. Default Vector Store Index Construction/Querying. By default, VectorStoreIndex uses an in-memory SimpleVectorStore that's initialized as part of the default storage context.

In [10]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

Here, we will be using Query engine pattern frmo llamaindex. 
Query engine is a generic interface that allows you to ask question over your data.

A query engine takes in a natural language query, and returns a rich response. It is most often (but not always) built on one or many indexes via retrievers. You can compose multiple query engines to achieve more advanced capability.

In [11]:
# Define Query Engine
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize",
    use_async=True,)
vector_query_engine = vector_index.as_query_engine()

In [12]:
from llama_index.core.tools import QueryEngineTool


summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to MetaGPT"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "Useful for retrieving specific context from the MetaGPT paper."
    ),
)

In [14]:
# Define Router Query Engine
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector


query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    verbose=True
)

In [15]:
response = query_engine.query("What is the summary of the document?")
print(str(response))

[1;3;38;5;200mSelecting query engine 0: Useful for summarization questions related to MetaGPT.
[0mThe document introduces MetaGPT, a meta-programming framework that enhances multi-agent collaboration in software development by utilizing Large Language Models (LLMs) and Standardized Operating Procedures (SOPs). It emphasizes role specialization, structured communication interfaces, and efficient sharing mechanisms to streamline workflows and improve code generation quality. MetaGPT outperforms other approaches in benchmarks, showcasing its state-of-the-art performance in generating high-quality code efficiently. The framework models a group of agents as a simulated software company, with roles assigned based on human-like domain expertise, and incorporates executable feedback mechanisms to enhance problem-solving capabilities during runtime. The document also discusses the development process using MetaGPT, highlighting steps from user input commands to the creation of functional appl

In [16]:
response = query_engine.query(
    "How do agents share information with other agents?"
)
print(str(response))

[1;3;38;5;200mSelecting query engine 1: This choice is more relevant as it focuses on retrieving specific context from the MetaGPT paper, which may provide insights on how agents share information with other agents..
[0mAgents share information with other agents by utilizing a shared message pool. This pool allows agents to publish structured messages and access messages from other entities directly, enhancing communication efficiency. Additionally, agents can subscribe to relevant information based on their role profiles, enabling them to extract necessary information and avoid distractions from irrelevant details.


In a basic RAG Pipeline, LLM's are only used for synthesis. 

## Tool Calling
* Tool calling enables LLMs to interact with external environments through a dynamic interface where the tool calling not only helps choosing the appropriate tool 
but also infer necessary arguments for the execution.
* In standard RAG, LLMs are mainly used for the synthesis of information only.
* Tool calling adds a layer of query understanding on top a RAG Pipeline, enable users to ask complex queries and get back more precise results.

In [17]:
# Define a simple tool
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def mystery(x: int, y: int) -> int: 
    """Mystery function that operates on top of two numbers."""
    return (x + y) * (x + y)


add_tool = FunctionTool.from_defaults(fn=add)
mystery_tool = FunctionTool.from_defaults(fn=mystery)

In [18]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")
response = llm.predict_and_call(
    [add_tool, mystery_tool], 
    "Tell me the output of the mystery function on 2 and 9", 
    verbose=True
)
print(str(response))

=== Calling Function ===
Calling function: mystery with args: {"x": 2, "y": 9}
=== Function Output ===
121
121


In the previous example, the LLM pick the tool, here the LLM also decides what parameters to give to the tool. Lets use this key concepts to define a slightly 
more sophisticated agentic layer on top of vector search. Not only can LLM choose vector search we can also get it to infer metadata filter which is a structured list of 
tags that helps to return a more precise set of search results. 

In [19]:
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: metagpt.pdf
file_type: application/pdf
file_size: 16715764
creation_date: 2024-06-22
last_modified_date: 2023-11-07

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks, however, 

In [20]:
# lets create a base RAG pipeline with the vector store
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes)
query_engine = vector_index.as_query_engine(similarity_top_k=2)

In [21]:
from llama_index.core.vector_stores import MetadataFilters

query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "2"}
        ]
    )
)

response = query_engine.query(
    "What are some high-level results of MetaGPT?", 
)

In [22]:
print(str(response))

MetaGPT achieves a new state-of-the-art in code generation benchmarks with 85.9% and 87.7% in Pass@1. It outperforms other popular frameworks like AutoGPT, LangChain, AgentVerse, and ChatDev in handling higher levels of software complexity and offering extensive functionality. Additionally, MetaGPT demonstrates a 100% task completion rate in experimental evaluations, showcasing its robustness and efficiency in terms of time and token costs.


In [23]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16715764, 'creation_date': '2024-06-22', 'last_modified_date': '2023-11-07'}


## Now lets build the auto-retrieval Tool
### Enhancing Data retrieval

* Integrating Metadata filter into a retrieval tool function.
* This fuction enables more precise retrieval by accepting a query string and optional metadata filters, such as page numbers.
* The LLM can intelligently infer relevant metadata filters i.e page number based on the users query.



In [24]:
from typing import List
from llama_index.core.vector_stores import FilterCondition


def vector_query(
    query: str, 
    page_numbers: List[str]
) -> str:
    """Perform a vector search over an index.
    
    query (str): the string query to be embedded.
    page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
        over all pages. Otherwise, filter by the set of specified pages.
    
    """

    metadata_dicts = [
        {"key": "page_label", "value": p} for p in page_numbers
    ]
    
    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts,
            condition=FilterCondition.OR
        )
    )
    response = query_engine.query(query)
    return response
    

vector_query_tool = FunctionTool.from_defaults(
    name="vector_tool",
    fn=vector_query
)

In [25]:
llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
response = llm.predict_and_call(
    [vector_query_tool], 
    "What are the high-level results of MetaGPT as described on page 2?", 
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "high-level results of MetaGPT", "page_numbers": ["2"]}
=== Function Output ===
MetaGPT achieves a new state-of-the-art in code generation benchmarks with 85.9% and 87.7% in Pass@1. It stands out in handling higher levels of software complexity and offering extensive functionality. In experimental evaluations, MetaGPT achieves a 100% task completion rate, demonstrating robustness and efficiency in design.


In [26]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16715764, 'creation_date': '2024-06-22', 'last_modified_date': '2023-11-07'}


In [27]:
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool

summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
summary_tool = QueryEngineTool.from_defaults(
    name="summary_tool",
    query_engine=summary_query_engine,
    description=(
        "Useful if you want to get a summary of MetaGPT"
    ),
)

In [28]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What are the MetaGPT comparisons with ChatDev described on page 8?", 
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "MetaGPT comparisons with ChatDev", "page_numbers": ["8"]}
=== Function Output ===
MetaGPT outperforms ChatDev in several aspects based on the statistical analysis provided. MetaGPT shows higher scores in executability, running times, token usage, code statistics, productivity, and human revision cost compared to ChatDev. This indicates that MetaGPT offers better performance and efficiency in software development tasks when compared to ChatDev.


In [29]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '8', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16715764, 'creation_date': '2024-06-22', 'last_modified_date': '2023-11-07'}


In [31]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool], 
    "What is a summary of the paper?", 
    verbose=True
)

=== Calling Function ===
Calling function: summary_tool with args: {"input": "The paper discusses the impact of climate change on biodiversity and ecosystems."}
=== Function Output ===
The paper does not discuss the impact of climate change on biodiversity and ecosystems.


## Building an Agent Reasoning Loop
So far, our queries have been done in a single forward pass. Given the query, call the right tool with a right parameters and get back the response. But, this is still quite limiting. what if the user asks a complex questions consiting of multiple steps or vague question that needs a clarification?

In [33]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, SummaryIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters, FilterCondition
from typing import List, Optional

def get_doc_tools(
    file_path: str,
    name: str,
) -> str:
    """Get vector query and summary query tools from a document."""

    # load documents
    documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
    splitter = SentenceSplitter(chunk_size=1024)
    nodes = splitter.get_nodes_from_documents(documents)
    vector_index = VectorStoreIndex(nodes)
    
    def vector_query(
        query: str, 
        page_numbers: Optional[List[str]] = None
    ) -> str:
        """Use to answer questions over the MetaGPT paper.
    
        Useful if you have specific questions over the MetaGPT paper.
        Always leave page_numbers as None UNLESS there is a specific page you want to search for.
    
        Args:
            query (str): the string query to be embedded.
            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE 
                if we want to perform a vector search
                over all pages. Otherwise, filter by the set of specified pages.
        
        """
    
        page_numbers = page_numbers or []
        metadata_dicts = [
            {"key": "page_label", "value": p} for p in page_numbers
        ]
        
        query_engine = vector_index.as_query_engine(
            similarity_top_k=2,
            filters=MetadataFilters.from_dicts(
                metadata_dicts,
                condition=FilterCondition.OR
            )
        )
        response = query_engine.query(query)
        return response
        
    
    vector_query_tool = FunctionTool.from_defaults(
        name=f"vector_tool_{name}",
        fn=vector_query
    )
    
    summary_index = SummaryIndex(nodes)
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )
    summary_tool = QueryEngineTool.from_defaults(
        name=f"summary_tool_{name}",
        query_engine=summary_query_engine,
        description=(
            "Use ONLY IF you want to get a holistic summary of MetaGPT. "
            "Do NOT use if you have specific questions over MetaGPT."
        ),
    )

    return vector_query_tool, summary_tool

In [36]:
# Setup a Query Tools
vector_tool, summary_tool = get_doc_tools("metagpt.pdf", "metagpt")

In [35]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)

An agent consist of two main components,
1. AgentRunner
2. AgentWorker

https://docs.llamaindex.ai/en/latest/module_guides/deploying/agents/agent_runner/
https://cobusgreyling.medium.com/llamaindex-agent-step-wise-execution-framework-with-agent-runners-agent-workers-dc1d9ff434ac

In [37]:
# Setup a Function calling Agent
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [38]:
response = agent.query(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT include Product Manager, Architect, Project Manager, Engineer, and QA Engineer. Each role has specific responsibilities and skills tailored to different aspects of the collaborative framework, such as conducting business-oriented analysis, translating requirements into system design components, handling task distribution, executing code, formulating test cases, generating Product Requirement Documents, devising technical specifications, breaking down tasks, ensuring high-quality software through unit testing, and reviewing feedback to enhance the overall multi-agent system.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "communication between agent roles in MetaGPT"}
=== F

In [39]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: metagpt.pdf
file_type: application/pdf
file_size: 16715764
creation_date: 2024-06-22
last_modified_date: 2023-11-07

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks, however, 

The agent is able to maintain chats in  a conversational memory buffer. The memory model
can be customized, but by default its a flat list of items that's a rolling buffer depending on the size of the context windows of the LLM. therefor when the agent decides to use a tool and not only uses a current chat but also the previous conversation history to take a next steps or perform the next action. 

In [40]:
response = agent.chat(
    "Tell me about the evaluation datasets used."
)

Added user message to memory: Tell me about the evaluation datasets used.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation datasets used in MetaGPT"}
=== Function Output ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and SoftwareDev.
=== LLM Response ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and SoftwareDev.


In [41]:
response = agent.chat("Tell me the results over one of the above datasets.")

Added user message to memory: Tell me the results over one of the above datasets.
=== Calling Function ===
Calling function: vector_tool_metagpt with args: {"query": "results over HumanEval dataset"}
=== Function Output ===
MetaGPT achieves 85.9% and 87.7% in the HumanEval dataset, outperforming ChatDev in nearly all metrics. It achieves a score of 3.75 for executability, takes less time (503 seconds), and shows superior performance in code statistics and human revision cost compared to ChatDev.
=== LLM Response ===
MetaGPT achieves impressive results over the HumanEval dataset, with scores of 85.9% and 87.7%. It outperforms ChatDev in various metrics, including executability, time taken, code statistics, and human revision cost.


The above question is follow up question "tell me the results over one of the above datasets." And we see  {"query": "results over HumanEval dataset"} which means the conversation history stored some where. 

## Agent Control
The key benefits:
* Decoupling of Task creation and Execution:
Users gain the flexibilty to schedule  task execution according to their needs. 
* Enhanced Debuggability: Offers deeper insights into each step of execution process improve troubleshooting capabilities.
* Steerability: Allows users to directly modify intermediate steps and incorporate human feedback for refined control.

This is useful when we want to listen to human feedback in the middle of the agent execution, as opposed to only after the agent execution is complete for a given task. Then, you can imagine creating some sort of async queue, where you are able to listen to input from the human throughout the middle of agent execution. And if human input actully comes in, you can actully interrupt and modify the execution of an agent as it going throuugh a larger task, as opposed to having to wait until the agent task is complete, 

In [42]:
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [43]:
task = agent.create_task(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

In [44]:
step_output = agent.run_step(task.task_id)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT are Product Manager, Architect, Project Manager, Engineer, and QA Engineer.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents communicate with each other in MetaGPT"}
=== Function Output ===
Agents in MetaGPT communicate with each other through structured communication interfaces, utilizing mechanisms such as message pools, subscriptions, and constraint prompts. This structured approach involves sharing information through documents and diagrams in a shared message pool, allowing agents to access necessary information efficiently without individual inquiries. Additionally, a workflow is followed where each agent performs specific tasks based on information provide

In [45]:
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

Num completed for task dbd458f2-a010-4310-a177-d5fe40aedb51: 1
The agent roles in MetaGPT are Product Manager, Architect, Project Manager, Engineer, and QA Engineer.


In [46]:
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

Num upcoming steps for task dbd458f2-a010-4310-a177-d5fe40aedb51: 1


TaskStep(task_id='dbd458f2-a010-4310-a177-d5fe40aedb51', step_id='ad6d96fc-74ec-4fc2-80ac-fa3ef566e892', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)

In [47]:
step_output = agent.run_step(
    task.task_id, input="What about how agents share information?"
)

Added user message to memory: What about how agents share information?
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents share information in MetaGPT"}
=== Function Output ===
Agents in MetaGPT share information through a structured communication protocol that involves utilizing a shared message pool to publish and access structured messages. They also make use of a subscription mechanism to manage the dissemination of information effectively based on their role profiles. Additionally, agents review previous feedback before each project, adjust constraint prompts accordingly, and integrate received feedback into updated constraint prompts stored in long-term memory. Furthermore, information sharing in MetaGPT occurs through a structured workflow involving various stages of software development, where different agents handle specific tasks based on their roles. This structured division of labor ensures smooth information flow between agents

In [48]:
step_output = agent.run_step(task.task_id)
print(step_output.is_last)

=== LLM Response ===
Agents in MetaGPT share information through a structured communication protocol that involves utilizing a shared message pool to publish and access structured messages. They also make use of a subscription mechanism to manage the dissemination of information effectively based on their role profiles. Additionally, agents review previous feedback before each project, adjust constraint prompts accordingly, and integrate received feedback into updated constraint prompts stored in long-term memory. Furthermore, information sharing in MetaGPT occurs through a structured workflow involving various stages of software development, where different agents handle specific tasks based on their roles. This structured division of labor ensures smooth information flow between agents, facilitating the development process.
True


In [49]:
response = agent.finalize_response(task.task_id)

In [50]:
print(str(response))

Agents in MetaGPT share information through a structured communication protocol that involves utilizing a shared message pool to publish and access structured messages. They also make use of a subscription mechanism to manage the dissemination of information effectively based on their role profiles. Additionally, agents review previous feedback before each project, adjust constraint prompts accordingly, and integrate received feedback into updated constraint prompts stored in long-term memory. Furthermore, information sharing in MetaGPT occurs through a structured workflow involving various stages of software development, where different agents handle specific tasks based on their roles. This structured division of labor ensures smooth information flow between agents, facilitating the development process.


In [56]:
!wget https://openreview.net/pdf?id=6PmJoRfdaK -O longlora.pdf

--2024-06-22 13:11:06--  https://openreview.net/pdf?id=6PmJoRfdaK
Resolving openreview.net (openreview.net)... 35.184.86.251
Connecting to openreview.net (openreview.net)|35.184.86.251|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1168720 (1.1M) [application/pdf]
Saving to: ‘longlora.pdf’


2024-06-22 13:11:10 (478 KB/s) - ‘longlora.pdf’ saved [1168720/1168720]



In [57]:
!wget https://openreview.net/pdf?id=hSyW5go0v8 -O selfrag.pdf

--2024-06-22 13:11:30--  https://openreview.net/pdf?id=hSyW5go0v8
Resolving openreview.net (openreview.net)... 35.184.86.251
Connecting to openreview.net (openreview.net)|35.184.86.251|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1244749 (1.2M) [application/pdf]
Saving to: ‘selfrag.pdf’


2024-06-22 13:11:34 (529 KB/s) - ‘selfrag.pdf’ saved [1244749/1244749]



## Building a Multi-Document Agent

In [58]:
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf",
]

In [59]:
from pathlib import Path

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: metagpt.pdf
Getting tools for paper: longlora.pdf
Getting tools for paper: selfrag.pdf


In [60]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]

In [61]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

In [62]:
len(initial_tools)

6

In [63]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [64]:
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation dataset"}
=== Function Output ===
The evaluation datasets mentioned in the provided context are PG19 test split, LongBench benchmark, LEval open-ended benchmark, and the PG19 validation set.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation results"}
=== Function Output ===
The evaluation results demonstrate the effectiveness of the proposed methods in achieving comparable or superior performance to full fine-tuning baselines, with improved efficiency. These results include perplexity scores on proof-pile and PG19 datasets, passkey retrieval accuracy for different context lengths, and comparisons with other models in topic retrieval tasks. The impact of attention patterns during fine-tuning, the influence of conte

In [65]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework that enhances the quality and factuality of a large language model through a combination of retrieval and self-reflection. It involves training the model to retrieve relevant information on-demand, reflect on the retrieved passages, and critique its own output using special tokens called reflection tokens. This approach allows the model to adaptively retrieve passages, evaluate their relevance, and generate responses that are supported by the retrieved information, ultimately improving the overall quality and accuracy of the generated text.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}


Retrying llama_index.llms.openai.base.OpenAI._achat in 0.13649480739457054 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-ChYE6mwylyr8PrXvhAPdx0EC on tokens per min (TPM): Limit 60000, Used 57429, Requested 3067. Please try again in 496ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.10889093558768548 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-3.5-turbo in organization org-ChYE6mwylyr8PrXvhAPdx0EC on tokens per min (TPM): Limit 60000, Used 57111, Requested 3393. Please try again in 503ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.5082416911243

=== Function Output ===
LongLoRA is an efficient method for extending the context length of Large Language Models (LLMs) while minimizing computational costs and training time. It combines shifted sparse attention (S2-Attn) with LoRA to enable fine-tuning of models to longer context lengths without altering their original architectures. LongLoRA has shown strong empirical results across various tasks and is compatible with existing techniques like Flash-Attention2. By introducing improvements such as training normalization and embedding layers, LongLoRA addresses performance gaps when adapting LLMs from short to long context lengths. Additionally, it can handle longer documents by extending position embeddings and has demonstrated reasonable accuracy in passkey retrieval up to certain context lengths.
=== LLM Response ===
Here are summaries of Self-RAG and LongLoRA:

1. Self-RAG: Self-RAG is a framework that enhances the quality and factuality of a large language model through a combin

here the issue is, for the 3 documents we end up having 6 agents. Then we increase the document to 100 then there are problems.
1. When too many tool selections into the LLM prompt leads to the following issues, 
a. tools may not all fit in the prompt. 
b. cost and latency will spike because you're increasing the number of tokens in your prompt and also the outline can actually get confused. LLM may fail to pick the right tool when the number of choices is too large. 

The soultion here is that when the user asks a query we actually perform retrieval agumentation, but not on the level of text bu actually on the level of tools.  We first retrieve a small set of relevant tools, then feed the relevant tools to the agent reasoning prompt instead of all the tools. this retievel process is similar to the retrieval process used in RAG. 