<a href="https://colab.research.google.com/github/meta-llama/llama-recipes/blob/main/recipes/3p_integrations/llamaindex/dlai_agentic_rag/Building_Agentic_RAG_with_Llamaindex_L2_Tool_Calling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook ports the DeepLearning.AI short course [Building Agentic RAG with Llamaindex Lesson 2 Tool Calling](https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/3/tool-calling) to using Llama 3. 

You should take the course before or after going through this notebook to have a deeper understanding.

In [1]:
!pip install llama-index
!pip install llama-index-embeddings-huggingface
!pip install llama-index-llms-groq



In [2]:
import nest_asyncio

nest_asyncio.apply()

In [3]:
from llama_index.core.tools import FunctionTool

def add(x: int, y: int) -> int:
    """Adds two integers together."""
    return x + y

def mystery(x: int, y: int) -> int:
    """Mystery function that operates on top of two numbers."""
    return (x + y) * (x + y)


add_tool = FunctionTool.from_defaults(fn=add)
mystery_tool = FunctionTool.from_defaults(fn=mystery)

In [4]:
import os 
os.environ['GROQ_API_KEY'] = 'xxx' # get a free key at https://console.groq.com/keys

In [5]:
from llama_index.llms.groq import Groq

llm = Groq(model="llama3-70b-8192", temperature=0)
response = llm.predict_and_call(
    [add_tool, mystery_tool],
    "Tell me the output of the mystery function on 2 and 9",
    verbose=True
)
print(str(response))

=== Calling Function ===
Calling function: mystery with args: {"x": 2, "y": 9}
=== Function Output ===
121
121


In [6]:
!wget "https://openreview.net/pdf?id=VtmBAGCN7o" -O metagpt.pdf

--2024-07-03 16:17:18--  https://openreview.net/pdf?id=VtmBAGCN7o
Resolving openreview.net (openreview.net)... 35.184.86.251
Connecting to openreview.net (openreview.net)|35.184.86.251|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16911937 (16M) [application/pdf]
Saving to: ‘metagpt.pdf’


2024-07-03 16:17:21 (5.75 MB/s) - ‘metagpt.pdf’ saved [16911937/16911937]



In [7]:
from llama_index.core import SimpleDirectoryReader

# https://arxiv.org/pdf/2308.00352 metagpt.pdf
documents = SimpleDirectoryReader(input_files=["metagpt.pdf"]).load_data()

In [8]:
from llama_index.core.node_parser import SentenceSplitter
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [9]:
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-07-03
last_modified_date: 2024-07-03

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks, however, 

In [10]:
from llama_index.llms.groq import Groq

from llama_index.core import Settings, VectorStoreIndex
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.llm = llm
#llm.complete("Who wrote the book godfather").text

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)



In [12]:
# Settings.llm and embed_model apply to which call below? VectorStoreIndex(), as_query_engine?

from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes)
query_engine = vector_index.as_query_engine(similarity_top_k=2)

In [13]:
from llama_index.core.vector_stores import MetadataFilters

query_engine = vector_index.as_query_engine(
    similarity_top_k=2,
    filters=MetadataFilters.from_dicts(
        [
            {"key": "page_label", "value": "2"}
        ]
    )
)

response = query_engine.query(
    "What are some high-level results of MetaGPT?",
)

In [14]:
print(str(response))

MetaGPT achieves a new state-of-the-art (SoTA) with 85.9% and 87.7% in Pass@1, and a 100% task completion rate, demonstrating the robustness and efficiency of its design.


In [15]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '2', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-07-03', 'last_modified_date': '2024-07-03'}


In [16]:
from typing import List
from llama_index.core.vector_stores import FilterCondition


def vector_query(
    query: str,
    page_numbers: List[str]
) -> str:
    """Perform a vector search over an index.

    query (str): the string query to be embedded.
    page_numbers (List[str]): Filter by set of pages. Leave BLANK if we want to perform a vector search
        over all pages. Otherwise, filter by the set of specified pages.

    """

    metadata_dicts = [
        {"key": "page_label", "value": p} for p in page_numbers
    ]

    query_engine = vector_index.as_query_engine(
        similarity_top_k=2,
        filters=MetadataFilters.from_dicts(
            metadata_dicts,
            condition=FilterCondition.OR
        )
    )
    response = query_engine.query(query)
    return response


vector_query_tool = FunctionTool.from_defaults(
    name="vector_tool",
    fn=vector_query
)

In [17]:
response = llm.predict_and_call(
    [vector_query_tool],
    "What are the high-level results of MetaGPT as described on page 2?",
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "MetaGPT", "page_numbers": []}
=== Function Output ===
MetaGPT is a system that alleviates or solves deep-seated challenges in developing complex systems, including using context efficiently, reducing hallucinations, and addressing information overload. It employs a unique design that includes a global message pool and a subscription mechanism to streamline communication and filter out irrelevant contexts.


In [18]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '26', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-07-03', 'last_modified_date': '2024-07-03'}
{'page_label': '23', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-07-03', 'last_modified_date': '2024-07-03'}


In [19]:
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool

summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
summary_tool = QueryEngineTool.from_defaults(
    name="summary_tool",
    query_engine=summary_query_engine,
    description=(
        "Useful if you want to get a summary of MetaGPT"
    ),
)

In [21]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool],
    "What are the MetaGPT comparisons with ChatDev described on page 8?",
    verbose=True
)

=== Calling Function ===
Calling function: summary_tool with args: {"input": "MetaGPT comparisons with ChatDev"}


Retrying llama_index.llms.openai.base.OpenAI._chat in 0.7729974179267081 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01hw1v17zqf6csfjsw04c5mxnm` on tokens per minute (TPM): Limit 6000, Used 24572, Requested 872. Please try again in 3m14.442999999s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._chat in 0.8118681920875381 seconds as it raised RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01hw1v17zqf6csfjsw04c5mxnm` on tokens per minute (TPM): Limit 6000, Used 24486, Requested 872. Please try again in 3m13.585s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}.
Retrying llama_index.llms.openai.base.OpenAI._achat in 0.8

=== Function Output ===
Encountered error: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01hw1v17zqf6csfjsw04c5mxnm` on tokens per minute (TPM): Limit 6000, Used 24285, Requested 3997. Please try again in 3m42.827s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}


In [22]:
llm

Groq(callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x178f50a90>, system_prompt=None, messages_to_prompt=<function messages_to_prompt at 0x17fa13400>, completion_to_prompt=<function default_completion_to_prompt at 0x17fa9f400>, output_parser=None, pydantic_program_mode=<PydanticProgramMode.DEFAULT: 'default'>, query_wrapper_prompt=None, model='llama3-70b-8192', temperature=0.0, max_tokens=None, logprobs=None, top_logprobs=0, additional_kwargs={}, max_retries=3, timeout=60.0, default_headers=None, reuse_client=True, api_key='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', api_base='https://api.groq.com/openai/v1', api_version='', context_window=3900, is_chat_model=True, is_function_calling_model=True, tokenizer=None)

In [None]:
for n in response.source_nodes:
    print(n.metadata)

{'page_label': '8', 'file_name': 'metagpt.pdf', 'file_path': 'metagpt.pdf', 'file_type': 'application/pdf', 'file_size': 16911937, 'creation_date': '2024-05-11', 'last_modified_date': '2024-05-11'}


In [None]:
response = llm.predict_and_call(
    [vector_query_tool, summary_tool],
    "What is a summary of the paper?",
    verbose=True
)

# got the error "Rate limit reached for model `llama3-70b-8192`"
# Observation: Error: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01hw1v17zqf6csfjsw04c5mxnm` on tokens per minute (TPM): Limit 3000, Used 1406, Requested ~3453. Please try again in 37.162s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
# https://console.groq.com/settings/limits
# ID	REQUESTS PER MINUTE |	REQUESTS PER DAY | TOKENS PER MINUTE
# llama3-70b-8192	30 | 14,400	| 6,000
# llama3-8b-8192	30 | 14,400	| 30,000

[1;3;38;5;200mThought: The current language of the user is: English. I need to use a tool to help me answer the question.
Action: summary_tool
Action Input: {'input': 'Please provide the paper text or a brief description of the paper'}
[0m



[1;3;34mObservation: Error: Error code: 429 - {'error': {'message': 'Rate limit reached for model `llama3-70b-8192` in organization `org_01hw1v17zqf6csfjsw04c5mxnm` on tokens per minute (TPM): Limit 3000, Used 1406, Requested ~3453. Please try again in 37.162s. Visit https://console.groq.com/docs/rate-limits for more information.', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}
[0m