## RAG + React Agents + LlamaIndex Tools

### Install Libs

In [1]:
!pip install -Uq llama_index llama_hub wget pypdf

### Import Libs

In [2]:
from llama_index import VectorStoreIndex,SimpleDirectoryReader
from llama_index import ServiceContext

from llama_index.tools.query_engine import QueryEngineTool

from llama_index.agent import ReActAgent
from llama_index.llms import OpenAI

import sys
sys.path.append("../..")

### Download Arxiv Papers
Install wget in your system.
On Mac:
```sh
brew install wget
```

In [3]:
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.04511.pdf" -O "../files/papers/llm_compiler_2312.04511.pdf"
# !wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.06648.pdf" -O "../files/papers/dense_x_retrieval_2312.06648.pdf"

--2024-01-22 20:41:21--  https://arxiv.org/pdf/2312.04511.pdf
Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.195.42, 151.101.3.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 755837 (738K) [application/pdf]
Saving to: ‘../files/papers/llm_compiler_2312.04511.pdf’


2024-01-22 20:41:24 (3.55 MB/s) - ‘../files/papers/llm_compiler_2312.04511.pdf’ saved [755837/755837]



### Initiate OpenAI LLM

In [4]:
llm = OpenAI(model="gpt-3.5-turbo",temperature=0)
service_context = ServiceContext.from_defaults(
    llm=llm
)

### Load, Parse, Index and Create Retrival Engines

In [5]:
docs = SimpleDirectoryReader('../files/papers/').load_data()

nodes = service_context.node_parser.get_nodes_from_documents(docs, show_progress=True)

print(f'len docs: {len(docs)}')
print(f'len nodes: {len(nodes)}')

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 21/21 [00:00<00:00, 437.70it/s]

len docs: 21
len nodes: 32





In [6]:
index_nodes = VectorStoreIndex(nodes, service_context=service_context)

retriver_engine_nodes = index_nodes.as_retriever(similarity_top_k=3)

query_engine_nodes = index_nodes.as_query_engine(similarity_top_k=3)

In [7]:
resp = query_engine_nodes.query("Explain LLMCompiler and its usecases.")

print(resp.response)
print(resp.source_nodes)
print(resp.metadata)

LLMCompiler is a novel framework that optimizes the parallel function calling performance of Language Model Models (LLMs). It enables the efficient orchestration of multiple function calls and their dependencies, resulting in improved latency, cost, and accuracy. LLMCompiler consists of three key components: an LLM Planner, a Task Fetching Unit, and an Executor. 

The LLM Planner identifies the execution flow by defining different function calls and their dependencies based on user inputs. The Task Fetching Unit dispatches the function calls that can be executed in parallel after substituting variables with the actual outputs of preceding tasks. The Executor executes the dispatched function calling tasks using the associated tools.

LLMCompiler has several use cases. It can be used with open-source LLMs to empower them with the capability to efficiently handle multiple function calling. It can also be beneficial for GPT models. LLMCompiler has been evaluated on various tasks with diffe

In [8]:
for idx, node_with_score in enumerate(resp.source_nodes):
    print(f'Node-{idx} has the score of {node_with_score.score}')


Node-0 has the score of 0.8592233790276087
Node-1 has the score of 0.8502768170710224
Node-2 has the score of 0.8497174583313575


### Create Tool and ReAct Agent

In [9]:
query_engine_tool = QueryEngineTool.from_defaults(
    name='llmcompiler',
    query_engine=query_engine_nodes,
    description=(
        "Provides information about LLMCompiler and Parallel Function Calling."
        ),
)

In [10]:
react_agent = ReActAgent.from_tools(
    [query_engine_tool],
    max_function_calls=10,
    llm=llm,
    verbose=True,
)

### Let's Ask From Our Agent

In [11]:
react_agent_resp = react_agent.chat("What is Parallel Function Calling and How LLMCompilers can help?")

[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: llmcompiler
Action Input: {'input': 'Parallel Function Calling'}
[0m[1;3;34mObservation: Parallel function calling refers to the ability of Large Language Models (LLMs) to execute multiple function calls simultaneously. This allows LLMs to efficiently handle complex tasks by invoking different functions and coordinating their execution. The goal of parallel function calling is to reduce latency, cost, and improve accuracy by executing function calls in parallel rather than sequentially. LLMCompiler is a framework that optimizes the orchestration of parallel function calling in LLMs by introducing an LLM Planner, a Task Fetching Unit, and an Executor. This framework streamlines the execution of multiple function calls and handles their dependencies, resulting in improved performance and efficiency.
[0m[1;3;38;5;200mThought: I can answer without using any more tools.
Answer: Parallel function calling

In [13]:
print(f'RESPONSE: {react_agent_resp.response}')
print(f'OBJECT_KEYS: {react_agent_resp.__dict__.keys()}')
print(f'SOURCE_NODES: {react_agent_resp.source_nodes}')

RESPONSE: Parallel function calling refers to the ability of Large Language Models (LLMs) to execute multiple function calls simultaneously. This allows LLMs to efficiently handle complex tasks by invoking different functions and coordinating their execution. LLMCompiler is a framework that optimizes the orchestration of parallel function calling in LLMs, improving performance and efficiency by introducing an LLM Planner, a Task Fetching Unit, and an Executor.
OBJECT_KEYS: dict_keys(['response', 'sources', 'source_nodes'])
SOURCE_NODES: [NodeWithScore(node=TextNode(id_='6b1611f0-a599-4f88-8673-03f5e818a163', embedding=None, metadata={'page_label': '1', 'file_name': 'llm_compiler_2312.04511.pdf', 'file_path': '../files/papers/llm_compiler_2312.04511.pdf', 'file_type': 'application/pdf', 'file_size': 755837, 'creation_date': '2024-01-22', 'last_modified_date': '2023-12-08', 'last_accessed_date': '2024-01-22'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation

In [14]:
print(f'len SOURCE_NODES: {len(react_agent_resp.source_nodes)}')

len SOURCE_NODES: 3


In [15]:
react_agent.memory

ChatMemoryBuffer(token_limit=3072, tokenizer_fn=functools.partial(<bound method Encoding.encode of <Encoding 'cl100k_base'>>, allowed_special='all'), chat_store=SimpleChatStore(store={'chat_history': [ChatMessage(role=<MessageRole.USER: 'user'>, content='What is Parallel Function Calling and How LLMCompilers can help?', additional_kwargs={}), ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='Parallel function calling refers to the ability of Large Language Models (LLMs) to execute multiple function calls simultaneously. This allows LLMs to efficiently handle complex tasks by invoking different functions and coordinating their execution. LLMCompiler is a framework that optimizes the orchestration of parallel function calling in LLMs, improving performance and efficiency by introducing an LLM Planner, a Task Fetching Unit, and an Executor.', additional_kwargs={})]}), chat_store_key='chat_history')

In [17]:
react_agent.memory.chat_store.store['chat_history']

[ChatMessage(role=<MessageRole.USER: 'user'>, content='What is Parallel Function Calling and How LLMCompilers can help?', additional_kwargs={}),
 ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='Parallel function calling refers to the ability of Large Language Models (LLMs) to execute multiple function calls simultaneously. This allows LLMs to efficiently handle complex tasks by invoking different functions and coordinating their execution. LLMCompiler is a framework that optimizes the orchestration of parallel function calling in LLMs, improving performance and efficiency by introducing an LLM Planner, a Task Fetching Unit, and an Executor.', additional_kwargs={})]