## RAG + React Agents + LlamaIndex Tools

### Install Libs

In [1]:
!pip install -Uq llama_index llama-index-core llama-index-llms-openai llama_index llama_hub wget pypdf llama-index-agent-openai

# pip install llama-index-llms-replicate
# pip install llama-index-embeddings-huggingface

### Import Libs

In [2]:
from llama_index.core import Settings, VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

import sys
sys.path.append("../..")

### Download Arxiv Papers
Install wget in your system.
On Mac:
```sh
brew install wget
```

In [3]:
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.04511.pdf" -O "../files/papers/llm_compiler_2312.04511.pdf"
# !wget --user-agent "Mozilla" "https://arxiv.org/pdf/2312.06648.pdf" -O "../files/papers/dense_x_retrieval_2312.06648.pdf"

--2024-02-18 16:09:03--  https://arxiv.org/pdf/2312.04511.pdf
Resolving arxiv.org (arxiv.org)... 151.101.195.42, 151.101.3.42, 151.101.131.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.195.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 885090 (864K) [application/pdf]
Saving to: '../files/papers/llm_compiler_2312.04511.pdf'


2024-02-18 16:09:04 (1.38 MB/s) - '../files/papers/llm_compiler_2312.04511.pdf' saved [885090/885090]



### Initiate OpenAI LLM

In [4]:
llm = OpenAI(model="gpt-3.5-turbo",temperature=0)
Settings.llm = llm

### Load, Parse, Index and Create Retrival Engines

In [5]:
docs = SimpleDirectoryReader('../files/papers/').load_data()

nodes = Settings.node_parser.get_nodes_from_documents(docs, show_progress=True)

print(f'len docs: {len(docs)}')
print(f'len nodes: {len(nodes)}')

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 24/24 [00:00<00:00, 593.65it/s]

len docs: 24
len nodes: 33





In [6]:
index_nodes = VectorStoreIndex(nodes)

retriver_engine_nodes = index_nodes.as_retriever(similarity_top_k=3)

query_engine_nodes = index_nodes.as_query_engine(similarity_top_k=3)

In [7]:
resp = query_engine_nodes.query("Explain LLMCompiler and its usecases.")

print(resp.response)
print(resp.source_nodes)
print(resp.metadata)

LLMCompiler is a tool that focuses on optimizing Latent Language Models (LLMs) by enabling dynamic replanning and efficient exploration of decision-making environments. It allows for parallel function calling, which can reduce latency and cost, and supports tasks with interdependencies. LLMCompiler is particularly useful for scenarios involving black-box LLM models and services where modifications are restricted. Its capabilities have been demonstrated through experiments, showing significant speedups and improved success rates compared to baselines in tasks like the Game of 24 and WebShop. Additionally, LLMCompiler uses a planner to identify parallelizable patterns within queries, aiming to reduce latency while maintaining accuracy.
[NodeWithScore(node=TextNode(id_='5e7a4fe5-faa2-4e15-8e59-0624f4dd0728', embedding=None, metadata={'page_label': '3', 'file_name': 'llm_compiler_2312.04511.pdf', 'file_path': '../files/papers/llm_compiler_2312.04511.pdf', 'file_type': 'application/pdf', 'f

In [8]:
for idx, node_with_score in enumerate(resp.source_nodes):
    print(f'Node-{idx} has the score of {node_with_score.score}')


Node-0 has the score of 0.8525170579800175
Node-1 has the score of 0.8427049560467622
Node-2 has the score of 0.8405651957303741


### Create Tool and ReAct Agent

In [9]:
query_engine_tool = QueryEngineTool.from_defaults(
    name='llmcompiler',
    query_engine=query_engine_nodes,
    description=(
        "Provides information about LLMCompiler and Parallel Function Calling."
        ),
)

In [10]:
react_agent = ReActAgent.from_tools(
    [query_engine_tool],
    max_function_calls=10,
    llm=llm,
    verbose=True,
)

### Let's Ask From Our Agent

In [11]:
react_agent_resp = react_agent.chat("What is Parallel Function Calling and How LLMCompilers can help?")

[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: llmcompiler
Action Input: {'input': 'Parallel Function Calling'}
[0m[1;3;34mObservation: LLMCompiler introduces a method for executing functions in parallel to efficiently manage multiple function calls. It consists of three main components: an LLM Planner for formulating execution plans, a Task Fetching Unit for dispatching function calling tasks, and an Executor for executing these tasks concurrently. By leveraging parallel execution, LLMCompiler optimizes the orchestration of function calls, resulting in significant improvements in latency speedup, cost savings, and accuracy compared to sequential methods like ReAct.
[0m[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: llmcompiler
Action Input: {'input': 'LLMCompilers'}
[0m[1;3;34mObservation: LLMCompiler is introduced as a tool that enables parallel function calling to efficiently orchestrate multiple functio

In [12]:
print(f'RESPONSE: {react_agent_resp.response}')
print(f'OBJECT_KEYS: {react_agent_resp.__dict__.keys()}')
print(f'SOURCE_NODES: {react_agent_resp.source_nodes}')

RESPONSE: Parallel Function Calling is a method of executing multiple functions concurrently to improve efficiency and performance. LLMCompilers help by automating the process of orchestrating these parallel function calls, leading to faster execution, cost savings, and improved accuracy compared to traditional sequential methods.
OBJECT_KEYS: dict_keys(['response', 'sources', 'source_nodes'])
SOURCE_NODES: [NodeWithScore(node=TextNode(id_='1ec932d3-31ba-4cba-ac98-1cc5b15a663e', embedding=None, metadata={'page_label': '1', 'file_name': 'llm_compiler_2312.04511.pdf', 'file_path': '../files/papers/llm_compiler_2312.04511.pdf', 'file_type': 'application/pdf', 'file_size': 885090, 'creation_date': '2024-02-18', 'last_modified_date': '2024-02-07', 'last_accessed_date': '2024-02-18'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_da

In [13]:
print(f'len SOURCE_NODES: {len(react_agent_resp.source_nodes)}')

len SOURCE_NODES: 6


In [14]:
react_agent.memory

ChatMemoryBuffer(token_limit=3072, tokenizer_fn=functools.partial(<bound method Encoding.encode of <Encoding 'cl100k_base'>>, allowed_special='all'), chat_store=SimpleChatStore(store={'chat_history': [ChatMessage(role=<MessageRole.USER: 'user'>, content='What is Parallel Function Calling and How LLMCompilers can help?', additional_kwargs={}), ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='Parallel Function Calling is a method of executing multiple functions concurrently to improve efficiency and performance. LLMCompilers help by automating the process of orchestrating these parallel function calls, leading to faster execution, cost savings, and improved accuracy compared to traditional sequential methods.', additional_kwargs={})]}), chat_store_key='chat_history')

In [15]:
react_agent.memory.chat_store.store['chat_history']

[ChatMessage(role=<MessageRole.USER: 'user'>, content='What is Parallel Function Calling and How LLMCompilers can help?', additional_kwargs={}),
 ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='Parallel Function Calling is a method of executing multiple functions concurrently to improve efficiency and performance. LLMCompilers help by automating the process of orchestrating these parallel function calls, leading to faster execution, cost savings, and improved accuracy compared to traditional sequential methods.', additional_kwargs={})]