# ReActAgent

Data Agents are LLM-powered knowledge workers in LlamaIndex that can intelligently perform various tasks over your data, in both a “read” and “write” function. They are capable of the following:

- Perform automated search and retrieval over different types of data - unstructured, semi-structured, and structured.

- Calling any external service API in a structured fashion, and processing the response + storing it for later.

In that sense, agents are a step beyond our query engines in that they can not only “read” from a static source of data, but can dynamically ingest and modify data from a variety of different tools.

ReAct, short for Reasoning and Acting, was first introduced in the paper [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/pdf/2210.03629.pdf).  

ReAct Agent, introduced by LlamaIndex, is an agent-based chat mode built on top of a query engine over your data. ReAct Agent is one of LlamaIndex’s main chat engines. For each chat interaction, the agent enters a reasoning and acting loop:

- First, decide whether to use the query engine tool and which query engine tool to use to come up with appropriate input.
- Query with the query engine tool and observe its output.
- Based on the output, decide whether to repeat the process or give a final response.

We will use ReAct agent to analyze the U.S. government’s financial reports for fiscal years 2020, 2021, and 2022.

LlamaIndex notebook: https://docs.llamaindex.ai/en/stable/examples/chat_engine/chat_engine_react.html.

## Step 1: Setup the Query Tools

In [1]:
import os
from openai import OpenAI
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv()) # read local .env file
openai_api_key = os.getenv('OPENAI_API_KEY')

Be sure to save the files with file name containing underscore, not dash, otherwise, ReAct agent chat completion won't work.

In [6]:
# !mkdir reports
# !wget https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2020/executive-summary-2020.pdf -O ./reports/executive_summary_2020.pdf
# !wget https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2021/executive-summary-2021.pdf -O ./reports/executive_summary_2021.pdf
# !wget https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2022/executive-summary-2022.pdf -O ./reports/executive_summary_2022.pdf
# !wget https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2023/executive-summary-2023.pdf -O ./reports/executive_summary_2023.pdf

# !wget https://www.fiscal.treasury.gov/files/reports-statements/financial-report/2024/executive-summary-2024.pdf -O ./reports/executive_summary_2024.pdf

In [69]:
from llama_index.core import SimpleDirectoryReader

# load documents
documents = SimpleDirectoryReader(input_files=["reports/executive_summary_2023.pdf"]).load_data()

In [70]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [71]:
print(nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: executive_summary_2023.pdf
file_path: reports/executive_summary_2023.pdf
file_type: application/pdf
file_size: 904959
creation_date: 2024-07-20
last_modified_date: 2024-02-15

1 EXECUTIVE SUMMARY TO THE 2023 FINANCIAL REPORT OF THE U.S. GOVERNMENT


In [72]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes)

# query_engine = vector_index.as_query_engine(similarity_top_k=10)

In [51]:
# What is the U.S. government's net operating cost?
# What are the total revenues reported by the U.S. government?
# What are the total assets reported by the U.S. government?
# What is the total national debt reported by the U.S. government?
# response = query_engine.query(
#     "What is the U.S. government's net operating cost?" 
# )
# print(response)

In [73]:
def vector_query(query: str) -> str:
    query_engine = vector_index.as_query_engine(similarity_top_k=10)
    response = query_engine.query(query)
    return response.response

In [74]:
vector_query("What is the U.S. government's net operating cost?")

"The U.S. government's net operating cost is $3.4 trillion."

In [75]:
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import FunctionTool

vector_query_tool = FunctionTool.from_defaults(
    name="vector_tool",
    fn=vector_query
)

llm = OpenAI(model="gpt-4o-mini", temperature=0)
response = llm.predict_and_call(
    [vector_query_tool], 
    "What is the U.S. government's net operating cost?", 
    verbose=True
)

=== Calling Function ===
Calling function: vector_tool with args: {"query": "U.S. government's net operating cost 2023"}
=== Function Output ===
The U.S. government's net operating cost for 2023 decreased by $753.8 billion (18.1 percent) to $3.4 trillion. This decrease was primarily due to significant decreases in non-cash costs, including reductions in losses resulting from changes in assumptions affecting cost and liability estimates for federal employee and veteran benefits programs, as well as reestimates of long-term student loan costs.


## Multi-Document Agent

In [77]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata


query_engine_tools = []
for filename in os.listdir("reports"):
    if filename.endswith(".pdf"):
        file_path = os.path.join("reports", filename)

        with open(file_path, "r") as file:
            documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
            print(f"Loaded {len(documents)} documents from {filename}")
            print(filename[:-4]) # print name without extension

            index = VectorStoreIndex.from_documents(documents)
            splitter = SentenceSplitter(chunk_size=1024)
            nodes = splitter.get_nodes_from_documents(documents)
            vector_index = VectorStoreIndex(nodes)
            query_engine = vector_index.as_query_engine(similarity_top_k=10)
            
            query_engine_tool = QueryEngineTool(
                query_engine=query_engine,
                metadata=ToolMetadata(
                    name=f"{filename[:-4]}",  # Construct name without extension
                    description=(
                        f"Provides information about the U.S. government financial report {filename[:-4]}"
                    ),
                ),
            )
            query_engine_tools.append(query_engine_tool)

Loaded 10 documents from executive_summary_2022.pdf
executive_summary_2022
Loaded 10 documents from executive_summary_2023.pdf
executive_summary_2023
Loaded 11 documents from executive_summary_2021.pdf
executive_summary_2021
Loaded 11 documents from executive_summary_2020.pdf
executive_summary_2020


In [84]:
query_engine_tools[1].metadata

ToolMetadata(description='Provides information about the U.S. government financial report executive_summary_2023', name='executive_summary_2023', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [78]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    query_engine_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [80]:
response = agent.query("What is the U.S. government's net operating cost from 2020 to 2023?")

Added user message to memory: What is the U.S. government's net operating cost from 2020 to 2023?
=== Calling Function ===
Calling function: executive_summary_2020 with args: {"input": "net operating cost"}
=== Function Output ===
The net operating cost is calculated by starting with total gross costs and then subtracting earned program revenues while adjusting for gains or losses from changes in actuarial assumptions used to estimate future federal employee and veteran benefits payments. This calculation results in the government's net cost before taxes and other revenues, which increased by $2.4 trillion during FY 2020 to $3.8 trillion.
=== Calling Function ===
Calling function: executive_summary_2021 with args: {"input": "net operating cost"}
=== Function Output ===
The net operating cost decreased by $746.5 billion (19.4 percent) during FY 2021 to $3.1 trillion. It is calculated by starting with total gross costs of $7.3 trillion, subtracting earned program revenues, and adjusting 

### ReAct Agent

In [88]:
from llama_index.core.agent import ReActAgent

llm = OpenAI(model="gpt-4")

react_agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)

In [93]:
response = react_agent.chat("Can you compare and contrast the government's net operating cost between 2020, 2021 and 2023? and tell me which year has the highest cost?")
print(response)

> Running step bec845fe-ad32-4df4-9c9d-bd6b5732c070. Step input: Can you compare and contrast the government's net operating cost between 2020, 2021 and 2023? and tell me which year has the highest cost?
[1;3;38;5;200mThought: To answer this question, I need to use the executive_summary_2020, executive_summary_2021, and executive_summary_2023 tools to get the government's net operating cost for each of these years. I'll start with the year 2020.
Action: executive_summary_2020
Action Input: {'input': 'net operating cost'}
[0m[1;3;34mObservation: The net operating cost is calculated by starting with total gross costs and then subtracting earned program revenues while adjusting for gains or losses from changes in actuarial assumptions used to estimate future federal employee and veteran benefits payments. This calculation results in the net cost before taxes and other revenues, which is a key component in determining the government's financial position.
[0m> Running step 7a043fad-4e6b

In [89]:
from IPython.display import Markdown

response = react_agent.chat("Can you compare and contrast the government's bottom line net operating cost and tell me which year has the highest cost?")
display(Markdown(f"<b>{response}</b>"))

> Running step 1f909ec8-7d92-4304-8501-75f7399a6ee6. Step input: Can you compare and contrast the government's bottom line net operating cost and tell me which year has the highest cost?
[1;3;38;5;200mThought: To answer this question, I need to use the executive summary tools for each year to get the government's net operating cost. I'll start with the year 2020.
Action: executive_summary_2020
Action Input: {'input': 'net operating cost'}
[0m[1;3;34mObservation: The net operating cost is calculated by starting with total gross costs and then subtracting earned program revenues while adjusting for gains or losses from changes in actuarial assumptions used to estimate future federal employee and veteran benefits payments. This calculation results in the government's net cost before taxes and other revenues, which increased by $2.4 trillion during FY 2020 to $3.8 trillion.
[0m> Running step be0b83fa-5be2-48b5-b500-d4740ef4822a. Step input: None
[1;3;38;5;200mThought: I have the net o

ValueError: Reached max iterations.