# Context-Augmented OpenAI Agent
In this tutorial, we show you how to use our ```ContextRetrieverOpenAIAgent``` implementation to build an agent on top of OpenAI's function API and store/index an arbitrary number of tools. Our indexing/retrieval modules help to remove the complexity of having too many functions to fit in the prompt.

In [1]:
from dotenv import load_dotenv
load_dotenv()
import os

In [None]:
%pip install llama-index-agent-openai-legacy

In [None]:
!pip install llama-index

In [2]:
import json
from typing import Sequence

from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)
from llama_index.core.tools import QueryEngineTool, ToolMetadata

In [3]:
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/march"
    )
    march_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/june"
    )
    june_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/sept"
    )
    sept_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

In [8]:
import os
import requests

# Create directory
os.makedirs('data/10q/', exist_ok=True)

# List of URLs to download
urls = [
    'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf',
    'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_june_2022.pdf',
    'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_sept_2022.pdf'
]

# Download each file
for url in urls:
    response = requests.get(url)
    file_name = os.path.join('data/10q/', url.split('/')[-1])
    with open(file_name, 'wb') as f:
        f.write(response.content)

In [4]:
# build indexes across the three data sources

if not index_loaded:
    # load data
    march_docs = SimpleDirectoryReader(
        input_files=["./data/10q/uber_10q_march_2022.pdf"]
    ).load_data()
    june_docs = SimpleDirectoryReader(
        input_files=["./data/10q/uber_10q_june_2022.pdf"]
    ).load_data()
    sept_docs = SimpleDirectoryReader(
        input_files=["./data/10q/uber_10q_sept_2022.pdf"]
    ).load_data()

    # build index
    march_index = VectorStoreIndex.from_documents(march_docs)
    june_index = VectorStoreIndex.from_documents(june_docs)
    sept_index = VectorStoreIndex.from_documents(sept_docs)

    # persist index
    march_index.storage_context.persist(persist_dir="./storage/march")
    june_index.storage_context.persist(persist_dir="./storage/june")
    sept_index.storage_context.persist(persist_dir="./storage/sept")

In [5]:
march_engine = march_index.as_query_engine(similarity_top_k=3)
june_engine = june_index.as_query_engine(similarity_top_k=3)
sept_engine = sept_index.as_query_engine(similarity_top_k=3)

In [6]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=march_engine,
        metadata=ToolMetadata(
            name="uber_march_10q",
            description=(
                "Provides information about Uber 10Q filings for March 2022. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=june_engine,
        metadata=ToolMetadata(
            name="uber_june_10q",
            description=(
                "Provides information about Uber financials for June 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=sept_engine,
        metadata=ToolMetadata(
            name="uber_sept_10q",
            description=(
                "Provides information about Uber financials for Sept 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

# Try Context-Augmented Agent
Here we augment our agent with context in different settings:

* toy context: we define some abbreviations that map to financial terms (e.g. R=Revenue). We supply this as context to the agent

In [7]:
from llama_index.core import Document
from llama_index.agent.openai_legacy import ContextRetrieverOpenAIAgent

In [8]:
# toy index - stores a list of abbreviations
texts = [
    "Abbreviation: X = Revenue",
    "Abbreviation: YZ = Risk Factors",
    "Abbreviation: Z = Costs",
]
docs = [Document(text=t) for t in texts]
context_index = VectorStoreIndex.from_documents(docs)

In [9]:
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
    query_engine_tools,
    context_index.as_retriever(similarity_top_k=1),
    verbose=True,
)

In [10]:
response = context_agent.chat("What is the YZ of March 2022?")

[1;3;33mContext information is below.
---------------------
Abbreviation: YZ = Risk Factors
---------------------
Given the context information and not prior knowledge, either pick the corresponding tool or answer the function: What is the YZ of March 2022?

[0mSTARTING TURN 1
---------------

=== Calling Function ===
Calling function: uber_march_10q with args: {"input":"Risk Factors"}
Got output: The company faces various risks including legal actions, investigations, and proceedings, potential adverse effects from the classification of Drivers, intense competition in the industry, significant losses, challenges in attracting and retaining personnel, security and data privacy breaches, climate change risks, regulatory risks, intellectual property protection issues, and volatility in the market price of its common stock. Additionally, economic conditions such as discretionary consumer spending, inflation, and increased costs could adversely impact the company's operating results.

ST

In [11]:
print(str(response))

The Risk Factors (YZ) for Uber in March 2022 include legal actions, investigations, intense competition, significant losses, challenges in attracting and retaining personnel, security and data privacy breaches, climate change risks, regulatory risks, intellectual property protection issues, and market price volatility. Additionally, economic conditions such as discretionary consumer spending, inflation, and increased costs could impact the company's operating results.


In [21]:
context_agent.chat("What is the X and Z in September 2022?")

[1;3;33mContext information is below.
---------------------
Three Months Ended September 30, Nine Months Ended September 30,
2021 2022 2021 2022
Revenue 100 % 100 % 100 % 100 %
Costs and expenses
Cost of revenue, exclusive of depreciation and amortization shown separately
below 50 % 62 % 53 % 62 %
Operations and support 10 % 7 % 11 % 8 %
Sales and marketing 24 % 14 % 30 % 16 %
Research and development 10 % 9 % 13 % 9 %
General and administrative 13 % 11 % 15 % 10 %
Depreciation and amortization 4 % 3 % 6 % 3 %
Total costs and expenses 112 % 106 % 128 % 107 %
Loss from operations (12)% (6)% (28)% (7)%
Interest expense (3)% (2)% (3)% (2)%
Other income (expense), net (38)% (6)% 16 % (34)%
Loss before income taxes and income (loss) from equity method
investments (52)% (14)% (16)% (43)%
Provision for (benefit from) income taxes (2)% 1 % (3)% — %
Income (loss) from equity method investments — % — % — % — %
Net loss including non-controlling interests (50)% (14)% (12)% (42)%
Less: net income

AgentChatResponse(response='The X and Z values for September 2022 are as follows:\n- X = -1590\n- Z = -2400', sources=[ToolOutput(content='-1590', tool_name='magic_formula', raw_input={'args': (), 'kwargs': {'revenue': 23370, 'cost': 24960}}, raw_output=-1590, is_error=False), ToolOutput(content='-2400', tool_name='magic_formula', raw_input={'args': (), 'kwargs': {'revenue': 8300, 'cost': 10700}}, raw_output=-2400, is_error=False)], source_nodes=[], is_dummy_stream=False, metadata=None)

# Use Uber 10-Q as context, use Calculator as Tool¶


In [14]:
from llama_index.core.tools import BaseTool, FunctionTool


def magic_formula(revenue: int, cost: int) -> int:
    """Runs MAGIC_FORMULA on revenue and cost."""
    return revenue - cost


magic_tool = FunctionTool.from_defaults(fn=magic_formula, name="magic_formula")

In [15]:
context_agent = ContextRetrieverOpenAIAgent.from_tools_and_retriever(
    [magic_tool], sept_index.as_retriever(similarity_top_k=3), verbose=True
)

In [18]:
response = context_agent.chat(
    "Can you run MAGIC_FORMULA on Uber's revenue and cost?"
)

[1;3;33mContext information is below.
---------------------
UBER TECHNOLOGIES, INC.
CONDENSED CONSOLIDATED STATEMENTS OF OPERATIONS
(In millions, except share amounts which are reflected in thousands, and per share amounts)
(Unaudited)
Three Months Ended September  30, Nine Months Ended September  30,
2021 2022 2021 2022
Revenue $ 4,845 $ 8,343 $ 11,677 $ 23,270 
Costs and expenses
Cost of revenue, exclusive of depreciation and amortization shown separately
below 2,438 5,173 6,247 14,352 
Operations and support 475 617 1,330 1,808 
Sales and marketing 1,168 1,153 3,527 3,634 
Research and development 493 760 1,496 2,051 
General and administrative 625 908 1,705 2,391 
Depreciation and amortization 218 227 656 724 
Total costs and expenses 5,417 8,838 14,961 24,960 
Loss from operations (572) (495) (3,284) (1,690)
Interest expense (123) (146) (353) (414)
Other income (expense), net (1,832) (535) 1,821 (7,796)
Loss before income taxes and income (loss) from equity method investments (2,

In [20]:
print(response)

The results of running the MAGIC_FORMULA on Uber's revenue and cost for the nine months ended September 30, 2022 are as follows:
- For the revenue of $23,270 million and cost of $24,960 million, the result is -1690.
- For the revenue of $11,677 million and cost of $14,352 million, the result is -2675.
