<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/agent/openai_agent_query_plan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OpenAI Agent Workarounds for Lengthy Tool Descriptions
In this demo, we illustrate a workaround for defining an OpenAI tool
whose description exceeds OpenAI's current limit of 1024 characters.
For simplicity, we will build upon the `QueryPlanTool` notebook
example.

If you're opening this Notebook on Colab, you will probably need to install LlamaIndex 🦙.

In [None]:
%pip install llama-index-agent-openai
%pip install llama-index-llms-openai

In [None]:
!pip install llama-index

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.llms.openai import OpenAI

In [None]:
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

In [None]:
llm = OpenAI(temperature=0, model="gpt-4")

## Download Data

In [None]:
!mkdir -p 'data/10q/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf' -O 'data/10q/uber_10q_march_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_june_2022.pdf' -O 'data/10q/uber_10q_june_2022.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_sept_2022.pdf' -O 'data/10q/uber_10q_sept_2022.pdf'

--2024-05-23 13:36:24--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1260185 (1.2M) [application/octet-stream]
Saving to: ‘data/10q/uber_10q_march_2022.pdf’


2024-05-23 13:36:24 (29.0 MB/s) - ‘data/10q/uber_10q_march_2022.pdf’ saved [1260185/1260185]

--2024-05-23 13:36:24--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_june_2022.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response

## Load data

In [None]:
march_2022 = SimpleDirectoryReader(
    input_files=["./data/10q/uber_10q_march_2022.pdf"]
).load_data()
june_2022 = SimpleDirectoryReader(
    input_files=["./data/10q/uber_10q_june_2022.pdf"]
).load_data()
sept_2022 = SimpleDirectoryReader(
    input_files=["./data/10q/uber_10q_sept_2022.pdf"]
).load_data()

## Build indices

We build a vector index / query engine over each of the documents (March, June, September).

In [None]:
march_index = VectorStoreIndex.from_documents(march_2022)
june_index = VectorStoreIndex.from_documents(june_2022)
sept_index = VectorStoreIndex.from_documents(sept_2022)

In [None]:
march_engine = march_index.as_query_engine(similarity_top_k=3, llm=llm)
june_engine = june_index.as_query_engine(similarity_top_k=3, llm=llm)
sept_engine = sept_index.as_query_engine(similarity_top_k=3, llm=llm)

## Defining an Excessively Lengthy Query Plan

Although a `QueryPlanTool` may be composed from many `QueryEngineTools`,
a single OpenAI tool is ultimately created from the `QueryPlanTool`
when the OpenAI API call is made. The description of this tool begins with
general instructions about the query plan approach, followed by the
descriptions of each constituent `QueryEngineTool`.

Currently, each OpenAI tool description has a maximum length of 1024 characters.
As you add more `QueryEngineTools` to your `QueryPlanTool`, you may exceed
this limit. If the limit is exceeded, LlamaIndex will raise an error when it
attempts to construct the OpenAI tool.

Let's demonstrate this scenario with an exaggerated example, where we will
give each query engine tool a very lengthy and redundant description.

In [None]:
description_10q_general = """\
A Form 10-Q is a quarterly report required by the SEC for publicly traded companies,
providing an overview of the company's financial performance for the quarter.
It includes unaudited financial statements (income statement, balance sheet,
and cash flow statement) and the Management's Discussion and Analysis (MD&A),
where management explains significant changes and future expectations.
The 10-Q also discloses significant legal proceedings, updates on risk factors,
and information on the company's internal controls. Its primary purpose is to keep
investors informed about the company's financial status and operations,
enabling informed investment decisions."""

description_10q_specific = (
    "This 10-Q provides Uber quarterly financials ending"
)

In [None]:
from llama_index.core.tools import QueryEngineTool
from llama_index.core.tools import QueryPlanTool
from llama_index.core import get_response_synthesizer

In [None]:
query_tool_sept = QueryEngineTool.from_defaults(
    query_engine=sept_engine,
    name="sept_2022",
    description=f"{description_10q_general} {description_10q_specific} September 2022",
)
query_tool_june = QueryEngineTool.from_defaults(
    query_engine=june_engine,
    name="june_2022",
    description=f"{description_10q_general} {description_10q_specific} June 2022",
)
query_tool_march = QueryEngineTool.from_defaults(
    query_engine=march_engine,
    name="march_2022",
    description=f"{description_10q_general} {description_10q_specific} March 2022",
)

print(len(query_tool_sept.metadata.description))
print(len(query_tool_june.metadata.description))
print(len(query_tool_march.metadata.description))

730
725
726


From the print statements above, we see that we will easily exceed the
maximum character limit of 1024 when composing these tools into the `QueryPlanTool`.

In [None]:
query_engine_tools = [query_tool_sept, query_tool_june, query_tool_march]

response_synthesizer = get_response_synthesizer()
query_plan_tool = QueryPlanTool.from_defaults(
    query_engine_tools=query_engine_tools,
    response_synthesizer=response_synthesizer,
)

In [None]:
openai_tool = query_plan_tool.metadata.to_openai_tool()

ValueError: Tool description exceeds maximum length of 1024 characters. Please shorten your description or move it to the prompt.

## Moving Tool Descriptions to the Prompt

One obvious solution to this problem would be to shorten the tool
descriptions themselves, however with sufficiently many tools,
we will still eventually exceed the character limit.

A more scalable solution would be to move the tool descriptions to the prompt.
This solves the character limit issue, since without the descriptions
of the query engine tools, the query plan description will remain fixed
in size. Of course, token limits imposed by the selected LLM will still
bound the tool descriptions, however these limits are far larger than the
1024 character limit.

There are two steps involved in moving these tool descriptions to the
prompt. First, we must modify the metadata property of the `QueryPlanTool`
to omit the `QueryEngineTool` descriptions, and make a slight modification
to the default query planning instructions (telling the LLM to look for the
tool names and descriptions in the prompt.)

In [None]:
from llama_index.core.tools.types import ToolMetadata

introductory_tool_description_prefix = """\
This is a query plan tool that takes in a list of tools and executes a \
query plan over these tools to answer a query. The query plan is a DAG of query nodes.

Given a list of tool names and the query plan schema, you \
can choose to generate a query plan to answer a question.

The tool names and descriptions will be given alongside the query.
"""

# Modify metadata to only include the general query plan instructions
new_metadata = ToolMetadata(
    introductory_tool_description_prefix,
    query_plan_tool.metadata.name,
    query_plan_tool.metadata.fn_schema,
)
query_plan_tool.metadata = new_metadata
query_plan_tool.metadata

ToolMetadata(description='This is a query plan tool that takes in a list of tools and executes a query plan over these tools to answer a query. The query plan is a DAG of query nodes.\n\nGiven a list of tool names and the query plan schema, you can choose to generate a query plan to answer a question.\n\nThe tool names and descriptions will be given alongside the query.\n', name='query_plan_tool', fn_schema=<class 'llama_index.core.tools.query_plan.QueryPlan'>, return_direct=False)

Second, we must concatenate our tool names and descriptions alongside
the query being posed.

In [None]:
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

agent = FunctionAgent(
    tools=[query_plan_tool],
    llm=OpenAI(temperature=0, model="gpt-4o"),
)

query = "What were the risk factors in sept 2022?"

In [None]:
# Reconstruct concatenated query engine tool descriptions
tools_description = "\n\n".join(
    [
        f"Tool Name: {tool.metadata.name}\n"
        + f"Tool Description: {tool.metadata.description} "
        for tool in query_engine_tools
    ]
)

# Concatenate tool descriptions and query
query_planned_query = f"{tools_description}\n\nQuery: {query}"
query_planned_query

"Tool Name: sept_2022\nTool Description: A Form 10-Q is a quarterly report required by the SEC for publicly traded companies,\nproviding an overview of the company's financial performance for the quarter.\nIt includes unaudited financial statements (income statement, balance sheet,\nand cash flow statement) and the Management's Discussion and Analysis (MD&A),\nwhere management explains significant changes and future expectations.\nThe 10-Q also discloses significant legal proceedings, updates on risk factors,\nand information on the company's internal controls. Its primary purpose is to keep\ninvestors informed about the company's financial status and operations,\nenabling informed investment decisions. This 10-Q provides Uber quarterly financials ending September 2022 \n\nTool Name: june_2022\nTool Description: A Form 10-Q is a quarterly report required by the SEC for publicly traded companies,\nproviding an overview of the company's financial performance for the quarter.\nIt includes

In [None]:
response = await agent.run(query_planned_query)
response

Added user message to memory: Tool Name: sept_2022
Tool Description: A Form 10-Q is a quarterly report required by the SEC for publicly traded companies,
providing an overview of the company's financial performance for the quarter.
It includes unaudited financial statements (income statement, balance sheet,
and cash flow statement) and the Management's Discussion and Analysis (MD&A),
where management explains significant changes and future expectations.
The 10-Q also discloses significant legal proceedings, updates on risk factors,
and information on the company's internal controls. Its primary purpose is to keep
investors informed about the company's financial status and operations,
enabling informed investment decisions. This 10-Q provides Uber quarterly financials ending September 2022 

Tool Name: june_2022
Tool Description: A Form 10-Q is a quarterly report required by the SEC for publicly traded companies,
providing an overview of the company's financial performance for the quart

Response(response="The risk factors for Uber in September 2022 included:\n\n1. Failure to meet regulatory requirements related to climate change or to meet stated climate change commitments, which could impact costs, operations, brand, and reputation.\n2. The ongoing COVID-19 pandemic and responses to it were also a risk, as they had an adverse impact on business and operations, including reducing the demand for Mobility offerings globally and affecting travel behavior and demand.\n3. Catastrophic events such as disease, weather events, war, or terrorist attacks could also adversely impact the business, financial condition, and results of operation.\n4. Other risks included errors, bugs, or vulnerabilities in the platform's code or systems, inappropriate or controversial data practices, and the growing use of artificial intelligence.\n5. Climate change related physical and transition risks, such as market shifts toward electric vehicles and lower carbon business models, and risks relat