<a href="https://colab.research.google.com/github/sudarshan-koirala/youtube-stuffs/blob/main/llamaindex/llamaindex_openai_assistant_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OpenAI Assistant Agent

This shows you how to use our agent abstractions built on top of the [OpenAI Assistant API](https://platform.openai.com/docs/assistants/overview).


In [1]:
%%capture
!pip install llama-index watermark openai

In [6]:
%load_ext watermark
%watermark -a "Sudarshan Koirala" -vmp llama_index,openai

Author: Sudarshan Koirala

Python implementation: CPython
Python version       : 3.10.12
IPython version      : 7.34.0

llama_index: 0.8.64.post1
openai     : 1.1.1

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 5.15.120+
Machine     : x86_64
Processor   : x86_64
CPU cores   : 2
Architecture: 64bit



In [7]:
# get your openai api key from https://platform.openai.com/account/api-keys 🔑
import openai
import os
from getpass import getpass

OPENAI_API_KEY = getpass()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

··········


## Simple Agent (no external tools)

Here we show a simple example with the built-in code interpreter.

In [8]:
from llama_index.agent import OpenAIAssistantAgent

In [42]:
OpenAIAssistantAgent??

In [10]:
agent = OpenAIAssistantAgent.from_new(
    name="Math Tutor",
    instructions="You are a personal math tutor. Write and run code to answer math questions.",
    openai_tools=[{"type": "code_interpreter"}],
    instructions_prefix="Please address the user as Sudarshan Koirala.",
)

In [11]:
agent.thread_id

'thread_ijDl6GDDjNhAcOWgDFRJpr64'

In [12]:
response = agent.chat(
    "I need to solve the equation `3x + 11 = 14`. Can you help me?"
)

In [13]:
print(str(response))

To solve the equation \(3x + 11 = 14\), we can isolate \(x\) as follows:

Subtract \(11\) from both sides of the equation to get:
\[3x = 14 - 11\]

Now, we have:
\[3x = 3\]

Next, divide both sides by \(3\) to solve for \(x\):
\[x = \frac{3}{3} = 1\]

Therefore, the solution to the equation \(3x + 11 = 14\) is \(x = 1\).


## Assistant with Query Engine Tools

Here we showcase the function calling capabilities of the OpenAIAssistantAgent by integrating it with our query engine tools over different documents.

### 1. Setup: Load Data

In [15]:
from llama_index.agent import OpenAIAssistantAgent
from llama_index import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    load_index_from_storage,
)

from llama_index.tools import QueryEngineTool, ToolMetadata

In [16]:
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/lyft"
    )
    lyft_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/uber"
    )
    uber_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

In [17]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

--2023-11-07 18:50:33--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1880483 (1.8M) [application/octet-stream]
Saving to: ‘data/10k/uber_2021.pdf’


2023-11-07 18:50:33 (20.9 MB/s) - ‘data/10k/uber_2021.pdf’ saved [1880483/1880483]

--2023-11-07 18:50:34--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1440303 (1.4M) [application/oc

In [19]:
%%capture
!pip install pypdf

In [20]:
if not index_loaded:
    # load data
    lyft_docs = SimpleDirectoryReader(
        input_files=["./data/10k/lyft_2021.pdf"]
    ).load_data()

    uber_docs = SimpleDirectoryReader(
        input_files=["./data/10k/uber_2021.pdf"]
    ).load_data()

    # build index
    lyft_index = VectorStoreIndex.from_documents(lyft_docs)
    uber_index = VectorStoreIndex.from_documents(uber_docs)

    # persist index
    lyft_index.storage_context.persist(persist_dir="./storage/lyft")
    uber_index.storage_context.persist(persist_dir="./storage/uber")

### 2. Create Engine

In [21]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)

In [22]:
lyft_engine

<llama_index.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x7b63f7a3eec0>

In [23]:
uber_engine

<llama_index.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x7b63f7a3e8f0>

In [24]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

### 3. Now the query engine tools is being created, let's try it out with this tools.

In [34]:
agent = OpenAIAssistantAgent.from_new(
    name="SEC Analyst",
    instructions="You are a QA assistant designed to analyze sec filings.",
    tools=query_engine_tools,
    instructions_prefix="Please address the user as Sudarshan.",
    verbose=True,
    run_retrieve_sleep_time=1.0,
)

In [35]:
response = agent.chat("What was Lyft's revenue growth in 2021?")

=== Calling Function ===
Calling function: lyft_10k with args: {"input":"What was Lyft's revenue growth in 2021?"}
Got output: Lyft's revenue growth in 2021 was 36%.


In [38]:
response.response

"Lyft's revenue growth in 2021 was 36%."

In [36]:
response.source_nodes

[NodeWithScore(node=TextNode(id_='811b322d-e86f-4986-a1b5-01615cf4190d', embedding=None, metadata={'page_label': '57', 'file_name': 'lyft_2021.pdf', 'file_path': 'data/10k/lyft_2021.pdf', 'creation_date': '2023-11-07', 'last_modified_date': '2023-11-07', 'last_accessed_date': '2023-11-07'}, excluded_embed_metadata_keys=['creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='ead7931d-bf6d-4997-978a-80763f96e92a', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '57', 'file_name': 'lyft_2021.pdf', 'file_path': 'data/10k/lyft_2021.pdf', 'creation_date': '2023-11-07', 'last_modified_date': '2023-11-07', 'last_accessed_date': '2023-11-07'}, hash='34f0436addaa20dc5e1bc13b9bc29cbe3f7256873b155653b81ce65b2260c200'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='8e3f8140-7afa-455f-80f8-fdd829256cfa', node_t

## Assistant with Built-In Retrieval

Let's test the assistant by having it use the built-in OpenAI Retrieval tool over a user-uploaded file.

Here, we upload and pass in the file during assistant-creation time.

The other option is you can upload/pass the file-id in for a message in a given thread with `upload_files` and `add_message`. [Link](https://platform.openai.com/docs/assistants/tools/uploading-files-for-retrieval) to openai documentation.

In [27]:
from llama_index.agent import OpenAIAssistantAgent

In [28]:
agent = OpenAIAssistantAgent.from_new(
    name="SEC Analyst",
    instructions="You are a QA assistant designed to analyze sec filings.",
    openai_tools=[{"type": "retrieval"}],
    instructions_prefix="Please address the user as Sudarshan.",
    files=["data/10k/lyft_2021.pdf"],
    verbose=True,
)

In [29]:
response = agent.chat("What was Lyft's revenue growth in 2021?")

In [31]:
from pprint import pprint
pprint(response.response)

("Lyft's revenue increased $843.6 million or 36% in 2021 as compared to the "
 'previous year. This growth was driven primarily by a significant increase in '
 'the number of Active Riders as vaccines became more widely distributed and '
 'communities began to reopen【7†source】.')
