# Part 15

# File Search Annotations

Universal code for the entire notebook

In [1]:
# Uncomment the line below to make sure you have all the packages needed
# %pip install -r requirements.txt

In [2]:
# Import necessary libraries
from openai import OpenAI  # Used for interacting with OpenAI's API
from typing_extensions import override  # Used for overriding methods in subclasses
from openai import AssistantEventHandler  # Used for handling events related to OpenAI assistants

In [3]:
# Create an instance of the OpenAI class to interact with the API.
# This assumes you have set the OPENAI_API_KEY environment variable.
client = OpenAI() 

### Special Event Handler (Doesn't Work Properly)

Ironically, the example for streaming, at the time of this writing, doesn't actually stream. You can see the post I made in the dev forum:
[Streaming Example for Assistants in Documentation Doesn’t Stream](https://community.openai.com/t/streaming-example-for-assistants-in-documentation-doesnt-stream/834370) 

In [4]:
# Event handler class that will be used to handle events related to the assistant
# This version doesn't actually stream and is supposed to according to the documentation
# I'm putting it here for reference and to show how it would be implemented later
# https://platform.openai.com/docs/assistants/tools/file-search/step-5-create-a-run-and-check-the-output
class EventHandler(AssistantEventHandler):
    @override
    def on_text_created(self, text) -> None:
        print(f"\nassistant > ", end="", flush=True)

    @override
    def on_tool_call_created(self, tool_call):
        print(f"\nassistant > {tool_call.type}\n", flush=True)

    @override
    def on_message_done(self, message) -> None:
        # print a citation to the file searched
        message_content = message.content[0].text
        annotations = message_content.annotations
        citations = []
        for index, annotation in enumerate(annotations):
            message_content.value = message_content.value.replace(
                annotation.text, f"[{index}]"
            )
            if file_citation := getattr(annotation, "file_citation", None):
                cited_file = client.files.retrieve(file_citation.file_id)
                citations.append(f"[{index}] {cited_file.filename}")

        print(message_content.value)
        print("\n".join(citations))


### Creating an Assistant with a Vector Store Already Attached

Now we will get a file reference and then create an assistant with a vector store that we will use for the rest of this notebook. It's a straightforward approach that we have seen before. 

In [5]:
drac_file = client.files.create(file=open("./artifacts/Dracula.pdf","rb"), purpose="assistants")

# Create an assistant using the client library.
try:
    assistant = client.beta.assistants.create(
        model="gpt-4o",  # Specify the model to be used.
        instructions=(
            "You are a helpful assistant that answers questions about the stories in your files. "
            "The stories are from a variety of authors. "
            "You will answer questions from the user about the stories. All you will do is answer questions about the stories in the files and provide related information. "
            "If the user asks you a question that is not related to the stories in the files, you should let them know that you can only answer questions about the stories."
        ),
        name="Quick Assistant and Vector Store at Once",  # Give the assistant a name.
        tools=[{"type": "file_search"}],  # Add the file search capability to the assistant.
        # Create a vector store and attach it to the assistant in one step.
        tool_resources={
            "file_search": {
                "vector_stores": [
                    {
                        "name": "Vector Store Auto Attached to Assistant",
                        "file_ids": [
                            drac_file.id,
                        ],
                        "metadata": {
                            "Book1": "Dracula", 
                        }
                    }
                ]
            }
        },
        metadata={  # Add metadata about the assistant's capabilities.
            "can_be_used_for_file_search": "True",
            "has_vector_store": "True",
        },
        temperature=1,  # Set the temperature for response variability.
        top_p=1,  # Set the top_p for nucleus sampling.
    )
except Exception as e:
    print(f"An error occurred while creating the assistant: {e}")
else:
    # Print the details of the created assistant to check its properties.
    print(assistant)  # Print the full assistant object.
    print("\n\n")
    print("Assistant Name: " + assistant.name)  # Print the name of the assistant.
    print("\n")
    
    # get the vector store information
    unnamed_assistant_vector_store = client.beta.vector_stores.retrieve(assistant.tool_resources.file_search.vector_store_ids[0])
    print("Vector Store Name: " + str(unnamed_assistant_vector_store.name))
    print("Vector Store Id: " + unnamed_assistant_vector_store.id)
    print("Vector Store Metadata: " + str(unnamed_assistant_vector_store.metadata))

Assistant(id='asst_dJpS9UHkUw888923s6W70Uao', created_at=1719069260, description=None, instructions='You are a helpful assistant that answers questions about the stories in your files. The stories are from a variety of authors. You will answer questions from the user about the stories. All you will do is answer questions about the stories in the files and provide related information. If the user asks you a question that is not related to the stories in the files, you should let them know that you can only answer questions about the stories.', metadata={'can_be_used_for_file_search': 'True', 'has_vector_store': 'True'}, model='gpt-4o', name='Quick Assistant and Vector Store at Once', object='assistant', tools=[FileSearchTool(type='file_search', file_search=None)], response_format='auto', temperature=1.0, tool_resources=ToolResources(code_interpreter=None, file_search=ToolResourcesFileSearch(vector_store_ids=['vs_jRqr0FIGOLC4DtHpu7LfhvHb'])), top_p=1.0)



Assistant Name: Quick Assistant

In [6]:
# Always name your vector stores
updated_vector_store = client.beta.vector_stores.update(
    vector_store_id=unnamed_assistant_vector_store.id,
    name="Dracula Vector Store",
    metadata={"Book1": "Dracula"}
)

print("Vector Store Name: " + str(updated_vector_store.name))
print("Vector Store Id: " + updated_vector_store.id)
print("Vector Store Metadata: " + str(updated_vector_store.metadata))

Vector Store Name: Dracula Vector Store
Vector Store Id: vs_jRqr0FIGOLC4DtHpu7LfhvHb
Vector Store Metadata: {'Book1': 'Dracula'}


### Create a Thread and Run the Stream

Finally, we create a thread that we will use for the entire notebook and stream the output using the OLD event handler that doesn't really stream. 

In [7]:
# Create a thread and attach the file to the message
thread = client.beta.threads.create(
    messages=[
    {
    "role": "user",
    "content": "Who are all the main characters in Dracula? Cite the location they are first introduced in the book. Every character should have a separate citation.",
    }
]
)

In [8]:
# Using our first assistant
with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id,
    event_handler=EventHandler(),
) as stream:
    stream.until_done()


assistant > file_search


assistant > Here are the main characters in *Dracula* along with citations of where they are first introduced in the book:

1. **Jonathan Harker** - First introduced in his own journal as he arrives at Count Dracula's castle.
   - Citation: "In the chapters describing his journey to Transylvania and arrival at the castle." [0]

2. **Mina Murray (later Mina Harker)** - First introduced in a letter to Lucy Westenra.
   - Citation: "Letter from Miss Mina Murray to Miss Lucy Westenra." [1]

3. **Lucy Westenra** - First introduced in Mina Murray’s letter.
   - Citation: "Letter from Miss Mina Murray to Miss Lucy Westenra." [1]

4. **Dr. John Seward** - First introduced in his own diary.
   - Citation: "Dr. Seward's Diary." [3]

5. **Arthur Holmwood (Lord Godalming)** - First introduced in Dr. Seward's records.
   - Citation: "Dr. Seward's Diary, mentioning his involvement and their meetings." [4]

6. **Quincey Morris** - First introduced in Dr. Seward's accounts i

### New Event Handler that Actually Streams

Having seen how the old event handler worked, or didn't in this case, we will create a new event handler to actually stream the output and then stream it

In [9]:
class EventHandler(AssistantEventHandler):
    """Custom event handler for processing assistant events."""

    def __init__(self):
        super().__init__()
        self.results = []  # Initialize the results list

    @override
    def on_text_created(self, text) -> None:
        """Handle the event when text is first created."""
        # Print the created text to the console
        print("\nassistant text > ", end="", flush=True)
        # Append the created text to the results list
        self.results.append(text)

    @override
    def on_text_delta(self, delta, snapshot):
        """Handle the event when there is a text delta (partial text)."""
        # Print the delta value (partial text) to the console
        print(delta.value, end="", flush=True)
        # Append the delta value to the results list
        self.results.append(delta.value)

    def on_tool_call_created(self, tool_call):
        """Handle the event when a tool call is created."""
        # Print the type of the tool call to the console
        print(f"\nassistant tool > {tool_call.type}\n", flush=True)

    def on_tool_call_delta(self, delta, snapshot):
        """Handle the event when there is a delta (update) in a tool call."""
        if delta.type == 'code_interpreter':
            # Check if there is an input in the code interpreter delta
            if delta.code_interpreter.input:
                # Print the input to the console
                print(delta.code_interpreter.input, end="", flush=True)
                # Append the input to the results list
                self.results.append(delta.code_interpreter.input)
            # Check if there are outputs in the code interpreter delta
            if delta.code_interpreter.outputs:
                # Print a label for outputs to the console
                print("\n\noutput >", flush=True)
                # Iterate over each output and handle logs specifically
                for output in delta.code_interpreter.outputs or []:
                    if output.type == "logs":
                        # Print the logs to the console
                        print(f"\n{output.logs}", flush=True)
                        # Append the logs to the results list
                        self.results.append(output.logs)

In [10]:
# Using our first assistant
with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id,
    event_handler=EventHandler(),
) as stream:
    stream.until_done()


assistant tool > file_search


assistant text > Here are the main characters in *Dracula* along with citations of where they are first introduced in the book:

1. **Jonathan Harker** – First introduced in his own journal as he arrives at Count Dracula's castle.
   - Citation: "Jonathan Harker's Journal" 【4:2†source】

2. **Mina Murray (later Mina Harker)** – First introduced in a letter to Lucy Westenra.
   - Citation: "Letter from Miss Mina Murray to Miss Lucy Westenra" 【4:18†source】

3. **Lucy Westenra** – First introduced in Mina Murray’s letter.
   - Citation: "Letter from Miss Mina Murray to Miss Lucy Westenra" 【4:18†source】

4. **Dr. John Seward** – First introduced in his own diary.
   - Citation: "Dr. Seward's Diary" 【4:4†source】

5. **Arthur Holmwood (Lord Godalming)** – First introduced in Dr. Seward's records.
   - Citation: "Dr. Seward's Diary" 【7:0†source】

6. **Quincey Morris** – First introduced in interaction with other characters in Dr. Seward's diary.
   - Citation: "

### Final Notes
You can see we get the list of characters and a quote from where they are first introduced so that is a good thing. However, as of the time of this writing, dealing with annotations like 【4:13†Dracula.pdf】are still a challenge. That said, the API changes almost daily so, by the time you read this, there may be a fix. You can look at this topic in the dev forum to see if it can provide insight. 

[How can I access the specific text of the file that the annotation is referencing?](https://community.openai.com/t/how-can-i-access-the-specific-text-of-the-file-that-the-annotation-is-referencing/726723) 


