# Assistants API - Knowledge Retrieval 

https://platform.openai.com/docs/assistants/tools/knowledge-retrieval

https://community.openai.com/t/new-assistants-api-a-potential-replacement-for-low-level-rag-style-content-generation/475677 

Watch:

https://youtu.be/5rcjGjgJNQc?t=600&si=d9OtX0nMi2Rv0fQV 

References:

https://community.openai.com/t/assistants-api-retrieval-pricing-how-much-does-this-cost/485188/8

https://medium.com/madhukarkumar/what-does-openais-announcement-mean-for-retrieval-augmented-generation-rag-and-vector-only-54bfc34cba2c

https://www.youtube.com/watch?v=ClfyQNkTeUc

https://www.pinecone.io/learn/assistants-api-canopy/




![Alt text](ret.jpg "Assistants")

![Alt text](objects.jpeg "Assistants_Objects")

https://cobusgreyling.medium.com/openai-assistant-with-retriever-tool-08e9158ca900 

In [31]:
from openai import OpenAI
import json
from dotenv import load_dotenv, find_dotenv

_ : bool = load_dotenv(find_dotenv()) # read local .env file

In [32]:
client : OpenAI = OpenAI()

### Knowledge Retrieval

Retrieval augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. Once a file is uploaded and passed to the Assistant, OpenAI will automatically chunk your documents, index and store the embeddings, and implement vector search to retrieve relevant content to answer user queries.

https://platform.openai.com/docs/assistants/tools/knowledge-retrieval 



### How it works

The model then decides when to retrieve content based on the user Messages. The Assistants API automatically chooses between two retrieval techniques:

1. it either passes the file content in the prompt for short documents, or
2. performs a vector search for longer documents

Retrieval currently optimizes for quality by adding all relevant content to the context of model calls. We plan to introduce other retrieval strategies to enable developers to choose a different tradeoff between retrieval quality and model usage cost.

https://platform.openai.com/docs/assistants/tools/how-it-works


### Step 1: Upload the file and Create an Assistant

In [33]:
# Create a vector store caled "Sir Zia BioGraphy"
vector_store = client.beta.vector_stores.create(name="Sir Zia Biography")
 
# Ready the files for upload to OpenAI
file_paths = ["zia_profile.pdf"]
file_streams = [open(path, "rb") for path in file_paths]
 
# Use the upload and poll SDK helper to upload the files, add them to the vector store,
# and poll the status of the file batch for completion.
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
  vector_store_id=vector_store.id, files=file_streams
)
 
# You can print the status and the file counts of the batch to see the result of this operation.
print(file_batch.status)
print(file_batch.file_counts)


completed
FileCounts(cancelled=0, completed=1, failed=0, in_progress=0, total=1)


In [34]:
assistant: Assistant = client.beta.assistants.create(
  name="Student Support Assistant",
  instructions="You are a student support chatbot. Use your knowledge base to best respond to student queries about Zia U. Khan.",
  model="gpt-3.5-turbo-1106",
  tools=[{"type": "file_search"}],
  tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)


### Step 2: Create a Thread

In [35]:
from openai.types.beta.thread import Thread

thread: Thread  = client.beta.threads.create()

print(thread)


Thread(id='thread_GRbd9ogrErAhIO8OrXzR6772', created_at=1718204571, metadata={}, object='thread', tool_resources=ToolResources(code_interpreter=None, file_search=None))


### Step 3: Add a Message to a Thread

In [36]:

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="When and which city Zia U. Khan was born?"
)


### Step 4: Run the Assistant

In [37]:
from openai.types.beta.threads.run import Run

run: Run = client.beta.threads.runs.create_and_poll(
  thread_id=thread.id,
  assistant_id=assistant.id,
  instructions="Please address the user as Pakistani. The user is the student of PIAIC."
)


### Step 5: Check the Run status

In [38]:
run: Run = client.beta.threads.runs.retrieve(
  thread_id=thread.id,
  run_id=run.id
)

print(run)


Run(id='run_PK32oDm9Ly2CPlEKA9ivlBXH', assistant_id='asst_8qgXeWGG8UX6Hh6xleZwHmtW', cancelled_at=None, completed_at=1718204579, created_at=1718204576, expires_at=None, failed_at=None, incomplete_details=None, instructions='Please address the user as Pakistani. The user is the student of PIAIC.', last_error=None, max_completion_tokens=None, max_prompt_tokens=None, metadata={}, model='gpt-3.5-turbo-1106', object='thread.run', required_action=None, response_format='auto', started_at=1718204576, status='completed', thread_id='thread_GRbd9ogrErAhIO8OrXzR6772', tool_choice='auto', tools=[FileSearchTool(type='file_search')], truncation_strategy=TruncationStrategy(type='auto', last_messages=None), usage=Usage(completion_tokens=59, prompt_tokens=3799, total_tokens=3858), temperature=1.0, top_p=1.0, tool_resources={}, parallel_tool_calls=True)


### Step 6: Display the Assistant's Response

In [39]:
# from openai.resources.beta.threads.messages.messages import SyncCursorPage 

messages = client.beta.threads.messages.list(
  thread_id=thread.id
)

for m in reversed(messages.data):
  print(m.role + ": " + m.content[0].text.value)


user: When and which city Zia U. Khan was born?
assistant: Zia U. Khan was born in Sialkot in 1961 in an army garrison. This information can be found in the "zia_profile.pdf" document .


### Streaming

In [40]:
from typing_extensions import override
from openai import AssistantEventHandler, OpenAI

class EventHandler(AssistantEventHandler):
    @override
    def on_text_created(self, text) -> None:
        print(f"\nassistant > ", end="", flush=True)

    @override
    def on_tool_call_created(self, tool_call):
        print(f"\nassistant > {tool_call.type}\n", flush=True)

    @override
    def on_message_done(self, message) -> None:
        # print a citation to the file searched
        message_content = message.content[0].text
        annotations = message_content.annotations
        citations = []
        for index, annotation in enumerate(annotations):
            message_content.value = message_content.value.replace(
                annotation.text, f"[{index}]"
            )
            if file_citation := getattr(annotation, "file_citation", None):
                cited_file = client.files.retrieve(file_citation.file_id)
                citations.append(f"[{index}] {cited_file.filename}")

        print(message_content.value)
        print("\n".join(citations))


# Then, we use the stream SDK helper
# with the EventHandler class to create the Run
# and stream the response.

with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id,
    instructions="Please address the user as Pakistani. The user is the student of PIAIC.",
    event_handler=EventHandler(),
) as stream:
    stream.until_done()


assistant > Zia U. Khan was born in Sialkot in 1961 in an army garrison .

