# Assistants with a File
This example uses the new assistants API to ask questions about a file. The file in this case is just a CSV created using, you guessed it, GPT 4. 

Create your `.env` file with the keys needed:
```text
AOAI_ENDPOINT=https://<azure resource here>.openai.azure.com/
AOAI_KEY=99999999
```

Create the `movies.csv` or use mine:
1. Open up the `Chat` interface in https://oai.azure.com and use the following
    - Prompt: Generate me a list ficticious movies with a name column and a rating column. Output in CSV format
1. Paste the output into the `movies.csv` file

Here is how it works:
1. Add the file to Azure OpenAI
1. Create an assistant and reference the file
    > NOTE: I never tell it the format of the file
1. Create a thread
    - 1 per sessions
1. Ask questions and look at the runs
1. Delete the file

You could easily use this to generate images and output. Just need to capture the `file_ids` with each response

In [2]:
import os
import json
from dotenv import load_dotenv
import requests
import time

# Load the .env file into the environment
load_dotenv()

True

In [3]:
# Set up the variables
endpoint=os.getenv('AOAI_ENDPOINT')
key=os.getenv('AOAI_KEY')
# Enter your deployment name here
model='gpt-4-turbo'

headers = {}
headers['api-key']=key
headers['Content-Type']='application/json'

In [4]:
form_header={'api-key': key}

files = {
    'file': ('movies.csv', open('movies.csv', 'rb')),
    'purpose': (None, 'assistants')
}
url=f"{endpoint}openai/files?api-version=2024-02-15-preview"
response=requests.post(url,files=files,headers=form_header)
print(response.status_code)
if response.status_code == 200:
    upload_response = response.json()
    print(upload_response)
    file_id=upload_response['id']
    print("File ID: ",file_id)

200
{'object': 'file', 'id': 'assistant-3WqllhlWgSzB2mL9RYoaHZcL', 'purpose': 'assistants', 'filename': 'movies.csv', 'bytes': 660, 'created_at': 1707184604, 'status': 'processed', 'status_details': None}


In [6]:
# Create the assistant
data={
  "instructions": "You are a movie analyst. When asked a question, you will parse your CSV file to provide the requested analysis.",
  "name": "Movie Assistant",
  "tools": [{"type": "code_interpreter"}],
  "model": f"{model}",
  "file_ids": [f"{file_id}"]
}
print(data)
url=f"{endpoint}openai/assistants?api-version=2024-02-15-preview"
response=requests.post(url,json=data,headers=headers)
print(response.status_code)
if response.status_code == 200:
  print(response.json())
  assistant_id=response.json()['id']
  print("Assistant ID: ",assistant_id)


{'instructions': 'You are a movie analyst. When asked a question, you will parse your CSV file to provide the requested analysis.', 'name': 'Movie Assistant', 'tools': [{'type': 'code_interpreter'}], 'model': 'gpt-4-turbo', 'file_ids': ['assistant-3WqllhlWgSzB2mL9RYoaHZcL']}
200
{'id': 'asst_RJS82ktjDssVR7s5HlQeLPMa', 'object': 'assistant', 'created_at': 1707184626, 'name': 'Movie Assistant', 'description': None, 'model': 'gpt-4-turbo', 'instructions': 'You are a movie analyst. When asked a question, you will parse your CSV file to provide the requested analysis.', 'tools': [{'type': 'code_interpreter'}], 'file_ids': ['assistant-3WqllhlWgSzB2mL9RYoaHZcL'], 'metadata': {}}
Assistant ID:  asst_RJS82ktjDssVR7s5HlQeLPMa


In [7]:
# Create the thread
url=f"{endpoint}openai/threads?api-version=2024-02-15-preview"
response=requests.post(url,headers=headers)
print(response.status_code)
if response.status_code == 200:
  print(response.json())
  thread_id=response.json()['id']
  print("Thread ID: ",thread_id)


200
{'id': 'thread_GfHH0MUe3eeYqk3ekqX5QU71', 'object': 'thread', 'created_at': 1707184646, 'metadata': {}}
Thread ID:  thread_GfHH0MUe3eeYqk3ekqX5QU71


## Define the reusable functions

In [12]:
def output_status():
  url=f"{endpoint}openai/threads/{thread_id}/messages?api-version=2024-02-15-preview"
  response=requests.get(url,headers=headers)
  output = response.json()
  # print(json.dumps(output, indent=2))
  for message in reversed(output['data']):
    print(message['role'], ":", message['content'][0]['text']['value'])

def ask_a_question(content:str):
  # Ask a question
  data={
    "role": "user",
    "content": content
  }
  url=f"{endpoint}openai/threads/{thread_id}/messages?api-version=2024-02-15-preview"
  response=requests.post(url,json=data,headers=headers)
  # print(response.status_code)

  # Run the request
  data = {
    "assistant_id": f"{assistant_id}"
  }

  url=f"{endpoint}openai/threads/{thread_id}/runs?api-version=2024-02-15-preview"
  response=requests.post(url,json=data,headers=headers)
  if response.status_code == 200:
    # print(response.json())
    run_id = response.json()['id']
    # print("Run ID: ",run_id)

  while True:
    # Check the status and wait for completion
    url=f"{endpoint}openai/threads/{thread_id}/runs/{run_id}?api-version=2024-02-15-preview"
    response=requests.get(url,headers=headers)
    output=response.json()
    if output['status'] != 'completed':
      # sleep for 1 second
      # print("Sleeping for 2 second")
      time.sleep(2)
    else:
      break
    
  output_status()


In [9]:
ask_a_question("What was the highest rated movie?")

Run ID:  run_zh3sgu3RWnRi6Sr9Q9MUJWbp
Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
user : What was the highest rated movie?
assistant : The highest rated movie in the dataset is "Shadows of Tomorrow" with a rating of 9.1.


In [11]:
ask_a_question("What are the top 5 movies by rating?")

Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
Sleeping for 2 second
user : What was the highest rated movie?
assistant : The highest rated movie in the dataset is "Shadows of Tomorrow" with a rating of 9.1.
user : What are the top 5 movies by rating?
assistant : The top 5 movies by rating are:

1. "Shadows of Tomorrow" with a rating of 9.1
2. "Starlight Beyond Time" with a rating of 9.0
3. "Secrets of the Alabaster Tower" with a rating of 8.9
4. "The Whispering Echoes" with a rating of 8.7
5. "Voyage through the Hidden Isles" with a rating of 8.6.


In [13]:
ask_a_question("What is the average rating of the movies?")


user : What was the highest rated movie?
assistant : The highest rated movie in the dataset is "Shadows of Tomorrow" with a rating of 9.1.
user : What are the top 5 movies by rating?
assistant : The top 5 movies by rating are:

1. "Shadows of Tomorrow" with a rating of 9.1
2. "Starlight Beyond Time" with a rating of 9.0
3. "Secrets of the Alabaster Tower" with a rating of 8.9
4. "The Whispering Echoes" with a rating of 8.7
5. "Voyage through the Hidden Isles" with a rating of 8.6.
user : What is the average rating of the movies?
assistant : The average rating of the movies is approximately 7.99.


In [None]:
ask_a_question("What is the lowest rate movies?")


In [None]:
ask_a_question("How many movies are there?")


In [None]:
# Delete the file
url=f"{endpoint}openai/files/{file_id}?api-version=2024-02-15-preview"
response=requests.delete(url,headers=headers)