# GPT Assistant

At some point Cognitive will want the chatbot that sits at the centre of the applications to be able to run calcualtions with uploaded files. This script will aim to give a assistant a task with some data in a csv and then see if it can solve the problem.

### Imports

In [2]:
from openai import OpenAI
import sys
import requests
from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

sys.path.append('C:/projects/python/gpt-databasewrapper')

GPT_API_KEY = os.getenv('OPENAI_API_KEY')
client = OpenAI(
  api_key=GPT_API_KEY,
)

### File Upload

Going to use the famous titanic dataset and ask the assistant questions. **Note**: Once you upload a file, it will stay in your Open AI repository until you delete it. Be careful to upload loads of files.

In [3]:
# Upload the file
file = client.files.create(
  file=open("../data/raw/titanic_dataset.csv", "rb"),
    purpose="user_data" 
)


### Create Assistant

Will the create the assistant, notice that we are using the "code_interpreter" to answer this questions. This allows the assistant to create some python code in the background to solve the prompt.

In [5]:
assistant = client.beta.assistants.create(
  name="Titanic Questions",
  description="You are a bot attempting to answer questions about some data regarding the sinking of the titanic.",
  model="gpt-4o",
  tools=[{"type": "code_interpreter"}],
  tool_resources={
    "code_interpreter": {
      "file_ids": [file.id]
    }
  }
)


Once we have created the assistant, we then need to create the "thread". This represents the questions you would like to ask the machine about the data.

In [6]:
thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": "Based on this data, could you please tell me how many people survived/died on this section of the titanic?",
      "attachments": [
        {
          "file_id": file.id,
          "tools": [{"type": "code_interpreter"}]
        }
      ]
    }
  ]
)


We have submitted are set of questions, we now need to run the assistant.

In [7]:
run = client.beta.threads.runs.create(
  thread_id=thread.id,
  assistant_id=assistant.id
)

### Check Query Run

Once we create the run we need to monitor the results. It can take time to answer these types of questions.

In [8]:
import time

def retrieve_run_details(thread_id, run_id):
    return client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run_id)

# Start by getting initial run details
run_details = retrieve_run_details(thread_id=thread.id, run_id=run.id)
print("Initial Run Details:", run_details)

# Poll for updates on the run's status
while run_details.status not in ["completed", "failed", "cancelled"]:
    time.sleep(10)  # Wait for 10 seconds before checking again
    run_details = retrieve_run_details(thread_id=thread.id, run_id=run.id)
    print("Current Run Status:", run_details.status)

# Check final status and retrieve messages if completed
if run_details.status == "completed":
    messages = client.beta.threads.messages.list(thread_id=thread.id)
    for message in messages.data:
        print("Message Content:", message.content)
else:
    print("Run did not complete successfully:", run_details)
    if 'error' in run_details:
        print("Error Details:", run_details.error)

Initial Run Details: Run(id='run_YvcEMGrtDU5b4ju8k9gTAPTv', assistant_id='asst_5lF7PRvkEQ4oYm9fDb4HTMN1', cancelled_at=None, completed_at=1717153166, created_at=1717153155, expires_at=None, failed_at=None, incomplete_details=None, instructions=None, last_error=None, max_completion_tokens=None, max_prompt_tokens=None, metadata={}, model='gpt-4o', object='thread.run', required_action=None, response_format='auto', started_at=1717153155, status='completed', thread_id='thread_6N8MsxcyhWl8Yb76RT81RBfs', tool_choice='auto', tools=[CodeInterpreterTool(type='code_interpreter')], truncation_strategy=TruncationStrategy(type='auto', last_messages=None), usage=Usage(completion_tokens=231, prompt_tokens=3127, total_tokens=3358), temperature=1.0, top_p=1.0, tool_resources={})
Message Content: [TextContentBlock(text=Text(annotations=[], value='According to the data:\n\n- **500** people survived.\n- **809** people did not survive.\n\nIf you need any further analysis or details, feel free to ask!'), typ

### Result

If we inspect the result we can see that the GPT recognised the dataset, parsed the columns and then did a sum of the survival column.

In [18]:
for message in messages:
    content_blocks = message.content
    for content_block in content_blocks:
        text_content = content_block.text.value
        print(text_content)

According to the data:

- **500** people survived.
- **809** people did not survive.

If you need any further analysis or details, feel free to ask!
It looks like the dataset includes information about passengers on the Titanic, with columns such as `pclass`, `survived`, `name`, `sex`, `age`, among others. The key column for determining the number of survivors and non-survivors is `survived`, where `1` indicates the passenger survived and `0` indicates the passenger did not survive.

Let's calculate the number of survivors and non-survivors:
Based on this data, could you please tell me how many people survived/died on this section of the titanic?
