## QUERYING DATA


In [None]:
!pip install openai



In [None]:
import openai
from openai import OpenAI


In [None]:
openai.api_key = "API_KEY"
client = OpenAI(api_key = "API_KEY")

In [None]:
file = client.files.create(
  file=open("clinical_trials.csv", "rb"),
  purpose='assistants'
)

In [None]:
assistant = client.beta.assistants.create(
      name="Data visualizer2",
  instructions="You are great at creating beautiful data visualizations. You analyze data present in .csv files, understand trends, and come up with data visualizations relevant to those trends. You also share a brief text summary of the trends observed.",
  model="gpt-4-1106-preview",
  tools=[{"type": "code_interpreter"}],
  file_ids=[file.id]
)

In [None]:
thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": "Create 3 data insights based on the trends in this file.",
      "file_ids": [file.id]
    }
  ]
)

In [None]:
run = client.beta.threads.runs.create(
    thread_id = thread.id,
    assistant_id = assistant.id,
    instructions = "Please address the user as Sir. The user has a premium account."
)

In [None]:
import os
import time
import logging
from datetime import datetime
import openai

In [None]:
def wait_for_run_completion(client, thread_id, run_id, sleep_interval=5):
    """
    Waits for a run to complete and prints the elapsed time.:param client: The OpenAI client object.
    :param thread_id: The ID of the thread.
    :param run_id: The ID of the run.
    :param sleep_interval: Time in seconds to wait between checks.
    """
    while True:
        try:
            run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run_id)
            if run.completed_at:
                elapsed_time = run.completed_at - run.created_at
                formatted_elapsed_time = time.strftime("%H:%M:%S", time.gmtime(elapsed_time))
                logging.info(f"Run completed in {formatted_elapsed_time}")
                break
        except Exception as e:
            logging.error(f"An error occurred while retrieving the run: {e}")
            break
        logging.info("Waiting for run to complete...")
        time.sleep(sleep_interval)

In [None]:
wait_for_run_completion(client, thread.id, run.id)


In [None]:
messages = client.beta.threads.messages.list(
    thread_id=thread.id
    )

## Let's "DECODE" these messages!

### YOU TRY

How can we decode the messages presented here to be more clear on what's happening?

Let's start off by looping through messages.data, and access each "message" to begin experimenting!



In [192]:
### YOUR SOLUTION



Ok looks like we have a couple of properties here:

1) Thread ID

2) Assistant ID

3) Content

4) Content types (images or text)

One of the strange things about Open AI Assistants (in current configuration) is that the LAST message in the index is the first message in the responses, so we'll have to work backwards!

In [None]:
### YOUR CODE HERE

In [None]:
message = "Let's proceed with more of those visualizations, with an emphasis on market research."

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content=message
)
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)
wait_for_run_completion(client, thread.id, run.id)

In [None]:
for msg in messages.data[::-1]:
    try:
        print(msg.content[0].text.value)
    except:
        print("Could not load text")

Create 3 data insights based on the trends in this file.
In order to provide you with data insights, I first need to examine the contents of the file you uploaded. I will begin by checking the file type and previewing its content. Then, I will conduct an analysis to identify trends and generate insights based on the data.

Let's start by determining the file type and previewing the data.
It seems that the uploaded file doesn't have an extension. Without knowing the file type, I will attempt to open it as a text file to preview its content, which could reveal more about the nature of the data. If it appears to be structured as a spreadsheet or CSV, I will then load it accordingly for analysis. Let's take a look at the first few lines of the file.
It appears that the operation has timed out, which could mean that the file is quite large or not in a plain text format. To proceed, I will try to read the file using different methods that are suitable for binary or specific data formats such

# Handling IMAGES
At this point, we probably want to have some more robust code to be able to process images!

In [None]:
def get_img(image_file_id,name):
    #image_file_id = "file-sBzB6EDaFbestf2eShKZSme2"
    image_file = openai.files.content(image_file_id)
    with open(f"{name}.png", "wb") as f:
        f.write(image_file.content)

In [None]:
def get_content(messages):
    for i, msg in enumerate(messages.data[::-1]):
        if msg.content[0].type == "image_file":
            img_id = msg.content[0].image_file.file_id
            img_name = f"img_0{i}"
            get_img(img_id,img_name)
            print(f"Saved an Image called: {img_name}")

        else:
            print(msg.content[0].text.value)



In [None]:
def get_new_message(prompt,thread,assistant):
    message = prompt

    message = client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=message
    )
    run = client.beta.threads.runs.create(
        thread_id=thread.id,
        assistant_id=assistant.id,
    )
    wait_for_run_completion(client, thread.id, run.id)


In [None]:
next_prompt = """Great, now please Suggest and create additional visualizations
that will help us understand the competitive pharmaceutical landscape
in clinical trials across various diseases and their sponsors.
Make insightful and creative visualizations, please!
"""
get_new_message(next_prompt,thread,assistant)

In [None]:
messages = client.beta.threads.messages.list(
    thread_id=thread.id
    )
get_content(messages)

Saved an Image called: img_00
Saved an Image called: img_01
Saved an Image called: img_02
Saved an Image called: img_03
To gain insights into the competitive pharmaceutical landscape in clinical trials, we should look at different aspects of the data. Here are a few additional visualizations that could provide valuable insights:

1. **Sponsor Activity by Phase**: Visualize the number of trials by phase for the top sponsors to understand where they focus their research efforts.
   
2. **Sponsor vs. Conditions Heatmap**: Construct a heatmap showing the number of trials each top sponsor has conducted for each top condition. This can illustrate the specialization of different sponsors.

3. **Time Series of Trial Start Dates**: A time series visualization showing the cumulative number of trials started over time, broken down by sponsor. This can show market activity trends and the rise and fall of different sponsors in the market.

4. **Sponsor Clinical Trial Outcomes**: Show the outcomes (

## Reflect

#### Where can you take this data analysis process from here?

#### How can we incorporate this in a more streamlined way to help support plug and play applications?

#### What are some of the exciting use cases with data you are thinking about?

## YOU TRY
Let's create wrapper functions for the following:
1) Creating files
2) Creating an Assistant
3) Creating and Retrieving Threads
4) Generating new Content

In [None]:
# def create_file(filepath):
# ..
#     return file
# annual_report_file = create_file("2022 Amgen Shareholder Letter and 10K.pdf")


In [None]:
instructions = ""
INSTRUCTIONS = f"""
You are a business leader and account executive who is able to quickly
- read reports and boring financial numbers in order to generate insights
- Understand how to digest a huge amount of information into small bite-sized quips (like quotes
- Excellent in your memory ability to recall direct facts from a file
Help me understand how I can share some of the key highlights in our annual reports
in a short, executive summary powerpoint.
"""
# def get_assistant(file, instructions = INSTRUCTIONS):

#     return assistant
# assistant1 = get_assistant(annual_report_file, INSTRUCTIONS)



In [None]:
# def get_thread(file,user_prompt):

#     return thread

prompt1 = """
You will help me prepare a keynote speech and power point presentation sharing the results of our (attached) most recent annual report!
"""
#thread1 = get_thread(annual_report_file,prompt1)


In [None]:
print(assistant1.id)
print(thread1.id)

asst_uA0rIv78iNZLs9trUe5SAM0H
thread_xUvQ9yHnWHQHHK1eDUULpaHC


In [None]:
# def get_run(thread,assistant):

#     return run
#run1 = get_run(thread1,assistant1)

In [None]:
## GET CONTENT


You will help me prepare a keynote speech and power point presentation sharing the results of our (attached) most recent annual report!

Of course, Sir. I would be glad to help you prepare your keynote speech and a PowerPoint presentation. However, before we get started, I will need to examine the contents of the annual report you've provided to understand what information it includes.

Let's start by identifying the type of file you uploaded and then I'll take a look inside to summarize the main points that can be included in your speech and presentation. Please give me a moment to check the file.
The file you have uploaded is approximately 12.7 MB in size, which suggests it could be a document with substantial content. To proceed further, I'll attempt to detect the file format and peek into the contents to extract relevant information for your keynote speech and PowerPoint presentation. Let me do that for you, Sir.
It seems there was an issue with identifying the file type using the

In [None]:
next_prompt = """Great, now please do a more in-depth analysis and help me
draft an keynote speech that will cover the punchy headlines of the annual report!
"""
# GENERATE A NEW MESSAGE BELOW - make sure you use the right thread!

In [193]:
# Retrieve content from the messages

## You've Done it!

There you go, you have now built a comprehensive AI ASSISTANT that:
- Uses OpenAI API calls (with the latest updates) to query DATA

- Understands and reads natural language data, to provide competitive analysis results

- Helps us even create graphs and visualizations for our data!