## 5 OpenAI Classification 

In this notebook, I will use the `openai` package with its [API key](https://platform.openai.com/api-keys) to classify the scraped negative reviews (1 & 2-stars) into one or more categories.

Initially, I categorized the seven selected apps based on their primary functionality into two collections to predict: 

- **whether the reviews from Collection 1 fell into one or more following categories:** <br>l1_inaccurate_cycle_prediction, l2_delayed_customer_service, l3_poor_prescription_management, l4_problematic_billing_practices

### **1) Collection 1**
- **c1_negative_reviews.csv** - This file includes all the negative reviews (1 & 2-stars) from 5 Birth Control Oriented Apps.
- **🌟c1_bc_GPT_tagged.csv🌟** - This file contains the results of OpenAI classification, where each review from collection 1 has been tagged with one or more labels.


## Collection 1: Birth-Control-Oriented Apps (x5)

### Step 1: Set up the connection to OpenAI and Load the data

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()  # take environment variables

API_KEY = os.getenv("API_KEY")

In [2]:
#!pip install openai
from openai import OpenAI
import math # use `math.exp` to get the probability 

client = OpenAI(api_key=API_KEY)

# my chatgpt API Key - For security reasons, I won't be able to view it again through your OpenAI account

In [2]:
import pandas as pd 

bc_df = pd.read_csv("c1_negative_reviews.csv")
bc_df

Unnamed: 0.1,Unnamed: 0,date,developerResponse,review,rating,isEdited,userName,title,app_name,app_id
0,2,2022-06-28 22:13:51,"{'id': 30650880, 'body': 'We are sorry to hear...",This whole business is a scam. It’s an exhaust...,1,False,Emilie Bo-Bemely.,What a Nightmare,nurx-birth-control-delivered,1213141301
1,3,2023-11-30 22:46:13,,I was happy with my first order of 1 pack of b...,1,False,moldyspice,It’s a scam,nurx-birth-control-delivered,1213141301
2,6,2022-06-29 17:13:53,"{'id': 30729739, 'body': ""Hello, we're disappo...",When I first signed up it took 3 weeks to get ...,1,False,A_huertas,The WORST!,nurx-birth-control-delivered,1213141301
3,8,2022-06-10 15:48:24,"{'id': 30331149, 'body': ""Hello, we're disappo...","UPDATE: got a reply, just to say they don’t ha...",1,False,surlycurlycue,"Poor Communication, Nice feel & design to app.",nurx-birth-control-delivered,1213141301
4,9,2020-08-20 04:55:14,"{'id': 17427196, 'body': ""Hi Alex, we are sorr...",I am using this app for about four years now. ...,2,False,ally.higgs,Was better in the beginning...,nurx-birth-control-delivered,1213141301
...,...,...,...,...,...,...,...,...,...,...
1019,284,2022-04-25 04:02:51,,The doctor on the app prescribed the worse med...,1,False,trumpian17,Horrible experience and lazy prescription,planned-parenthood-direct,1214393415
1020,285,2021-09-17 16:31:14,"{'id': 25282792, 'body': ""Oh no! We're very so...",Charged me without properly sending my prescri...,1,False,jostilts,Terrible,planned-parenthood-direct,1214393415
1021,288,2021-04-11 22:43:44,"{'id': 22223566, 'body': 'At this time our app...",This is just not affordable at all. I’m a coll...,1,False,confused!?!?!?!?!?,$60 for 3 Packs?,planned-parenthood-direct,1214393415
1022,290,2020-08-08 16:08:31,"{'id': 17494334, 'body': ""I'm sorry to hear th...",I have to log in with my username and password...,1,False,<SpaceCat4857>,App Doesn’t Work,planned-parenthood-direct,1214393415


### Step 2: Customize prompt and Choose GPT model (gpt-4o)

Create a function to read through all the reviews in CSV file

In [3]:
def categorize(row):
    prompt = """
        Below is the text to a review. Classify it as one or more of the following categories:

        - l1_inaccurate_cycle_prediction: This category suggests that the app's cycle prediction algorithm is inaccurate, sometimes leading to unplanned pregnancies.
        - l2_delayed_customer_service: This category suggests that difficulty in contacting customer service and long wait times, which oftentimes result in late or inaccurate deliveries of prescriptions and medications.
        - l3_poor_prescription_management: This category suggests users experience issues such as missing or incorrect prescriptions, incorrect birth control medications, inaccurate refill frequencies, late deliveries, and canceled medications.
        - l4_problematic_billing_practices: This category suggests that users encounter unexpected charges including but not limited to auto-renewals without notification, and charges on old credit cards without refunds, or they fail to use the current insurance plan for insurance billing.

        The review may be assigned to multiple categories. Please list all applicable categories based on the review content.

        Review text:

        {text}
        """
    
    results = client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {"role": "system", "content": "You are a review assistant. Be brief and consider multiple categories in your responses."},
        {"role": "user", "content": prompt.format(text=row['review'])}
      ],
      temperature=0,
      logprobs=True, #including probablities
    )

    return pd.Series({
        'content': results.choices[0].message.content,
        'probability': math.exp(results.choices[0].logprobs.content[0].logprob)
    })

In [4]:
print(bc_df.iloc[0])
print(categorize(bc_df.iloc[0]))

Unnamed: 0                                                           2
date                                               2022-06-28 22:13:51
developerResponse    {'id': 30650880, 'body': 'We are sorry to hear...
review               This whole business is a scam. It’s an exhaust...
rating                                                               1
isEdited                                                         False
userName                                             Emilie Bo-Bemely.
title                                                 What a Nightmare
app_name                                  nurx-birth-control-delivered
app_id                                                      1213141301
Name: 0, dtype: object
content        - l2_delayed_customer_service\n- l3_poor_presc...
probability                                             0.960243
dtype: object


### Step 3:  Make the prediction

In [5]:
# Add tqdm to pandas to get a nice progress bar to understand how long this is going to take
from tqdm.auto import tqdm
tqdm.pandas()

bc_df[['category', 'probability']] = bc_df.progress_apply(categorize, axis=1)
bc_df

  0%|          | 0/1024 [00:00<?, ?it/s]

Unnamed: 0.1,Unnamed: 0,date,developerResponse,review,rating,isEdited,userName,title,app_name,app_id,category,probability
0,2,2022-06-28 22:13:51,"{'id': 30650880, 'body': 'We are sorry to hear...",This whole business is a scam. It’s an exhaust...,1,False,Emilie Bo-Bemely.,What a Nightmare,nurx-birth-control-delivered,1213141301,- l2_delayed_customer_service\n- l3_poor_presc...,0.762608
1,3,2023-11-30 22:46:13,,I was happy with my first order of 1 pack of b...,1,False,moldyspice,It’s a scam,nurx-birth-control-delivered,1213141301,- l3_poor_prescription_management\n- l4_proble...,0.918996
2,6,2022-06-29 17:13:53,"{'id': 30729739, 'body': ""Hello, we're disappo...",When I first signed up it took 3 weeks to get ...,1,False,A_huertas,The WORST!,nurx-birth-control-delivered,1213141301,- l2_delayed_customer_service\n- l3_poor_presc...,0.980574
3,8,2022-06-10 15:48:24,"{'id': 30331149, 'body': ""Hello, we're disappo...","UPDATE: got a reply, just to say they don’t ha...",1,False,surlycurlycue,"Poor Communication, Nice feel & design to app.",nurx-birth-control-delivered,1213141301,- l2_delayed_customer_service: The review ment...,0.948530
4,9,2020-08-20 04:55:14,"{'id': 17427196, 'body': ""Hi Alex, we are sorr...",I am using this app for about four years now. ...,2,False,ally.higgs,Was better in the beginning...,nurx-birth-control-delivered,1213141301,- l2_delayed_customer_service\n- l3_poor_presc...,0.828844
...,...,...,...,...,...,...,...,...,...,...,...,...
1019,284,2022-04-25 04:02:51,,The doctor on the app prescribed the worse med...,1,False,trumpian17,Horrible experience and lazy prescription,planned-parenthood-direct,1214393415,- l2_delayed_customer_service: The review ment...,0.966600
1020,285,2021-09-17 16:31:14,"{'id': 25282792, 'body': ""Oh no! We're very so...",Charged me without properly sending my prescri...,1,False,jostilts,Terrible,planned-parenthood-direct,1214393415,- l2_delayed_customer_service\n- l3_poor_presc...,0.998372
1021,288,2021-04-11 22:43:44,"{'id': 22223566, 'body': 'At this time our app...",This is just not affordable at all. I’m a coll...,1,False,confused!?!?!?!?!?,$60 for 3 Packs?,planned-parenthood-direct,1214393415,The review text can be classified under the fo...,0.661997
1022,290,2020-08-08 16:08:31,"{'id': 17494334, 'body': ""I'm sorry to hear th...",I have to log in with my username and password...,1,False,<SpaceCat4857>,App Doesn’t Work,planned-parenthood-direct,1214393415,The review does not fall into any of the provi...,0.616664


In [6]:
#bc_df.to_csv("c1_negative_reviews_categorized.csv", index=False)

### Step 4:  Create separate columns for categories to store the prediction results

In [7]:
bc_df = pd.read_csv("c1_negative_reviews_categorized.csv")

categories = [
    'l1_inaccurate_cycle_prediction',
    'l2_delayed_customer_service',
    'l3_poor_prescription_management',
    'l4_problematic_billing_practices'
]

# Create columns for each category initialized to 0
for category in categories:
    bc_df[category] = 0

# Function to update the category columns based on the categorized column
def update_category_columns(row):
    if pd.notna(row['category']):  # Check if the categorized cell is not NaN
        for category in categories:
            if category in row['category']:
                row[category] = 1
    return row

# Apply the function to each row
new_bc_df = bc_df.apply(update_category_columns, axis=1)
new_bc_df

Unnamed: 0.1,Unnamed: 0,date,developerResponse,review,rating,isEdited,userName,title,app_name,app_id,category,probability,l1_inaccurate_cycle_prediction,l2_delayed_customer_service,l3_poor_prescription_management,l4_problematic_billing_practices
0,2,2022-06-28 22:13:51,"{'id': 30650880, 'body': 'We are sorry to hear...",This whole business is a scam. It’s an exhaust...,1,False,Emilie Bo-Bemely.,What a Nightmare,nurx-birth-control-delivered,1213141301,- l2_delayed_customer_service\n- l3_poor_presc...,0.762608,0,1,1,1
1,3,2023-11-30 22:46:13,,I was happy with my first order of 1 pack of b...,1,False,moldyspice,It’s a scam,nurx-birth-control-delivered,1213141301,- l3_poor_prescription_management\n- l4_proble...,0.918996,0,0,1,1
2,6,2022-06-29 17:13:53,"{'id': 30729739, 'body': ""Hello, we're disappo...",When I first signed up it took 3 weeks to get ...,1,False,A_huertas,The WORST!,nurx-birth-control-delivered,1213141301,- l2_delayed_customer_service\n- l3_poor_presc...,0.980574,0,1,1,0
3,8,2022-06-10 15:48:24,"{'id': 30331149, 'body': ""Hello, we're disappo...","UPDATE: got a reply, just to say they don’t ha...",1,False,surlycurlycue,"Poor Communication, Nice feel & design to app.",nurx-birth-control-delivered,1213141301,- l2_delayed_customer_service: The review ment...,0.948530,0,1,1,0
4,9,2020-08-20 04:55:14,"{'id': 17427196, 'body': ""Hi Alex, we are sorr...",I am using this app for about four years now. ...,2,False,ally.higgs,Was better in the beginning...,nurx-birth-control-delivered,1213141301,- l2_delayed_customer_service\n- l3_poor_presc...,0.828844,0,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1019,284,2022-04-25 04:02:51,,The doctor on the app prescribed the worse med...,1,False,trumpian17,Horrible experience and lazy prescription,planned-parenthood-direct,1214393415,- l2_delayed_customer_service: The review ment...,0.966600,0,1,1,1
1020,285,2021-09-17 16:31:14,"{'id': 25282792, 'body': ""Oh no! We're very so...",Charged me without properly sending my prescri...,1,False,jostilts,Terrible,planned-parenthood-direct,1214393415,- l2_delayed_customer_service\n- l3_poor_presc...,0.998372,0,1,1,1
1021,288,2021-04-11 22:43:44,"{'id': 22223566, 'body': 'At this time our app...",This is just not affordable at all. I’m a coll...,1,False,confused!?!?!?!?!?,$60 for 3 Packs?,planned-parenthood-direct,1214393415,The review text can be classified under the fo...,0.661997,0,0,0,1
1022,290,2020-08-08 16:08:31,"{'id': 17494334, 'body': ""I'm sorry to hear th...",I have to log in with my username and password...,1,False,<SpaceCat4857>,App Doesn’t Work,planned-parenthood-direct,1214393415,The review does not fall into any of the provi...,0.616664,0,0,0,0


In [9]:
# Save the updated DataFrame
#new_bc_df.to_csv("c1_bc_GPT_tagged.csv", index=False)

# 1024 reviews -> 7.85 - 5.85 = 2??