## 5 OpenAI Classification - Collection 2

In this notebook, I will use the `openai` package with its [API key](https://platform.openai.com/api-keys) to classify the scraped negative reviews (1 & 2-stars) into one or more categories.

Initially, I categorized the seven selected apps based on their primary functionality into two collections to predict: 

- **whether the reviews from Collection 2 fell into one or more following categories:** <br>l1_inaccurate_cycle_prediction, l2_unfair_functionality_charges, l3_user_data_privacy_concerns, l4_if_related_to_the_overturn

### **2) Collection 2**
- **c2_negative_reviews.csv** - This file includes all the negative reviews (1 & 2-stars) from 2 Period-and-Fertility-Tracking Apps.
- **🌟c2_pt_GPT_tagged.csv🌟** - This file contains the results of OpenAI classification, where each review from collection 2 has been tagged with one or more labels.


## Collection 1: Birth-Control-Oriented Apps (x5)

### Step 1: Set up the connection to OpenAI and Load the data

In [None]:
from dotenv import load_dotenv
import os

load_dotenv()  # take environment variables

API_KEY = os.getenv("API_KEY")

In [5]:
#!pip install openai
from openai import OpenAI
import math # use `math.exp` to get the probability 

client = OpenAI(api_key=API_KEY)

# my chatgpt API Key - For security reasons, I won't be able to view it again through your OpenAI account

In [6]:
import pandas as pd 

pt_df = pd.read_csv("c2_negative_reviews.csv")
pt_df

Unnamed: 0.1,Unnamed: 0,date,developerResponse,review,rating,isEdited,userName,title,app_name,app_id
0,7,2022-02-20 02:16:25,"{'id': 28190339, 'body': 'Hi kennaliz122,\n\nT...",I downloaded Flo when I was a sophomore in hig...,2,False,kennaliz122,Too much for too little,flo-period-pregnancy-tracker,1038369065
1,8,2023-10-03 17:17:00,"{'id': 39499342, 'body': 'Hi Kcnyee!\n\nThank ...",I’ve used Flo for years. I’ve had no problem w...,1,False,Kcnyee,Taking away free features??,flo-period-pregnancy-tracker,1038369065
2,10,2023-12-22 16:28:09,"{'id': 41056705, 'body': 'Hi there! Thank you ...",I’ve been using this app for years and years a...,1,False,I'mYourLight,No longer user friendly,flo-period-pregnancy-tracker,1038369065
3,11,2024-02-20 04:36:03,"{'id': 42439888, 'body': 'Hi there! We underst...",It’s already a struggle to get women’s reprodu...,2,False,Asialopez,Woman’s health needs to be improved,flo-period-pregnancy-tracker,1038369065
4,12,2022-06-19 07:23:45,"{'id': 30517987, 'body': ""Hi LithiumBarbie! Th...",I’ve been using the Flo app since 2017 to trac...,2,False,LithiumBarbie,Good For Keeping Track of Period Dates Only,flo-period-pregnancy-tracker,1038369065
...,...,...,...,...,...,...,...,...,...,...
7796,4779,2020-11-22 05:29:57,"{'id': 19325772, 'body': ""Hi there! We deeply ...",I’ve been using this app for a good 4 years no...,2,False,Lola910421,The Crash Saga Continues Pt 2,clue-period-tracker-calendar,657189652
7797,4780,2020-11-13 18:32:06,"{'id': 19391392, 'body': 'Hello! Please reach ...",I am very upset. I signed up for a one month s...,1,False,laureno6o2,:/,clue-period-tracker-calendar,657189652
7798,4782,2020-09-17 23:02:04,,I’ve used this app for years- since I started ...,2,False,So addictive!!!!,Used to be much better,clue-period-tracker-calendar,657189652
7799,4786,2020-08-09 16:20:37,,"Skipped a few months using the app, and now it...",1,False,love my privacy,Account now required to use,clue-period-tracker-calendar,657189652


### Step 2: Customize prompt and Choose GPT model (gpt-4o)

Create a function to read through all the reviews in CSV file

In [7]:
def categorize(row):
    prompt = """
        Below is the text to a review. Classify it as one or more of the following categories:

        - l1_inaccurate_cycle_prediction: This category suggests that the app's cycle prediction algorithm is inaccurate, sometimes leading to unplanned pregnancies.
        - l2_unfair_functionality_charges: This category suggests that many users express frustration over unreasonable fees for basic functions, excessive ads or aggressive premium upgrade promotions, new updates that degrade app performance, removal of essential features for predicting ovulation, and late or missing reminder notifications, etc.
        - l3_user_data_privacy_concerns: This category suggests that users are worried their collected period data could be used against them in the future, especially following the overturn of Roe v. Wade.
        - l4_if_related_to_the_overturn: This category suggests that these reviews directly talk about the concerns of their experiences with the app due to the 2022 overturn of Roe v. Wade, with key words such as "Roe v. Wade" or "overturn" explicitly appearing in the reviews.

        The review may be assigned to multiple categories. Please list all applicable categories based on the review content.

        Review text:

        {text}
        """
    
    results = client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {"role": "system", "content": "You are a review assistant. Be brief and consider multiple categories in your responses."},
        {"role": "user", "content": prompt.format(text=row['review'])}
      ],
      temperature=0,
      logprobs=True, #including probablities
    )

    return pd.Series({
        'content': results.choices[0].message.content,
        'probability': math.exp(results.choices[0].logprobs.content[0].logprob)
    })

In [8]:
print(pt_df.iloc[0])
print(categorize(pt_df.iloc[0]))

Unnamed: 0                                                           7
date                                               2022-02-20 02:16:25
developerResponse    {'id': 28190339, 'body': 'Hi kennaliz122,\n\nT...
review               I downloaded Flo when I was a sophomore in hig...
rating                                                               2
isEdited                                                         False
userName                                                   kennaliz122
title                                          Too much for too little
app_name                                  flo-period-pregnancy-tracker
app_id                                                      1038369065
Name: 0, dtype: object
content        - l2_unfair_functionality_charges
probability                             0.977951
dtype: object


### Step 3:  Make the prediction

In [9]:
# Add tqdm to pandas to get a nice progress bar to understand how long this is going to take
from tqdm.auto import tqdm
tqdm.pandas()

pt_df[['category', 'probability']] = pt_df.progress_apply(categorize, axis=1)
pt_df

  0%|          | 0/7801 [00:00<?, ?it/s]

Unnamed: 0.1,Unnamed: 0,date,developerResponse,review,rating,isEdited,userName,title,app_name,app_id,category,probability
0,7,2022-02-20 02:16:25,"{'id': 28190339, 'body': 'Hi kennaliz122,\n\nT...",I downloaded Flo when I was a sophomore in hig...,2,False,kennaliz122,Too much for too little,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.977951
1,8,2023-10-03 17:17:00,"{'id': 39499342, 'body': 'Hi Kcnyee!\n\nThank ...",I’ve used Flo for years. I’ve had no problem w...,1,False,Kcnyee,Taking away free features??,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.984431
2,10,2023-12-22 16:28:09,"{'id': 41056705, 'body': 'Hi there! Thank you ...",I’ve been using this app for years and years a...,1,False,I'mYourLight,No longer user friendly,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.998340
3,11,2024-02-20 04:36:03,"{'id': 42439888, 'body': 'Hi there! We underst...",It’s already a struggle to get women’s reprodu...,2,False,Asialopez,Woman’s health needs to be improved,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.996349
4,12,2022-06-19 07:23:45,"{'id': 30517987, 'body': ""Hi LithiumBarbie! Th...",I’ve been using the Flo app since 2017 to trac...,2,False,LithiumBarbie,Good For Keeping Track of Period Dates Only,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.966814
...,...,...,...,...,...,...,...,...,...,...,...,...
7796,4779,2020-11-22 05:29:57,"{'id': 19325772, 'body': ""Hi there! We deeply ...",I’ve been using this app for a good 4 years no...,2,False,Lola910421,The Crash Saga Continues Pt 2,clue-period-tracker-calendar,657189652,The review text falls under the following cate...,0.552608
7797,4780,2020-11-13 18:32:06,"{'id': 19391392, 'body': 'Hello! Please reach ...",I am very upset. I signed up for a one month s...,1,False,laureno6o2,:/,clue-period-tracker-calendar,657189652,- l2_unfair_functionality_charges,0.996506
7798,4782,2020-09-17 23:02:04,,I’ve used this app for years- since I started ...,2,False,So addictive!!!!,Used to be much better,clue-period-tracker-calendar,657189652,- l2_unfair_functionality_charges,0.994832
7799,4786,2020-08-09 16:20:37,,"Skipped a few months using the app, and now it...",1,False,love my privacy,Account now required to use,clue-period-tracker-calendar,657189652,- l3_user_data_privacy_concerns,0.992691


In [10]:
pt_df.to_csv("c2_negative_reviews_categorized.csv", index=False)

### Step 4:  Create separate columns for categories to store the prediction results

In [11]:
pt_df = pd.read_csv("c2_negative_reviews_categorized.csv")

categories = [
    'l1_inaccurate_cycle_prediction',
    'l2_unfair_functionality_charges', 
    'l3_user_data_privacy_concerns',
    'l4_if_related_to_the_overturn',
]

# Create columns for each category initialized to 0
for category in categories:
    pt_df[category] = 0

# Function to update the category columns based on the categorized column
def update_category_columns(row):
    if pd.notna(row['category']):  # Check if the categorized cell is not NaN
        for category in categories:
            if category in row['category']:
                row[category] = 1
    return row

# Apply the function to each row
new_pt_df = pt_df.apply(update_category_columns, axis=1)
new_pt_df

Unnamed: 0.1,Unnamed: 0,date,developerResponse,review,rating,isEdited,userName,title,app_name,app_id,category,probability,l1_inaccurate_cycle_prediction,l2_unfair_functionality_charges,l3_user_data_privacy_concerns,l4_if_related_to_the_overturn
0,7,2022-02-20 02:16:25,"{'id': 28190339, 'body': 'Hi kennaliz122,\n\nT...",I downloaded Flo when I was a sophomore in hig...,2,False,kennaliz122,Too much for too little,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.977951,0,1,0,0
1,8,2023-10-03 17:17:00,"{'id': 39499342, 'body': 'Hi Kcnyee!\n\nThank ...",I’ve used Flo for years. I’ve had no problem w...,1,False,Kcnyee,Taking away free features??,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.984431,0,1,0,0
2,10,2023-12-22 16:28:09,"{'id': 41056705, 'body': 'Hi there! Thank you ...",I’ve been using this app for years and years a...,1,False,I'mYourLight,No longer user friendly,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.998340,0,1,0,0
3,11,2024-02-20 04:36:03,"{'id': 42439888, 'body': 'Hi there! We underst...",It’s already a struggle to get women’s reprodu...,2,False,Asialopez,Woman’s health needs to be improved,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.996349,0,1,0,0
4,12,2022-06-19 07:23:45,"{'id': 30517987, 'body': ""Hi LithiumBarbie! Th...",I’ve been using the Flo app since 2017 to trac...,2,False,LithiumBarbie,Good For Keeping Track of Period Dates Only,flo-period-pregnancy-tracker,1038369065,- l2_unfair_functionality_charges,0.966814,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7796,4779,2020-11-22 05:29:57,"{'id': 19325772, 'body': ""Hi there! We deeply ...",I’ve been using this app for a good 4 years no...,2,False,Lola910421,The Crash Saga Continues Pt 2,clue-period-tracker-calendar,657189652,The review text falls under the following cate...,0.552608,0,1,0,0
7797,4780,2020-11-13 18:32:06,"{'id': 19391392, 'body': 'Hello! Please reach ...",I am very upset. I signed up for a one month s...,1,False,laureno6o2,:/,clue-period-tracker-calendar,657189652,- l2_unfair_functionality_charges,0.996506,0,1,0,0
7798,4782,2020-09-17 23:02:04,,I’ve used this app for years- since I started ...,2,False,So addictive!!!!,Used to be much better,clue-period-tracker-calendar,657189652,- l2_unfair_functionality_charges,0.994832,0,1,0,0
7799,4786,2020-08-09 16:20:37,,"Skipped a few months using the app, and now it...",1,False,love my privacy,Account now required to use,clue-period-tracker-calendar,657189652,- l3_user_data_privacy_concerns,0.992691,0,0,1,0


In [12]:
# Save the updated DataFrame
new_pt_df.to_csv("c2_pt_GPT_tagged.csv", index=False)

# 7801 reviews -> $25.84 开始 - $10.39结束