# Classification for Advice Types

## Import data and understand costs for classification using Open AI

In [70]:
import pandas as pd
from transformers import pipeline
import numpy as np
import csv
import os


In [45]:
df = pd.read_csv("data/20200325_counsel_chat.csv")

In [25]:
answers = df["answerText"].dropna().unique().tolist()
super_string = ' '.join(answers)
num_tokens = len(super_string) / 4
input_price = 1.100 * num_tokens / 1000000
output_price = 4.4 * len(answers) * 20 / 1000000
total_price = input_price + output_price
print(input_price, output_price)
print(f'Minimum cost of classification using openAI o4-mini ${total_price}')

0.542162775 0.176704
Minimum cost of classification using openAI o4-mini $0.718866775


## Zero-shot Classification Approach

In [60]:
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
labels = [" Practical Life Advice", "Empathetic response", "Resource suggestion"]

Device set to use mps:0


In [61]:
direct_advice = classifier("I recommend establishing a consistent sleep schedule and limiting screen time an hour before bed. Try using a sleep diary to track your patterns and notice what improves your rest.", candidate_labels=labels, multi_label=True)
print(direct_advice["labels"], direct_advice["scores"])

[' Practical Life Advice', 'Resource suggestion', 'Empathetic response'] [0.8517794013023376, 0.7835465669631958, 0.5683595538139343]


In [62]:
empathetic_response = classifier("It sounds like you’ve been carrying a lot on your own lately. I want you to know that it’s completely valid to feel overwhelmed, and I’m here to support you through this.", candidate_labels=labels, multi_label=True)
print(empathetic_response["labels"], empathetic_response["scores"])

['Empathetic response', 'Resource suggestion', ' Practical Life Advice'] [0.9629988074302673, 0.6677781939506531, 0.46740734577178955]


In [63]:
resource_suggestion = classifier("You might find it helpful to try the Headspace app for guided meditations, or explore 'The Feeling Good Handbook' by Dr. David Burns — it’s a great resource for managing negative thoughts.", candidate_labels=labels, multi_label=True)
print(resource_suggestion["labels"], resource_suggestion["scores"])

['Resource suggestion', 'Empathetic response', ' Practical Life Advice'] [0.962764322757721, 0.5002696514129639, 0.014397886581718922]


In [64]:
diagnosis = classifier("Given the patient’s reports of persistent low mood, lack of interest in activities, and trouble sleeping for more than two weeks, these symptoms are consistent with major depressive disorder. You may want to further assess using the PHQ-9", candidate_labels=labels, multi_label=True)
print(diagnosis["labels"], diagnosis["scores"])

['Resource suggestion', 'Empathetic response', ' Practical Life Advice'] [0.8775472640991211, 0.16735967993736267, 0.08701489120721817]


In [65]:
treatment = classifier("That is intense. Depression is a liar. Sometimes depression places these glasses over our eyes, these dark sunglasses that change how we see things.\xa0Depression tells us things like you\'re worthless no one likes you don\'t worry about doing anything. And it is so easy for us to listen and to be tricked into thinking that just because we feel something means it is true. Please know that even if you are feeling worthless right now, that doesn\'t mean you are worthless.The first step to working through this is recognizing what is going on. Recognizing when depression is telling you the same story (ie; being worthless) with different words (ie worthless here, worthless there) and making an effort to talk back.\xa0While I can not give you a diagnosis of depression, reading what you are going through, it sounds like you might need help to get back on track. Seeing a counselor can open an entirely new option up wherein someone who is not involved in your life can help you without judgement and with an objective perspective. This can do wonders in unwrapping these kinds of thoughts. Wishing you the absolute best!", candidate_labels=labels, multi_label=True)
print(treatment["labels"], treatment["scores"])

['Resource suggestion', 'Empathetic response', ' Practical Life Advice'] [0.8477880358695984, 0.8398752808570862, 0.5635181665420532]


Zero shot classification approach above is free. However, it may not be the most accurate. From the above 5 examples, it seems like its good at classifying `Direct advice`, `Resource suggestion`, `Empathetic responses`. However, it is inaccurate when it comes to `Diagnosis` or `Treatment`. OpenAI will most likely be alot more accurate but it is not free. Due to the time constraint, and low cost of classification using OpenAI, it seems like the most reasonable move. There is a chance that OPEN AI might hallucinate or give undesirable results. But overall it is the least time-consuming option and the trade-off is worth it. Additionally, the labels could be useful in training a supervised machine learning algorithm in the future and using that to classify instead of making API calls to OpenAI

## Low-shot classification

In [66]:
import time
from openai import OpenAI

client = OpenAI(api_key="")

answers = df["answerText"].dropna().tolist()

classified = []

system_prompt = """You are a classifier that assigns therapist answers into one or more of the following categories:
   Practical Life Advice: Actionable suggestions to improve sleep, routine, self-care, gratitude, or navigating social environments.
    Emotional Support and Validation: Suggest ways the therapist can validate the patient’s feelings and build rapport.,
    Resource Suggestion: Recommend tools, exercises, or materials the therapist can offer the patient..,
    Psychoeducation: Diagnose whether the patient's symptoms align with a mental health condition, based on their description.


Return only the most relevant categories as a Comma separated list. Do not include anything else"""



In [72]:
# File to save results
output_file = "data/classified_results.csv"

# File to save results
output_file = "data/classified_results.csv"

# Create output file and write header if it doesn't exist
if not os.path.exists(output_file):
    with open(output_file, mode="w", newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow(["index", "answerText", "adviceType"])

# Iterate over the dataframe
for i, row in df.iterrows():
    answer = row['answerText'].strip()
    label = "error"  # Default to "error" in case of failure

    # Try 3 times before moving to the next row
    for attempt in range(3):
        try:
            response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": answer}
                ],
                temperature=0,
            )
            label = response.choices[0].message.content.strip()
            break  # Exit the retry loop if successful
        except Exception as e:
            print(f"Error on row {i}, attempt {attempt + 1}: {e}")
            time.sleep(5)

    # Write the result to CSV immediately after processing the row
    with open(output_file, mode="a", newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow([i, answer, label])