# Senior Project - GPT 3.5 Personalization

This notebook explores applications of OpenAI's GPT (generative-pre-trained) NLP model gpt-3.5-turbo in providing subjective, personalized, and context-aware responses to user input. The model is not fine-tuned and is simply the available model trained by OpenAI.

---
### Imports, Setup, Helper Functions

In [45]:
from openai import OpenAI
from dotenv import load_dotenv
import os
import json
import pickle
import sys

load_dotenv(".env")
client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
) 

save_folder = "./data/"

In [46]:
## Save Data (Save Money)
def save_data(data_structure, file_name):
    if not file_name.endswith('.pkl'):
        file_name += '.pkl'
    
    with open(save_folder + file_name, 'wb') as file:
        pickle.dump(data_structure, file)

In [47]:
## Load Data (Save Money)
def load_data(file_name):
    if not file_name.endswith('.pkl'):
        file_name += '.pkl'
    
    with open(save_folder + file_name, 'rb') as file:
        return pickle.load(file)

#### Cost Helper Functions

In [48]:
def calculate_usage_cost(model, c_tokens, p_tokens):
    cost = 0
    if model == "gpt-3.5-turbo":
        cost_per1k_prompt_token = 0.001 # 1/10th of a cent
        cost_per1k_completion_token = 0.002 # 1/5th of a cent
    elif model == "gpt-4-turbo":
        cost_per1k_prompt_token = 0.01 # 1 cent
        cost_per1k_completion_token = 0.03 # 3 cents

    cost += cost_per1k_prompt_token * p_tokens / 1000
    cost += cost_per1k_completion_token * c_tokens / 1000
    return cost

def print_cost(cost):
    print(f"Total Cost: ${cost:.8f}")
    if cost > 0:
        print(f"Number of Runs to Reach $1: {1/cost:.0f}")
    print("*"*50)

#### GPT-3.5-Turbo Test Setup

In [49]:
# THIS IS A TEST - 
chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test",
        }
    ],
    model="gpt-3.5-turbo",
)

print(chat_completion.choices[0].message.content)
print_cost(
    calculate_usage_cost(
        "gpt-3.5-turbo",
        chat_completion.usage.completion_tokens,
        chat_completion.usage.prompt_tokens
    )
)

This is a test.
Total Cost: $0.00002200
Number of Runs to Reach $1: 45455
**************************************************


#### Pretty Print Helper Function

For printing the dictionary of questions + labels + uses + explanations that are generated by gpt.

In [50]:
# HELPER FUNCTIONS

def pretty_print(data):
    print(json.dumps(data, indent=4, sort_keys=False))


#### Init Questions List

In [51]:
import json

question_data_path = './Questions_Text.json'

# Function to load the JSON data into a Python data structure
def load_questions(json_path):
    with open(json_path, 'r', encoding='utf-8') as file:
        questions = json.load(file)
    return questions

# # Function to iterate through each question and retrieve the necessary information
# def process_questions(questions):
#     for question in questions:
#         question_text = question["text"]
#         answer_type = question["answer_type"]
#         # Here you can add the logic to pass the question_text and answer_type into a GPT prompt
#         # For example, you might call your GPT function here
#         # gpt_response = gpt_prompt(question_text, answer_type)
#         # For now, we'll just print them
#         print(f"Question Text: {question_text}\nAnswer Type: {answer_type}\n")

# Load the questions from the JSON file
questions = load_questions(question_data_path)

# # Process each question
# process_questions(questions)

# Note: In your actual implementation, you'd replace the print statement
# with the logic to call your GPT prompt function.


#### Functions for Adding, Removing, and Clearing Labeled Questions

These represent API CRUD operations in a sense. Illustrating how this demo could easily be turned into a microservice that the main Trees app could call.

In [38]:
labeled_questions = []

def add_labeled_question(text, a_type, label, label_explain):
    # Only add if text is not already in labeled_questions
    for question in labeled_questions:
        if question["text"] == text:
            print(f"Question already labeled: {text}")
            return
    labeled_questions.append({
        "text": text,
        "answer_type": a_type,
        "label": label,
        "label_explain": label_explain,
        "primary_use": "",
        "secondary_use": "",
        "tertiary_use": "",
        "primary_explain": "",
        "secondary_explain": "",
        "tertiary_explain": "",
    })
    print(f"Added labeled question: {text}")
    print(f"Length of labeled questions: {len(labeled_questions)}")


def clear_labeled_questions():
    labeled_questions.clear()
    print("Cleared labeled questions")

def remove_labeled_question(text):
    for question in labeled_questions:
        if question["text"] == text:
            labeled_questions.remove(question)
            print(f"Removed labeled question: {text}")
            print(f"Length of labeled questions: {len(labeled_questions)}")
            return
    print(f"Could not find labeled question: {text}")

def add_uses_to_labeled_question(text, primary, secondary, tertiary):
    for question in labeled_questions:
        if question["text"] == text:
            question["primary_use"] = primary
            question["secondary_use"] = secondary
            question["tertiary_use"] = tertiary
            print(f"Added uses to labeled question: {text}")
            return
    print(f"Could not find labeled question: {text}")

def add_explain_to_labeled_question(text, primary, secondary, tertiary):
    for question in labeled_questions:
        if question["text"] == text:
            question["primary_explain"] = primary
            question["secondary_explain"] = secondary
            question["tertiary_explain"] = tertiary
            print(f"Added explain to labeled question: {text}")
            return
    print(f"Could not find labeled question: {text}")

#### "Major Statements" CRUD Functions

These aren't used in this demo, but illustrate how admin users of the Trees app could add, remove, and clear major statements that can be used as context to further specify the GPT response. This could also be a microservice, and this could also be a feature within the mobile app that GPT-savvy user's can pay for to tweak the context that GPT gets when it generates labels, or personalized content recommendations for them. This gives user's power, and is cost effective since the cost to run these models is miniscule and user's would be paying for the frontend that makes these responses clean and useful rather than the backend that generates them with the GPT api for very cheap! (5x markup on cost would still be ~ $0.05 per user request, which seems very reasonable for a personalized response...however frontend costs may necessitate a higher cost in-app)

In [37]:
# Question major statements

major_statements = []

def add_major_statement(text):
    # Only add if text is not already in major_statements
    for statement in major_statements:
        if statement["text"] == text:
            print(f"Major statement already added: {text}")
            return
    major_statements.append({
        "text": text,
    })
    print(f"Added major statement: {text}")
    print(f"Length of major statements: {len(major_statements)}")

def clear_major_statements():
    major_statements.clear()
    print("Cleared major statements")

def remove_major_statement(text):
    for statement in major_statements:
        if statement["text"] == text:
            major_statements.remove(statement)
            print(f"Removed major statement: {text}")
            print(f"Length of major statements: {len(major_statements)}")
            return
    print(f"Could not find major statement: {text}")

---
## Question Answering Console "App"

This is a simple console app that allows you to answer all the questions in the list of questions (108 total questions). This allows this notebook demo to use real user answers to predict personality traits, goals, habits, and (hopefully) plan/content recommendations. Additionally, it (maybe, bc this is really out-of-scope) allows for gpt to generate personalized versions of generalized content for a specific user.

In [16]:
# DONT RUN THIS CELL UNLESS YOU WANT TO CLEAR ALL USER ANSWERS

# data structure to store user's and their answers to questions
user_answers = {} # { "username": { "question" : "answer" } }

In [17]:
def answer_questions(questions):
    username = input("Enter Username: ")

    if username not in user_answers:
        user_answers[username] = {}
    else:
        print(f"Username already exists: {username}")

    for i, question in enumerate(questions):
        if question["text"] not in user_answers[username]:
            print(f"----- Question #{i+1} -----")
            print(f"{question['text']}")

            answers =[]
                

            if (question["answer_type"] == "A slider displaying text values"):
                if (question["text"] == "How many hours per week do you work?"): # Special slider values, not 11-pt scale
                    answers = ["Some number between 0 and 80"]
                elif (question["text"] == "How many hours per week do you spend on schoolwork?"):
                    pass
                elif (question["text"] == "On average, how many hours do you sleep in each 24-hour period?"):
                    answers = ["A number between 4 and 12 hours with half-hour increments (0.5s) being acceptable"]
                elif (question["text"] == "From which socio-economic group would you consider your family to be?"):
                    answers = ["Low income", "Middle income", "High income"]
                elif (question["text"] == "Are you currently in a relationship?"):
                    answers = ["Yes", "Sort of", "No"]
                elif (question["text"] == "Are you currently in a romantic relationship?"):
                    answers = ["Yes", "Sort of", "No"]
                elif (question["text"] == "Do you have children?"):
                    answers = ["Yes", "Yes but they don't live with me", "No"]
                elif (question["text"] == "Each day, how many hours do you typically spend (outside of class) studying or doing schoolwork?"):
                    answers = ["0hrs", "1hr", "2hrs", "3hrs", "4hrs", "5hrs", "6hrs", "7hrs", 
                               "8hrs", "9hrs", "10hrs", "11hrs", "12hrs", "13hrs", "14hrs or more"]
                elif (question["text"] == "How intense are your exercise sessions?"):
                    answers = ["Incredibly intense", "Pretty intense", "Kind of intense", "Comfortable", "Relaxed"]
                elif (question["text"] == "How often do you exercise?"):
                    answers = ["7-10 times per week", "5-6 times per week", "3-4 times per week", "Twice per week", "Once per week", "Never"]
                else: # Regular 11-pt scale slider questions
                    answers = ["5, (YesYesYesYesYes)", "4, (YesYesYesYes)", "3, (YesYesYes)", "2, (YesYes)", "1, (Yes)",
                            "0, (Indifferent)", 
                            "-1, (No)", "-2, (NoNo)", "-3, (NoNoNo)", "-4, (NoNoNoNo)", "-5, (NoNoNoNoNo)"]
            else:    
                if (question["text"] == "Which of these learning preferences work best for you?"):
                    answers = ["Reading/Writing", "Audio", "Video", "Hands-On"]
                elif (question["text"] == "Did either of your parents (including step parents/guardian) complete university?"):
                    answers = ["Both did", "One did", "I don't know", "No"]
                elif (question["text"] == "Are you currently employed?"):
                    answers = ["Full-time", "Part-time", "Unemployed and currently looking for work",
                            "Unemployed and not currently looking for work", "Other", "Prefer not to say"] # Don't have Self-Employed?
                elif (question["text"] == "What is your race?"):
                    answers = ["Asian and South East Asian", "Black or African Canadian", "First Nation, Metis, Inuit or (Indigenous)",
                            "Middle Eastern", "White", "Other", "Prefer not to say"]
                elif (question["text"] == "How do you get to campus?"):
                    answers= ["Living on-campus", "Bike", "Walk", "Drive", "Bus", "Other"]
                elif (question["text"] == "Would you be interested in accessing resources and learning more about any of the following?"):
                    answers = ["Depression", "Anxiety", "OCD", "ADHD", "Eating disorders", "Stress reduction", "Time management", 
                            "None of the above"] # 'None of the above' makes no sense as an answer to this question... 
                                                    # (it's the same as just not answering...)
                elif (question["text"] == "What year of university are you in?"):
                    answers = ["1st", "2nd", "3rd", "4th", "5th or higher", "Graduate program"]
                elif (question["text"] == "What are your living arrangements?"):
                    answers = ["Living at home with family", "Living alone in residence", "Living in residence with roommate(s)",
                            "Living alone off campus", "Living off campus with roommate(s)", "Living off campus with a significant other"]
                elif (question["text"] == "How old are you?"):
                    answers = ["Answer with a number"]
                elif (question["text"] == "What is your gender?"):
                    answers = ["Prefer not to say", "Female", "Male", "Non-binary/non-conforming", "Gender fluid", 
                            "Transgender", "Other"]        
            

            print(f"---> Answer Type: {question['answer_type']}")
            for i, answer in enumerate(answers):
                print(f"{i+1}. {answer}")
            answer = input("Answer (if numbered enter the number of your answer): ")
            # if the answer is a valid number and there is more than one string in the answers list
            if answer.isnumeric() and int(answer) <= len(answers) and len(answers) > 1:
                answer = answers[int(answer)-1]

            print(f"You answered: {answer}")
            validate = input("Is this correct? (y/n): ")
            if validate.lower() == "y":
                # add answer to user_answers
                user_answers[username][question["text"]] = answer
            else:
                continue # continue to the next question if the answer was wrong
            
            print("-"*50)
        else:
            print(f"Question already answered: {question['text']}")

        


In [22]:
answer_questions(questions)

Username already exists: kbrait
----- Question #1 -----
I care what people close to me think is best
---> Answer Type: A slider displaying text values
1. 5, (YesYesYesYesYes)
2. 4, (YesYesYesYes)
3. 3, (YesYesYes)
4. 2, (YesYes)
5. 1, (Yes)
6. 0, (Indifferent)
7. -1, (No)
8. -2, (NoNo)
9. -3, (NoNoNo)
10. -4, (NoNoNoNo)
11. -5, (NoNoNoNoNo)
You answered: 1, (Yes)
--------------------------------------------------
Question already answered: I sympathize with others’ feelings.
Question already answered: I can be pushy.
Question already answered: I follow a schedule.
Question already answered: I leave my belongings around.
Question already answered: I often act on the spur of the moment.
Question already answered: I am always prepared.
Question already answered: I often forget to put things back in their proper place.
----- Question #9 -----
I find it difficult to remember appointments or obligations.
---> Answer Type: A slider displaying text values
1. 5, (YesYesYesYesYes)
2. 4, (YesYesY

In [23]:
print(f"User Answers: {user_answers}")

try:
    save_data(user_answers, "user_answers")
    print("Saved user answers")
except:
    print("Could not save user answers")
    print(f"Error Message: {sys.exc_info()[0]}")

User Answers: {'kbrait': {'I sympathize with others’ feelings.': '3, (YesYesYes)', 'I can be pushy.': '2, (YesYes)', 'I follow a schedule.': '-2, (NoNo)', 'I leave my belongings around.': '4, (YesYesYesYes)', 'I often act on the spur of the moment.': '5, (YesYesYesYesYes)', 'I am always prepared.': '2, (YesYes)', 'I often forget to put things back in their proper place.': '0, (Indifferent)', 'I have trouble wrapping up the final details of a project once the challenging parts are done.': '5, (YesYesYesYesYes)', 'My social life is important to me': '3, (YesYesYes)', "I like to talk with others when I'm trying to understand new things.": '4, (YesYesYesYes)', 'I talk to a lot of different people at social gatherings.': '2, (YesYes)', 'I like being the center of attention.': '2, (YesYes)', 'I feel I can be my most authentic self.': '3, (YesYesYes)', 'I am relaxed most of the time.': '1, (Yes)', 'I get stressed out easily.': '-3, (NoNoNo)', 'I change my mood a lot.': '1, (Yes)', 'I see myse

In [53]:
user_answers

{'kbrait': {'I sympathize with others’ feelings.': '3, (YesYesYes)',
  'I can be pushy.': '2, (YesYes)',
  'I follow a schedule.': '-2, (NoNo)',
  'I leave my belongings around.': '4, (YesYesYesYes)',
  'I often act on the spur of the moment.': '5, (YesYesYesYesYes)',
  'I am always prepared.': '2, (YesYes)',
  'I often forget to put things back in their proper place.': '0, (Indifferent)',
  'I have trouble wrapping up the final details of a project once the challenging parts are done.': '5, (YesYesYesYesYes)',
  'My social life is important to me': '3, (YesYesYes)',
  "I like to talk with others when I'm trying to understand new things.": '4, (YesYesYesYes)',
  'I talk to a lot of different people at social gatherings.': '2, (YesYes)',
  'I like being the center of attention.': '2, (YesYes)',
  'I feel I can be my most authentic self.': '3, (YesYesYes)',
  'I am relaxed most of the time.': '1, (Yes)',
  'I get stressed out easily.': '-3, (NoNoNo)',
  'I change my mood a lot.': '1, (Ye

---
## First Pipeline: Generate Additional Question Context for Given Question Text & Answer Type

This pipeline is used to add more context to each question, this could be shown to user's to help them understand the purpose of different questions, however it needs to be specified that this is **AI-Generated** per OpenAI's terms of use. Additionally, it should be specified since at scale the generations will undoubtedly be wrong sometimes, and it's important to be transparent so that user's can laugh about it rather than be confused or upset.

---
#### Context statements that are being used in the following two prompts

These context statements allow GPT to better "understand" what the questions are to the user's in-app experience (question_context_stmt) and how the generations that it makes will be used (generation_context_stmt).

In [41]:
# Important contexts for first part of the pipeline - Question labeling & use/explain analysis

question_context_stmt = "The questions that you are analyzing are asked in a mobile application directly on a single individual user. \
    The questions are in the for of a questionnaire with a slider, text box, or multiple choice selection. The questions \
    provide insight into the way a user is currently living and experiencing life. This is the context of the questions."

eventual_use_stmt = "The eventual use of the analysis you are providing on the questions will be twofold: (1) provide the user with a better \
    understanding of their current state of being, and (2) provide the user with recommendations on content within the mobile app that they \
    can use to improve themselves. Improve in this context is subjective, hence the need for question analysis. Eventually, \
    the analysis you currently provide on each question will be combined with specific answers to each question and the entire text content of \
    every plan in the mobile app. This is the context of how the analysis you provide will be used."

---
### First Prompt: Generate Question Label & Explanation

In [42]:
# PROMPT #1: GENERAL QUESTION LABEL PROMPT

def gpt_question_label_prompt(text, answer_type, answers=None):
    gpt_system_message = "You are going to interpret the meaning behind various questions. You will be given the text of a question \
        and the type of answer that is expected. You will then label the question with a broad label that would help a user answering \
        the question to understand what aspect of their life the answer to this question will relate to. The following is an explanation \
        the context of the questions you are labeling:\n\
        \t{question_context}\n\
        The following is an explanation of the context of how this and further analysis will eventually be used:\n\
        \t{eventual_use}\n\
        Finally, you will be asked to provide a short explanation of why you chose this label for this question.\
        You will fill out the following information for each question: \
            Question Text: {text} \
            Question Label: {generated_label} \
            Label Explanation: {label_explanation}"
    
    real_answer_type = ''
    if answer_type == 'Text, open-ended question':
        real_answer_type = 'open-ended, question_text is representative of how the user will answer'
    elif answer_type == 'A slider displaying text values':
        real_answer_type = 'an 11-pt slider where 0-4 is negative (disagreement with statement), 5 is neutral, 6-10 is positive (agreement with statement)'
    elif answer_type == 'Multiple choices':
        real_answer_type = 'multiple choice, question_text is representative of the choices, typically provides demographic information'
    else:
        real_answer_type = 'unknown'
    
    # implementation without explicit answers
    if answers is None:
        prompt = f"The question text is: {text} \
            The answer type is '{real_answer_type}'. Answer type is important because it illustrates the options a user has to answer the question. \
            Please fill out the template for this question and generate a label for this question."
        
        # Call the OpenAI API to generate a response
        completion = client.chat.completions.create(
            messages=[
                {
                    "role":"system",
                    "content":gpt_system_message
                },
                {
                    "role":"system",
                    "content":"Never include anything in the response except for the quetion text, question label, and label explanation as specified \
                        in the template. The parsing function relies on the template being followed."
                },
                {
                    "role":"user",
                    "content":prompt
                }
            ],
            model="gpt-3.5-turbo"
        )
        
        # Extract the generated text from the response
        generated_text = completion.choices[0].message.content

        # Print usage cost

        cost = calculate_usage_cost(
                "gpt-3.5-turbo",
                chat_completion.usage.completion_tokens,
                chat_completion.usage.prompt_tokens
            )
            

        return generated_text, real_answer_type, cost
    


#### Parsing Labels From the GPT Response

In [43]:
# parsing function needs to be figured out
def parse_label_output(output):
    output = output.split('\n')
    label = output[1].split(': ')[1]
    label_explain = output[2].split(': ')[1]
    return label, label_explain

#### Cell that Clears the Labeled Questions List

In [None]:
clear_labeled_questions()

Cleared labeled questions


#### Label All Questions
Cost == ~$0.00237600

In [48]:
total_cost = 0

for question in questions:
    if question["text"] not in [question["text"] for question in labeled_questions]:
        print("*"*50)
        print("Labeling question: " + question["text"])
        output, real_answer_type, cost = gpt_question_label_prompt(question["text"], question["answer_type"])
        total_cost += cost
        try:
            add_labeled_question(question['text'], real_answer_type, parse_label_output(output)[0], parse_label_output(output)[1])
        except IndexError:
            print("!"*50); print("!"*50)
            print("Error: Can't Parse Prompt Output")
            print("-"*50); print(output); print("-"*50); 
            print("!"*50); print("!"*50)

        print("*"*50)
    else:
        print(f"Question already labeled: {question['text']}")
    # output, real_answer_type = gpt_question_label_prompt(question["text"], question["answer_type"])
    # add_labeled_question(question['text'], real_answer_type, parse_label_output(output)[0], parse_label_output(output)[1])

print_cost(total_cost)

Question already labeled: I care what people close to me think is best
Question already labeled: I sympathize with others’ feelings.
Question already labeled: I can be pushy.
Question already labeled: I follow a schedule.
Question already labeled: I leave my belongings around.
Question already labeled: I often act on the spur of the moment.
Question already labeled: I am always prepared.
Question already labeled: I often forget to put things back in their proper place.
Question already labeled: I find it difficult to remember appointments or obligations.
Question already labeled: I make careless mistakes while working on a boring or difficult project.
Question already labeled: I have trouble wrapping up the final details of a project once the challenging parts are done.
Question already labeled: My social life is important to me
Question already labeled: I like to talk with others when I'm trying to understand new things.
Question already labeled: I talk to a lot of different people at

In [49]:
pretty_print(labeled_questions)
try:
    save_data(labeled_questions, "labeled_questions")
    print("Saved labeled questions")
except:
    print("Could not save labeled questions")
    print(f"Error Message: {sys.exc_info()[0]}")

[
    {
        "text": "I care what people close to me think is best",
        "answer_type": "an 11-pt slider where 0-4 is negative (disagreement with statement), 5 is neutral, 6-10 is positive (agreement with statement)",
        "label": "Relationships",
        "label_explain": "This question is asking about the importance placed on the opinions of people who are close to the person answering the question. It relates to the dynamics and dynamics and influence in their relationships.",
        "primary_use": "",
        "secondary_use": "",
        "tertiary_use": "",
        "primary_explain": "",
        "secondary_explain": "",
        "tertiary_explain": ""
    },
    {
        "text": "I sympathize with others\u2019 feelings.",
        "answer_type": "an 11-pt slider where 0-4 is negative (disagreement with statement), 5 is neutral, 6-10 is positive (agreement with statement)",
        "label": "Empathy",
        "label_explain": "This question is about the respondent's capaci

#### Label a Specific Question - Demo Purpose

In [None]:
# Example to see how it works - THIS CELL COSTS MONEY TO RUN (PROMPT #1)
# Takes ~20 seconds to run

ex_q = questions[107]
#questions[99] # question 99 text: "I feel connected to the university."
ex_output, real_answer_type = gpt_question_label_prompt(ex_q['text'], ex_q['answer_type'])

Total Cost: $0.00002200
Number of Runs to Reach $1: 45455


In [None]:
print(ex_output)

Question Text: What program are you currently enrolled in?
Question Label: Academic/Career
Label Explanation: This question is asking about the program the person is currently studying or participating in. It pertains to their academic or career-related activities.


In [54]:
#add_labeled_question(ex_q['text'], real_answer_type, parse_label_output(ex_output)[0], parse_label_output(ex_output)[1])
pretty_print(labeled_questions)

[
    {
        "text": "I care what people close to me think is best",
        "answer_type": "an 11-pt slider where 0-4 is negative (disagreement with statement), 5 is neutral, 6-10 is positive (agreement with statement)",
        "label": "Relationships",
        "label_explain": "This question is asking about the importance placed on the opinions of people who are close to the person answering the question. It relates to the dynamics and dynamics and influence in their relationships.",
        "primary_use": "Assessing the importance placed on the opinions of people close to the user",
        "secondary_use": "Understanding the user's decision-making process",
        "tertiary_use": "Evaluating the user's level of independence in decision-making",
        "primary_explain": "This question is asking about the extent to which the user values the opinions of people who are close to them. It provides insights into the importance the user gives to the perspectives and input of their l

---
### Second Prompt: Generate Three Question Uses and Explanations

In [51]:
# PROMPT #2 - GENERAL QUESTION USES PROMPT

def gpt_question_uses_prompt(text, answer_type, label, label_explanation, answers=None):
    gpt_system_message = """
    You are going to interpret the meaning behind various questions. 
    You will be given the text of a question and the type of answer that is expected. 
    Your goal is to think about how the way the user answers the question could generally provide some usable information for the user. 
    Try to be general enough with the uses so that whether the question is answered extremely one way or extremely the 
    other way, there will still be some use in knowing the uses of the question overall. The context of the questions you are 
    analyzing is as follows:
    \t{question_context}
    The context of how this and further analysis will eventually be used is as follows:
    \t{eventual_use}
    You will just fill out the following information for each question:
    Question Text: {text}
    Question Main Use: {generated_main_use}
    Question Secondary Use: {generated_secondary_use}
    Question Tertiary Use: {generated_tertiary_use}
    Main Explanation: {main_explanation}
    Secondary Explanation: {secondary_explanation}
    Tertiary Explanation: {tertiary_explanation}"""
    
    if answers is None:
        prompt = f"""
        The question text is: {text}
        The answer type is '{answer_type}'. Answer type is important because it illustrates the options a user has to answer the question.
        The label is '{label}'. This is a general label that provides some insight into how the question relates to the user's life.
        Since the label is generated by a NLP model, there is also an explanation to provide better context as to why the model chose this label.
        The label explanation is '{label_explanation}'.
        Please fill out the template for this question and generate uses and explanations for this question."""

        example_output = """
        Question Text: This is just an example question text.
        Question Main Use: This is just an example main use.
        Question Secondary Use: This is just an example secondary use.
        Question Tertiary Use: This is just an example tertiary use.
        Main Explanation: This is just an example main explanation.
        Secondary Explanation: This is just an example secondary explanation.
        Tertiary Explanation: This is just an example tertiary explanation."""
        
        completion = client.chat.completions.create(
            messages=[
                {
                    "role":"system",
                    "content":gpt_system_message
                },
                {
                    "role":"user",
                    "content":prompt
                },
                {
                    "role":"assistant",
                    "content":example_output
                }
            ],
            model="gpt-3.5-turbo"
        )
        
        # Extract the generated text from the response
        generated_text = completion.choices[0].message.content.strip('\n')

        # Print usage cost
        cost = calculate_usage_cost(
                "gpt-3.5-turbo",
                chat_completion.usage.completion_tokens,
                chat_completion.usage.prompt_tokens
            )

        return generated_text, cost

#### Parse Generated Uses and Explanations

In [52]:
def parse_uses_output(output):
    output = output.strip('\n')
    output = output.split('\n')
    primary_use = output[1].split(': ')[1]
    secondary_use = output[2].split(': ')[1]
    tertiary_use = output[3].split(': ')[1]
    primary_explain = output[4].split(': ')[1]
    secondary_explain = output[5].split(': ')[1]
    tertiary_explain = output[6].split(': ')[1]
    return primary_use, secondary_use, tertiary_use, primary_explain, secondary_explain, tertiary_explain

#### Better Parsing (ChatGPT-4 Online Wrote This Function)

In [54]:
def better_uses_parsing(output):
    # Define default values for all fields
    fields = {
        'Question Main Use': 'N/A',
        'Question Secondary Use': 'N/A',
        'Question Tertiary Use': 'N/A',
        'Main Explanation': 'N/A',
        'Secondary Explanation': 'N/A',
        'Tertiary Explanation': 'N/A'
    }

    # Normalize the output by replacing double newlines with single newlines
    normalized_output = output.replace('\n\n', '\n').strip()

    # Split the output into lines
    lines = normalized_output.split('\n')

    # Iterate over each line, checking if it starts with one of the fields
    for line in lines:
        # Split the line into key and value based on the first colon found
        parts = line.split(': ', 1)
        if len(parts) == 2:
            key, value = parts
            # If the key is one of the fields we're looking for, update the dictionary
            if key in fields:
                fields[key] = value.strip()

    # Return the values in the order expected
    return (fields['Question Main Use'], fields['Question Secondary Use'], fields['Question Tertiary Use'],
            fields['Main Explanation'], fields['Secondary Explanation'], fields['Tertiary Explanation'])


#### Bulk Question Uses Generation

In [55]:
total_cost = 0
for question in labeled_questions:
    if question['primary_use'] == "" and question['label'] != "":
        print("*"*50)
        print("Generating uses for question: " + question['text'])
        output, cost = gpt_question_uses_prompt(question['text'], question['answer_type'], question['label'], question['label_explain'])
        total_cost += cost
        try:
            p,s,t,pe,se,te = better_uses_parsing(output) # parse_uses_output(output)
            add_uses_to_labeled_question(question['text'], p,s,t)
            add_explain_to_labeled_question(question['text'], pe,se,te)
        except IndexError:
            print("!"*50); print("!"*50)
            print("Error: Can't Parse Prompt Output")
            print("-"*50); print(output); print("-"*50); 
            print("!"*50); print("!"*50)

        print("*"*50)
    else:
        print(f"Question already has uses: {question['text']}")

print_cost(total_cost)

Question already has uses: I care what people close to me think is best
Question already has uses: I sympathize with others’ feelings.
**************************************************
Generating uses for question: I can be pushy.
Added uses to labeled question: I can be pushy.
Added explain to labeled question: I can be pushy.
**************************************************
Question already has uses: I follow a schedule.
Question already has uses: I leave my belongings around.
Question already has uses: I often act on the spur of the moment.
Question already has uses: I am always prepared.
**************************************************
Generating uses for question: I often forget to put things back in their proper place.
Added uses to labeled question: I often forget to put things back in their proper place.
Added explain to labeled question: I often forget to put things back in their proper place.
**************************************************
Question already has uses: I

In [56]:
pretty_print(labeled_questions)
try:
    save_data(labeled_questions, "labeled_questions")
    print("Saved labeled questions")
except:
    print("Could not save labeled questions")
    print(f"Error Message: {sys.exc_info()[0]}")

[
    {
        "text": "I care what people close to me think is best",
        "answer_type": "an 11-pt slider where 0-4 is negative (disagreement with statement), 5 is neutral, 6-10 is positive (agreement with statement)",
        "label": "Relationships",
        "label_explain": "This question is asking about the importance placed on the opinions of people who are close to the person answering the question. It relates to the dynamics and dynamics and influence in their relationships.",
        "primary_use": "Assessing the importance placed on the opinions of people close to the user",
        "secondary_use": "Understanding the user's decision-making process",
        "tertiary_use": "Evaluating the user's level of independence in decision-making",
        "primary_explain": "This question is asking about the extent to which the user values the opinions of people who are close to them. It provides insights into the importance the user gives to the perspectives and input of their l

In [57]:
def labeled_question_len(array):
    count = 0
    for question in array:
        if question['primary_use'] != "":
            count += 1
    return count

print(f"{labeled_question_len(labeled_questions)} out of {len(questions)} questions labeled")

108 out of 108 questions labeled


#### Specific Question Uses Generation

In [None]:
# Example of generating uses for a question - THIS CELL COSTS MONEY TO RUN (PROMPT #2)
# Runtime is 30s-1min
ex_q = labeled_questions[0]
ex_output = gpt_question_uses_prompt(ex_q['text'], ex_q['answer_type'], ex_q['label'], ex_q['label_explain'])

Running Function - Usage: CompletionUsage(completion_tokens=204, prompt_tokens=555, total_tokens=759)


In [None]:
print(ex_output)

Question Text: How do you get to campus?
Question Main Use: Gathering information about transportation methods to campus.
Question Secondary Use: Understanding commuting patterns and transportation preferences.
Question Tertiary Use: Identifying potential environmental impact and sustainability efforts.
Main Explanation: The main use of this question is to gather information about the different transportation methods individuals use to commute to campus. This can provide insights into the overall transportation infrastructure and needs of the campus community.
Secondary Explanation: By understanding how people get to campus, institutions can analyze commuting patterns and preferences. This information can be used to optimize transportation services, improve accessibility, and plan for future infrastructure developments. It can also help identify potential challenges or bottlenecks in transportation systems that need to be addressed.
Tertiary Explanation: This question can provide valua

In [None]:
p,s,t,pe,se,te = parse_uses_output(ex_output)

add_uses_to_labeled_question(ex_q['text'], p, s, t)
add_explain_to_labeled_question(ex_q['text'], pe, se, te)

pretty_print(labeled_questions)

Added uses to labeled question: How do you get to campus?
Added explain to labeled question: How do you get to campus?
[
    {
        "text": "How do you get to campus?",
        "answer_type": "multiple choice, question_text is representative of the choices, typically provides demographic information",
        "label": "Transportation",
        "label_explain": "I chose the label \"Transportation\" for this question because it pertains to how someone travels to campus. The answer to this question would provide insight into the different transportation methods used by individuals to commute to the campus.",
        "primary_use": "Gathering information about transportation methods to campus.",
        "secondary_use": "Understanding commuting patterns and transportation preferences.",
        "tertiary_use": "Identifying potential environmental impact and sustainability efforts.",
        "primary_explain": "The main use of this question is to gather information about the different tr

---
## Second Pipeline: Generate Personalized Response to User's Question Answers

#### Example Set of User Answers

In [68]:
## All answers are ~ 2000 tokens long
## https://platform.openai.com/tokenizer

user_answers['kbrait']


{'I care what people close to me think is best': '1, (Yes)',
 'I sympathize with others’ feelings.': '3, (YesYesYes)',
 'I can be pushy.': '3, (YesYesYes)',
 'I follow a schedule.': '-4, (NoNoNoNo)',
 'I leave my belongings around.': '1, (Yes)',
 'I often act on the spur of the moment.': '5, (YesYesYesYesYes)',
 'I am always prepared.': '4, (YesYesYesYes)',
 'I often forget to put things back in their proper place.': '0, (Indifferent)',
 'I find it difficult to remember appointments or obligations.': '-2, (NoNo)',
 'I make careless mistakes while working on a boring or difficult project.': '1, (Yes)',
 'I have trouble wrapping up the final details of a project once the challenging parts are done.': '3, (YesYesYes)',
 "I like to talk with others when I'm trying to understand new things.": '4, (YesYesYesYes)',
 'I talk to a lot of different people at social gatherings.': '4, (YesYesYesYes)',
 'I like being the center of attention.': '3, (YesYesYes)',
 'I am relaxed most of the time.': '-

#### Prompt #3: Generate Goals for the User

In [72]:
def gpt_goal_generator(user_answers, subset_labeled_questions):

    gpt_system_message = """
    You are going to generate a goal for a user based on their answers to questions. 
    Each question will have a label, label explanation, and three uses with explanations. 
    These are to provide context about the question and how it can help you understand their answers. 
    You must use the following template in your response: 
    goal: <goal>
    goal explanation: <goal explanation>
    goal: <goal>
    goal explanation: <goal explanation>
    goal: <goal>
    goal explanation: <goal explanation>
    Please generate three goals and explanations for the user. Attempt to make the goals unique and 
    explicity relevant to the user as a whole rather than specific to a single question."""
    
    prompt = "The user's answers to the questions are as follows: \n"
    for question in subset_labeled_questions:
        prompt += f"Question: {question['text']}\n"
        prompt += f"Labeled: {question['label']}\n"
        prompt += f"Primary Use: {question['primary_use']}\n"
        prompt += f"Secondary Use: {question['secondary_use']}\n"
        prompt += f"Tertiary Use: {question['tertiary_use']}\n"
        prompt += f"Answer: {user_answers[question['text']]}\nEND QUESTION\n\n"

    completion = client.chat.completions.create(
            messages=[
                {
                    "role":"system",
                    "content":gpt_system_message
                },
                {
                    "role":"user",
                    "content":prompt
                }
            ],
            model="gpt-3.5-turbo-1106"
        )
    
    # Extract the generated text from the response
    generated_text = completion.choices[0].message.content.strip('\n')

    # Print usage cost
    cost = calculate_usage_cost(
            "gpt-3.5-turbo",
            chat_completion.usage.completion_tokens,
            chat_completion.usage.prompt_tokens
        )

    print(f"Input Tokens: {chat_completion.usage.prompt_tokens}")
    print(f"Output Tokens: {chat_completion.usage.completion_tokens}")

    return generated_text, cost

In [88]:
goal_response, cost = gpt_goal_generator(user_answers['kbrait'], labeled_questions)

Input Tokens: 12
Output Tokens: 5


In [74]:
print_cost(cost)

Total Cost: $0.00002200
Number of Runs to Reach $1: 45455
**************************************************


In [89]:
print(goal_response)

goal: Improve decision-making independence
goal explanation: Based on your responses indicating the importance you place on the opinions of people close to you and your level of assertiveness, it would be beneficial to work on developing more independence in your decision-making process. Balancing the value of others' perspectives with asserting your own opinions and preferences when making decisions can lead to a more well-rounded approach.

goal: Enhance emotional resilience and stability
goal explanation: Given your high empathy level, tendency to worry, and fluctuations in mood, focusing on enhancing emotional resilience and stability can be beneficial. Strengthening coping mechanisms, stress management, and self-regulation could help in navigating fluctuations in mood and emotional responses, leading to a more balanced and stable emotional state.

goal: Explore new learning experiences and social opportunities
goal explanation: With your strong preference for social interaction wh

In [82]:
def gpt_goal_plan_creation(user_answers, goal):

    gpt_system_message = """
                        Based on a goal, you are going to generate a plan of action for a specific user to follow.
                        This plan should have actionable steps that the user can take to achieve their goal.
                        The plan must be personalized to the user, to do this you will be provided the user's answers
                        to question's that provide insight into their personality and individuality. You will provide
                        the following output, in this exact format:

                        plan_title: <plan_title>
                        plan_description: <plan_description>
                        number_of_steps: <number_of_steps>
                        steps: <step_1>, <step_2>, <step_3>, ...

                        It is important that you follow this format exactly as the parsing function relies on it.
                        It is also very important that the steps are comma separated with no newlines in between steps.
                        There should only be newlines between the plan_title, plan_description, number_of_steps, and steps."""
    
    prompt = "The user's answers to the questions are as follows: \n"
    for question_text in user_answers.keys():
        prompt += f"Question: {question_text}\nAnswer: {user_answers[question_text]}\n"
    prompt += f"The user's goal: {goal}"
    
    completion = client.chat.completions.create(
            messages=[
                {
                    "role":"system",
                    "content":gpt_system_message
                },
                {
                    "role":"user",
                    "content":prompt
                }
            ],
            model="gpt-3.5-turbo"
        )
    
    # Extract the generated text from the response
    generated_text = completion.choices[0].message.content.strip('\n')

    # Print usage cost
    cost = calculate_usage_cost(
            "gpt-3.5-turbo",
            chat_completion.usage.completion_tokens,
            chat_completion.usage.prompt_tokens
        )

    return generated_text, cost

In [90]:
generated_plans = []
total_cost = 0

for i, goal in enumerate(goal_response.split('\n')):
    if goal.startswith('goal: '):
        single_goal = goal[6:]
        print(f"Generating plan for goal #{i+1}: {single_goal}")
        plan_response, cost = gpt_goal_plan_creation(user_answers['kbrait'], single_goal)
        generated_plans.append(plan_response)
        total_cost += cost

#plan_response, cost = gpt_goal_plan_creation(user_answers['kbrait'], goal_response)

Generating plan for goal #1: Improve decision-making independence
Generating plan for goal #4: Enhance emotional resilience and stability
Generating plan for goal #7: Explore new learning experiences and social opportunities


In [84]:
print_cost(cost)

Total Cost: $0.00002200
Number of Runs to Reach $1: 45455
**************************************************


In [91]:
for plan in generated_plans:
    print(plan)
    print("*"*50)

plan_title: Improving Decision-Making Independence
plan_description: This plan will help you become more confident and independent in making decisions.
number_of_steps: 4
steps: Step 1: Identify your decision-making style.
Step 2: Start with small decisions and gradually increase their complexity.
Step 3: Seek advice and gather information, but make the final decision on your own.
Step 4: Reflect on your decisions and learn from them for future growth.
**************************************************
plan_title: Enhancing Emotional Resilience and Stability
plan_description: This plan is designed to help you enhance your emotional resilience and stability. By following these actionable steps, you can develop strategies to cope with stress, manage your emotions, and build emotional well-being.
number_of_steps: 5
steps: Step 1: Practice mindfulness and self-reflection daily to become more aware of your emotions and thoughts. This can involve meditation, journaling, or simply taking a fe

#### Prompt #4: Generate Personality Traits for the User

In [14]:
## Getting subset of personality based questions
labeled_questions = load_data("labeled_questions.pkl")

def filter_questions(questions, answer_type):
    return [question for question in questions if question['answer_type'] == answer_type]

slider_type = "an 11-pt slider where 0-4 is negative (disagreement with statement), 5 is neutral, 6-10 is positive (agreement with statement)"
slider_questions = filter_questions(labeled_questions, slider_type)

print(f"Number of slider questions: {len(slider_questions)}")
print("Example:")
pretty_print(slider_questions[0])



Number of slider questions: 91
Example:
{
    "text": "I care what people close to me think is best",
    "answer_type": "an 11-pt slider where 0-4 is negative (disagreement with statement), 5 is neutral, 6-10 is positive (agreement with statement)",
    "label": "Relationships",
    "label_explain": "This question is asking about the importance placed on the opinions of people who are close to the person answering the question. It relates to the dynamics and dynamics and influence in their relationships.",
    "primary_use": "Assessing the importance placed on the opinions of people close to the user",
    "secondary_use": "Understanding the user's decision-making process",
    "tertiary_use": "Evaluating the user's level of independence in decision-making",
    "primary_explain": "This question is asking about the extent to which the user values the opinions of people who are close to them. It provides insights into the importance the user gives to the perspectives and input of their

In [28]:
user_answers['kbrait']

{'I sympathize with others’ feelings.': '3, (YesYesYes)',
 'I can be pushy.': '2, (YesYes)',
 'I follow a schedule.': '-2, (NoNo)',
 'I leave my belongings around.': '4, (YesYesYesYes)',
 'I often act on the spur of the moment.': '5, (YesYesYesYesYes)',
 'I am always prepared.': '2, (YesYes)',
 'I often forget to put things back in their proper place.': '0, (Indifferent)',
 'I have trouble wrapping up the final details of a project once the challenging parts are done.': '5, (YesYesYesYesYes)',
 'My social life is important to me': '3, (YesYesYes)',
 "I like to talk with others when I'm trying to understand new things.": '4, (YesYesYesYes)',
 'I talk to a lot of different people at social gatherings.': '2, (YesYes)',
 'I like being the center of attention.': '2, (YesYes)',
 'I feel I can be my most authentic self.': '3, (YesYesYes)',
 'I am relaxed most of the time.': '1, (Yes)',
 'I get stressed out easily.': '-3, (NoNoNo)',
 'I change my mood a lot.': '1, (Yes)',
 'I see myself as som

#### Personality Trait Generation

In [37]:
def gpt_personality_trait_generator(user_answers, subset_labeled_questions):
    
    gpt_system_message = """
                        You are going to generate a set of personality traits for a user based on their answers to questions.
                        You must generate five "Big5" personality traits for the user. The five traits are: Openness, Conscientiousness,
                        Extraversion, Agreeableness, and Neuroticism. And you must score these on a scale from 0 to 100.

                        Then you must generate a list of ten additional personality traits that you think the user may have.

                        Finally, you must generate a list of ten personality traits that you think the user does not have.

                        You must use the following template in your response:
                        big5_traits: <big5_traits>
                        personality_traits_user_has: <personality_traits>
                        personality_traits_user_does_not_have: <personality_traits>

                        The context of this generation is that an individual user has answered a set of questions and you are
                        generating personality traits that they may have based on their answers. These traits will be able to be
                        selected or deselected by a user looking to build their in app profile. 

                        Additionally, some of the questions may not be applicable to identifying personality traits, please
                        ignore these questions when generating personality traits based on the question text and answer.
                        """
    
    prompt = "The user's answers to the questions are as follows: \n"
    for question in user_answers:
        if question in [question['text'] for question in subset_labeled_questions]:
            prompt += f"Question: {question}\nAnswer: {user_answers[question]}\n"

    completion = client.chat.completions.create(
            messages=[
                {
                    "role":"system",
                    "content":gpt_system_message
                },
                {
                    "role":"user",
                    "content":prompt
                }
            ],
            model="gpt-3.5-turbo"
        )

    # Extract the generated text from the response
    generated_text = completion.choices[0].message.content.strip('\n')

    # Print usage cost
    cost = calculate_usage_cost(
            "gpt-3.5-turbo",
            chat_completion.usage.completion_tokens,
            chat_completion.usage.prompt_tokens
        )

    return generated_text, cost

In [86]:
personality_traits_response, cost = gpt_personality_trait_generator(user_answers['kbrait'], slider_questions)

In [87]:
print_cost(cost); print(personality_traits_response)

Total Cost: $0.00002200
Number of Runs to Reach $1: 45455
**************************************************
big5_traits: 
- Openness: 85
- Conscientiousness: 60
- Extraversion: 70
- Agreeableness: 75
- Neuroticism: 40

personality_traits_user_has: 
- Empathetic
- Organized
- Social
- Adventurous
- Confident
- Imaginative
- Artistic
- Ambitious
- Assertive
- Resilient

personality_traits_user_does_not_have: 
- Reserved
- Introverted
- Timid
- Careless
- Insecure
- Unadventurous
- Unimaginative
- Uncaring
- Indifferent
- Impulsive


#### Prompt #5: Generate Habits (Good & Bad) for the User

In [None]:
def gpt_habits_generator(user_answers, subset_labeled_questions):
    pass

---
## Third Pipeline: Generate Personalized Content Recommendations for User

#### Prompt #6: Generate Recommendations for the User

# Leftovers from the original attempts at this project

In [None]:
import json

def get_question_samples():
    data = [
        {
            # Open-ended question
            "id": 1,
            "text": "What program are you currently enrolled in?",
            "answer_type": "Text, open-ended question",
            "state": "Activated"
        },
        {
            # Open-ended question
            "id": 2,
            "text": "What country were you born in?",
            "answer_type": "Text, open-ended question",
            "state": "Activated"
        },
        {
            # Open-ended question
            "id": 3,
            "text": "Is there anything else you'd like to tell us that may help us help you?",
            "answer_type": "Text, open-ended question",
            "state": "Activated"
        },
        {
            # Financial_Worry - Answers: YesYesYesYesYes ... Indifferent ... NoNoNoNoNo
            # 11-pt slider: -5 = No, 0 = Indifferent, +5 = Yes
            "id": 4,
            "text": "I worry about my finances.",
            "answer_type": "A slider displaying text values",
            "state": "Activated"
        },
        {
            # Physical_Health - Answers: YesYesYesYesYes ... Indifferent ... NoNoNoNoNo
            # 11-pt slider: -5 = No, 0 = Indifferent, +5 = Yes
            "id": 5,
            "text": "I worry about my physical health.",
            "answer_type": "A slider displaying text values",
            "state": "Activated"
        },
        {
            # University_Overwhelmed - Answers: YesYesYesYesYes ... Indifferent ... NoNoNoNoNo
            # 11-pt slider: -5 = No, 0 = Indifferent, +5 = Yes
            "id": 6,
            "text": "I feel overwhelmed by university life.",
            "answer_type": "A slider displaying text values",
            "state": "Activated"
        },
        {
            # University_Satisfaction - Answers: YesYesYesYesYes ... Indifferent ... NoNoNoNoNo
            # 11-pt slider: -5 = No, 0 = Indifferent, +5 = Yes
            "id": 7,
            "text": "I am satisfied with the support I have been offered by the university.",
            "answer_type": "A slider displaying text values",
            "state": "Activated"
        },
        {
            # University_Demographic - Answers: 1st, 2nd, 3rd, 4th, 5th or higher, Graduate program
            "id": 8,
            "text": "What year of university are you in?",
            "answer_type": "Multiple choices",
            "state": "Activated"
        },
        {
            # Demographic - Answers: Both did, One did, I don't know, No
            "id": 9,
            "text": "Did either of your parents (including step parents/guardian) complete university?",
            "answer_type": "Multiple choices",
            "state": "Activated"
        },
        {
            # Big5 personality - Answers: High Conscientiousness (+5) vs. Low Conscientiousness (-5)
            "id": 10,
            "text": "I follow a schedule.",
            "answer_type": "A slider displaying text values",
            "state": "Activated"
        }
    ]
    return json.dumps(data)

def get_plan_samples():
    data = [
        {
            # Co-Pilot generated these steps, and I really like this, it's simply, straightforward, less wordy than the
            # actual in-app version and it's more actionable.
            "id": 838,
            "title": "How to turn acquaintances into real friends",
            "description": "Turning acquaintances into real friends can be a great way to build a support system and make your college experience more enjoyable. Here are some tips to help you turn acquaintances into real friends:",
            "state": "Approved",
            "steps": [
                {
                    "id": 1,
                    "text": "Invite them to an event you are going to",
                    "state": "Approved"
                },
                {
                    "id": 2,
                    "text": "Ask them to study with you",
                    "state": "Approved"
                },
                {
                    "id": 3,
                    "text": "Ask them to grab a meal with you",
                    "state": "Approved"
                },
                {
                    "id": 4,
                    "text": "Ask them to join you in an activity you enjoy",
                    "state": "Approved"
                },
                {
                    "id": 5,
                    "text": "Ask them to join you in an activity they enjoy",
                    "state": "Approved"
                },
                {
                    "id": 6,
                    "text": "Ask them to join you in an activity you both enjoy",
                    "state": "Approved"
                }
            ]
        },
    ]
    return json.dumps(data)


In [None]:
import pandas as pd

question_data = get_questionnaire_questions()

# Convert JSON string to Python object
questions = json.loads(question_data)

# Extract id, text, and answer_type fields for each question
data = []
for question in questions:
    if question['state'] == 'Activated':
        data.append({'id': question['id'], 'text': question['text'], 'answer_type': question['answer_type']})

# Create pandas dataframe from extracted data
df = pd.DataFrame(data)

# Print dataframe
print("The ids are wrong, the text and answer_type is copied from the Django admin panel for now.")
display(df)


The ids are wrong, the text and answer_type is copied from the Django admin panel for now.


Unnamed: 0,id,text,answer_type
0,1,What program are you currently enrolled in?,"Text, open-ended question"
1,2,What country were you born in?,"Text, open-ended question"
2,3,Is there anything else you'd like to tell us t...,"Text, open-ended question"
3,4,I worry about my finances.,A slider displaying text values
4,5,I worry about my physical health.,A slider displaying text values
5,6,I feel overwhelmed by university life.,A slider displaying text values
6,7,I am satisfied with the support I have been of...,A slider displaying text values
7,8,What year of university are you in?,Multiple choices
8,9,Did either of your parents (including step par...,Multiple choices
9,10,I follow a schedule.,A slider displaying text values


In [None]:
# THIS CELL CALLS THE OPENAI API TO GENERATE QUESTION GROUPS FOR EACH QUESTION
# DONT RUN UNLESS YOU WANT TO SPEND MONEY ON OPENAI API CALLS
# PRICE PER RUN OF CELL: ~$0.02 

import pandas as pd
import openai
from dotenv import load_dotenv
import os

load_dotenv(".env")
openai.api_key = os.environ["OPENAI_API_KEY"]

# Get the first question from the dataframe
question_texts = df['text'].tolist()

# This function generates a list of question_groups for a single question
# This function uses the "text-davinci-002" engine
def generate_question_groups_for_single_question(question_text):
    # Define the prompt
    prompt = f"Given the answer to the question '{question_text}', what can we infer about the user? \
        Please provide a list of potential 'question_groups' that define what the answer to this question tells us about the user. \
        Please aim to keep each question_group to a maximum of 3 to 5 words. \
        Separate each question_group with a semicolon delimiter for parsing purposes."
    
    # Call the OpenAI API to generate a response
    response = openai.Completion.create(
        engine="text-davinci-002",
        prompt=prompt,
        max_tokens=1024,
        n=1,
        stop=None,
        temperature=0.5,
    )
    
    # Extract the generated text from the response
    generated_text = response.choices[0].text.strip()
    
    # Split the generated text into a list of question groups
    question_groups = generated_text.split(';')
    
    # Strip whitespace from each question group
    question_groups = [group.strip() for group in question_groups]
    
    return question_groups

# Loop through each question text and generate a list of question groups for each one
question_group_dict = {}
for question_text in question_texts:
    question_groups = generate_question_groups_for_single_question(question_text)
    question_group_dict[question_text] = question_groups

# Create pandas dataframe from extracted data
df_question_groups = pd.DataFrame(question_group_dict.items(), columns=['Question Text', 'Question Groups'])

# Print dataframe
display(df_question_groups)


Unnamed: 0,Question Text,Question Groups
0,What program are you currently enrolled in?,[The user is currently enrolled in a program.\...
1,What country were you born in?,[The user was born in a specific country.\n\nT...
2,Is there anything else you'd like to tell us t...,"[The user may be experiencing anxiety, The use..."
3,I worry about my finances.,[The user is worried about their finances.\n\n...
4,I worry about my physical health.,[The user is concerned about their physical he...
5,I feel overwhelmed by university life.,[The user is experiencing negative emotions re...
6,I am satisfied with the support I have been of...,[The user is satisfied with the support they h...
7,What year of university are you in?,[The user is in their first year of university...
8,Did either of your parents (including step par...,[The user's parents' level of education]
9,I follow a schedule.,[The user is:\n\n-Disciplined\n-Punctual\n-Org...


In [None]:
# Display Question Groups

for index, row in df_question_groups.iterrows():
    print(f"Question Text: {row['Question Text']}")
    print(f"Question Groups: {row['Question Groups']}")
    print("\n")

Question Text: What program are you currently enrolled in?
Question Groups: ["The user is currently enrolled in a program.\n\nThis question tells us about the user's current enrollment status."]


Question Text: What country were you born in?
Question Groups: ['The user was born in a specific country.\n\nThis tells us that the user is from a specific country.']


Question Text: Is there anything else you'd like to tell us that may help us help you?
Question Groups: ['The user may be experiencing anxiety', 'The user may be experiencing depression', 'The user may be experiencing stress']


Question Text: I worry about my finances.
Question Groups: ["The user is worried about their finances.\n\nThis question tells us that the user is worried about their finances. question_groups about the user's finances", "about the user's worries"]


Question Text: I worry about my physical health.
Question Groups: ['The user is concerned about their physical health.', 'The user is worried about their p

In [None]:
## COST TO RUN: ~$0.03

# What if we give the prompt more context? i.e. answer_type

answer_types = df['answer_type'].tolist()

# This function generates a list of question_groups for a single question
def generate_question_groups_for_single_question_answer_type(question_text, answer_type):
    real_answer_type = ''
    if answer_type == 'Text, open-ended question':
        real_answer_type = 'open-ended, question_text is representative of how the user will answer'
    elif answer_type == 'A slider displaying text values':
        real_answer_type = 'an 11-pt slider where 0-4 is negative (disagreement with statement), 5 is neutral, 6-10 is positive (agreement with statement)'
    elif answer_type == 'Multiple choices':
        real_answer_type = 'multiple choice, question_text is representative of the choices, typically provides demographic information'

    # Better prompt?
    prompt = f"Given the answer to the question '{question_text}', what can we infer about the user? \
        The answer type is '{real_answer_type}'. Answer type is important because it tells us how the user will answer the question. \
        Please provide at least three different question groups describing what this question relates to in terms of the user. \
        Please aim to keep each question_group to a maximum of 3 to 5 words. \
        Separate each question_group with a semicolon delimiter for parsing purposes."
    
    # Call the OpenAI API to generate a response
    response = openai.Completion.create(
        engine="text-davinci-002",
        prompt=prompt,
        max_tokens=1024,
        n=1,
        stop=None,
        temperature=0.5,
    )
    
    # Extract the generated text from the response
    generated_text = response.choices[0].text.strip()
    
    # Split the generated text into a list of question groups
    question_groups = generated_text.split(';')
    
    # Strip whitespace from each question group
    question_groups = [group.strip() for group in question_groups]
    
    return question_groups

# Loop through each question text and generate a list of question groups for each one
# Loop through each question text and answer type and generate a list of question groups for each one
question_group_dict = {}
for question_text, answer_type in zip(question_texts, answer_types):
    question_groups = generate_question_groups_for_single_question_answer_type(question_text, answer_type)
    question_group_dict[question_text] = question_groups

# Create pandas dataframe from extracted data
df_question_groups = pd.DataFrame(question_group_dict.items(), columns=['Question Text', 'Question Groups'])

# Print dataframe
display(df_question_groups)

Unnamed: 0,Question Text,Question Groups
0,What program are you currently enrolled in?,"[current program, enrolled program, current en..."
1,What country were you born in?,[The user was born in a country.\n\nThis quest...
2,Is there anything else you'd like to tell us t...,[The user is willing to share information that...
3,I worry about my finances.,[-How the user feels about their current finan...
4,I worry about my physical health.,"[-Health and fitness, -Diet and nutrition, -Ex..."
5,I feel overwhelmed by university life.,[- How is the user feeling?\n- How is the user...
6,I am satisfied with the support I have been of...,"[satisfaction, support, university]"
7,What year of university are you in?,"[Demographics, University, Years.]"
8,Did either of your parents (including step par...,"[Family background, Education level, Parental ..."
9,I follow a schedule.,"[The user is reliable, The user is punctual, T..."


In [None]:
# Display Question Groups

for index, row in df_question_groups.iterrows():
    print(f"Question Text: {row['Question Text']}")
    print(f"Question Groups: {row['Question Groups']}")
    print("\n")

Question Text: What program are you currently enrolled in?
Question Groups: ['current program', 'enrolled program', 'current enrollment']


Question Text: What country were you born in?
Question Groups: ["The user was born in a country.\n\nThis question relates to:\n-The user's country of origin\n-The user's place of birth\n-The user's nationality"]


Question Text: Is there anything else you'd like to tell us that may help us help you?
Question Groups: ["The user is willing to share information that may help the company help them.\n\nThis question relates to:\n\n-Customer service\n-The user's needs\n-The company's ability to help the user"]


Question Text: I worry about my finances.
Question Groups: ['-How the user feels about their current financial situation\n-How the user feels about their ability to save money\n-How the user feels about their ability to spend money wisely']


Question Text: I worry about my physical health.
Question Groups: ['-Health and fitness', '-Diet and nutr

In [None]:
## EXPENSIVE CELL
## ALL THE SAME BUT WE USE GPT-3.5-TURBO INSTEAD OF TEXT-DAVINCI-002

# This function generates a list of question_groups for a single question
def generate_question_groups_for_single_question_answer_type(question_text, answer_type):
    real_answer_type = ''
    if answer_type == 'Text, open-ended question':
        real_answer_type = 'open-ended, question_text is representative of how the user will answer'
    elif answer_type == 'A slider displaying text values':
        real_answer_type = 'an 11-pt slider where 0-4 is negative (disagreement with statement), 5 is neutral, 6-10 is positive (agreement with statement)'
    elif answer_type == 'Multiple choices':
        real_answer_type = 'multiple choice, question_text is representative of the choices, typically provides demographic information'

    # Better prompt?
    prompt = f"Given the answer to the question '{question_text}', what can we infer about the user? \
        The answer type is '{real_answer_type}'. Answer type is important because it tells us how the user will answer the question. \
        Please provide at least three different question groups describing what this question relates to in terms of the user. \
        Please aim to keep each question_group to a maximum of 3 to 5 words. \
        Separate each question_group with a semicolon delimiter for parsing purposes."
    
    # Call the OpenAI API to generate a response
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role":"system",
                "content":"you are a question_group generator, you don't return excess information, just the question groups"
            },
            {
                "role":"user",
                "content":prompt
            }
        ]
    )
    
    # Extract the generated text from the response
    generated_text = response.choices[0].message.content.strip()

    # Print token usage
    prompt_tokens = response.usage.prompt_tokens
    completion_tokens = response.usage.completion_tokens
    total_tokens = response.usage.total_tokens

    print(f"Prompt tokens: {prompt_tokens}\n")
    print(f"Completion tokens: {completion_tokens}\n")
    print(f"Total tokens: {total_tokens}\n")
    
    # Split the generated text into a list of question groups
    question_groups = generated_text.split(';')
    
    # Strip whitespace from each question group
    question_groups = [group.strip() for group in question_groups]
    
    return question_groups

# Loop through each question text and generate a list of question groups for each one
# Loop through each question text and answer type and generate a list of question groups for each one
question_group_dict = {}
for question_text, answer_type in zip(question_texts, answer_types):
    question_groups = generate_question_groups_for_single_question_answer_type(question_text, answer_type)
    question_group_dict[question_text] = question_groups

# Create pandas dataframe from extracted data
df_question_groups = pd.DataFrame(question_group_dict.items(), columns=['Question Text', 'Question Groups'])

# Print dataframe
display(df_question_groups)

Prompt tokens: 142

Completion tokens: 60

Total tokens: 202

Prompt tokens: 141

Completion tokens: 9

Total tokens: 150

Prompt tokens: 151

Completion tokens: 140

Total tokens: 291

Prompt tokens: 163

Completion tokens: 27

Total tokens: 190

Prompt tokens: 164

Completion tokens: 20

Total tokens: 184

Prompt tokens: 164

Completion tokens: 168

Total tokens: 332

Prompt tokens: 171

Completion tokens: 27

Total tokens: 198

Prompt tokens: 144

Completion tokens: 30

Total tokens: 174

Prompt tokens: 152

Completion tokens: 17

Total tokens: 169

Prompt tokens: 162

Completion tokens: 23

Total tokens: 185



Unnamed: 0,Question Text,Question Groups
0,What program are you currently enrolled in?,[User Background:\n- Field of Study in Program...
1,What country were you born in?,"[Nationality, place of birth, cultural backgro..."
2,Is there anything else you'd like to tell us t...,[1. User's additional needs: \n- What specific...
3,I worry about my finances.,"[Question Group 1: Financial Stability, Questi..."
4,I worry about my physical health.,"[Level of concern for physical health, Attitud..."
5,I feel overwhelmed by university life.,[Question Group 1: Emotional Well-being\n- How...
6,I am satisfied with the support I have been of...,[User satisfaction with university support reg...
7,What year of university are you in?,[Question Group 1: User's educational level\nQ...
8,Did either of your parents (including step par...,"[Parent's educational background, User's famil..."
9,I follow a schedule.,"[How important is sticking to a schedule?, Are..."


In [None]:
# Display Question Groups

for index, row in df_question_groups.iterrows():
    print(f"Question Text: {row['Question Text']}")
    print(f"Question Groups: {row['Question Groups']}")
    print("\n")

Question Text: What program are you currently enrolled in?
Question Groups: ['User Background:\n- Field of Study in Program\n- Level of Education\n- Institution Attending\n\nUser Interests:\n- Career Goals\n- Passions and Hobbies\n- Future Aspirations\n\nUser Experience:\n- Duration in Program\n- Satisfaction with Program\n- Challenges Faced in Program']


Question Text: What country were you born in?
Question Groups: ['Nationality', 'place of birth', 'cultural background']


Question Text: Is there anything else you'd like to tell us that may help us help you?
Question Groups: ["1. User's additional needs: \n- What specific assistance would you like to request?\n- Are there any specific resources or tools you require?\n- Could you provide more details on how we can help you?\n\n2. User's feedback or suggestions: \n- Is there anything you would like to suggest to enhance our service?\n- Do you have any feedback on our current offerings?\n- Are there any improvements you would like to s