# Get Synthetic Users' Preference

In [4]:
from openai import OpenAI
import os
import pandas as pd
from dotenv import load_dotenv
load_dotenv()

True

In [5]:
client = OpenAI(
    # This is the default and can be omitted
    api_key=os.getenv('API_KEY')
)

In [None]:
N = 75

prompt = f'''
Imagine you're a student exploring master's programs in Germany. Write a variety of natural-sounding statements that reflect your interests, aspirations, and potential constraints. Each statement must include either a specific program name (e.g., 'Computer Science'), a general field of interest (e.g., 'engineering'), or a degree type (e.g., 'Master of Science'), and ideally both. However, allow for ambiguity in any of these details to reflect openness to various options.

User Preferences:
- Program Name: Include both specific programs (e.g., Computer Science) and general fields of interest (e.g., Engineering, Business). Allow for ambiguity by sometimes mentioning only the general area without specifying the exact program.
- Location: Highlight a mix of prominent cities like Berlin, Hamburg, and Munich, as well as mid-sized university hubs like Freiburg or Göttingen. Encourage expressions of openness to multiple locations.
- Degree Type: Include common options such as Master of Science, Master of Arts, and Master of Education, along with unique formats like modular programs, international courses, or accelerated degrees.
- Language: Include programs taught in English and German, with occasional mentions of dual-language offerings or those tailored for non-native speakers.
- Subject: Range from foundational areas (e.g., Engineering and Management) to interdisciplinary or emerging fields (e.g., Digital Transformation, Business Mathematics).
- Study Mode: Include a balance of full-time, part-time, distance learning, and international study formats.

Guidelines:
Balance Preferences:
- Ensure 75-80% of statements reflect popular preferences across program names, degrees, languages, and study modes.
- Include 20-25% with emerging or niche preferences for variety without overemphasis.
Introduce Ambiguity:
- Encourage statements to express flexibility or openness in any aspect, including program name.
- Allow some details to be unspecified or general to leave room for interpretation.
Clarity and Simplicity:
- Keep statements clear and concise, avoiding overly long sentences or excessive detail.
Incorporate Aspirations:
- Occasionally include career or academic goals, such as leadership opportunities, research prospects, or industry connections.
Address Practical Constraints:
- Reflect real-life considerations like affordability, work-life balance, or accessibility for international students.
Diversity in Study Modes:
- Ensure proportional inclusion of full-time, part-time, modular, and distance learning programs.
Unique Degree Formats:
- Highlight distinctive programs, such as accelerated degrees or industry-sponsored courses, where relevant.
Global Relevance:
- Mention international focus or globally recognized accreditations when appropriate.
Personalization:
- Make statements authentic and realistic by incorporating personal factors like financial considerations or lifestyle preferences.
Task:
Generate {N} statements that align with these guidelines. Ensure diversity, clarity, and relatability in the statements, focusing on popular and emerging options. Introduce ambiguity by expressing openness or flexibility in any preferences, including program names, by sometimes using general fields of interest or unspecified areas. Avoid overly niche or obscure topics while maintaining a realistic and personalized tone.
Output should be without opening and closing statements and without numbering.
'''


In [7]:
response = client.chat.completions.create(
    messages=[
        {
            "role": "system",
            "content": "You are an AI assistant that generates synthetic user preferences for master's programs."
        },
        {
            "role": "user",
            "content": prompt
        }
    ],
    model="gpt-4o-mini",
)

In [None]:
output = response.choices[0].message.content
synthetic_preferences = output.split('\n')
output_filepath = '../user_preferences.csv'

# Clean up and print each preference
synthetic_preferences = [each for each in synthetic_preferences if each]
# print(synthetic_preferences)
df = pd.DataFrame(synthetic_preferences, columns=['preference'])
if not os.path.exists(output_filepath):
    # Write with header if file doesn't exist
    df.to_csv(output_filepath, mode='w', header=True, index=False)
else:
    # Append without header if file exists
    df.to_csv(output_filepath, mode='a', header=False, index=False)