We have run experiments to predict mental health condition severity with XGBoost, Multi-Layer Perceptron, and CatBoost classifers and ackowledge that age, genre preferences, and genre listening habits are not sufficent to diagnose mental health disorders. Moreover, the models did not perform well enough to be used in a critical application such as mental disorder treatment.

Although we did use mental disorders as the target variable, our research was meant to investigate how genre preferences can impact mental health condition severity. To use our research for a real-world application that is not as critical as medical diagnosis, we have built a tool to suggest 3 genres for a user to list to more frequently, and 3 to avoid in order to minimise the model's probability of predicting high condition severity.

The tool makes use of functions defined in the custom module 'FindingBestProfile2'. It takes in the user's listening profile and other features (excluding self reported mental health condition severity), as well as the condition they are wishing to improve. It utilises the optuna library, performing a study that uses Bayesian optimisation to vary genre listening frequency values across iterations, while fixing other user data, such as age and listening habits.

Here, we are demonstrating the optimiser tool being used on a random user from the original dataset to minimise their Depression severity. As the best performing model was the Depression XGBoost model, we will be using it in this demonstration.

In [16]:
# Imports
import FindingBestProfile2 as fbp
import pandas as pd
import joblib
import os
import random


In [17]:
# Load the dataset
current_dir = os.getcwd()
data_path = os.path.join(os.path.dirname(current_dir), 'Data', 'mxmh_survey_results_processed.csv')
df = pd.read_csv(data_path)

In [18]:
# Resolve model file paths relative to the project root (parent of current working dir)
project_root = os.path.dirname(current_dir)
model_paths = {
    'anxiety': os.path.join(project_root, 'models', 'xgboost_anxiety_model.pkl'),
    'depression': os.path.join(project_root, 'models', 'xgboost_depression_model.pkl'),
    'insomnia': os.path.join(project_root, 'models', 'xgboost_insomnia_model.pkl'),
    'ocd': os.path.join(project_root, 'models', 'xgboost_ocd_model.pkl'),
}

# Display the keys of the model paths dictionary
print("Paths for:", model_paths.keys())

Paths for: dict_keys(['anxiety', 'depression', 'insomnia', 'ocd'])


In [19]:
# Initialise an object with a random user profile from the dataset for testing
random_user_profile = df.iloc[random.randint(0, len(df) - 1)]

# Build a feature vector compatible with the trained models.
# The training code used df_fe1 with all columns except those ending with '_class'
exclude_cols = [c for c in df.columns if c.endswith('_class')]
feature_cols = df.drop(columns=exclude_cols).columns

# Extract those features from the selected profile and convert to a 1-row DataFrame
features = random_user_profile[feature_cols].to_frame().T

# Display the features to verify
print("Selected profile index:", random_user_profile.name)
print("Feature columns used:", list(feature_cols))
features

Selected profile index: 181
Feature columns used: ['Hours per day', 'While working', 'Instrumentalist', 'Composer', 'Foreign languages', 'Frequency [Classical]', 'Frequency [Country]', 'Frequency [EDM]', 'Frequency [Folk]', 'Frequency [Hip hop]', 'Frequency [Jazz]', 'Frequency [K pop]', 'Frequency [Lofi]', 'Frequency [Metal]', 'Frequency [Pop]', 'Frequency [R&B]', 'Frequency [Rap]', 'Frequency [Rock]', 'Frequency [Video game music]', 'AgeGroup_<18', 'AgeGroup_18-24', 'AgeGroup_25-34', 'AgeGroup_35-44', 'AgeGroup_45-54', 'AgeGroup_55-64', 'AgeGroup_65+']


Unnamed: 0,Hours per day,While working,Instrumentalist,Composer,Foreign languages,Frequency [Classical],Frequency [Country],Frequency [EDM],Frequency [Folk],Frequency [Hip hop],...,Frequency [Rap],Frequency [Rock],Frequency [Video game music],AgeGroup_<18,AgeGroup_18-24,AgeGroup_25-34,AgeGroup_35-44,AgeGroup_45-54,AgeGroup_55-64,AgeGroup_65+
181,2.5,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,2.0,...,2.0,2.0,2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0


In [20]:
# Load the XGBoost models and store in a dictionary
loaded_model_dict = {}

# Load each model from its path
for cond, path in model_paths.items():
    full_path = os.path.abspath(path)
    loaded_model_dict[cond] = joblib.load(full_path)
    print(f"Loaded model for {cond}: {full_path}")

Loaded model for anxiety: c:\Users\hanna\Documents\DS3000\DS3000-Group-4\models\xgboost_anxiety_model.pkl
Loaded model for depression: c:\Users\hanna\Documents\DS3000\DS3000-Group-4\models\xgboost_depression_model.pkl
Loaded model for insomnia: c:\Users\hanna\Documents\DS3000\DS3000-Group-4\models\xgboost_insomnia_model.pkl
Loaded model for ocd: c:\Users\hanna\Documents\DS3000\DS3000-Group-4\models\xgboost_ocd_model.pkl


In [24]:
# Optimise music genre profile for depression condition
best_profile, try_genres, avoid_genres = fbp.optimise_genre_profile(
    user_profile=features.iloc[0],
    condition='depression',
    model_dict=loaded_model_dict,
    n_trials=100
)

[I 2025-12-01 18:56:27,993] A new study created in memory with name: no-name-02fe69f3-c791-4d82-834e-e69019e4cbbc
[I 2025-12-01 18:56:28,003] Trial 0 finished with value: 0.4165581464767456 and parameters: {'max_genre': 'Frequency [Jazz]', 'Frequency [Classical]': 3.9604944139722003, 'Frequency [Country]': 3.6157416696288416, 'Frequency [EDM]': 1.0636277055314431, 'Frequency [Folk]': 1.609381127994108, 'Frequency [Hip hop]': 2.0561530383371878, 'Frequency [K pop]': 3.8444525118630213, 'Frequency [Lofi]': 2.5383622084580413, 'Frequency [Metal]': 2.826148313736721, 'Frequency [Pop]': 1.2965502105815803, 'Frequency [R&B]': 1.3069454598033177, 'Frequency [Rap]': 2.9916070595265625, 'Frequency [Rock]': 2.0291621903077295, 'Frequency [Video game music]': 2.0721739785714433}. Best is trial 0 with value: 0.4165581464767456.
[I 2025-12-01 18:56:28,009] Trial 1 finished with value: 0.24556657671928406 and parameters: {'max_genre': 'Frequency [Classical]', 'Frequency [Country]': 1.890563681031892

In [25]:
# To display the results
print("Try genres:", try_genres)
print("Avoid genres:", avoid_genres)

Try genres: ['Frequency [Hip hop]', 'Frequency [Classical]', 'Frequency [K pop]']
Avoid genres: ['Frequency [Rock]', 'Frequency [Folk]', 'Frequency [Lofi]']
