Interface-Idea between ML-Models (Backend) and a GUI (Frontend)  with multiple models and fixed columns

We apply a simple depression model, which uses the three columns (features): gender, phq_score, gad_score and predict the depressiveness.
The models are trained and saved in the file: multiple_simple_model_depression.ipynb as models.pkl

The model and the aim-dataset, e.g., aim_test.csv, is loaded and a prediction is done.

The models list now contains the fitted models for each target
models[0] -> model for 'anxiousness'
models[1] -> model for 'depressiveness'
models[2] -> model for 'treatment_status'
models[3] -> model for 'suicidal'


The results are given in the vector y_pred_aim and a output is presented with 
'anxiousness':  0 -> non-anxious and 1-> anxious
'depressiveness': 0 -> non-depressive and 1-> depressive
'treatment_status': 0 -> not in treatment and 1-> in treatment
'suicidal': 0 -> non-suicidal and 1->  suicidal

Columns in the aim_test_cleaned.csv


| **Column** | **Description** |
| ------------ | :-----------------: |
| school_year | years in school:   1, ..., 4 |
| age | age of the student:  18, ..., 31|
| gender | female :0, male: 1 |
| bmi | body mass index: 0, ..., 54 |
| who_bmi | bmi category: 'Class I Obesity' 'Normal' 'Overweight' 'Not Availble' 'Class III Obesity' 'Underweight' 'Class II Obesity' |
| phq_score | measure the severity of symptoms related to depression, anxiety, and other related disorders in patients: which are given as integer-values:  0 ... 24 (0 low, 24 is very high) |
| depression_severity | degree or intensity of symptoms experienced by an individual with depression: 'Mild' 'Moderately severe' 'None-minimal' 'Moderate' 'Severe' 'none' |
| depressiveness | non-depressive: 0, depressive :1 |
| suicidal | the candidate have suicide thought: 0 : no suicide thought, 1: suicide thought |
| depression_diagnosis | the candidate already have depression diagnosis: 0 : no depression diagnosis, 1: depression diagnosis |
| depression_treatment | the candidate already have depression treatment:  0 no depression treatment, 1: depression treatment |
| gad_score | measure that assesses the severity of Generalized Anxiety Disorder:  which are given as: 0 ... 21  (0 is low, 21 is very high) |
| anxiety_severity |  intensity of symptoms experienced by an individual with anxiety: 'Moderate' 'Mild' 'Severe' 'None-minimal' '0' |
| anxiousness | non-angiousness : 0, angiousness: 1|
| anxiety_diagnosis | the candidate already have anxiety diagnosis: no anxiety-diagnosis: 0, anxiety diagnosis: 1 |
| anxiety_treatment | the candidate already have anxiety treatment: 0 no anxiety treatment, 1: anxiety treatment |
| epworth_score |  score to assess daytime sleepiness ytime sleepiness: which is given as: 0 ... 32 (0 is low and 32 is very high) |
| sleepiness | 0 : no sleepiness, 1: sleppiness|


For example, we have following values in the aim_test_cleaned.csv  file:


| school_year | number of school year: 1,age,gender,bmi,who_bmi,phq_score,depression_severity,depressiveness,suicidal,depression_diagnosis,depression_treatment,gad_score,anxiety_severity,anxiousness,anxiety_diagnosis,anxiety_treatment,epworth_score,sleepiness
1,19,1,33.33333333,Class I Obesity,9,Mild,0,0,0,0,11,Moderate,1,0,0,7,0
1,18,1,19.84126984,Normal,8,Mild,0,0,0,0,5,Mild,0,0,0,14,1
2,19,0,25.10239133,Overweight,8,Mild,0,0,0,0,6,Mild,0,0,0,6,0
1,18,0,23.73866213,Normal,19,Moderately severe,1,1,0,0,15,Severe,1,0,0,11,1


In [1]:
# Data Cleaning


# Data Cleaning and Transformation Function
def clean_data(X, y=None):
    """Cleans and transforms the input feature DataFrame and optionally the target.

    This function performs the following operations on the feature DataFrame (X):
        - Drops rows with any missing values (NaNs) in either X or y to ensure they remain aligned.
        - Converts binary categorical features into numerical values (0 and 1).
        - Optionally, cleans and transforms the target DataFrame (y) if provided.
        - Converts datetime features to the appropriate datetime format.
        - Returns the cleaned features and optionally the cleaned target.

    Args: 
        X (pd.DataFrame): The feature data.
        y (pd.DataFrame, optional): The target data. Default is None.
        
    Returns:
        pd.DataFrame: A cleaned DataFrame with transformed features.
        pd.DataFrame: Optionally, the cleaned target DataFrame if y is provided.
    """
    # Concatenate X and y to handle missing values across both simultaneously
    if y is not None:
        # Combine X and y into a single DataFrame for joint NaN removal
        combined = pd.concat([X, y], axis=1)
        
        # Drop rows with any missing values in either X or y
        combined_clean = combined.dropna()
        
        # Separate X and y again
        X_clean = combined_clean.iloc[:, :X.shape[1]]
        y_clean = combined_clean.iloc[:, X.shape[1]:]
    else:
        # If y is not provided, only clean X
        X_clean = X.dropna()

    # List of categorical columns with binary values that need to be converted to integers
    cat_cols_trans = ['gender', 'depression_diagnosis', 'depression_treatment',  
                      'anxiety_diagnosis', 'sleepiness']

    # Map 'gender' column to integers: 'male' -> 1, 'female' -> 0
    X_clean['gender'] = X_clean['gender'].map({'male': 1, 'female': 0})

    # Convert the specified binary categorical columns in X to integers (0 and 1)
    X_clean[cat_cols_trans] = X_clean[cat_cols_trans].astype(int)

    if y is not None:
        # Convert all target variables to integers
        y_clean = y_clean.astype(int)
        # Return both cleaned X and cleaned y
        return X_clean, y_clean
    else:
        # If y is not provided, return only the cleaned features
        return X_clean


In [2]:
import pandas as pd
import pickle

# Load the model of the simple algorithm
with open('model_depression.pkl', 'rb') as file:
    loaded_model = pickle.load(file)  # Load the models (as they were saved first)
    target_col = pickle.load(file)    # Load target_col 
    feature_col = pickle.load(file)   # Load the feature col


print("The Target is given as", target_col)

print("The features are given as", feature_col)

# Reading the cleaned CSV file into a DataFrame
X_aim = pd.read_csv('aim_test.csv')  # Ensure this file is cleaned similarly to the training data

X_aim = clean_data(X_aim)

display(X_aim)

# Predict using the loaded model
y_pred_aim = loaded_model.predict(X_aim)

# Print the predictions
print(f"Predictions for {target_col}:")

# Format the predictions for better readability
for j, prediction in enumerate(y_pred_aim, start=1):
    # Assuming binary classification (e.g., 0 = "Not Depressed", 1 = "Depressed")
    status = "Depressed" if prediction == 1 else "Not Depressed"
    print(f'Person {j} is {status}')


The Target is given as depressiveness
The features are given as ['anxiety_diagnosis', 'sleepiness', 'gender', 'depression_diagnosis', 'epworth_score', 'anxiety_severity', 'bmi', 'gad_score', 'phq_score', 'depression_severity']


Unnamed: 0,school_year,age,gender,bmi,who_bmi,phq_score,depression_severity,depressiveness,suicidal,depression_diagnosis,depression_treatment,gad_score,anxiety_severity,anxiousness,anxiety_diagnosis,anxiety_treatment,epworth_score,sleepiness
0,1,19,1,33.333333,Class I Obesity,9,Mild,False,False,0,0,11,Moderate,True,0,False,7,0
1,1,18,1,19.84127,Normal,8,Mild,False,False,0,0,5,Mild,False,0,False,14,1
2,2,19,0,25.102391,Overweight,8,Mild,False,False,0,0,6,Mild,False,0,False,6,0
3,1,18,0,23.738662,Normal,19,Moderately severe,True,True,0,0,15,Severe,True,0,False,11,1


Predictions for depressiveness:
Person 1 is Not Depressed
Person 2 is Not Depressed
Person 3 is Not Depressed
Person 4 is Depressed


In [3]:
# Prediction of one Data-Set

import pandas as pd

# Example input data
aim = [19, 33.33, 0, 0, 1, 1, 0, 7, 'Moderate', 11, 'Mild', 9]

# Define column names
columns = ['age', 'bmi', 'sleepiness', 'anxiety_diagnosis', 'gender', 'school_year', 
           'depression_diagnosis', 'epworth_score', 'anxiety_severity', 
           'gad_score', 'depression_severity', 'phq_score']

# Convert aim into a DataFrame
X_aim = pd.DataFrame([aim], columns=columns)



display(X_aim)
y_pred_aim = loaded_model.predict(X_aim)
print(X_aim)
print(y_pred_aim)

# 0 is not depressive
# 1 is depressiv

Unnamed: 0,age,bmi,sleepiness,anxiety_diagnosis,gender,school_year,depression_diagnosis,epworth_score,anxiety_severity,gad_score,depression_severity,phq_score
0,19,33.33,0,0,1,1,0,7,Moderate,11,Mild,9


   age    bmi  sleepiness  anxiety_diagnosis  gender  school_year  \
0   19  33.33           0                  0       1            1   

   depression_diagnosis  epworth_score anxiety_severity  gad_score  \
0                     0              7         Moderate         11   

  depression_severity  phq_score  
0                Mild          9  
[0]
