# Summarize the different split approaches for all datasets

- [x] Read in all datasets
- [x] Bring into one table
- [x] Format the output

In [1]:
# imports
import pandas as pd
import helpers_summarize

## What is the average rank of the approach with **time-sorted** users with its std?

In [11]:
# load data
df = helpers_summarize.load_approach_tables(path='../../results/tables/approaches/sorted_users')

# summarize and print
results = helpers_summarize.prepare_results(df)
results.transpose().to_csv('../../results/tables/approaches/sorted_users/approaches_ranking.csv')
results.transpose()

Unnamed: 0,time_cut,bl_user_based_last,bl_user_based_all,user_cut,average_user,user_wise,bl_assessment_based_last,bl_assessment_based_all
average_rank,2.29,3.29,3.57,3.57,3.86,4.33,6.86,7.71
average_rank_std,1.5,1.7,2.37,1.72,0.69,2.07,0.69,0.49


In [8]:
# get all indices with baseline models
idxs = [idx for idx in df.index if 'bl' in idx]
# get all dataset abbreviations
datasets = helpers_summarize.get_dataset_names()
# get result table with average f1 scores and standard deviation
res2 = pd.DataFrame()
for dataset in datasets:
    res2[dataset] = helpers_summarize.format_f1_and_std(df, dataset)
# save to csv
res2.loc[idxs, :].to_csv(f'../../results/tables/baseline_performances.csv')
# display
res2.loc[idxs, :]

Unnamed: 0,cc,ch_stress,rki_children,rki_heart,rki_parent,tyt,uniti
bl_user_based_last,0.604 (0.008),0.567 (0.008),0.626 (0.028),0.580 (0.014),0.671 (0.008),0.250 (0.003),0.515 (0.005)
bl_user_based_all,0.555 (0.008),0.558 (0.016),0.687 (0.037),0.660 (0.020),0.698 (0.006),0.190 (0.003),0.504 (0.005)
bl_assessment_based_last,0.445 (0.008),0.273 (0.004),0.288 (0.040),0.275 (0.012),0.313 (0.013),0.205 (0.003),0.254 (0.007)
bl_assessment_based_all,0.302 (0.006),0.138 (0.018),0.233 (0.022),0.176 (0.018),0.317 (0.011),0.187 (0.003),0.173 (0.011)


The approaches where do not take users into account lead to an overestimation of the performance of the classifier in the testset. That is, if we allow users to be present in both the test and the train set

## What is the average rank of the approach with **randomly-drawn** users with its std?

Method: The whole ML pipeline for 9 datasets with 8 approaches each was repeated 5 times. Each time, a different seed was chosen to randomly draw train and test users.
The overall question is: Do the approach rankings change if users are change from test to train sets and vice versa?

In [10]:
seeds = [1962, 1964, 1991, 1994, 2023]
results_random = pd.DataFrame()
for i, seed in enumerate(seeds):
    # load data
    df = helpers_summarize.load_approach_tables(path=f'../../results/tables/approaches/random_users/seed_{seed}')

    # summarize and print
    res = helpers_summarize.prepare_results(df)
    
    results_random[f'seed_{seed}'] = res['average_rank'] 

results_random['mean_ranking'] = results_random.mean(axis=1)
results_random['std_ranking'] = results_random.std(axis=1)

results_random

Unnamed: 0,seed_1962,seed_1964,seed_1991,seed_1994,seed_2023,mean_ranking,std_ranking
time_cut,1.57,1.57,1.43,1.86,1.43,1.572,0.157022
user_cut,3.14,3.14,3.14,2.86,3.0,3.056,0.112
bl_user_based_last,3.29,3.43,3.43,3.29,3.86,3.46,0.209571
average_user,3.43,3.71,3.86,3.71,2.86,3.514,0.355336
bl_user_based_all,4.29,4.29,4.57,4.29,4.71,4.43,0.177088
user_wise,5.67,5.0,4.5,5.17,5.17,5.102,0.375414
bl_assessment_based_last,6.86,6.57,6.86,6.86,6.86,6.802,0.116
bl_assessment_based_all,7.43,7.86,7.71,7.57,7.71,7.656,0.145547
