# Tutorial 3 - conducting a multifaceted benchmark
In this example, we illustrate how Synthesizers can be used to compare different models. Essentially, the previous example is repeated across a broard selection of datasets and the model that consistently outperforms its peers is chosen for some downstream task. For example, a model is to be selected to creat a synthetic version of a sensitive dataset, but before the infrastructure is deployerd, the more effective models has to be identified. In order to keep the runtime of the notebook down, we use far fewer datasets than what would be advisable for a good judgement, but the example can easily be extended. 

In [7]:
import numpy as np
import pandas as pd
from synthesizers import Load, pipeline

case1 = Load("mstz/heart_failure")
case2 = Load("mstz/breast")
case3 = Load("mstz/mammography")
case4 = Load("mstz/haberman")
case5 = Load("mstz/pima")
case6 = Load("mstz/titanic")

cases_list      = [case1, case2, case3, case4, case5, case6]
data_names_list = ['heart_failure', 'breast', 'mammography', 'haberman', 'diabetes', 'titanic']
target_label_list = ['is_dead', 'is_cancer', 'is_severe', 'has_survived_5_years', 'has_diabetes', 'has_survived']
gen_models = ["adsgan", "tvae", "nflow"]

result_dataframe = pd.DataFrame({'models': gen_models})

for case, data_name, target_lab in zip(cases_list, data_names_list, target_label_list):
    # Here the synthesize pipeline is applied 
    state = pipeline("synthesize", split_size=0.8, 
                    train_plugin = gen_models, 
                    eval_target_col = target_lab,
                    eval_config="full_eval",
                    save_name=f"{data_name}.csv",
                    save_key="synth")(case)

    gb_res, pr_res, ut_res = [], [], []
    for i in range(len(state)):
        # Here the intresting results are collected
        pr_res.append(np.mean(state[i].eval[state[i].eval['dim'] == 'p']['n_val']))
        ut_res.append(np.mean(state[i].eval[state[i].eval['dim'] == 'u']['n_val']))

    result_dataframe[f"{data_name}_utility"] = ut_res
    result_dataframe[f"{data_name}_privacy"] = pr_res

print(result_dataframe)

[2024-04-17T14:18:46.168543+0200][8232][CRITICAL] module disabled: c:\Users\lautrup\Documents\GitHub\synthesizers\test_env\lib\site-packages\synthcity\plugins\generic\plugin_goggle.py
  6%|▌         | 599/10000 [01:22<21:27,  7.30it/s]  
[2024-04-17T14:20:08.754851+0200][8232][CRITICAL] module disabled: c:\Users\lautrup\Documents\GitHub\synthesizers\test_env\lib\site-packages\synthcity\plugins\generic\plugin_goggle.py
 50%|█████     | 500/1000 [01:36<01:36,  5.19it/s]
[2024-04-17T14:21:46.006123+0200][8232][CRITICAL] module disabled: c:\Users\lautrup\Documents\GitHub\synthesizers\test_env\lib\site-packages\synthcity\plugins\generic\plugin_goggle.py
 50%|████▉     | 499/1000 [00:22<00:22, 21.81it/s]
[2024-04-17T14:22:25.542610+0200][8232][CRITICAL] module disabled: c:\Users\lautrup\Documents\GitHub\synthesizers\test_env\lib\site-packages\synthcity\plugins\generic\plugin_goggle.py
  5%|▌         | 549/10000 [03:07<53:40,  2.93it/s]  
[2024-04-17T14:25:32.731222+0200][8232][CRITICAL] modu

Error: Principal component analysis did not run, too few nummerical attributes!
Error: Principal component analysis did not run, too few nummerical attributes!
Error: Principal component analysis did not run, too few nummerical attributes!


[2024-04-17T14:41:34.935893+0200][8232][CRITICAL] module disabled: c:\Users\lautrup\Documents\GitHub\synthesizers\test_env\lib\site-packages\synthcity\plugins\generic\plugin_goggle.py
 12%|█▏        | 1199/10000 [02:12<16:12,  9.05it/s]
[2024-04-17T14:43:47.684744+0200][8232][CRITICAL] module disabled: c:\Users\lautrup\Documents\GitHub\synthesizers\test_env\lib\site-packages\synthcity\plugins\generic\plugin_goggle.py
 65%|██████▌   | 650/1000 [01:54<01:01,  5.67it/s]
[2024-04-17T14:45:42.784727+0200][8232][CRITICAL] module disabled: c:\Users\lautrup\Documents\GitHub\synthesizers\test_env\lib\site-packages\synthcity\plugins\generic\plugin_goggle.py
 55%|█████▍    | 549/1000 [00:17<00:14, 31.12it/s]
[2024-04-17T14:46:08.067838+0200][8232][CRITICAL] module disabled: c:\Users\lautrup\Documents\GitHub\synthesizers\test_env\lib\site-packages\synthcity\plugins\generic\plugin_goggle.py
  4%|▍         | 449/10000 [02:12<46:48,  3.40it/s]  
[2024-04-17T14:48:20.881035+0200][8232][CRITICAL] modul

   models  heart_failure_utility  heart_failure_privacy  breast_utility  \
0  adsgan               0.823811               0.739449        0.806009   
1    tvae               0.759009               0.763382        0.839364   
2   nflow               0.790504               0.759102        0.808695   

   breast_privacy  mammography_utility  mammography_privacy  haberman_utility  \
0        0.636086             0.767258             0.552102          0.884506   
1        0.657855             0.797184             0.551897          0.799857   
2        0.719300             0.794045             0.544068          0.915520   

   haberman_privacy  diabetes_utility  diabetes_privacy  titanic_utility  \
0          0.606921          0.757864          0.760265         0.815999   
1          0.728317          0.764606          0.860033         0.764629   
2          0.634556          0.829910          0.768021         0.823150   

   titanic_privacy  
0         0.726570  
1         0.744925  
2     

In [8]:
from sklearn.preprocessing import MinMaxScaler

result_dataframe_ranked = result_dataframe.copy()
value_cols = result_dataframe_ranked.columns.tolist()[1:]

MMS = MinMaxScaler()
result_dataframe_ranked[value_cols] = MMS.fit_transform(result_dataframe_ranked[value_cols])

result_dataframe['utility'] = result_dataframe_ranked[[col for col in result_dataframe_ranked.columns if 'utility' in col]].sum(axis=1)
result_dataframe['privacy'] = result_dataframe_ranked[[col for col in result_dataframe_ranked.columns if 'privacy' in col]].sum(axis=1)

result_dataframe

Unnamed: 0,models,heart_failure_utility,heart_failure_privacy,breast_utility,breast_privacy,mammography_utility,mammography_privacy,haberman_utility,haberman_privacy,diabetes_utility,diabetes_privacy,titanic_utility,titanic_privacy,utility,privacy
0,adsgan,0.823811,0.739449,0.806009,0.636086,0.767258,0.552102,0.884506,0.606921,0.757864,0.760265,0.815999,0.72657,2.609655,1.0
1,tvae,0.759009,0.763382,0.839364,0.657855,0.797184,0.551897,0.799857,0.728317,0.764606,0.860033,0.764629,0.744925,2.093576,4.711179
2,nflow,0.790504,0.759102,0.808695,0.7193,0.794045,0.544068,0.91552,0.634556,0.82991,0.768021,0.82315,0.765205,4.461665,3.126525


Above we do a simple ranking by minmax normalising the average normalised metric results from each dataset. Doing a row sum (keeping utility and privacy separate) gives an indication of which models perform better or worse in comparison to the others. In this case, the "nflow" model performs best on utility, the "tvae" model does best on privacy. Overall, the "nflow" model does better on privacy than the "tvae" model does on utility, so this would perhaps be a reasonable choice for an overall winner. 

All models were run using their default hyperparameters - it is likely that the "adsgan" model could be tuned to do better on privacy.

Finally, we once again emphasize that more datasets should realistically be added for this kind of benchmark to hold any deciding power. 