# Analysis of the autism data

In [30]:
run init.ipynb

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Done.


# Experiments

* This section aims at answering a few questions related to the different algorithms used, with the aim of optimizing our classification framework.

**Parameters or potential settings**

* The approach used, which could be Logistic Regression, Decision Tree, Explainable Boosting Machine, Neural Additive models, or XGBoost.
* The use or not of indicator variables as inputs. 
* The scaling of the data. 
* The imputation approach when the learning algorithms does not handle them by design. It could be constant imputation (called encoding here), or conditional imputation (mean, knn, or mice).
* The sampling method in the case of imbalance learning (either no down-sampling, vanilla (random sampling of the minority class until classes are balanced), or SMOTE (a more elaborated sampling).
* The number of features, between 2 and 6. 
* The number of folds when cross-validating the results. 

**Notes on the classification pipeline:**

* Two datasets are age-matched but one (scenario `asd_td_age_matched_n_balanced`) lead to relatively balanced classes, whereas the scenario `asd_td_age_matched_n_unbalanced` has slightly younger kids and so leverage the amount of young neuro-typical kids, and is more unbalanced. 
* No searches on hyper-parameters are performed on each of the approach. Cross-validation is performed with the stratified inner-fold being left-out, the model is fitted on the training set, and the prediction of the test set is done with predictions stored for later evaluation of performances. 
* since a lot of settings parameters are tested, we test all of the hypothesis with a default setting that is: Encoding of the missing variables, scaling of the data, no use of indicator variables, no down-sampling (???), and a 16-fold cross-validation.
* Features selection were made for the two scenario suing the features with the higher importance based on xgboost importance map.
* Classification here is between autistic and neurotypical participants.



**Among the questions at stakes:**

* Experiment 1: Shall we use indicator variables? For each scenario (columns of axes), ad each dimension of the problem (rows of axes) , x is the `use_of_indicator_variables`, y is an indicator of performances (typically the F1 score), and the hue variable are the approaches. The plots are performed for no imputations, 
* Experiement 2: Shall we scale data or not ? 
* Experiment X: for the algorithms handling missing variables (xgboost, nam with encoding) , shall we let them be missing, or imputed ? 

In [31]:
df =  pd.read_csv(AUTISM_DATA_PATH)
scenario='papers'

with open(os.path.join(DATA_DIR, 'selected_features_{}.pkl'.format(scenario)), 'rb') as f:
    features = pickle.load(f)

In [32]:
list(features['xgboost'].keys())

[10, 22, 32]

## Experiment 1.  `use_missing_indicator_variables`

In [34]:
n=0
for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):
        
        for use_missing_indicator_variables in [True, False]:
            
            for approach in ['nam', "LogisticRegression", "DecisionTree",  "ebm", 'xgboost']:
                
                df =  pd.read_csv(AUTISM_DATA_PATH)


                data = Dataset(df=df, 
                               missing_data_handling='encoding',
                               imputation_method='without',
                               sampling_method='without',
                               scenario = scenario, 
                               features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                               scale_data=True, 
                               use_missing_indicator_variables=use_missing_indicator_variables,
                               verbosity=0, 
                               proportion_train=1 if approach != 'nam' else .8)

                exp = Experiments(data.dataset_name,
                                  dataset=data, 
                                  approach=approach, 
                                  previous_experiment=None,        
                                  debug=False, 
                                  verbosity=0, 
                                  experiment_folder_name='paper_experiment_1_fs_validation',
                                  save_experiment=True)
                
                
                print("approach: {} - use_missing_indicator_variables: {} - n_features: {}".format(approach, use_missing_indicator_variables, n_features))
                exp.fit()
                
                exp.dataset.proportion_train = 0.
                exp.dataset._init_scenario(scenario='papers_remote')
                exp.predict()
                
                display(exp.performances_df)
                

approach: nam - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.667,0.569,0.298,0.061,0.18,0.677,0.104,0.58,0.339,0.977,0.281,0.506,0.941,0.023,0.494,0.059,0.667


approach: LogisticRegression - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.639,0.554,0.292,0.1,0.769,0.65,0.13,0.57,0.252,0.884,0.333,0.5,0.792,0.116,0.5,0.208


approach: DecisionTree - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.506,0.085,0.118,0.916,0.601,0.17,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: ebm - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.696,0.666,0.385,0.192,0.959,0.661,0.245,0.58,0.283,0.907,0.333,0.506,0.826,0.093,0.494,0.174


approach: xgboost - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.713,0.657,0.417,0.182,0.953,0.695,0.245,0.63,0.377,0.93,0.404,0.541,0.885,0.07,0.459,0.115


approach: nam - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.675,0.584,0.308,0.064,0.181,0.69,0.111,0.63,0.362,0.907,0.421,0.542,0.857,0.093,0.458,0.143,0.675


approach: LogisticRegression - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.639,0.547,0.292,0.078,0.678,0.65,0.089,0.57,0.252,0.884,0.333,0.5,0.792,0.116,0.5,0.208


approach: DecisionTree - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.5,0.085,0.097,0.904,0.601,0.13,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: ebm - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.713,0.672,0.454,0.19,0.959,0.667,0.245,0.59,0.299,0.907,0.351,0.513,0.833,0.093,0.487,0.167


approach: xgboost - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.716,0.659,0.413,0.183,0.954,0.69,0.245,0.63,0.362,0.907,0.421,0.542,0.857,0.093,0.458,0.143


approach: nam - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.656,0.588,0.244,0.05,0.563,0.678,0.14,0.6,0.331,0.93,0.351,0.519,0.87,0.07,0.481,0.13,0.656


approach: LogisticRegression - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.618,0.511,0.249,0.032,-0.212,0.647,0.078,0.63,0.289,0.744,0.544,0.552,0.738,0.256,0.448,0.262


approach: DecisionTree - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.506,0.085,0.118,0.916,0.601,0.17,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: ebm - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.687,0.631,0.352,0.104,0.869,0.672,0.213,0.59,0.315,0.93,0.333,0.513,0.864,0.07,0.487,0.136


approach: xgboost - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.694,0.638,0.387,0.138,0.924,0.661,0.17,0.57,0.283,0.93,0.298,0.5,0.85,0.07,0.5,0.15


approach: nam - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.67,0.614,0.261,0.096,0.856,0.672,0.15,0.58,0.318,0.953,0.298,0.506,0.895,0.047,0.494,0.105,0.67


approach: LogisticRegression - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.591,0.51,0.237,0.053,-0.283,0.619,0.084,0.46,0.107,0.977,0.07,0.442,0.8,0.023,0.558,0.2


approach: DecisionTree - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.506,0.085,0.118,0.916,0.601,0.17,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: ebm - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.646,0.417,0.109,0.874,0.672,0.234,0.6,0.315,0.907,0.368,0.52,0.84,0.093,0.48,0.16


approach: xgboost - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.633,0.427,0.116,0.897,0.661,0.145,0.57,0.283,0.93,0.298,0.5,0.85,0.07,0.5,0.15


approach: nam - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.679,0.594,0.31,0.051,0.558,0.672,0.185,0.61,0.317,0.884,0.404,0.528,0.821,0.116,0.472,0.179,0.679


approach: LogisticRegression - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.618,0.497,0.255,0.03,-0.036,0.655,0.067,0.6,0.276,0.837,0.421,0.522,0.774,0.163,0.478,0.226


approach: DecisionTree - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.535,0.465,0.058,0.028,0.623,0.601,0.076,0.58,0.091,0.209,0.86,0.529,0.59,0.791,0.471,0.41


approach: ebm - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.683,0.629,0.339,0.104,0.87,0.683,0.213,0.6,0.349,0.953,0.333,0.519,0.905,0.047,0.481,0.095


approach: xgboost - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.676,0.606,0.347,0.09,0.827,0.647,0.122,0.52,0.241,0.977,0.175,0.472,0.909,0.023,0.528,0.091


approach: nam - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.677,0.594,0.311,0.067,0.491,0.683,0.122,0.61,0.346,0.93,0.368,0.526,0.875,0.07,0.474,0.125,0.677


approach: LogisticRegression - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.611,0.505,0.265,0.052,-0.473,0.637,0.072,0.58,0.233,0.814,0.404,0.507,0.742,0.186,0.493,0.258


approach: DecisionTree - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.533,0.453,0.058,0.026,0.452,0.601,0.058,0.58,0.091,0.209,0.86,0.529,0.59,0.791,0.471,0.41


approach: ebm - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.679,0.608,0.323,0.062,0.577,0.678,0.234,0.61,0.331,0.907,0.386,0.527,0.846,0.093,0.473,0.154


approach: xgboost - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.681,0.611,0.36,0.111,0.876,0.656,0.13,0.56,0.266,0.93,0.281,0.494,0.842,0.07,0.506,0.158


## Experiment 2  `scale_data`

In [6]:

for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):
        
        for scale_data in [True, False]:
            
            for approach in ["LogisticRegression", "DecisionTree", "nam", "ebm", 'xgboost']:
                
                df =  pd.read_csv(AUTISM_DATA_PATH)


                data = Dataset(df=df, 
                               missing_data_handling='encoding',
                               imputation_method='without',
                               sampling_method='without',
                               scenario = scenario, 
                               features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                               scale_data=scale_data, 
                               use_missing_indicator_variables=False,
                               verbosity=1, 
                               proportion_train=1 if approach != 'nam' else .8)

                exp = Experiments(data.dataset_name,
                                  dataset=data, 
                                  approach=approach, 
                                  previous_experiment=None,        
                                  debug=False, 
                                  verbosity=0, 
                                experiment_folder_name='paper_experiment_2_fs_validation',
                                  save_experiment=True)

                print("approach: {} - scale_data: {} - n_features: {}".format(approach, scale_data, n_features))
                exp.fit()

                exp.dataset.proportion_train = 0.
                exp.dataset._init_scenario(scenario='papers_remote')
                exp.predict()

                display(exp.performances_df)                    


approach: LogisticRegression - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.639,0.547,0.292,0.078,0.678,0.65,0.089,0.57,0.252,0.884,0.333,0.5,0.792,0.116,0.5,0.208


approach: DecisionTree - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.5,0.085,0.097,0.904,0.601,0.13,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: nam - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.674,0.613,0.321,0.151,0.918,0.673,0.208,0.63,0.325,0.837,0.474,0.545,0.794,0.163,0.455,0.206,0.674


approach: ebm - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.713,0.672,0.454,0.19,0.959,0.667,0.245,0.59,0.299,0.907,0.351,0.513,0.833,0.093,0.487,0.167


approach: xgboost - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.716,0.659,0.413,0.183,0.954,0.69,0.245,0.63,0.362,0.907,0.421,0.542,0.857,0.093,0.458,0.143


approach: LogisticRegression - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.54,0.445,0.147,0.024,-1.573,0.627,0.057,0.49,0.155,0.953,0.14,0.456,0.8,0.047,0.544,0.2


approach: DecisionTree - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.585,0.505,0.154,0.034,0.635,0.601,0.102,0.62,0.197,0.302,0.86,0.619,0.62,0.698,0.381,0.38


approach: nam - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.631,0.521,0.25,0.033,-0.206,0.661,0.084,0.59,0.285,0.884,0.368,0.514,0.808,0.116,0.486,0.192,0.631


approach: ebm - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.722,0.65,0.463,0.118,0.9,0.685,0.13,0.67,0.356,0.744,0.614,0.593,0.761,0.256,0.407,0.239


approach: xgboost - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.715,0.608,0.462,0.069,-0.16,0.678,0.126,0.61,0.331,0.907,0.386,0.527,0.846,0.093,0.473,0.154


approach: LogisticRegression - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.591,0.51,0.237,0.053,-0.283,0.619,0.084,0.46,0.107,0.977,0.07,0.442,0.8,0.023,0.558,0.2


approach: DecisionTree - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.506,0.085,0.118,0.916,0.601,0.17,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: nam - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.673,0.578,0.333,0.045,0.531,0.684,0.154,0.62,0.346,0.907,0.404,0.534,0.852,0.093,0.466,0.148,0.673


approach: ebm - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.646,0.417,0.109,0.874,0.672,0.234,0.6,0.315,0.907,0.368,0.52,0.84,0.093,0.48,0.16


approach: xgboost - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.633,0.427,0.116,0.897,0.661,0.145,0.57,0.283,0.93,0.298,0.5,0.85,0.07,0.5,0.15


approach: LogisticRegression - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.531,0.461,0.077,0.026,,0.604,0.066,0.44,0.014,0.953,0.053,0.432,0.6,0.047,0.568,0.4


approach: DecisionTree - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.585,0.505,0.154,0.034,0.635,0.601,0.102,0.62,0.197,0.302,0.86,0.619,0.62,0.698,0.381,0.38


approach: nam - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.657,0.582,0.314,0.045,0.548,0.645,0.122,0.55,0.232,0.907,0.281,0.488,0.8,0.093,0.512,0.2,0.657


approach: ebm - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.678,0.641,0.375,0.161,0.943,0.634,0.208,0.54,0.2,0.884,0.281,0.481,0.762,0.116,0.519,0.238


approach: xgboost - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.72,0.605,0.482,0.048,,0.679,0.12,0.64,0.352,0.86,0.474,0.552,0.818,0.14,0.448,0.182


approach: LogisticRegression - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.611,0.505,0.265,0.052,-0.473,0.637,0.072,0.58,0.233,0.814,0.404,0.507,0.742,0.186,0.493,0.258


approach: DecisionTree - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.533,0.453,0.058,0.026,0.452,0.601,0.058,0.58,0.091,0.209,0.86,0.529,0.59,0.791,0.471,0.41


approach: nam - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.661,0.567,0.322,0.043,0.54,0.683,0.138,0.61,0.346,0.93,0.368,0.526,0.875,0.07,0.474,0.125,0.661


approach: ebm - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.679,0.608,0.323,0.062,0.577,0.678,0.234,0.61,0.331,0.907,0.386,0.527,0.846,0.093,0.473,0.154


approach: xgboost - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.681,0.611,0.36,0.111,0.876,0.656,0.13,0.56,0.266,0.93,0.281,0.494,0.842,0.07,0.506,0.158


approach: LogisticRegression - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.47,0.405,0.031,0.021,-3.227,0.601,0.05,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: DecisionTree - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.556,0.486,0.096,0.03,0.552,0.601,0.084,0.58,0.109,0.326,0.772,0.519,0.603,0.674,0.481,0.397


approach: nam - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.605,0.531,0.237,0.036,,0.628,0.108,0.48,0.159,0.977,0.105,0.452,0.857,0.023,0.548,0.143,0.605


approach: ebm - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.698,0.646,0.424,0.178,0.938,0.661,0.245,0.65,0.309,0.698,0.614,0.577,0.729,0.302,0.423,0.271


approach: xgboost - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.689,0.587,0.39,0.045,,0.667,0.126,0.61,0.304,0.86,0.421,0.529,0.8,0.14,0.471,0.2


## Experiment 3. `Imputation approach`

In [27]:

for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):
        
        for imputation_method in ['mice', 'knn', 'mean']:
            
            for approach in ["LogisticRegression", "DecisionTree", "nam", "ebm", 'xgboost']:
                
                df =  pd.read_csv(AUTISM_DATA_PATH)

                data = Dataset(df=df, 
                               missing_data_handling='imputation',
                               imputation_method=imputation_method,
                               sampling_method='without',
                               scenario = scenario, 
                               features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                               scale_data=True, 
                               use_missing_indicator_variables=False,
                               verbosity=0, 
                               proportion_train=1 if approach != 'nam' else .8)

                exp = Experiments(data.dataset_name,
                                  dataset=data, 
                                  approach=approach, 
                                  previous_experiment=None,        
                                  debug=False, 
                                 experiment_folder_name='paper_experiment_3_fs_validation',
                                  verbosity=0, 
                                  save_experiment=True)

                print("approach: {} - imputation_method: {} - n_features: {}".format(approach, imputation_method, n_features))
                exp.fit()

                exp.dataset.proportion_train = 0
                exp.dataset._init_scenario(scenario='papers_remote')
                exp.predict()

                display(exp.performances_df)                     


approach: LogisticRegression - imputation_method: mice - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.684,0.621,0.382,0.06,0.581,0.655,0.17,0.58,0.268,0.884,0.351,0.507,0.8,0.116,0.493,0.2


approach: DecisionTree - imputation_method: mice - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.656,0.591,0.333,0.172,0.955,0.601,0.245,0.62,0.245,0.651,0.596,0.549,0.694,0.349,0.451,0.306


approach: nam - imputation_method: mice - n_features: 10
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.653,0.566,0.328,0.063,0.535,0.661,0.124,0.57,0.283,0.93,0.298,0.5,0.85,0.07,0.5,0.15,0.653


approach: ebm - imputation_method: mice - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.615,0.61,0.195,0.235,0.943,0.628,0.346,0.54,0.187,0.86,0.298,0.481,0.739,0.14,0.519,0.261


approach: xgboost - imputation_method: mice - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.672,0.658,0.351,0.227,0.963,0.635,0.314,0.55,0.232,0.907,0.281,0.488,0.8,0.093,0.512,0.2


approach: LogisticRegression - imputation_method: knn - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.674,0.625,0.341,0.098,0.787,0.643,0.17,0.59,0.249,0.814,0.421,0.515,0.75,0.186,0.485,0.25


approach: DecisionTree - imputation_method: knn - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.593,0.535,0.16,0.142,0.937,0.601,0.208,0.63,0.224,0.279,0.895,0.667,0.622,0.721,0.333,0.378


approach: nam - imputation_method: knn - n_features: 10
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.703,0.614,0.401,0.071,0.567,0.707,0.124,0.65,0.407,0.93,0.439,0.556,0.893,0.07,0.444,0.107,0.703


approach: ebm - imputation_method: knn - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.707,0.662,0.462,0.201,0.95,0.673,0.28,0.63,0.325,0.837,0.474,0.545,0.794,0.163,0.455,0.206


approach: xgboost - imputation_method: knn - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.66,0.649,0.273,0.261,0.96,0.639,0.377,0.56,0.222,0.86,0.333,0.493,0.76,0.14,0.507,0.24


approach: LogisticRegression - imputation_method: mean - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.696,0.64,0.394,0.084,0.59,0.655,0.185,0.6,0.276,0.837,0.421,0.522,0.774,0.163,0.478,0.226


approach: DecisionTree - imputation_method: mean - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.629,0.547,0.261,0.103,0.919,0.602,0.13,0.58,0.17,0.628,0.544,0.509,0.66,0.372,0.491,0.34


approach: nam - imputation_method: mean - n_features: 10
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.69,0.614,0.357,0.092,0.849,0.695,0.124,0.63,0.377,0.93,0.404,0.541,0.885,0.07,0.459,0.115,0.69


approach: ebm - imputation_method: mean - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.67,0.634,0.315,0.237,0.953,0.679,0.346,0.63,0.336,0.86,0.456,0.544,0.812,0.14,0.456,0.188


approach: xgboost - imputation_method: mean - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.671,0.651,0.315,0.245,0.962,0.645,0.346,0.56,0.235,0.884,0.316,0.494,0.783,0.116,0.506,0.217


approach: LogisticRegression - imputation_method: mice - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.717,0.632,0.491,0.057,0.572,0.655,0.17,0.61,0.282,0.814,0.456,0.53,0.765,0.186,0.47,0.235


approach: DecisionTree - imputation_method: mice - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.581,0.497,0.197,0.031,0.54,0.601,0.078,0.41,-0.164,0.953,0.0,0.418,0.0,0.047,0.582,1.0


approach: nam - imputation_method: mice - n_features: 22
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.698,0.604,0.434,0.048,0.551,0.661,0.121,0.59,0.285,0.884,0.368,0.514,0.808,0.116,0.486,0.192,0.698


approach: ebm - imputation_method: mice - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.711,0.635,0.437,0.096,0.85,0.672,0.138,0.59,0.315,0.93,0.333,0.513,0.864,0.07,0.487,0.136


approach: xgboost - imputation_method: mice - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.681,0.674,0.422,0.266,0.965,0.641,0.377,0.53,0.214,0.93,0.228,0.476,0.812,0.07,0.524,0.188


approach: LogisticRegression - imputation_method: knn - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.756,0.656,0.588,0.078,0.546,0.714,0.13,0.71,0.437,0.791,0.649,0.63,0.804,0.209,0.37,0.196


approach: DecisionTree - imputation_method: knn - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.595,0.496,0.207,0.032,0.494,0.601,0.091,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: nam - imputation_method: knn - n_features: 22
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.711,0.626,0.426,0.094,0.83,0.71,0.154,0.68,0.415,0.86,0.544,0.587,0.838,0.14,0.413,0.162,0.711


approach: ebm - imputation_method: knn - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.69,0.64,0.384,0.107,0.874,0.661,0.213,0.61,0.293,0.837,0.439,0.529,0.781,0.163,0.471,0.219


approach: xgboost - imputation_method: knn - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.698,0.637,0.44,0.155,0.924,0.673,0.208,0.62,0.32,0.86,0.439,0.536,0.806,0.14,0.464,0.194


approach: LogisticRegression - imputation_method: mean - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.734,0.625,0.542,0.072,0.174,0.708,0.13,0.66,0.407,0.907,0.474,0.565,0.871,0.093,0.435,0.129


approach: DecisionTree - imputation_method: mean - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.629,0.547,0.261,0.103,0.919,0.602,0.13,0.58,0.17,0.628,0.544,0.509,0.66,0.372,0.491,0.34


approach: nam - imputation_method: mean - n_features: 22
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.678,0.616,0.342,0.116,0.907,0.684,0.154,0.62,0.346,0.907,0.404,0.534,0.852,0.093,0.466,0.148,0.678


approach: ebm - imputation_method: mean - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.692,0.65,0.383,0.148,0.944,0.672,0.213,0.58,0.318,0.953,0.298,0.506,0.895,0.047,0.494,0.105


approach: xgboost - imputation_method: mean - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.668,0.643,0.32,0.204,0.955,0.643,0.28,0.59,0.249,0.814,0.421,0.515,0.75,0.186,0.485,0.25


approach: LogisticRegression - imputation_method: mice - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.697,0.576,0.457,0.041,0.15,0.673,0.096,0.64,0.331,0.814,0.509,0.556,0.784,0.186,0.444,0.216


approach: DecisionTree - imputation_method: mice - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.524,0.478,0.079,0.032,0.653,0.601,0.101,0.4,-0.158,0.884,0.035,0.409,0.286,0.116,0.591,0.714


approach: nam - imputation_method: mice - n_features: 32
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.716,0.641,0.434,0.119,0.914,0.686,0.147,0.66,0.363,0.814,0.544,0.574,0.795,0.186,0.426,0.205,0.716


approach: ebm - imputation_method: mice - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.651,0.605,0.257,0.135,0.923,0.661,0.17,0.56,0.286,0.953,0.263,0.494,0.882,0.047,0.506,0.118


approach: xgboost - imputation_method: mice - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.688,0.648,0.41,0.143,0.939,0.646,0.17,0.64,0.298,0.721,0.579,0.564,0.733,0.279,0.436,0.267


approach: LogisticRegression - imputation_method: knn - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.734,0.613,0.515,0.048,0.469,0.69,0.106,0.64,0.364,0.884,0.456,0.551,0.839,0.116,0.449,0.161


approach: DecisionTree - imputation_method: knn - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.643,0.565,0.302,0.146,0.936,0.602,0.208,0.6,0.189,0.558,0.632,0.533,0.655,0.442,0.467,0.345


approach: nam - imputation_method: knn - n_features: 32
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.7,0.64,0.416,0.138,0.923,0.68,0.17,0.66,0.354,0.791,0.561,0.576,0.78,0.209,0.424,0.22,0.7


approach: ebm - imputation_method: knn - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.686,0.649,0.407,0.169,0.949,0.655,0.234,0.58,0.268,0.884,0.351,0.507,0.8,0.116,0.493,0.2


approach: xgboost - imputation_method: knn - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.659,0.647,0.336,0.221,0.957,0.629,0.314,0.54,0.2,0.884,0.281,0.481,0.762,0.116,0.519,0.238


approach: LogisticRegression - imputation_method: mean - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.77,0.642,0.63,0.075,0.124,0.735,0.144,0.73,0.477,0.814,0.667,0.648,0.826,0.186,0.352,0.174


approach: DecisionTree - imputation_method: mean - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.629,0.547,0.261,0.103,0.919,0.602,0.13,0.58,0.17,0.628,0.544,0.509,0.66,0.372,0.491,0.34


approach: nam - imputation_method: mean - n_features: 32
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.656,0.601,0.265,0.133,0.918,0.683,0.17,0.59,0.354,0.977,0.298,0.512,0.944,0.023,0.488,0.056,0.656


approach: ebm - imputation_method: mean - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.702,0.665,0.386,0.174,0.957,0.678,0.234,0.61,0.331,0.907,0.386,0.527,0.846,0.093,0.473,0.154


approach: xgboost - imputation_method: mean - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.669,0.633,0.33,0.143,0.939,0.655,0.192,0.6,0.276,0.837,0.421,0.522,0.774,0.163,0.478,0.226


## Experiment 4. `Sampling approach`

In [4]:
n=0
for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):
        
        for sampling_method in ['without', 'vanilla', 'smote']:
            
            for approach in ["LogisticRegression", "DecisionTree", "nam", "ebm", 'xgboost']:
                    
                df =  pd.read_csv(AUTISM_DATA_PATH)


                data = Dataset(df=df, 
                               missing_data_handling='encoding',
                               imputation_method='without',
                               sampling_method=sampling_method,
                               scenario = scenario, 
                                   features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                               scale_data=True, 
                               use_missing_indicator_variables=False,
                               verbosity=1, 
                               proportion_train=1 if approach != 'nam' else .8)

                exp = Experiments(data.dataset_name,
                                  dataset=data, 
                                  approach=approach, 
                                  previous_experiment=None,        
                                  debug=False, 
                                 experiment_folder_name='paper_experiment_4_fs_validation',
                                  verbosity=0, 
                                  save_experiment=True)
                print("approach: {} - sampling_method: {} - n_features: {}".format(approach, sampling_method, n_features))
                
                exp.fit()
                
                exp.dataset.proportion_train = 0.
                exp.dataset._init_scenario(scenario='papers_remote')
                exp.predict()
                
                
                display(exp.performances_df)  
       

approach: LogisticRegression - sampling_method: without - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.639,0.547,0.292,0.078,0.678,0.65,0.089,0.57,0.252,0.884,0.333,0.5,0.792,0.116,0.5,0.208


approach: DecisionTree - sampling_method: without - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.5,0.085,0.097,0.904,0.601,0.13,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: nam - sampling_method: without - n_features: 10
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.717,0.611,0.468,0.049,0.536,0.689,0.118,0.62,0.362,0.93,0.386,0.533,0.88,0.07,0.467,0.12,0.717


approach: ebm - sampling_method: without - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.713,0.672,0.454,0.19,0.959,0.667,0.245,0.59,0.299,0.907,0.351,0.513,0.833,0.093,0.487,0.167


approach: xgboost - sampling_method: without - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.716,0.659,0.413,0.183,0.954,0.69,0.245,0.63,0.362,0.907,0.421,0.542,0.857,0.093,0.458,0.143


approach: LogisticRegression - sampling_method: vanilla - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.639,0.547,0.292,0.078,0.678,0.65,0.089,0.57,0.252,0.884,0.333,0.5,0.792,0.116,0.5,0.208


approach: DecisionTree - sampling_method: vanilla - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.5,0.085,0.097,0.904,0.601,0.13,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: nam - sampling_method: vanilla - n_features: 10
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.666,0.583,0.294,0.064,0.516,0.677,0.111,0.59,0.333,0.953,0.316,0.512,0.9,0.047,0.488,0.1,0.666


approach: ebm - sampling_method: vanilla - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.713,0.672,0.454,0.19,0.959,0.667,0.245,0.59,0.299,0.907,0.351,0.513,0.833,0.093,0.487,0.167


approach: xgboost - sampling_method: vanilla - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.716,0.659,0.413,0.183,0.954,0.69,0.245,0.63,0.362,0.907,0.421,0.542,0.857,0.093,0.458,0.143


approach: LogisticRegression - sampling_method: smote - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.639,0.547,0.292,0.078,0.678,0.65,0.089,0.57,0.252,0.884,0.333,0.5,0.792,0.116,0.5,0.208


approach: DecisionTree - sampling_method: smote - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.5,0.085,0.097,0.904,0.601,0.13,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: nam - sampling_method: smote - n_features: 10
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.677,0.589,0.323,0.065,0.525,0.684,0.111,0.62,0.346,0.907,0.404,0.534,0.852,0.093,0.466,0.148,0.677


approach: ebm - sampling_method: smote - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.713,0.672,0.454,0.19,0.959,0.667,0.245,0.59,0.299,0.907,0.351,0.513,0.833,0.093,0.487,0.167


approach: xgboost - sampling_method: smote - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.716,0.659,0.413,0.183,0.954,0.69,0.245,0.63,0.362,0.907,0.421,0.542,0.857,0.093,0.458,0.143


approach: LogisticRegression - sampling_method: without - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.591,0.51,0.237,0.053,-0.283,0.619,0.084,0.46,0.107,0.977,0.07,0.442,0.8,0.023,0.558,0.2


approach: DecisionTree - sampling_method: without - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.506,0.085,0.118,0.916,0.601,0.17,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: nam - sampling_method: without - n_features: 22
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.671,0.582,0.331,0.046,0.202,0.662,0.154,0.55,0.292,0.977,0.228,0.488,0.929,0.023,0.512,0.071,0.671


approach: ebm - sampling_method: without - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.646,0.417,0.109,0.874,0.672,0.234,0.6,0.315,0.907,0.368,0.52,0.84,0.093,0.48,0.16


approach: xgboost - sampling_method: without - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.633,0.427,0.116,0.897,0.661,0.145,0.57,0.283,0.93,0.298,0.5,0.85,0.07,0.5,0.15


approach: LogisticRegression - sampling_method: vanilla - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.591,0.51,0.237,0.053,-0.283,0.619,0.084,0.46,0.107,0.977,0.07,0.442,0.8,0.023,0.558,0.2


approach: DecisionTree - sampling_method: vanilla - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.506,0.085,0.118,0.916,0.601,0.17,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: nam - sampling_method: vanilla - n_features: 22
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.671,0.62,0.268,0.136,0.918,0.683,0.17,0.61,0.346,0.93,0.368,0.526,0.875,0.07,0.474,0.125,0.671


approach: ebm - sampling_method: vanilla - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.646,0.417,0.109,0.874,0.672,0.234,0.6,0.315,0.907,0.368,0.52,0.84,0.093,0.48,0.16


approach: xgboost - sampling_method: vanilla - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.633,0.427,0.116,0.897,0.661,0.145,0.57,0.283,0.93,0.298,0.5,0.85,0.07,0.5,0.15


approach: LogisticRegression - sampling_method: smote - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.591,0.51,0.237,0.053,-0.283,0.619,0.084,0.46,0.107,0.977,0.07,0.442,0.8,0.023,0.558,0.2


approach: DecisionTree - sampling_method: smote - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.558,0.506,0.085,0.118,0.916,0.601,0.17,0.42,-0.116,0.977,0.0,0.424,0.0,0.023,0.576,1.0


approach: nam - sampling_method: smote - n_features: 22
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.686,0.599,0.342,0.05,0.558,0.678,0.154,0.61,0.331,0.907,0.386,0.527,0.846,0.093,0.473,0.154,0.686


approach: ebm - sampling_method: smote - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.646,0.417,0.109,0.874,0.672,0.234,0.6,0.315,0.907,0.368,0.52,0.84,0.093,0.48,0.16


approach: xgboost - sampling_method: smote - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.704,0.633,0.427,0.116,0.897,0.661,0.145,0.57,0.283,0.93,0.298,0.5,0.85,0.07,0.5,0.15


approach: LogisticRegression - sampling_method: without - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.611,0.505,0.265,0.052,-0.473,0.637,0.072,0.58,0.233,0.814,0.404,0.507,0.742,0.186,0.493,0.258


approach: DecisionTree - sampling_method: without - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.533,0.453,0.058,0.026,0.452,0.601,0.058,0.58,0.091,0.209,0.86,0.529,0.59,0.791,0.471,0.41


approach: nam - sampling_method: without - n_features: 32
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.656,0.6,0.246,0.131,0.909,0.678,0.17,0.62,0.333,0.884,0.421,0.535,0.828,0.116,0.465,0.172,0.656


approach: ebm - sampling_method: without - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.679,0.608,0.323,0.062,0.577,0.678,0.234,0.61,0.331,0.907,0.386,0.527,0.846,0.093,0.473,0.154


approach: xgboost - sampling_method: without - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.681,0.611,0.36,0.111,0.876,0.656,0.13,0.56,0.266,0.93,0.281,0.494,0.842,0.07,0.506,0.158


approach: LogisticRegression - sampling_method: vanilla - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.611,0.505,0.265,0.052,-0.473,0.637,0.072,0.58,0.233,0.814,0.404,0.507,0.742,0.186,0.493,0.258


approach: DecisionTree - sampling_method: vanilla - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.533,0.453,0.058,0.026,0.452,0.601,0.058,0.58,0.091,0.209,0.86,0.529,0.59,0.791,0.471,0.41


approach: nam - sampling_method: vanilla - n_features: 32
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.684,0.628,0.318,0.137,0.928,0.701,0.17,0.64,0.392,0.93,0.421,0.548,0.889,0.07,0.452,0.111,0.684


approach: ebm - sampling_method: vanilla - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.679,0.608,0.323,0.062,0.577,0.678,0.234,0.61,0.331,0.907,0.386,0.527,0.846,0.093,0.473,0.154


approach: xgboost - sampling_method: vanilla - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.681,0.611,0.36,0.111,0.876,0.656,0.13,0.56,0.266,0.93,0.281,0.494,0.842,0.07,0.506,0.158


approach: LogisticRegression - sampling_method: smote - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.611,0.505,0.265,0.052,-0.473,0.637,0.072,0.58,0.233,0.814,0.404,0.507,0.742,0.186,0.493,0.258


approach: DecisionTree - sampling_method: smote - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.533,0.453,0.058,0.026,0.452,0.601,0.058,0.58,0.091,0.209,0.86,0.529,0.59,0.791,0.471,0.41


approach: nam - sampling_method: smote - n_features: 32
	===== Replicate no. 1 =====

	===== Replicate no. 2 =====

	===== Replicate no. 3 =====

	===== Replicate no. 4 =====

	===== Replicate no. 5 =====

	===== Replicate no. 6 =====

	===== Replicate no. 7 =====

	===== Replicate no. 8 =====

	===== Replicate no. 9 =====

	===== Replicate no. 10 =====



Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.671,0.616,0.273,0.134,0.915,0.678,0.17,0.62,0.333,0.884,0.421,0.535,0.828,0.116,0.465,0.172,0.671


approach: ebm - sampling_method: smote - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.679,0.608,0.323,0.062,0.577,0.678,0.234,0.61,0.331,0.907,0.386,0.527,0.846,0.093,0.473,0.154


approach: xgboost - sampling_method: smote - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.681,0.611,0.36,0.111,0.876,0.656,0.13,0.56,0.266,0.93,0.281,0.494,0.842,0.07,0.506,0.158
