# Analysis of the autism data

In [122]:
run init.ipynb

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Done.


# Experiments

* This section aims at answering a few questions related to the different algorithms used, with the aim of optimizing our classification framework.

**Parameters or potential settings**

* The approach used, which could be Logistic Regression, Decision Tree, Explainable Boosting Machine, Neural Additive models, or XGBoost.
* The use or not of indicator variables as inputs. 
* The scaling of the data. 
* The imputation approach when the learning algorithms does not handle them by design. It could be constant imputation (called encoding here), or conditional imputation (mean, knn, or mice).
* The sampling method in the case of imbalance learning (either no down-sampling, vanilla (random sampling of the minority class until classes are balanced), or SMOTE (a more elaborated sampling).
* The number of features, between 2 and 6. 
* The number of folds when cross-validating the results. 

**Notes on the classification pipeline:**

* Two datasets are age-matched but one (scenario `asd_td_age_matched_n_balanced`) lead to relatively balanced classes, whereas the scenario `asd_td_age_matched_n_unbalanced` has slightly younger kids and so leverage the amount of young neuro-typical kids, and is more unbalanced. 
* No searches on hyper-parameters are performed on each of the approach. Cross-validation is performed with the stratified inner-fold being left-out, the model is fitted on the training set, and the prediction of the test set is done with predictions stored for later evaluation of performances. 
* since a lot of settings parameters are tested, we test all of the hypothesis with a default setting that is: Encoding of the missing variables, scaling of the data, no use of indicator variables, no down-sampling (???), and a 16-fold cross-validation.
* Features selection were made for the two scenario suing the features with the higher importance based on xgboost importance map.
* Classification here is between autistic and neurotypical participants.



**Among the questions at stakes:**

* Experiment 1: Shall we use indicator variables? For each scenario (columns of axes), ad each dimension of the problem (rows of axes) , x is the `use_of_indicator_variables`, y is an indicator of performances (typically the F1 score), and the hue variable are the approaches. The plots are performed for no imputations, 
* Experiement 2: Shall we scale data or not ? 
* Experiment X: for the algorithms handling missing variables (xgboost, nam with encoding) , shall we let them be missing, or imputed ? 

In [4]:
df =  pd.read_csv(AUTISM_DATA_PATH)

scenario='papers'

with open(os.path.join(DATA_DIR, 'selected_features_{}.pkl'.format(scenario)), 'rb') as f:
    features = pickle.load(f)

## Experiment 1.  `use_missing_indicator_variables`

In [45]:
n=0
for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):
        
        for use_missing_indicator_variables in [True, False]:
            
            for approach in ["LogisticRegression", "DecisionTree",  "ebm", 'xgboost', "nam"]:
                
                df =  pd.read_csv(AUTISM_DATA_PATH)


                data = Dataset(df=df, 
                               missing_data_handling='encoding',
                               imputation_method='without',
                               sampling_method='without',
                               scenario = scenario, 
                               features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                               scale_data=True, 
                               use_missing_indicator_variables=use_missing_indicator_variables,
                               verbosity=1, 
                               proportion_train=1)

                exp = Experiments(data.dataset_name,
                                  dataset=data, 
                                  approach=approach, 
                                  previous_experiment=None,        
                                  debug=True, 
                                  verbosity=0, 
                                  experiment_folder_name='paper_experiment_1_fs',
                                  save_experiment=True)
                print("approach: {} - use_missing_indicator_variables: {} - n_features: {}".format(approach, use_missing_indicator_variables, n_features))
                exp.fit_predict(num_cv=16)
                display(exp.performances_df)
                

approach: LogisticRegression - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.729,0.264,0.81,0.083,0.845,0.356,0.18,0.832,0.266,0.465,0.87,0.27,0.94,0.535,0.73,0.06


approach: DecisionTree - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.659,0.372,0.878,0.202,0.964,0.406,0.319,0.915,0.359,0.256,0.983,0.611,0.927,0.744,0.389,0.073


approach: ebm - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.864,0.44,0.888,0.199,0.967,0.5,0.272,0.889,0.428,0.558,0.923,0.429,0.953,0.442,0.571,0.047


approach: xgboost - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.804,0.449,0.921,0.246,0.982,0.451,0.302,0.876,0.373,0.512,0.913,0.379,0.948,0.488,0.621,0.052


approach: nam - use_missing_indicator_variables: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8408,0.4024,0.892,0.1625,0.8999,0.4746,0.2733,0.893,0.4053,0.479,0.9358,0.4567,0.9458,0.521,0.5433,0.0542,0.8408


approach: LogisticRegression - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.73,0.264,0.796,0.082,0.851,0.37,0.181,0.834,0.283,0.488,0.87,0.28,0.943,0.512,0.72,0.057


approach: DecisionTree - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.661,0.374,0.879,0.203,0.964,0.406,0.319,0.915,0.359,0.256,0.983,0.611,0.927,0.744,0.389,0.073


approach: ebm - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.863,0.457,0.9,0.222,0.977,0.511,0.276,0.898,0.44,0.535,0.935,0.46,0.951,0.465,0.54,0.049


approach: xgboost - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.806,0.442,0.92,0.235,0.98,0.453,0.319,0.871,0.376,0.535,0.906,0.371,0.95,0.465,0.629,0.05


approach: nam - use_missing_indicator_variables: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8503,0.4069,0.8903,0.1647,0.8932,0.4835,0.2772,0.8989,0.4125,0.4721,0.943,0.4669,0.9453,0.5279,0.5331,0.0546,0.8503


approach: LogisticRegression - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.805,0.418,0.901,0.19,0.956,0.521,0.343,0.922,0.47,0.419,0.974,0.621,0.942,0.581,0.379,0.058


approach: DecisionTree - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.603,0.262,0.794,0.098,0.919,0.333,0.221,0.85,0.223,0.349,0.901,0.268,0.931,0.651,0.732,0.069


approach: ebm - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.857,0.402,0.888,0.18,0.965,0.43,0.225,0.882,0.349,0.442,0.928,0.388,0.941,0.558,0.612,0.059


approach: xgboost - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.827,0.461,0.927,0.243,0.981,0.442,0.352,0.893,0.365,0.419,0.942,0.429,0.94,0.581,0.571,0.06


approach: nam - use_missing_indicator_variables: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8272,0.3503,0.8611,0.1324,0.8591,0.411,0.2275,0.8447,0.3362,0.5372,0.8764,0.3298,0.9487,0.4628,0.6702,0.0513,0.8272


approach: LogisticRegression - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.786,0.415,0.882,0.199,0.971,0.487,0.306,0.911,0.423,0.419,0.962,0.529,0.941,0.581,0.471,0.059


approach: DecisionTree - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.607,0.311,0.826,0.163,0.951,0.337,0.289,0.852,0.227,0.349,0.904,0.273,0.931,0.651,0.727,0.069


approach: ebm - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.866,0.429,0.895,0.218,0.976,0.446,0.24,0.83,0.389,0.698,0.844,0.316,0.964,0.302,0.684,0.036


approach: xgboost - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.832,0.459,0.921,0.24,0.98,0.459,0.351,0.869,0.384,0.558,0.901,0.369,0.952,0.442,0.631,0.048


approach: nam - use_missing_indicator_variables: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.85,0.3758,0.8821,0.1453,0.8748,0.4229,0.238,0.8591,0.3489,0.5232,0.8938,0.3489,0.9484,0.4768,0.6511,0.0516,0.85


approach: LogisticRegression - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.785,0.395,0.886,0.159,0.868,0.475,0.289,0.882,0.4,0.535,0.918,0.404,0.95,0.465,0.596,0.05


approach: DecisionTree - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.612,0.297,0.798,0.168,0.946,0.34,0.319,0.906,,0.0,1.0,,0.906,1.0,,0.094


approach: ebm - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.854,0.408,0.887,0.172,0.955,0.464,0.222,0.852,0.4,0.651,0.873,0.346,0.96,0.349,0.654,0.04


approach: xgboost - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.829,0.443,0.92,0.232,0.98,0.433,0.35,0.878,0.352,0.465,0.921,0.377,0.943,0.535,0.623,0.057


approach: nam - use_missing_indicator_variables: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8216,0.3578,0.8541,0.1463,0.9318,0.4056,0.2227,0.8583,0.3299,0.4813,0.897,0.3542,0.9441,0.5187,0.6458,0.0559,0.8216


approach: LogisticRegression - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.78,0.38,0.88,0.149,0.825,0.475,0.264,0.882,0.4,0.535,0.918,0.404,0.95,0.465,0.596,0.05


approach: DecisionTree - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.615,0.298,0.811,0.15,0.941,0.364,0.323,0.906,,0.0,1.0,,0.906,1.0,,0.094


approach: ebm - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.859,0.415,0.906,0.159,0.929,0.475,0.259,0.863,0.408,0.628,0.887,0.365,0.958,0.372,0.635,0.042


approach: xgboost - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.821,0.443,0.92,0.232,0.979,0.458,0.332,0.885,0.38,0.488,0.925,0.404,0.946,0.512,0.596,0.054


approach: nam - use_missing_indicator_variables: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8392,0.3837,0.8778,0.1606,0.9428,0.4248,0.2268,0.8636,0.3476,0.507,0.9004,0.3532,0.9468,0.493,0.6468,0.0532,0.8392


In [73]:
n=0
use_missing_indicator_variables = {'FunAtThePark':['FP_postural_sway', 'FP_postural_sway_derivative'],
                                           'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']}

for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):

        for approach in ["LogisticRegression", "DecisionTree",  "ebm", 'xgboost', "nam"]:

            df =  pd.read_csv(AUTISM_DATA_PATH)


            data = Dataset(df=df, 
                           missing_data_handling='encoding',
                           imputation_method='without',
                           sampling_method='without',
                           scenario = scenario, 
                           features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                           scale_data=True, 
                           use_missing_indicator_variables=use_missing_indicator_variables,
                           verbosity=1, 
                           proportion_train=1)

            exp = Experiments(data.dataset_name,
                              dataset=data, 
                              approach=approach, 
                              previous_experiment=None,        
                              debug=True, 
                              verbosity=0, 
                              experiment_folder_name='paper_experiment_1_fs_custom_missing',
                              save_experiment=True)
            print("approach: {} - use_missing_indicator_variables: {} - n_features: {}".format(approach, use_missing_indicator_variables, n_features))
            exp.fit_predict(num_cv=16)
            display(exp.performances_df)


approach: LogisticRegression - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.745,0.268,0.804,0.082,0.817,0.381,0.196,0.856,0.293,0.442,0.899,0.311,0.94,0.558,0.689,0.06


approach: DecisionTree - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.735,0.474,0.931,0.331,0.984,0.48,0.407,0.915,0.418,0.372,0.971,0.571,0.937,0.628,0.429,0.063


approach: ebm - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.872,0.465,0.899,0.204,0.946,0.543,0.323,0.917,0.482,0.488,0.962,0.568,0.948,0.512,0.432,0.052


approach: xgboost - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.837,0.477,0.93,0.249,0.982,0.523,0.319,0.906,0.454,0.512,0.947,0.5,0.949,0.488,0.5,0.051


approach: nam - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8514,0.4316,0.9078,0.1984,0.9568,0.469,0.2734,0.8812,0.3972,0.5256,0.9181,0.4106,0.9494,0.4744,0.5894,0.0505,0.8514


approach: LogisticRegression - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.828,0.401,0.894,0.167,0.928,0.462,0.264,0.906,0.394,0.395,0.959,0.5,0.939,0.605,0.5,0.061


approach: DecisionTree - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.589,0.213,0.733,0.066,0.86,0.275,0.194,0.841,0.177,0.302,0.897,0.232,0.926,0.698,0.768,0.074


approach: ebm - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.843,0.328,0.823,0.103,0.848,0.453,0.204,0.871,0.376,0.535,0.906,0.371,0.95,0.465,0.629,0.05


approach: xgboost - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.86,0.429,0.882,0.218,0.974,0.472,0.234,0.876,0.398,0.558,0.909,0.387,0.952,0.442,0.613,0.048


approach: nam - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8396,0.3709,0.872,0.1487,0.9129,0.409,0.2245,0.8323,0.3452,0.586,0.858,0.3254,0.954,0.414,0.6746,0.046,0.8396


approach: LogisticRegression - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.818,0.401,0.888,0.161,0.921,0.458,0.259,0.9,0.385,0.419,0.95,0.462,0.94,0.581,0.538,0.06


approach: DecisionTree - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.612,0.285,0.801,0.127,0.931,0.357,0.302,0.906,,0.0,1.0,,0.906,1.0,,0.094


approach: ebm - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.861,0.417,0.895,0.198,0.974,0.451,0.238,0.876,0.373,0.512,0.913,0.379,0.948,0.488,0.621,0.052


approach: xgboost - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.852,0.472,0.936,0.266,0.986,0.46,0.306,0.88,0.383,0.512,0.918,0.393,0.948,0.488,0.607,0.052


approach: nam - use_missing_indicator_variables: {'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'], 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative']} - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8364,0.3657,0.8712,0.1432,0.9122,0.4087,0.2351,0.8416,0.3344,0.5534,0.8715,0.3161,0.9501,0.4466,0.6839,0.0499,0.8364


In [118]:
df =  pd.read_csv(AUTISM_DATA_PATH)

scenario='papers'

with open(os.path.join(DATA_DIR, 'selected_features_{}.pkl'.format(scenario)), 'rb') as f:
    features = pickle.load(f)
    
n=0
use_missing_indicator_variables = {'Floating Bubbles': ['FB_postural_sway', 'FB_postural_sway_derivative'],
                                 'DogInGrassC': ['DIGC_postural_sway', 'DIGC_postural_sway_derivative'],
                                 'DogInGrassRRL': ['DIGRRL_postural_sway', 'DIGRRL_postural_sway_derivative'],
                                 'SpinningTop': ['ST_postural_sway', 'ST_postural_sway_derivative'],
                                 'Social gaze': ['S_gaze_percent_right', 'S_gaze_silhouette_score'],
                                 'PlayingWithBlocks': ['PWB_postural_sway', 'PWB_postural_sway_derivative'],
                                 'FunAtThePark': ['FP_postural_sway', 'FP_postural_sway_derivative'],
                                 'MechanicalPuppy': ['MP_postural_sway', 'MP_postural_sway_derivative'],
                                 'BlowingBubbles': ['BB_postural_sway', 'BB_postural_sway_derivative'],
                                 'RhymesAndToys': ['RT_postural_sway', 'RT_postural_sway_derivative'],
                                 'MakeMeLaugh': ['MML_postural_sway', 'MML_postural_sway_derivative'],
                                 'Game administration': ['number_of_touches', 'number_of_target', 'exploratory_percentage'],
                                 'No touches at all': ['average_length',
                                  'std_length',
                                  'pop_rate',
                                  'average_touch_duration',
                                  'std_touch_duration',
                                  'average_accuracy_variation',
                                  'accuracy_consistency'],
                                 'Game grouped touches': ['average_time_spent', 'std_time_spent']}


for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):
        for approach in ["ebm"]:
            for feature_missing, feats in use_missing_indicator_variables.items():

                df =  pd.read_csv(AUTISM_DATA_PATH)

                data = Dataset(df=df, 
                               missing_data_handling='encoding',
                               imputation_method='without',
                               sampling_method='without',
                               scenario = scenario, 
                               features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                               scale_data=True, 
                               use_missing_indicator_variables={feature_missing:feats},
                               verbosity=1, 
                               proportion_train=1)

                exp = Experiments(data.dataset_name,
                                  dataset=data, 
                                  approach=approach, 
                                  previous_experiment=None,        
                                  debug=True, 
                                  verbosity=0, 
                                  experiment_folder_name='paper_experiment_1_fs_custom_missing_1_per_1',
                                  save_experiment=True)
                print("approach: {} - use_missing_indicator_variables: {} - n_features: {}".format(approach, feature_missing, n_features))
                exp.fit_predict(num_cv=16)
                display(exp.performances_df)


approach: ebm - use_missing_indicator_variables: Floating Bubbles - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.866,0.469,0.911,0.23,0.98,0.516,0.285,0.9,0.446,0.535,0.938,0.469,0.951,0.465,0.531,0.049


approach: ebm - use_missing_indicator_variables: DogInGrassC - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.855,0.46,0.901,0.225,0.978,0.511,0.289,0.898,0.44,0.535,0.935,0.46,0.951,0.465,0.54,0.049


approach: ebm - use_missing_indicator_variables: DogInGrassRRL - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.863,0.463,0.909,0.237,0.98,0.532,0.276,0.902,0.464,0.558,0.938,0.48,0.954,0.442,0.52,0.046


approach: ebm - use_missing_indicator_variables: SpinningTop - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.864,0.458,0.905,0.215,0.976,0.511,0.289,0.902,0.44,0.512,0.942,0.478,0.949,0.488,0.522,0.051


approach: ebm - use_missing_indicator_variables: Social gaze - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.865,0.458,0.908,0.216,0.977,0.517,0.276,0.904,0.447,0.512,0.945,0.489,0.949,0.488,0.511,0.051


approach: ebm - use_missing_indicator_variables: PlayingWithBlocks - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.877,0.474,0.914,0.242,0.981,0.531,0.285,0.898,0.463,0.581,0.93,0.463,0.956,0.419,0.537,0.044


approach: ebm - use_missing_indicator_variables: FunAtThePark - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.871,0.472,0.92,0.245,0.982,0.515,0.278,0.891,0.446,0.581,0.923,0.439,0.955,0.419,0.561,0.045


approach: ebm - use_missing_indicator_variables: MechanicalPuppy - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.862,0.449,0.891,0.208,0.973,0.527,0.289,0.904,0.459,0.535,0.942,0.489,0.951,0.465,0.511,0.049


approach: ebm - use_missing_indicator_variables: BlowingBubbles - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.868,0.467,0.904,0.219,0.976,0.523,0.317,0.906,0.454,0.512,0.947,0.5,0.949,0.488,0.5,0.051


approach: ebm - use_missing_indicator_variables: RhymesAndToys - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.87,0.468,0.902,0.229,0.978,0.531,0.301,0.898,0.463,0.581,0.93,0.463,0.956,0.419,0.537,0.044


approach: ebm - use_missing_indicator_variables: MakeMeLaugh - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.863,0.456,0.903,0.213,0.975,0.505,0.285,0.895,0.434,0.535,0.933,0.451,0.951,0.465,0.549,0.049


approach: ebm - use_missing_indicator_variables: Game administration - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.861,0.461,0.9,0.225,0.977,0.523,0.282,0.906,0.454,0.512,0.947,0.5,0.949,0.488,0.5,0.051


approach: ebm - use_missing_indicator_variables: No touches at all - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.87,0.465,0.903,0.227,0.978,0.521,0.292,0.898,0.451,0.558,0.933,0.462,0.953,0.442,0.538,0.047


approach: ebm - use_missing_indicator_variables: Game grouped touches - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.867,0.464,0.906,0.227,0.979,0.511,0.289,0.898,0.44,0.535,0.935,0.46,0.951,0.465,0.54,0.049


approach: ebm - use_missing_indicator_variables: Floating Bubbles - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.862,0.427,0.898,0.23,0.977,0.436,0.245,0.834,0.372,0.651,0.853,0.315,0.959,0.349,0.685,0.041


approach: ebm - use_missing_indicator_variables: DogInGrassC - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.858,0.418,0.888,0.204,0.975,0.438,0.222,0.83,0.377,0.674,0.846,0.312,0.962,0.326,0.688,0.038


approach: ebm - use_missing_indicator_variables: DogInGrassRRL - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.865,0.426,0.892,0.218,0.976,0.439,0.238,0.837,0.375,0.651,0.856,0.318,0.96,0.349,0.682,0.04


approach: ebm - use_missing_indicator_variables: SpinningTop - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.861,0.412,0.876,0.2,0.972,0.44,0.215,0.826,0.383,0.698,0.839,0.309,0.964,0.302,0.691,0.036


approach: ebm - use_missing_indicator_variables: Social gaze - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.86,0.408,0.863,0.198,0.969,0.436,0.227,0.834,0.372,0.651,0.853,0.315,0.959,0.349,0.685,0.041


approach: ebm - use_missing_indicator_variables: PlayingWithBlocks - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.854,0.414,0.881,0.202,0.973,0.438,0.235,0.819,0.386,0.721,0.829,0.304,0.966,0.279,0.696,0.034


approach: ebm - use_missing_indicator_variables: FunAtThePark - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.863,0.414,0.885,0.192,0.972,0.449,0.231,0.832,0.393,0.698,0.846,0.319,0.964,0.302,0.681,0.036


approach: ebm - use_missing_indicator_variables: MechanicalPuppy - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.867,0.424,0.881,0.215,0.973,0.451,0.24,0.839,0.391,0.674,0.856,0.326,0.962,0.326,0.674,0.038


approach: ebm - use_missing_indicator_variables: BlowingBubbles - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.86,0.406,0.858,0.187,0.965,0.448,0.24,0.837,0.387,0.674,0.853,0.322,0.962,0.326,0.678,0.038


approach: ebm - use_missing_indicator_variables: RhymesAndToys - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.86,0.409,0.856,0.207,0.968,0.439,0.227,0.858,0.363,0.558,0.889,0.343,0.951,0.442,0.657,0.049


approach: ebm - use_missing_indicator_variables: MakeMeLaugh - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.863,0.417,0.876,0.212,0.972,0.439,0.233,0.898,0.365,0.395,0.95,0.447,0.938,0.605,0.553,0.062


approach: ebm - use_missing_indicator_variables: Game administration - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.865,0.424,0.888,0.205,0.975,0.448,0.24,0.837,0.387,0.674,0.853,0.322,0.962,0.326,0.678,0.038


approach: ebm - use_missing_indicator_variables: No touches at all - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.864,0.422,0.889,0.214,0.975,0.436,0.221,0.874,0.355,0.488,0.913,0.368,0.945,0.512,0.632,0.055


approach: ebm - use_missing_indicator_variables: Game grouped touches - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.861,0.416,0.878,0.211,0.973,0.444,0.24,0.9,0.372,0.395,0.952,0.459,0.938,0.605,0.541,0.062


approach: ebm - use_missing_indicator_variables: Floating Bubbles - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.856,0.407,0.895,0.151,0.926,0.456,0.259,0.904,0.387,0.395,0.957,0.486,0.939,0.605,0.514,0.061


approach: ebm - use_missing_indicator_variables: DogInGrassC - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.854,0.403,0.893,0.15,0.926,0.45,0.274,0.854,0.38,0.605,0.88,0.342,0.956,0.395,0.658,0.044


approach: ebm - use_missing_indicator_variables: DogInGrassRRL - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.861,0.406,0.899,0.153,0.927,0.464,0.259,0.852,0.4,0.651,0.873,0.346,0.96,0.349,0.654,0.04


approach: ebm - use_missing_indicator_variables: SpinningTop - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.859,0.404,0.891,0.149,0.925,0.472,0.238,0.852,0.412,0.674,0.87,0.349,0.963,0.326,0.651,0.037


approach: ebm - use_missing_indicator_variables: Social gaze - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.858,0.42,0.901,0.173,0.929,0.452,0.285,0.861,0.379,0.581,0.889,0.352,0.954,0.419,0.648,0.046


approach: ebm - use_missing_indicator_variables: PlayingWithBlocks - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.856,0.406,0.892,0.164,0.927,0.436,0.243,0.902,0.366,0.372,0.957,0.471,0.936,0.628,0.529,0.064


approach: ebm - use_missing_indicator_variables: FunAtThePark - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.855,0.412,0.898,0.162,0.88,0.472,0.259,0.852,0.412,0.674,0.87,0.349,0.963,0.326,0.651,0.037


approach: ebm - use_missing_indicator_variables: MechanicalPuppy - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.858,0.413,0.901,0.155,0.928,0.478,0.251,0.845,0.425,0.721,0.858,0.344,0.967,0.279,0.656,0.033


approach: ebm - use_missing_indicator_variables: BlowingBubbles - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.853,0.405,0.895,0.151,0.926,0.455,0.248,0.852,0.388,0.628,0.875,0.342,0.958,0.372,0.658,0.042


approach: ebm - use_missing_indicator_variables: RhymesAndToys - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.856,0.415,0.902,0.156,0.928,0.463,0.254,0.826,0.418,0.767,0.832,0.32,0.972,0.233,0.68,0.028


approach: ebm - use_missing_indicator_variables: MakeMeLaugh - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.851,0.392,0.893,0.139,0.877,0.464,0.244,0.852,0.4,0.651,0.873,0.346,0.96,0.349,0.654,0.04


approach: ebm - use_missing_indicator_variables: Game administration - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.856,0.405,0.896,0.151,0.927,0.454,0.241,0.856,0.384,0.605,0.882,0.347,0.956,0.395,0.653,0.044


approach: ebm - use_missing_indicator_variables: No touches at all - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.861,0.422,0.897,0.172,0.928,0.464,0.264,0.852,0.4,0.651,0.873,0.346,0.96,0.349,0.654,0.04


approach: ebm - use_missing_indicator_variables: Game grouped touches - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.86,0.413,0.897,0.154,0.927,0.484,0.264,0.854,0.428,0.698,0.87,0.357,0.965,0.302,0.643,0.035


## Experiment 2  `scale_data`

In [46]:

for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):
        
        for scale_data in [True, False]:
            
            for approach in ["LogisticRegression", "DecisionTree", "nam", "ebm", 'xgboost']:
                
                try:
                    df =  pd.read_csv(AUTISM_DATA_PATH)


                    data = Dataset(df=df, 
                                   missing_data_handling='encoding',
                                   imputation_method='without',
                                   sampling_method='without',
                                   scenario = scenario, 
                                   features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                                   scale_data=scale_data, 
                                   use_missing_indicator_variables=False,
                                   verbosity=1, 
                                   proportion_train=1)

                    exp = Experiments(data.dataset_name,
                                      dataset=data, 
                                      approach=approach, 
                                      previous_experiment=None,        
                                      debug=True, 
                                      verbosity=0, 
                                    experiment_folder_name='paper_experiment_2_fs',
                                      save_experiment=True)
                    exp.fit_predict(num_cv=16)

                    print("approach: {} - scale_data: {} - n_features: {}".format(approach, scale_data, n_features))
                    exp.fit_predict(num_cv=16)
                    display(exp.performances_df)                    

                except:
                    print("Faileed on ", scenario, n_features, approach)
                    pass

approach: LogisticRegression - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.73,0.264,0.796,0.082,0.851,0.37,0.181,0.834,0.283,0.488,0.87,0.28,0.943,0.512,0.72,0.057


approach: DecisionTree - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.659,0.372,0.878,0.202,0.964,0.406,0.319,0.915,0.359,0.256,0.983,0.611,0.927,0.744,0.389,0.073


approach: nam - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8493,0.4087,0.8887,0.1639,0.906,0.499,0.2852,0.9064,0.4314,0.4628,0.9525,0.5048,0.9449,0.5372,0.4952,0.0551,0.8493


approach: ebm - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.863,0.457,0.9,0.222,0.977,0.511,0.276,0.898,0.44,0.535,0.935,0.46,0.951,0.465,0.54,0.049


approach: xgboost - scale_data: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.821,0.449,0.925,0.224,0.973,0.48,0.302,0.885,0.406,0.535,0.921,0.411,0.95,0.465,0.589,0.05


approach: LogisticRegression - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.554,0.133,0.475,0.034,0.312,0.231,0.107,0.867,0.137,0.186,0.938,0.235,0.918,0.814,0.765,0.082


approach: DecisionTree - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.659,0.372,0.878,0.202,0.964,0.406,0.319,0.915,0.359,0.256,0.983,0.611,0.927,0.744,0.389,0.073


approach: nam - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8081,0.326,0.8321,0.1149,0.8591,0.4042,0.2053,0.8393,0.3272,0.5488,0.8691,0.309,0.9494,0.4512,0.691,0.0506,0.8081


approach: ebm - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.863,0.457,0.9,0.222,0.977,0.511,0.276,0.898,0.44,0.535,0.935,0.46,0.951,0.465,0.54,0.049


approach: xgboost - scale_data: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.821,0.449,0.925,0.224,0.973,0.48,0.302,0.885,0.406,0.535,0.921,0.411,0.95,0.465,0.589,0.05


approach: LogisticRegression - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.786,0.415,0.882,0.199,0.971,0.487,0.306,0.911,0.423,0.419,0.962,0.529,0.941,0.581,0.471,0.059


approach: DecisionTree - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.605,0.306,0.825,0.145,0.943,0.339,0.296,0.915,0.334,0.209,0.988,0.643,0.924,0.791,0.357,0.076


approach: nam - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8481,0.3764,0.8818,0.1472,0.8833,0.4191,0.229,0.856,0.3479,0.521,0.8905,0.3556,0.9481,0.479,0.6444,0.0519,0.8481


approach: ebm - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.866,0.429,0.895,0.218,0.976,0.446,0.24,0.83,0.389,0.698,0.844,0.316,0.964,0.302,0.684,0.036


approach: xgboost - scale_data: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.855,0.49,0.938,0.27,0.986,0.495,0.323,0.887,0.423,0.558,0.921,0.421,0.953,0.442,0.579,0.047


approach: LogisticRegression - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.728,0.301,0.825,0.097,0.852,0.39,0.205,0.889,0.31,0.349,0.945,0.395,0.933,0.651,0.605,0.067


approach: DecisionTree - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.605,0.306,0.825,0.145,0.943,0.339,0.296,0.915,0.334,0.209,0.988,0.643,0.924,0.791,0.357,0.076


approach: nam - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8051,0.3097,0.8272,0.1065,0.8105,0.3814,0.2052,0.8132,0.3149,0.5884,0.8363,0.2904,0.9528,0.4116,0.7096,0.0472,0.8051


approach: ebm - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.866,0.429,0.895,0.218,0.976,0.446,0.24,0.83,0.389,0.698,0.844,0.316,0.964,0.302,0.684,0.036


approach: xgboost - scale_data: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.855,0.49,0.938,0.27,0.986,0.495,0.323,0.887,0.423,0.558,0.921,0.421,0.953,0.442,0.579,0.047


approach: LogisticRegression - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.78,0.38,0.88,0.149,0.825,0.475,0.264,0.882,0.4,0.535,0.918,0.404,0.95,0.465,0.596,0.05


approach: DecisionTree - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.615,0.311,0.818,0.177,0.95,0.357,0.319,0.919,0.369,0.209,0.993,0.75,0.924,0.791,0.25,0.076


approach: nam - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8317,0.3709,0.8741,0.1528,0.9347,0.4059,0.24,0.8605,0.3337,0.4721,0.9007,0.3682,0.9438,0.5279,0.6318,0.0562,0.8317


approach: ebm - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.859,0.415,0.906,0.159,0.929,0.475,0.259,0.863,0.408,0.628,0.887,0.365,0.958,0.372,0.635,0.042


approach: xgboost - scale_data: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.852,0.472,0.929,0.242,0.982,0.494,0.351,0.9,0.422,0.488,0.942,0.467,0.947,0.512,0.533,0.053


approach: LogisticRegression - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.717,0.312,0.805,0.117,0.859,0.4,0.205,0.854,0.316,0.488,0.892,0.318,0.944,0.512,0.682,0.056


approach: DecisionTree - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.615,0.311,0.818,0.177,0.95,0.357,0.319,0.919,0.369,0.209,0.993,0.75,0.924,0.791,0.25,0.076


approach: nam - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.7824,0.2815,0.7835,0.097,0.779,0.3622,0.1731,0.8118,0.29,0.5487,0.839,0.2777,0.9487,0.4513,0.7223,0.0513,0.7824


approach: ebm - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.859,0.415,0.906,0.159,0.929,0.475,0.259,0.863,0.408,0.628,0.887,0.365,0.958,0.372,0.635,0.042


approach: xgboost - scale_data: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.852,0.472,0.929,0.242,0.982,0.494,0.351,0.9,0.422,0.488,0.942,0.467,0.947,0.512,0.533,0.053


## Experiment 3. `Imputation approach`

In [40]:

for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):
        
        for imputation_method in ['without','mice', 'knn', 'mean']:
            
            for approach in ["LogisticRegression", "DecisionTree", "nam", "ebm", 'xgboost']:
                
                try:
                    df =  pd.read_csv(AUTISM_DATA_PATH)

                    data = Dataset(df=df, 
                                   missing_data_handling='imputation',
                                   imputation_method=imputation_method,
                                   sampling_method='without',
                                   scenario = scenario, 
                                   features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                                   scale_data=True, 
                                   use_missing_indicator_variables=False,
                                   verbosity=1, 
                                   proportion_train=1)

                    exp = Experiments(data.dataset_name,
                                      dataset=data, 
                                      approach=approach, 
                                      previous_experiment=None,        
                                      debug=True, 
                                     experiment_folder_name='paper_experiment_3_fs',
                                      verbosity=0, 
                                      save_experiment=True)

                    print("approach: {} - imputation_method: {} - n_features: {}".format(approach, imputation_method, n_features))
                    exp.fit_predict(num_cv=16)
                    display(exp.performances_df)                     

                except:
                    print("Failed on ", scenario, n_features, approach, imputation_method)
                    pass

approach: LogisticRegression - imputation_method: without - n_features: 10
Failed on  papers 10 LogisticRegression without
approach: DecisionTree - imputation_method: without - n_features: 10
Failed on  papers 10 DecisionTree without
approach: nam - imputation_method: without - n_features: 10
Failed on  papers 10 nam without
approach: ebm - imputation_method: without - n_features: 10
Failed on  papers 10 ebm without
approach: xgboost - imputation_method: without - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.826,0.474,0.932,0.256,0.982,0.52,0.348,0.893,0.451,0.581,0.925,0.446,0.955,0.419,0.554,0.045


approach: LogisticRegression - imputation_method: mice - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.732,0.259,0.801,0.079,0.795,0.367,0.182,0.863,0.277,0.395,0.911,0.315,0.936,0.605,0.685,0.064


approach: DecisionTree - imputation_method: mice - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.751,0.356,0.867,0.125,0.813,0.517,0.273,0.906,0.443,0.488,0.95,0.5,0.947,0.512,0.5,0.053


approach: nam - imputation_method: mice - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8604,0.4249,0.9026,0.161,0.831,0.5504,0.322,0.915,0.4875,0.5185,0.9561,0.552,0.9505,0.4815,0.448,0.0495,0.8604


approach: ebm - imputation_method: mice - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.854,0.447,0.926,0.181,0.883,0.538,0.332,0.904,0.47,0.558,0.94,0.49,0.954,0.442,0.51,0.046


approach: xgboost - imputation_method: mice - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.814,0.456,0.927,0.23,0.974,0.483,0.323,0.867,0.416,0.628,0.892,0.375,0.959,0.372,0.625,0.041


approach: LogisticRegression - imputation_method: knn - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.744,0.252,0.778,0.075,0.799,0.357,0.15,0.841,0.266,0.442,0.882,0.279,0.939,0.558,0.721,0.061


approach: DecisionTree - imputation_method: knn - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.809,0.503,0.943,0.264,0.978,0.526,0.35,0.9,0.457,0.558,0.935,0.471,0.953,0.442,0.529,0.047


approach: nam - imputation_method: knn - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8727,0.4658,0.917,0.2031,0.9267,0.5329,0.3324,0.9126,0.4686,0.4953,0.9557,0.5402,0.9483,0.5047,0.4598,0.0517,0.8727


approach: ebm - imputation_method: knn - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.87,0.493,0.939,0.228,0.938,0.505,0.315,0.9,0.434,0.512,0.94,0.468,0.949,0.488,0.532,0.051


approach: xgboost - imputation_method: knn - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.827,0.415,0.884,0.181,0.964,0.533,0.252,0.891,0.469,0.628,0.918,0.443,0.96,0.372,0.557,0.04


approach: LogisticRegression - imputation_method: mean - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.739,0.254,0.769,0.075,0.79,0.369,0.167,0.856,0.279,0.419,0.901,0.305,0.938,0.581,0.695,0.062


approach: DecisionTree - imputation_method: mean - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.755,0.396,0.898,0.173,0.957,0.5,0.302,0.889,0.428,0.558,0.923,0.429,0.953,0.442,0.571,0.047


approach: nam - imputation_method: mean - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8665,0.4498,0.9052,0.1851,0.8711,0.548,0.3441,0.9167,0.4864,0.5023,0.9593,0.565,0.949,0.4977,0.435,0.051,0.8665


approach: ebm - imputation_method: mean - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.868,0.485,0.935,0.258,0.985,0.505,0.266,0.9,0.434,0.512,0.94,0.468,0.949,0.488,0.532,0.051


approach: xgboost - imputation_method: mean - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.821,0.468,0.932,0.243,0.975,0.495,0.348,0.887,0.423,0.558,0.921,0.421,0.953,0.442,0.579,0.047


approach: LogisticRegression - imputation_method: without - n_features: 22
Failed on  papers 22 LogisticRegression without
approach: DecisionTree - imputation_method: without - n_features: 22
Failed on  papers 22 DecisionTree without
approach: nam - imputation_method: without - n_features: 22
Failed on  papers 22 nam without
approach: ebm - imputation_method: without - n_features: 22
Failed on  papers 22 ebm without
approach: xgboost - imputation_method: without - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.842,0.48,0.935,0.271,0.985,0.476,0.374,0.926,0.459,0.326,0.988,0.737,0.934,0.674,0.263,0.066


approach: LogisticRegression - imputation_method: mice - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.839,0.504,0.946,0.292,0.988,0.518,0.327,0.908,0.45,0.488,0.952,0.512,0.947,0.512,0.488,0.053


approach: DecisionTree - imputation_method: mice - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.73,0.218,0.617,0.06,0.64,0.375,0.146,0.808,0.297,0.581,0.832,0.263,0.951,0.419,0.737,0.049


approach: nam - imputation_method: mice - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8694,0.5063,0.9431,0.2729,0.9637,0.5056,0.372,0.9194,0.4616,0.4046,0.9727,0.6344,0.9406,0.5954,0.3656,0.0594,0.8694


approach: ebm - imputation_method: mice - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.879,0.433,0.898,0.171,0.916,0.473,0.276,0.891,0.397,0.488,0.933,0.429,0.946,0.512,0.571,0.054


approach: xgboost - imputation_method: mice - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.838,0.493,0.94,0.277,0.987,0.494,0.332,0.908,0.427,0.442,0.957,0.514,0.943,0.558,0.486,0.057


approach: LogisticRegression - imputation_method: knn - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.843,0.509,0.951,0.334,0.989,0.494,0.377,0.904,0.424,0.465,0.95,0.488,0.945,0.535,0.512,0.055


approach: DecisionTree - imputation_method: knn - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.696,0.262,0.727,0.082,0.88,0.412,0.187,0.874,0.328,0.442,0.918,0.358,0.941,0.558,0.642,0.059


approach: nam - imputation_method: knn - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8553,0.4964,0.9416,0.2793,0.9804,0.4818,0.357,0.9176,0.4319,0.3743,0.9739,0.5988,0.9379,0.6257,0.4012,0.0621,0.8553


approach: ebm - imputation_method: knn - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.826,0.36,0.844,0.124,0.907,0.453,0.268,0.908,0.39,0.372,0.964,0.516,0.937,0.628,0.484,0.063


approach: xgboost - imputation_method: knn - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.834,0.489,0.939,0.286,0.986,0.485,0.374,0.924,0.453,0.349,0.983,0.682,0.936,0.651,0.318,0.064


approach: LogisticRegression - imputation_method: mean - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.833,0.479,0.94,0.285,0.986,0.468,0.302,0.889,0.392,0.488,0.93,0.42,0.946,0.512,0.58,0.054


approach: DecisionTree - imputation_method: mean - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.768,0.267,0.742,0.082,0.882,0.345,0.149,0.771,0.278,0.628,0.786,0.233,0.953,0.372,0.767,0.047


approach: nam - imputation_method: mean - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8695,0.5143,0.9485,0.3009,0.9862,0.4926,0.3758,0.9205,0.4504,0.3767,0.9767,0.64,0.9381,0.6233,0.36,0.0619,0.8695


approach: ebm - imputation_method: mean - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.879,0.45,0.902,0.204,0.97,0.475,0.269,0.858,0.412,0.651,0.88,0.359,0.961,0.349,0.641,0.039


approach: xgboost - imputation_method: mean - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.843,0.459,0.92,0.242,0.982,0.486,0.318,0.874,0.418,0.605,0.901,0.388,0.957,0.395,0.612,0.043


approach: LogisticRegression - imputation_method: without - n_features: 32
Failed on  papers 32 LogisticRegression without
Failed on  papers 32 DecisionTree without
Failed on  papers 32 nam without
Failed on  papers 32 ebm without
approach: xgboost - imputation_method: without - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.836,0.463,0.929,0.241,0.981,0.472,0.323,0.876,0.398,0.558,0.909,0.387,0.952,0.442,0.613,0.048


approach: LogisticRegression - imputation_method: mice - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.835,0.449,0.925,0.187,0.931,0.506,0.326,0.913,0.443,0.442,0.962,0.543,0.943,0.558,0.457,0.057


Failed on  papers 32 DecisionTree mice
Failed on  papers 32 nam mice
Failed on  papers 32 ebm mice
approach: xgboost - imputation_method: mice - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.84,0.482,0.938,0.269,0.986,0.475,0.33,0.882,0.4,0.535,0.918,0.404,0.95,0.465,0.596,0.05


approach: LogisticRegression - imputation_method: knn - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.853,0.54,0.947,0.309,0.988,0.532,0.398,0.917,0.471,0.465,0.964,0.571,0.946,0.535,0.429,0.054


Failed on  papers 32 DecisionTree knn
Failed on  papers 32 nam knn
Failed on  papers 32 ebm knn
approach: xgboost - imputation_method: knn - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.843,0.461,0.929,0.256,0.984,0.476,0.283,0.878,0.403,0.558,0.911,0.393,0.952,0.442,0.607,0.048


approach: LogisticRegression - imputation_method: mean - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.829,0.493,0.928,0.267,0.984,0.5,0.374,0.924,0.462,0.372,0.981,0.667,0.938,0.628,0.333,0.062


Failed on  papers 32 DecisionTree mean
Failed on  papers 32 nam mean
Failed on  papers 32 ebm mean
approach: xgboost - imputation_method: mean - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.839,0.471,0.924,0.251,0.983,0.455,0.335,0.867,0.38,0.558,0.899,0.364,0.952,0.442,0.636,0.048


## Experiment 4. `Sampling approach`

In [None]:
n=0
for i, n_features in enumerate(list(features['xgboost'].keys())):
    for j, scenario in enumerate(['papers']):
        
        for sampling_method in ['without', 'vanilla', 'smote']:
            
            for approach in ["LogisticRegression", "DecisionTree", "nam", "ebm", 'xgboost']:
                    
                df =  pd.read_csv(AUTISM_DATA_PATH)


                data = Dataset(df=df, 
                               missing_data_handling='encoding',
                               imputation_method='without',
                               sampling_method=sampling_method,
                               scenario = scenario, 
                                   features_name = features[approach][n_features] if approach in features.keys() else features['ebm'][n_features],
                               scale_data=True, 
                               use_missing_indicator_variables=False,
                               verbosity=1, 
                               proportion_train=1)

                exp = Experiments(data.dataset_name,
                                  dataset=data, 
                                  approach=approach, 
                                  previous_experiment=None,        
                                  debug=True, 
                                 experiment_folder_name='paper_experiment_4_fs',
                                  verbosity=0, 
                                  save_experiment=True)
                print("approach: {} - sampling_method: {} - n_features: {}".format(approach, sampling_method, n_features))
                exp.fit_predict(num_cv=16)
                display(exp.performances_df)  
       

approach: LogisticRegression - sampling_method: vanilla - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.768,0.377,0.875,0.153,0.923,0.441,0.238,0.854,0.367,0.581,0.882,0.338,0.953,0.419,0.662,0.047


approach: DecisionTree - sampling_method: vanilla - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.799,0.281,0.746,0.084,0.633,0.453,0.2,0.871,0.363,0.512,0.909,0.367,0.947,0.488,0.633,0.053


approach: nam - sampling_method: vanilla - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8185,0.4186,0.9055,0.1803,0.9213,0.4604,0.2955,0.8739,0.397,0.542,0.9082,0.4157,0.9509,0.458,0.5843,0.0491,0.8185


approach: ebm - sampling_method: vanilla - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.755,0.243,0.728,0.069,0.711,0.384,0.154,0.83,0.302,0.535,0.861,0.284,0.947,0.465,0.716,0.053


approach: xgboost - sampling_method: vanilla - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.832,0.396,0.867,0.155,0.919,0.495,0.25,0.895,0.422,0.512,0.935,0.449,0.949,0.488,0.551,0.051


approach: LogisticRegression - sampling_method: smote - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.766,0.39,0.879,0.158,0.922,0.471,0.258,0.861,0.404,0.628,0.885,0.36,0.958,0.372,0.64,0.042


approach: DecisionTree - sampling_method: smote - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.779,0.259,0.676,0.075,0.636,0.448,0.187,0.863,0.384,0.581,0.892,0.357,0.954,0.419,0.643,0.046


approach: nam - sampling_method: smote - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8169,0.4088,0.9002,0.1679,0.8937,0.459,0.2907,0.8583,0.3902,0.607,0.8842,0.3546,0.9562,0.393,0.6454,0.0438,0.8169


approach: ebm - sampling_method: smote - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.784,0.342,0.822,0.143,0.945,0.442,0.21,0.882,0.362,0.465,0.925,0.392,0.944,0.535,0.608,0.056


approach: xgboost - sampling_method: smote - n_features: 2


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.827,0.354,0.861,0.122,0.869,0.465,0.242,0.898,0.391,0.442,0.945,0.452,0.942,0.558,0.548,0.058


approach: LogisticRegression - sampling_method: without - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.798,0.392,0.885,0.187,0.972,0.412,0.264,0.911,0.361,0.302,0.974,0.542,0.931,0.698,0.458,0.069


approach: DecisionTree - sampling_method: without - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.712,0.312,0.772,0.142,0.938,0.424,0.191,0.874,0.342,0.465,0.916,0.364,0.943,0.535,0.636,0.057


approach: nam - sampling_method: without - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.853,0.4328,0.9082,0.2077,0.9668,0.4483,0.2787,0.846,0.3823,0.6348,0.8678,0.3328,0.9583,0.3652,0.6672,0.0417,0.853


approach: ebm - sampling_method: without - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.865,0.388,0.829,0.17,0.951,0.492,0.21,0.867,0.428,0.651,0.889,0.378,0.961,0.349,0.622,0.039


approach: xgboost - sampling_method: without - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.852,0.411,0.867,0.197,0.97,0.492,0.217,0.858,0.436,0.698,0.875,0.366,0.966,0.302,0.634,0.034


approach: LogisticRegression - sampling_method: vanilla - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.815,0.41,0.894,0.207,0.976,0.43,0.264,0.9,0.358,0.372,0.954,0.457,0.936,0.628,0.543,0.064


approach: DecisionTree - sampling_method: vanilla - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.827,0.274,0.722,0.08,0.568,0.46,0.2,0.867,0.392,0.581,0.897,0.368,0.954,0.419,0.632,0.046


approach: nam - sampling_method: vanilla - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV),Area Under the Curve (AUC)
0,0.8538,0.4323,0.9115,0.2024,0.9687,0.4468,0.2706,0.844,0.3814,0.6372,0.8654,0.332,0.9585,0.3628,0.668,0.0415,0.8538


approach: ebm - sampling_method: vanilla - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.778,0.294,0.838,0.092,0.763,0.392,0.197,0.808,0.322,0.628,0.827,0.273,0.956,0.372,0.727,0.044


approach: xgboost - sampling_method: vanilla - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.871,0.371,0.818,0.124,0.799,0.506,0.295,0.908,0.438,0.465,0.954,0.513,0.945,0.535,0.487,0.055


approach: LogisticRegression - sampling_method: smote - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.788,0.409,0.903,0.204,0.976,0.412,0.264,0.911,0.361,0.302,0.974,0.542,0.931,0.698,0.458,0.069


approach: DecisionTree - sampling_method: smote - n_features: 5


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.791,0.256,0.726,0.073,0.633,0.426,0.182,0.865,0.35,0.512,0.901,0.349,0.947,0.488,0.651,0.053


approach: nam - sampling_method: smote - n_features: 5


# Experiement 5 - Use of Clinical variables 

In [14]:
additional_clinical_variables = ['srs_total_tscore','srs_social_awareness_tscore','srs_social_motivation_tscore','cbcl_scaleIV_score','cbcl_asd_score','mchat_final']

In [15]:
n=0
for i, n_features in enumerate(list(features['ebm'].keys())):
    for use_clinical in [True, False]:

        df =  pd.read_csv(AUTISM_DATA_PATH)


        data = Dataset(df=df, 
                       missing_data_handling='encoding',
                       imputation_method='without',
                       sampling_method='without',
                       scenario = scenario, 
                       features_name = features['ebm'][n_features] + additional_clinical_variables if use_clinical  else features['ebm'][n_features],
                       scale_data=False, 
                       use_missing_indicator_variables=False,
                       verbosity=1, 
                       proportion_train=1)

        exp = Experiments(data.dataset_name,
                          dataset=data, 
                          approach='ebm', 
                          previous_experiment=None,        
                          debug=True, 
                         experiment_folder_name='paper_experiment_5_fs_papers',
                          verbosity=0, 
                          save_experiment=True)
        print("use_clinical: {} - n_features: {}".format(use_clinical, n_features))
        exp.fit_predict(num_cv='loocv')
        display(exp.performances_df)  


use_clinical: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.94,0.855,0.992,0.732,0.998,0.786,0.69,0.959,0.749,0.744,0.981,0.8,0.974,0.256,0.2,0.026


use_clinical: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.823,0.453,0.919,0.224,0.977,0.517,0.289,0.904,0.447,0.512,0.945,0.489,0.949,0.488,0.511,0.051


use_clinical: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.959,0.866,0.995,0.756,0.999,0.8,0.716,0.963,0.768,0.721,0.988,0.861,0.972,0.279,0.139,0.028


use_clinical: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.828,0.405,0.89,0.188,0.97,0.439,0.243,0.898,0.365,0.395,0.95,0.447,0.938,0.605,0.553,0.062


use_clinical: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.945,0.855,0.993,0.724,0.998,0.818,0.664,0.963,0.784,0.814,0.978,0.795,0.981,0.186,0.205,0.019


use_clinical: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.815,0.405,0.905,0.169,0.92,0.455,0.276,0.852,0.388,0.628,0.875,0.342,0.958,0.372,0.658,0.042


In [16]:
additional_clinical_variables = ['mchat_final']

In [17]:
n=0
for i, n_features in enumerate(list(features['ebm'].keys())):
    for use_clinical in [True, False]:

        df =  pd.read_csv(AUTISM_DATA_PATH)


        data = Dataset(df=df, 
                       missing_data_handling='encoding',
                       imputation_method='without',
                       sampling_method='without',
                       scenario = scenario, 
                       features_name = features['ebm'][n_features] + additional_clinical_variables if use_clinical  else features['ebm'][n_features],
                       scale_data=False, 
                       use_missing_indicator_variables=False,
                       verbosity=1, 
                       proportion_train=1)

        exp = Experiments(data.dataset_name,
                          dataset=data, 
                          approach='ebm', 
                          previous_experiment=None,        
                          debug=True, 
                         experiment_folder_name='paper_experiment_5_fs_papers',
                          verbosity=0, 
                          save_experiment=True)
        print("use_clinical: {} - n_features: {}".format(use_clinical, n_features))
        exp.fit_predict(num_cv='loocv')
        display(exp.performances_df)  


use_clinical: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.913,0.752,0.986,0.564,0.997,0.725,0.55,0.95,0.685,0.651,0.981,0.778,0.965,0.349,0.222,0.035


use_clinical: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.823,0.453,0.919,0.224,0.977,0.517,0.289,0.904,0.447,0.512,0.945,0.489,0.949,0.488,0.511,0.051


use_clinical: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.918,0.716,0.974,0.479,0.984,0.71,0.614,0.939,0.664,0.744,0.959,0.653,0.973,0.256,0.347,0.027


use_clinical: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.828,0.405,0.89,0.188,0.97,0.439,0.243,0.898,0.365,0.395,0.95,0.447,0.938,0.605,0.553,0.062


use_clinical: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.917,0.706,0.971,0.442,0.984,0.727,0.57,0.946,0.683,0.721,0.969,0.705,0.971,0.279,0.295,0.029


use_clinical: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.815,0.405,0.905,0.169,0.92,0.455,0.276,0.852,0.388,0.628,0.875,0.342,0.958,0.372,0.658,0.042


In [11]:
additional_clinical_variables = ['mchat_final']

In [12]:
n=0
for i, n_features in enumerate(list(features['ebm'].keys())):
    for use_clinical in [True, False]:

        df =  pd.read_csv(AUTISM_DATA_PATH)


        data = Dataset(df=df, 
                       missing_data_handling='encoding',
                       imputation_method='without',
                       sampling_method='without',
                       scenario = scenario, 
                       features_name = features['ebm'][n_features] + additional_clinical_variables if use_clinical  else features['ebm'][n_features],
                       scale_data=True, 
                       use_missing_indicator_variables=False,
                       verbosity=1, 
                       proportion_train=1)

        exp = Experiments(data.dataset_name,
                          dataset=data, 
                          approach='ebm', 
                          previous_experiment=None,        
                          debug=True, 
                         experiment_folder_name='paper_experiment_5_fs_papers',
                          verbosity=0, 
                          save_experiment=True)
        print("use_clinical: {} - n_features: {}".format(use_clinical, n_features))
        exp.fit()
        exp.dataset.proportion_train = 0.
        exp.dataset._init_scenario(scenario='papers_remote')
        exp.predict()

        display(exp.performances_df)
            

use_clinical: True - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.835,0.751,0.728,0.114,0.598,0.763,0.25,0.76,0.535,0.837,0.702,0.679,0.851,0.163,0.321,0.149


use_clinical: False - n_features: 10


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.722,0.65,0.463,0.118,0.9,0.685,0.13,0.67,0.356,0.744,0.614,0.593,0.761,0.256,0.407,0.239


use_clinical: True - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.829,0.78,0.668,0.263,0.982,0.76,0.314,0.75,0.526,0.86,0.667,0.661,0.864,0.14,0.339,0.136


use_clinical: False - n_features: 22


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.678,0.641,0.375,0.161,0.943,0.634,0.208,0.54,0.2,0.884,0.281,0.481,0.762,0.116,0.519,0.238


use_clinical: True - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.833,0.793,0.687,0.306,0.985,0.757,0.377,0.74,0.519,0.884,0.632,0.644,0.878,0.116,0.356,0.122


use_clinical: False - n_features: 32


Unnamed: 0,AUROC,AUC-PR,AUC-PR-Gain,AUC-PR-Corrected,AUC-PR-Gain-Corrected,F1 score (2 PPVxTPR/(PPV+TPR)),F1 score Corrected,Accuracy,Matthews correlation coefficient (MCC),"Sensitivity, recall, hit rate, or true positive rate (TPR)","Specificity, selectivity or true negative rate (TNR)",Precision or positive predictive value (PPV),Negative predictive value (NPV),Miss rate or false negative rate (FNR),False discovery rate (FDR=1-PPV),False omission rate (FOR=1-NPV)
0,0.698,0.646,0.424,0.178,0.938,0.661,0.245,0.65,0.309,0.698,0.614,0.577,0.729,0.302,0.423,0.271
