# AutoPrognosis API Tutorial

A demonstration for AP functionality and operation

This tutorial shows how to use [Autoprognosis](https://arxiv.org/abs/1802.07207). We are using the UCI Spam dataset.

See [installation instructions](../../doc/install.md) to install the dependencies.

Load dataset and show the first five samples:

In [2]:
import pandas as pd
import numpy as np
import initpath_ap
initpath_ap.init_sys_path()
import utilmlab

from sklearn.datasets import load_breast_cancer

#df = load_breast_cancer()
#X_ = pd.DataFrame(df.data)
#Y_ = pd.DataFrame(df.target)

## Import the AutoPrognosis library

In [3]:
import model

# Run the model from command line

--it : total number of iterations for each fold or n-fold cross validation

--cv : If 0, that means a normal validation with train and test (or validation) set. -iValIndex should also be set. Otherwise, n for n-fold cross validation

-iValIndex: address of the test index file, test_indexes.csv or val_indexes.csv", 

--nstage: size of pipeline: 0: auto (selects imputation when missing data is detected),
        1: only classifiers, 
        2: feature processesing + clf, 
        3: imputers + feature processors and clf
        
--ensemble : include ensembles when fitting. It gives an assertion error when set to 0! should be looked into.

--modelindexes : list of 

0 Random Forest,
1 Gradient Boosting, 
2 XGBoost, 
3 Adaboost, 
4 Bagging, 
5 Bernoulli Naive Bayes, 
6 Gauss Naive Bayes, 
7 Multinomial Naive Bayes, 
8 Logistic Regression, 
9 Perceptron, 
10 Decision Trees, 
11 QDA, 
12 LDA, 
13 KNN, 
14 Linear SVM, 
15 Neural Network

In [123]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage_cardio_data_17_feature_noNan.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 15 \
--cv 3 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

[ XGBoost ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', max=15.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
[ XGBoost ]
Iteration number: 1 2s (2s) (30s), Current pipelines:  [[[ XGBoost ]]], BO objective: 0.0
[ XGBoost ]
Iteration number: 2 3s (2s) (25s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.0
[ XGBoost ]
Iteration number: 3 5s (2s) (26s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.9876888921413338
[ XGBoost ]
Iteration number: 4 7s (2s) (25s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.0542197867363827
[ XGBoost ]
Iteration number: 5 9s (2s) (26s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.2441867625077925
[ XGBoost ]
Iteration number: 6 10s (2s) (25s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.2091476146269782
[ XGBoost ]
Iteration number: 7 12s (2s) (25s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.1728061592736738
[ XGBoost ]
Iteration number: 8 13s (2s) (25s)

In [12]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

Score

classifier      aucroc 0.636
classifier      aucprc 0.008
ensemble        aucroc 0.650
ensemble        aucprc 0.008

Report

best score single pipeline (while fitting)    0.644
model_names_single_pipeline                   [ XGBoost ]
best ensemble score (while fittng)            0.652
ensemble_pipelines                            ['[ XGBoost ]', '[ XGBoost ]', '[ XGBoost ]']
ensemble_pipelines_weight                     [0.0, 0.3874943289232847, 0.6125056710767153]
optimisation_metric                           aucroc
hyperparameter_properties                     [{'name': 'XGBoost', 'hyperparameters': {'model': "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n              colsample_bynode=1, colsample_bytree=0.57132313161386,\n              gamma=3.820546853194836, gpu_id=-1, importance_type='gain',\n              interaction_constraints='', learning_rate=0.18662621258764373,\n              max_delta_step=0, max_depth=4,\n              min_child_weight=0.

## Run for image data set with different categories

In [139]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage_green.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 30 \
--cv 3 \
--nstage 4 \
--modelindexes 0 1 2 8 12 \
--num_components 5\
--kernel_freq 100

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', max=30.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 1 4s (4s) (133s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: 0.0
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 2 9s (4s) (134s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: -1.0000000000001883
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 3 14s (5s) (140s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegress

[ LDA ]
Iteration number: 29 216s (7s) (223s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: -1.2745986386869934
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 30 229s (8s) (229s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: -1.28788968102842

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', max=30.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 1 5s (5s) (143s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: 0.0
[ Random Forest

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 28 214s (8s) (229s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: -4.442000490349723
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 29 225s (8s) (233s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: -3.3673582871001266
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 30 237s (8s) (237s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: -2.8227209568175446

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
HBox(children=(FloatProgress(value=0.0, description='BO progress'

[ LDA ]
Iteration number: 26 164s (6s) (189s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: -1.8584658457976386
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 27 171s (6s) (190s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: -1.7927323288207488
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 28 177s (6s) (190s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticRegression ]]], [[[ LDA ]]], BO objective: -1.737319429077127
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]
[ LogisticRegression ]
[ LDA ]
Iteration number: 29 184s (6s) (190s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], [[[ LogisticReg

score_d or result.json contains the initial 4 scores: classifier/ensemble aucroc/aucprc, and is the score and other information of the AP_mdl that is passed to evaluate_ens(). This function makes a deep copy of this model and returns that. In autoprognosis.py, where this function is called, the report_d is then produced out of the deep copy that is passed. I wonder why it is so complicated and why did the model deep copy itself.

report_d or report.json contains the numbers in the middle: after the word Report, and before the long list. This was produced by APReport in model.

clf_d or result_clf.json contains the long list of all pipelines


holy mother of God! From the code, it looks like that the cross validation is done first, and for each train and test set, the model is updated for n (number of iterations) times, instead of fixing a model and testing it on train and test set splits.

In [140]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

Score

classifier      aucroc 0.693
classifier      aucprc 0.066
ensemble        aucroc 0.694
ensemble        aucprc 0.066

Report

best score single pipeline (while fitting)    0.688
model_names_single_pipeline                   [ LDA ]
best ensemble score (while fittng)            0.688
ensemble_pipelines                            ['[ LDA ]', '[ LDA ]', '[ LDA ]']
ensemble_pipelines_weight                     [0.3333333333333333, 0.3333333333333333, 0.3333333333333333]
optimisation_metric                           aucroc
hyperparameter_properties                     [{'name': 'LDA', 'hyperparameters': {'model': 'LinearDiscriminantAnalysis()'}}]
acquisition_type                              LCB
kernel_members                                0 ['Random Forest']
kernel_members                                1 ['Gradient Boosting']
kernel_members                                2 ['XGBoost']
kernel_members                                3 ['Logistic Regression']
kerne

157 GradientBoostingClassifier(learning_rate=0.05365375562536643, max_depth=6,
                           n_estimators=75)   1 0.660 0.054
158 XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=1.0, gamma=0.0,
              gpu_id=None, importance_type='gain', interaction_constraints=None,
              learning_rate=0.005, max_delta_step=None, max_depth=3,
              min_child_weight=2.0, missing=nan, monotone_constraints=None,
              n_estimators=50, n_jobs=None, num_parallel_tree=None,
              random_state=None, reg_alpha=None, reg_lambda=None,
              scale_pos_weight=None, subsample=0.5, tree_method=None,
              validate_parameters=None, verbosity=None)   1 0.660 0.057
159 XGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=0.5099418135931173,
              gamma=6.7579411661680515, gpu_id=None, imp

In [93]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage_orange.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 15 \
--cv 3 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

[ XGBoost ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', max=15.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
[ XGBoost ]
Iteration number: 1 1s (1s) (13s), Current pipelines:  [[[ XGBoost ]]], BO objective: 0.0
[ XGBoost ]
Iteration number: 2 2s (1s) (12s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.9999999999999857
[ XGBoost ]
Iteration number: 3 3s (1s) (13s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.9580813430923157
[ XGBoost ]
Iteration number: 4 3s (1s) (12s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.7569081159660461
[ XGBoost ]
Iteration number: 5 5s (1s) (14s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.6768837219889907
[ XGBoost ]
Iteration number: 6 6s (1s) (14s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.8654784343377803
[ XGBoost ]
Iteration number: 7 7s (1s) (14s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.7767834606108769
[ XGBoost ]
Iteration number: 8 7

In [94]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

Score

classifier      aucroc 0.630
classifier      aucprc 0.092
ensemble        aucroc 0.631
ensemble        aucprc 0.093

Report

best score single pipeline (while fitting)    0.630
model_names_single_pipeline                   [ XGBoost ]
best ensemble score (while fittng)            0.631
ensemble_pipelines                            ['[ XGBoost ]', '[ XGBoost ]', '[ XGBoost ]']
ensemble_pipelines_weight                     [0.8773589546635906, 0.08551817064586709, 0.03712287469054227]
optimisation_metric                           aucroc
hyperparameter_properties                     [{'name': 'XGBoost', 'hyperparameters': {'model': "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n              importance_type='gain', interaction_constraints='',\n              learning_rate=0.15286061750389995, max_delta_step=0, max_depth=3,\n              min_child_weight=1, missing=nan, 

In [95]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage_Vascular.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 15 \
--cv 3 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

[ XGBoost ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', max=15.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
[ XGBoost ]
Iteration number: 1 1s (1s) (15s), Current pipelines:  [[[ XGBoost ]]], BO objective: 0.0
[ XGBoost ]
Iteration number: 2 2s (1s) (15s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.0
[ XGBoost ]
Iteration number: 3 3s (1s) (14s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.307317326797437
[ XGBoost ]
Iteration number: 4 4s (1s) (15s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.8121597643783444
[ XGBoost ]
Iteration number: 5 5s (1s) (15s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.76916563629995
[ XGBoost ]
Iteration number: 6 6s (1s) (15s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.8110097617956091
[ XGBoost ]
Iteration number: 7 7s (1s) (15s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.864485640692292
[ XGBoost ]
Iteration number: 8 8s (1s) (15s), Curre

In [96]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

Score

classifier      aucroc 0.598
classifier      aucprc 0.021
ensemble        aucroc 0.612
ensemble        aucprc 0.021

Report

best score single pipeline (while fitting)    0.586
model_names_single_pipeline                   [ XGBoost ]
best ensemble score (while fittng)            0.621
ensemble_pipelines                            ['[ XGBoost ]', '[ XGBoost ]', '[ XGBoost ]']
ensemble_pipelines_weight                     [0.5, 0.0, 0.5]
optimisation_metric                           aucroc
hyperparameter_properties                     [{'name': 'XGBoost', 'hyperparameters': {'model': "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n              importance_type='gain', interaction_constraints='',\n              learning_rate=0.1272613559635392, max_delta_step=0, max_depth=3,\n              min_child_weight=1, missing=nan, monotone_constraints='()',\n              n_estimators=40, n_jo

In [97]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage_CEREBROVASCULAR.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 15 \
--cv 3 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

[ XGBoost ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', max=15.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
[ XGBoost ]
Iteration number: 1 2s (2s) (25s), Current pipelines:  [[[ XGBoost ]]], BO objective: 0.0
[ XGBoost ]
Iteration number: 2 3s (1s) (19s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.999999999999955
[ XGBoost ]
Iteration number: 3 3s (1s) (17s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.409765912532007
[ XGBoost ]
Iteration number: 4 4s (1s) (17s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.454229611717222
[ XGBoost ]
Iteration number: 5 5s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.7596962489123629
[ XGBoost ]
Iteration number: 6 6s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.7001402864350179
[ XGBoost ]
Iteration number: 7 8s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.6953357913147914
[ XGBoost ]
Iteration number: 8 9s (

In [98]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

Score

classifier      aucroc 0.637
classifier      aucprc 0.009
ensemble        aucroc 0.639
ensemble        aucprc 0.009

Report

best score single pipeline (while fitting)    0.642
model_names_single_pipeline                   [ XGBoost ]
best ensemble score (while fittng)            0.648
ensemble_pipelines                            ['[ XGBoost ]', '[ XGBoost ]', '[ XGBoost ]']
ensemble_pipelines_weight                     [0.1622482204788163, 0.8377517795211836, 0.0]
optimisation_metric                           aucroc
hyperparameter_properties                     [{'name': 'XGBoost', 'hyperparameters': {'model': "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n              importance_type='gain', interaction_constraints='',\n              learning_rate=0.11525872241120932, max_delta_step=0, max_depth=2,\n              min_child_weight=1, missing=nan, monotone_constrai

In [99]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage_CardiacArrhythmias.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 15 \
--cv 3 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

[ XGBoost ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', max=15.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
[ XGBoost ]
Iteration number: 1 2s (2s) (25s), Current pipelines:  [[[ XGBoost ]]], BO objective: 0.0
[ XGBoost ]
Iteration number: 2 3s (1s) (20s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.9999999999999916
[ XGBoost ]
Iteration number: 3 4s (1s) (18s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.799509500513086
[ XGBoost ]
Iteration number: 4 4s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.3563479123031468
[ XGBoost ]
Iteration number: 5 7s (1s) (21s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.6776867638929476
[ XGBoost ]
Iteration number: 6 10s (2s) (24s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.8014555792206507
[ XGBoost ]
Iteration number: 7 12s (2s) (26s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.8027143350739584
[ XGBoost ]
Iteration number: 8 

In [100]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

Score

classifier      aucroc 0.692
classifier      aucprc 0.025
ensemble        aucroc 0.690
ensemble        aucprc 0.026

Report

best score single pipeline (while fitting)    0.691
model_names_single_pipeline                   [ XGBoost ]
best ensemble score (while fittng)            0.694
ensemble_pipelines                            ['[ XGBoost ]', '[ XGBoost ]', '[ XGBoost ]']
ensemble_pipelines_weight                     [0.36874747272793673, 0.4924995424484424, 0.13875298482362083]
optimisation_metric                           aucroc
hyperparameter_properties                     [{'name': 'XGBoost', 'hyperparameters': {'model': "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n              importance_type='gain', interaction_constraints='',\n              learning_rate=0.34972595461412415, max_delta_step=0, max_depth=1,\n              min_child_weight=1, missing=nan, 

In [101]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage_CoronaryArtery.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 15 \
--cv 3 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

[ XGBoost ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', max=15.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
[ XGBoost ]
Iteration number: 1 1s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: 0.0
[ XGBoost ]
Iteration number: 2 2s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.999999999999997
[ XGBoost ]
Iteration number: 3 3s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.1148301364088888
[ XGBoost ]
Iteration number: 4 4s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.364061550597541
[ XGBoost ]
Iteration number: 5 5s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.1939144320767707
[ XGBoost ]
Iteration number: 6 6s (1s) (16s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.3456775868494286
[ XGBoost ]
Iteration number: 7 9s (1s) (18s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.5087008723121689
[ XGBoost ]
Iteration number: 8 10s

In [102]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

Score

classifier      aucroc 0.723
classifier      aucprc 0.035
ensemble        aucroc 0.723
ensemble        aucprc 0.035

Report

best score single pipeline (while fitting)    0.706
model_names_single_pipeline                   [ XGBoost ]
best ensemble score (while fittng)            0.701
ensemble_pipelines                            ['[ XGBoost ]', '[ XGBoost ]', '[ XGBoost ]']
ensemble_pipelines_weight                     [0.32575644822270206, 0.5646886559251522, 0.10955489585214578]
optimisation_metric                           aucroc
hyperparameter_properties                     [{'name': 'XGBoost', 'hyperparameters': {'model': "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n              importance_type='gain', interaction_constraints='',\n              learning_rate=0.23045397963992612, max_delta_step=0, max_depth=2,\n              min_child_weight=1, missing=nan, 

## Run for image data set with manual and auto imputation

In [57]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage_cardio_data_17_feature_noNan.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 15 \
--cv 3 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

[ XGBoost ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', max=15.0, style=ProgressStyle(description_width='initial')), HTML(value='')))
[ XGBoost ]
Iteration number: 1 14s (14s) (207s), Current pipelines:  [[[ XGBoost ]]], BO objective: 0.0
[ XGBoost ]
Iteration number: 2 15s (8s) (114s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.0
[ XGBoost ]
Iteration number: 3 17s (6s) (83s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.2975964266320794
[ XGBoost ]
Iteration number: 4 18s (4s) (67s), Current pipelines:  [[[ XGBoost ]]], BO objective: -0.9867661913394739
[ XGBoost ]
Iteration number: 5 19s (4s) (58s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.1269587298422807
[ XGBoost ]
Iteration number: 6 21s (3s) (52s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.0290689464721487
[ XGBoost ]
Iteration number: 7 22s (3s) (47s), Current pipelines:  [[[ XGBoost ]]], BO objective: -1.0166550632772537
[ XGBoost ]
Iteration number: 8 23s (3

In [58]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

Score

classifier      aucroc 0.690
classifier      aucprc 0.070
ensemble        aucroc 0.691
ensemble        aucprc 0.070

Report

best score single pipeline (while fitting)    0.670
model_names_single_pipeline                   [ XGBoost ]
best ensemble score (while fittng)            0.678
ensemble_pipelines                            ['[ XGBoost ]', '[ XGBoost ]', '[ XGBoost ]']
ensemble_pipelines_weight                     [0.5686781842196585, 0.4313218157803414, 0.0]
optimisation_metric                           aucroc
hyperparameter_properties                     [{'name': 'XGBoost', 'hyperparameters': {'model': "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n              importance_type='gain', interaction_constraints='',\n              learning_rate=0.06711869359355345, max_delta_step=0, max_depth=1,\n              min_child_weight=1, missing=nan, monotone_constrai

In [59]:
!python3 autoprognosis.py\
-i ../../../AutoPrognosisThings/cardio_data/withImage_cardio_data_17_feature_nan.csv\
--target outcome \
-o ../../../AutoPrognosisThings/outputs \
--it 100 \
--cv 3 \
--nstage 4 \
--modelindexes 2\
--num_components 1\
--kernel_freq 100

[ iterative_extra_trees, XGBoost ]
HBox(children=(FloatProgress(value=0.0, description='BO progress', style=ProgressStyle(description_width='initial')), HTML(value='')))
[ iterative_k_neighbors, XGBoost ]
Iteration number: 1 2s (2s) (234s), Current pipelines:  [[[ most_frequent, XGBoost ]]], BO objective: 0.0
[ iterative_decision_tree, XGBoost ]
Iteration number: 2 5s (3s) (250s), Current pipelines:  [[[ iterative_extra_trees, XGBoost ]]], BO objective: -1.0
[ iterative_bayesian_ridge, XGBoost ]
Iteration number: 3 7s (2s) (228s), Current pipelines:  [[[ iterative_k_neighbors, XGBoost ]]], BO objective: -1.0271024092597387
[ iterative_decision_tree, XGBoost ]
Iteration number: 4 10s (2s) (240s), Current pipelines:  [[[ iterative_extra_trees, XGBoost ]]], BO objective: -1.0673626352966024
[ iterative_bayesian_ridge, XGBoost ]
Iteration number: 5 12s (2s) (238s), Current pipelines:  [[[ iterative_k_neighbors, XGBoost ]]], BO objective: -1.0174138264431307
[ iterative_decision_tree, XGBoo

Iteration number: 51 218s (4s) (427s), Current pipelines:  [[[ mean, XGBoost ]]], BO objective: -1.031968431639054
[ iterative_bayesian_ridge, XGBoost ]
Iteration number: 52 219s (4s) (422s), Current pipelines:  [[[ median, XGBoost ]]], BO objective: -1.050275006746963
[ iterative_k_neighbors, XGBoost ]
Iteration number: 53 222s (4s) (419s), Current pipelines:  [[[ iterative_decision_tree, XGBoost ]]], BO objective: -1.045394233255035
[ iterative_decision_tree, XGBoost ]
Iteration number: 54 225s (4s) (416s), Current pipelines:  [[[ median, XGBoost ]]], BO objective: -1.0636549188803426
[ mean, XGBoost ]
Iteration number: 55 226s (4s) (410s), Current pipelines:  [[[ median, XGBoost ]]], BO objective: -1.0436937873355774
[ iterative_extra_trees, XGBoost ]
Iteration number: 56 240s (4s) (428s), Current pipelines:  [[[ median, XGBoost ]]], BO objective: -1.02783203858372
[ iterative_decision_tree, XGBoost ]
Iteration number: 57 243s (4s) (425s), Current pipelines:  [[[ iterative_decision_

Iteration number: 3 18s (6s) (613s), Current pipelines:  [[[ iterative_bayesian_ridge, XGBoost ]]], BO objective: -0.8729723186504094
[ iterative_decision_tree, XGBoost ]
Iteration number: 4 21s (5s) (530s), Current pipelines:  [[[ iterative_k_neighbors, XGBoost ]]], BO objective: -0.9476510760751177
[ iterative_bayesian_ridge, XGBoost ]
Iteration number: 5 23s (5s) (457s), Current pipelines:  [[[ iterative_k_neighbors, XGBoost ]]], BO objective: -1.1392726680874161
[ mean, XGBoost ]
Iteration number: 6 24s (4s) (403s), Current pipelines:  [[[ iterative_bayesian_ridge, XGBoost ]]], BO objective: -0.9769788194852722
[ iterative_extra_trees, XGBoost ]
Iteration number: 7 39s (6s) (557s), Current pipelines:  [[[ median, XGBoost ]]], BO objective: -0.8656090165709337
[ iterative_extra_trees, XGBoost ]
Iteration number: 8 54s (7s) (672s), Current pipelines:  [[[ median, XGBoost ]]], BO objective: -0.8157112824691299
[ most_frequent, XGBoost ]
Iteration number: 9 56s (6s) (617s), Current pip

[ iterative_bayesian_ridge, XGBoost ]
Iteration number: 55 240s (4s) (436s), Current pipelines:  [[[ iterative_k_neighbors, XGBoost ]]], BO objective: -0.8993153308318
[ iterative_decision_tree, XGBoost ]
Iteration number: 56 242s (4s) (432s), Current pipelines:  [[[ iterative_k_neighbors, XGBoost ]]], BO objective: -0.9052010385965472
[ iterative_bayesian_ridge, XGBoost ]
Iteration number: 57 244s (4s) (428s), Current pipelines:  [[[ iterative_extra_trees, XGBoost ]]], BO objective: -0.9153250503318896
[ iterative_decision_tree, XGBoost ]
Iteration number: 58 247s (4s) (425s), Current pipelines:  [[[ most_frequent, XGBoost ]]], BO objective: -0.9081656384101646
[ most_frequent, XGBoost ]
Iteration number: 59 249s (4s) (421s), Current pipelines:  [[[ iterative_k_neighbors, XGBoost ]]], BO objective: -0.8959013138404976
[ median, XGBoost ]
Iteration number: 60 250s (4s) (417s), Current pipelines:  [[[ iterative_decision_tree, XGBoost ]]], BO objective: -0.8965922940858623
[ iterative_de

[ iterative_k_neighbors, XGBoost ]
Iteration number: 5 16s (3s) (330s), Current pipelines:  [[[ iterative_bayesian_ridge, XGBoost ]]], BO objective: -1.207921139577223
[ iterative_extra_trees, XGBoost ]
Iteration number: 6 31s (5s) (521s), Current pipelines:  [[[ mean, XGBoost ]]], BO objective: -1.2843175913964
[ iterative_bayesian_ridge, XGBoost ]
Iteration number: 7 33s (5s) (469s), Current pipelines:  [[[ most_frequent, XGBoost ]]], BO objective: -1.1226993768879385
[ most_frequent, XGBoost ]
Iteration number: 8 34s (4s) (430s), Current pipelines:  [[[ iterative_decision_tree, XGBoost ]]], BO objective: -1.0403802895765186
[ iterative_decision_tree, XGBoost ]
Iteration number: 9 37s (4s) (411s), Current pipelines:  [[[ mean, XGBoost ]]], BO objective: -0.9457716121130618
[ most_frequent, XGBoost ]
Iteration number: 10 38s (4s) (385s), Current pipelines:  [[[ median, XGBoost ]]], BO objective: -0.9526800613712112
[ most_frequent, XGBoost ]
Iteration number: 11 40s (4s) (363s), Curre

[ median, XGBoost ]
Iteration number: 58 212s (4s) (366s), Current pipelines:  [[[ mean, XGBoost ]]], BO objective: -1.0647413499119514
[ iterative_extra_trees, XGBoost ]
Iteration number: 59 227s (4s) (384s), Current pipelines:  [[[ mean, XGBoost ]]], BO objective: -1.064134085523948
[ median, XGBoost ]
Iteration number: 60 228s (4s) (380s), Current pipelines:  [[[ median, XGBoost ]]], BO objective: -1.054781121999606
[ most_frequent, XGBoost ]
Iteration number: 61 229s (4s) (376s), Current pipelines:  [[[ iterative_decision_tree, XGBoost ]]], BO objective: -1.0976555533603913
[ iterative_extra_trees, XGBoost ]
Iteration number: 62 244s (4s) (393s), Current pipelines:  [[[ iterative_decision_tree, XGBoost ]]], BO objective: -1.0917100506907544
[ iterative_bayesian_ridge, XGBoost ]
Iteration number: 63 245s (4s) (389s), Current pipelines:  [[[ iterative_bayesian_ridge, XGBoost ]]], BO objective: -1.0959376902852362
[ iterative_bayesian_ridge, XGBoost ]
Iteration number: 64 246s (4s) (3

In [60]:
!python3 autoprognosis_report.py -i ../../../AutoPrognosisThings/outputs --verbose 1

Score

classifier      aucroc 0.682
classifier      aucprc 0.066
ensemble        aucroc 0.685
ensemble        aucprc 0.068

Report

best score single pipeline (while fitting)    0.684
model_names_single_pipeline                   [ mean, XGBoost ]
best ensemble score (while fittng)            0.683
ensemble_pipelines                            ['[ most_frequent, XGBoost ]', '[ iterative_extra_trees, XGBoost ]', '[ mean, XGBoost ]']
ensemble_pipelines_weight                     [0.5909853790741754, 0.4090146209258246, 0.0]
optimisation_metric                           aucroc
hyperparameter_properties                     [{'name': 'mean'}, {'name': 'XGBoost', 'hyperparameters': {'model': "XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n              importance_type='gain', interaction_constraints='',\n              learning_rate=0.23614286922723263, max_delta_step=0, max_depth=

119 iterative_decision_treeXGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None, gamma=None,
              gpu_id=None, importance_type='gain', interaction_constraints=None,
              learning_rate=0.12543991573802482, max_delta_step=None,
              max_depth=3, min_child_weight=None, missing=nan,
              monotone_constraints=None, n_estimators=20, n_jobs=None,
              num_parallel_tree=None, random_state=None, reg_alpha=None,
              reg_lambda=None, scale_pos_weight=None, subsample=None,
              tree_method=None, validate_parameters=None, verbosity=None)   1 0.678 0.061
120 most_frequentXGBClassifier(base_score=None, booster=None, colsample_bylevel=None,
              colsample_bynode=None, colsample_bytree=None, gamma=None,
              gpu_id=None, importance_type='gain', interaction_constraints=None,
              learning_rate=0.28112780198453613, max_delta_step

# Run the model for image data set

## Run the model for 17 features

In [121]:
X_= pd.read_feather("../../../AutoPrognosisThings/cardio_data/withImage/train_df_noNan")
X_.set_index('eid', inplace=True)
Y_= pd.read_feather("../../../AutoPrognosisThings/cardio_data/withImage/train_df_noNan_outcome_green")
Y_.set_index('eid', inplace=True)
df_all= X_.join(Y_)
df_all.isnull().sum()

age-high-bp-diagnosed              0
average-dias-0                     0
average-sys-0                      0
average-pulse-0                    0
history-of-diabetes                0
gender                             0
age-0                              0
hypertention-medication-0          0
mother-smoker                      0
smoker                             0
ex-smoker                          0
non-smoker                         0
amount-combined                    0
ex-penalty                         0
average-BMI-0                      0
diff-age-and-agehighbpdiagnosed    0
diff-blood-pressures               0
outcome                            0
dtype: int64

In [122]:
df_all.to_csv('../../../AutoPrognosisThings/cardio_data/withImage_cardio_data_17_feature_noNan.csv')

In [46]:
X_= df_all.drop(columns=['outcome'])
Y_= df_all[['outcome']]

In [47]:
metric = 'aucprc'
acquisition_type = 'MPI' # default and prefered is LCB but this generates excessive warnings, MPI is a good compromise.
#I changed kernel_freq=100 and Gibbs_iter=100
model.nmax_model= 4
AP_mdl   = model.AutoPrognosis_Classifier(
    metric=metric, CV=3, num_iter=15, kernel_freq=100, ensemble=True,
    ensemble_size=3, Gibbs_iter=100, burn_in=50, num_components=3, 
    acquisition_type=acquisition_type, my_model_indexes=[2])

In [51]:
AP_mdl.fit(X_, Y_)

## Run the model for 7 features

In [8]:
X_= pd.read_csv("../../../AutoPrognosisThings/cardio_data/withImage2/with_prob_no_test_cereb.csv")
X_.set_index('eid', inplace=True)
Y_= pd.read_feather("../../../AutoPrognosisThings/cardio_data/withImage/train_df_noNan_outcome_CEREBROVASCULAR")
Y_.set_index('eid', inplace=True)
df_all= X_.join(Y_)

In [9]:
df_all= df_all[['gender', 'age-0', 'average-sys-0', 'history-of-diabetes', 'hypertention-medication-0', 'smoker',
                'average-BMI-0', 'prob', 'outcome']]
df_all.isnull().sum()

gender                       0
age-0                        0
average-sys-0                0
history-of-diabetes          0
hypertention-medication-0    0
smoker                       0
average-BMI-0                0
prob                         0
outcome                      0
dtype: int64

In [10]:
df_all.to_csv('../../../AutoPrognosisThings/cardio_data/withImage2/with_prob_no_test_cereb.csv')

In [63]:
X_= df_all.drop(columns=['outcome'])
Y_= df_all[['outcome']]

In [64]:
metric = 'aucprc'
acquisition_type = 'MPI' # default and prefered is LCB but this generates excessive warnings, MPI is a good compromise.
#I changed kernel_freq=100 and Gibbs_iter=100
model.nmax_model= 4
AP_mdl   = model.AutoPrognosis_Classifier(
    metric=metric, CV=3, num_iter=3, kernel_freq=100, ensemble=True,
    ensemble_size=3, Gibbs_iter=100, burn_in=50, num_components=3, 
    acquisition_type=acquisition_type, my_model_indexes=[0,1,2])

In [65]:
AP_mdl.fit(X_, Y_)

[ iterative_bayesian_ridge, Random Forest ]
[ most_frequent, Gradient Boosting ]
[ mean, XGBoost ]


HBox(children=(FloatProgress(value=0.0, description='BO progress', max=3.0, style=ProgressStyle(description_wi…

[ iterative_extra_trees, Random Forest ]
[ iterative_extra_trees, Gradient Boosting ]
[ iterative_extra_trees, XGBoost ]


Iteration number: 1 28s (28s) (83s), Current pipelines:  [[[ iterative_extra_trees, Random Forest ]]], [[[ median, Gradient Boosting ]]], [[[ mean, XGBoost ]]], BO objective: 0.0


[ iterative_k_neighbors, Random Forest ]
[ mean, Gradient Boosting ]
[ median, XGBoost ]


Iteration number: 2 41s (21s) (62s), Current pipelines:  [[[ median, Random Forest ]]], [[[ iterative_bayesian_ridge, Gradient Boosting ]]], [[[ iterative_bayesian_ridge, XGBoost ]]], BO objective: -1.0000000000000075


[ iterative_decision_tree, Random Forest ]
[ iterative_extra_trees, Gradient Boosting ]
[ iterative_decision_tree, XGBoost ]


Iteration number: 3 60s (20s) (60s), Current pipelines:  [[[ median, Random Forest ]]], [[[ iterative_decision_tree, Gradient Boosting ]]], [[[ iterative_k_neighbors, XGBoost ]]], BO objective: -1.383231689627314





**The best model is: **[ most_frequent, XGBoost ]

 |||| Now building the ensemble...

**Ensemble: **['[ iterative_decision_tree, XGBoost ]', '[ most_frequent, Gradient Boosting ]', '[ iterative_decision_tree, XGBoost ]']

**Ensemble weights: **[0.45932538 0.21936044 0.32131418]

**The ensemble did not help.**

[{'name': 'initial', 'aucprc': 0.06245598285917736},
 {'aucprc': 0.06340984877085147,
  'aucroc': 0.6752906925179878,
  'name': '[ iterative_extra_trees, Random Forest ]',
  'cv': 3,
  'iter': 0,
  'component_idx': 0,
  'hyperparameter_properties': [{'name': 'iterative_extra_trees'},
   {'name': 'Random Forest',
    'hyperparameters': {'model': "RandomForestClassifier(max_features='log2', min_samples_leaf=5,\n                       n_estimators=200)"}}],
  'model': '<pipelines.basePipeline.basePipeline object at 0x1a775ab450>'},
 {'aucprc': 0.06285786492454601,
  'aucroc': 0.6844028901948724,
  'name': '[ iterative_extra_trees, Gradient Boosting ]',
  'cv': 3,
  'iter': 0,
  'component_idx': 1,
  'hyperparameter_properties': [{'name': 'iterative_extra_trees'},
   {'name': 'Gradient Boosting',
    'hyperparameters': {'model': 'GradientBoostingClassifier(learning_rate=0.2750469990268981, max_depth=4,\n                           n_estimators=20)'}}],
  'model': '<pipelines.basePipeline.ba

# Run the model by short simple python code

In [None]:
#X_= pd.read_feather("../../../AutoPrognosisThings/cardio_data/ICD10 codes of the paper/train_df_noNan")
X_= pd.read_feather("../../../AutoPrognosisThings/cardio_data/Green ICD10 codes/train_df_noNan")
X_.set_index('eid', inplace=True)
#Y_= pd.read_feather("../../../AutoPrognosisThings/cardio_data/ICD10 codes of the paper/train_df_noNan_outcome")
Y_= pd.read_feather("../../../AutoPrognosisThings/cardio_data/Green ICD10 codes/train_df_noNan_outcome")
Y_.set_index('eid', inplace=True)
df_all= X_.join(Y_)

In [21]:
# make a small random dataset
df_all=df_all.reindex(np.random.permutation(df_all.index))
df_all=df_all[:20000]
df_all.to_csv('../../../AutoPrognosisThings/cardio_data/small_cardio_data.csv')

In [14]:
X_= df_all.drop(columns=['outcome'])
Y_= df_all[['outcome']]

In [68]:
metric = 'aucprc'
acquisition_type = 'MPI' # default and prefered is LCB but this generates excessive warnings, MPI is a good compromise.
#I changed kernel_freq=100 and Gibbs_iter=100
model.nmax_model= 4
AP_mdl   = model.AutoPrognosis_Classifier(
    metric=metric, CV=3, num_iter=3, kernel_freq=100, ensemble=True,
    ensemble_size=3, Gibbs_iter=100, burn_in=50, num_components=3, 
    acquisition_type=acquisition_type, my_model_indexes=[0,1,2], is_nan=False)

In [69]:
AP_mdl.fit(X_, Y_)

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


HBox(children=(FloatProgress(value=0.0, description='BO progress', max=3.0, style=ProgressStyle(description_wi…

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 1 7s (7s) (22s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: 0.0


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 2 12s (6s) (18s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.0000000000000056


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 3 18s (6s) (18s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.1698793364759563





**The best model is: **[ Random Forest ]

 |||| Now building the ensemble...

**Ensemble: **['[ Random Forest ]', '[ Gradient Boosting ]', '[ Gradient Boosting ]']

**Ensemble weights: **[0.46391801 0.41431131 0.12177068]

**The ensemble helps!**

[{'name': 'initial', 'aucprc': 0.0652095870850132},
 {'aucprc': 0.06393364520775931,
  'aucroc': 0.68003974748301,
  'name': '[ Random Forest ]',
  'cv': 3,
  'iter': 0,
  'component_idx': 0,
  'hyperparameter_properties': [{'name': 'Random Forest',
    'hyperparameters': {'model': 'RandomForestClassifier(max_features=0.8, min_samples_leaf=25, n_estimators=75)'}}],
  'model': '<pipelines.basePipeline.basePipeline object at 0x1a32b914d0>'},
 {'aucprc': 0.06817250055182285,
  'aucroc': 0.689237393416243,
  'name': '[ Gradient Boosting ]',
  'cv': 3,
  'iter': 0,
  'component_idx': 1,
  'hyperparameter_properties': [{'name': 'Gradient Boosting',
    'hyperparameters': {'model': 'GradientBoostingClassifier(learning_rate=0.28824961971127044, n_estimators=15)'}}],
  'model': '<pipelines.basePipeline.basePipeline object at 0x1a32bf3410>'},
 {'aucprc': 0.0625410339814459,
  'aucroc': 0.6848175831023787,
  'name': '[ XGBoost ]',
  'cv': 3,
  'iter': 0,
  'component_idx': 2,
  'hyperparameter_pr

## Computing model predictions

##### ~~~First element in the output is the predictions of a single model, the second element is the prediction of the ensemble~~~

In [70]:
AP_mdl.predict(X_)

(array([[0.99561158, 0.00438842],
        [0.98593672, 0.01406328],
        [0.92780567, 0.07219433],
        ...,
        [0.91539711, 0.08460289],
        [0.92031566, 0.07968434],
        [0.93252623, 0.06747377]]),
 array([[0.99561158, 0.00438842],
        [0.98593672, 0.01406328],
        [0.92780567, 0.07219433],
        ...,
        [0.91539711, 0.08460289],
        [0.92031566, 0.07968434],
        [0.93252623, 0.06747377]]))

## Compute performance via multi-fold cross-validation

In [71]:
model.evaluate_ens(X_, Y_, AP_mdl, n_folds=5, visualize=True)

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


HBox(children=(FloatProgress(value=0.0, description='BO progress', max=3.0, style=ProgressStyle(description_wi…

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 1 3s (3s) (10s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: 0.0


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 2 7s (4s) (11s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.0


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 3 12s (4s) (12s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.0267434062648066





**The best model is: **[ Gradient Boosting ]

 |||| Now building the ensemble...

**Ensemble: **['[ Gradient Boosting ]', '[ Gradient Boosting ]', '[ XGBoost ]']

**Ensemble weights: **[0.83517524 0.11811711 0.04670766]

**The ensemble helps!**

**Cross-validation score: **0.07562599409535037

**Cross-validation score with ensembles: **0.07572498446998366

---------------------------------------------------------
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


HBox(children=(FloatProgress(value=0.0, description='BO progress', max=3.0, style=ProgressStyle(description_wi…

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 1 19s (19s) (57s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: 0.0


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 2 37s (19s) (56s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.0000000000000042


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 3 55s (18s) (55s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.3755183789040022





**The best model is: **[ Gradient Boosting ]

 |||| Now building the ensemble...

**Ensemble: **['[ Gradient Boosting ]', '[ Random Forest ]', '[ XGBoost ]']

**Ensemble weights: **[0.14452557 0.41327178 0.44220265]

**The ensemble helps!**

**Cross-validation score: **0.05012572646874525

**Cross-validation score with ensembles: **0.04972130907642249

---------------------------------------------------------
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


HBox(children=(FloatProgress(value=0.0, description='BO progress', max=3.0, style=ProgressStyle(description_wi…

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 1 40s (40s) (121s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: 0.0


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 2 45s (22s) (67s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.0000000000000024


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 3 49s (16s) (49s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -0.8405921073846454





**The best model is: **[ Gradient Boosting ]

 |||| Now building the ensemble...

**Ensemble: **['[ Gradient Boosting ]', '[ Random Forest ]', '[ XGBoost ]']

**Ensemble weights: **[0.31593007 0.35383576 0.33023417]

**The ensemble helps!**

**Cross-validation score: **0.058340096116703016

**Cross-validation score with ensembles: **0.060923429387329836

---------------------------------------------------------
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


HBox(children=(FloatProgress(value=0.0, description='BO progress', max=3.0, style=ProgressStyle(description_wi…

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 1 37s (37s) (110s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: 0.0


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 2 41s (21s) (62s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.0000000000000004


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 3 46s (15s) (46s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.1633587787042599





**The best model is: **[ Random Forest ]

 |||| Now building the ensemble...

**Ensemble: **['[ Random Forest ]', '[ Random Forest ]', '[ XGBoost ]']

**Ensemble weights: **[0.36588068 0.02778245 0.60633687]

**The ensemble helps!**

**Cross-validation score: **0.057586347040475896

**Cross-validation score with ensembles: **0.0610369519261712

---------------------------------------------------------
[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


HBox(children=(FloatProgress(value=0.0, description='BO progress', max=3.0, style=ProgressStyle(description_wi…

[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 1 6s (6s) (18s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: 0.0


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 2 11s (6s) (17s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.000000000000002


[ Random Forest ]
[ Gradient Boosting ]
[ XGBoost ]


Iteration number: 3 21s (7s) (21s), Current pipelines:  [[[ Random Forest ]]], [[[ Gradient Boosting ]]], [[[ XGBoost ]]], BO objective: -1.233075662896327





**The best model is: **[ Random Forest ]

 |||| Now building the ensemble...

**Ensemble: **['[ Random Forest ]', '[ XGBoost ]', '[ XGBoost ]']

**Ensemble weights: **[0.33879514 0.30349428 0.35771058]

**The ensemble did not help.**

**Cross-validation score: **0.0812013417657555

**Cross-validation score with ensembles: **0.08574224400228753

---------------------------------------------------------


**Final Cross-validation score: **(0.064575901097406, 0.01033465836020563)

**Final Cross-validation score with ensembles: **(0.06662978377243894, 0.011071814830114343)

---------------------------------------------------------


((0.064575901097406, 0.01033465836020563),
 (0.06662978377243894, 0.011071814830114343),
 {'clf': {'roc_lst': [0.7143938111486617,
    0.6349281694883624,
    0.6760740091608752,
    0.6667807718990761,
    0.7030356254028869],
   'prc_lst': [0.07562599409535037,
    0.05012572646874525,
    0.058340096116703016,
    0.057586347040475896,
    0.0812013417657555],
   'roc_cur': 0.7030356254028869,
   'prc_cur': 0.0812013417657555},
  'clf_ens': {'roc_lst': [0.7150612582882759,
    0.6432583585886249,
    0.680006329036891,
    0.6994539177144062,
    0.7141445945835679],
   'prc_lst': [0.07572498446998366,
    0.04972130907642249,
    0.060923429387329836,
    0.0610369519261712,
    0.08574224400228753],
   'roc_cur': 0.7141445945835679,
   'prc_cur': 0.08574224400228753}},
 <model.AutoPrognosis_Classifier at 0x1a7bcc1310>,
 [[{'name': 'initial', 'aucprc': 0.0560354075737109},
   {'aucprc': 0.05190446332046906,
    'aucroc': 0.6475465080062162,
    'name': '[ Random Forest ]',
    'cv'

## Visualize data...

In [13]:
AP_mdl.visualize_data(X_)

## Visualize the model...

In [12]:
AP_mdl.APReport()

***Ensemble Report***

**----------------------**

**Rank0:   [ XGBoost ],   Ensemble weight: 0.337141036259983**

**----------------------**

{'model_list': [<models.classifiers.XGboost object at 0x1a3466ae90>], 'explained': '[ *GBoost is an open-source software library which provides the gradient boosting framework for C++, Java, Python, R, and Julia.* ]', 'image_name': None, 'classes': None, 'num_stages': 1, 'pipeline_stages': ['classifier'], 'name': '[ XGBoost ]', 'analysis_mode': None, 'analysis_type': None}


**_____________________________________________**

[ *GBoost is an open-source software library which provides the gradient boosting framework for C++, Java, Python, R, and Julia.* ]

**Rank1:   [ AdaBoost ],   Ensemble weight: 0.33191778877869144**

**----------------------**

{'model_list': [<models.classifiers.Adaboost object at 0x1a34668790>], 'explained': "[ *AdaBoost, short for Adaptive Boosting, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers.* ]", 'image_name': None, 'classes': None, 'num_stages': 1, 'pipeline_stages': ['classifier'], 'name': '[ AdaBoost ]', 'analysis_mode': None, 'analysis_type': None}


**_____________________________________________**

[ *AdaBoost, short for Adaptive Boosting, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire, who won the 2003 Gödel Prize for their work. It can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers.* ]

**Rank2:   [ XGBoost ],   Ensemble weight: 0.33094117496132563**

**----------------------**

{'model_list': [<models.classifiers.XGboost object at 0x1a376f1e90>], 'explained': '[ *GBoost is an open-source software library which provides the gradient boosting framework for C++, Java, Python, R, and Julia.* ]', 'image_name': None, 'classes': None, 'num_stages': 1, 'pipeline_stages': ['classifier'], 'name': '[ XGBoost ]', 'analysis_mode': None, 'analysis_type': None}


**_____________________________________________**

[ *GBoost is an open-source software library which provides the gradient boosting framework for C++, Java, Python, R, and Julia.* ]

**----------------------**

***Kernel Report***

**Component 0**

**Members: ['XGBoost', 'Gradient Boosting', 'Random Forest', 'Neural Network']**

  [1mMat52.     [0;0m  |               value  |  constraints  |  priors
  [1mvariance   [0;0m  |  0.9999990030869095  |      +ve      |        
  [1mlengthscale[0;0m  |  0.9031481051397683  |      +ve      |        


**Component 1**

**Members: ['Multinomial Naive Bayes', 'Bernoulli Naive Bayes', 'Bagging', 'Adaboost']**

  [1mMat52.     [0;0m  |               value  |  constraints  |  priors
  [1mvariance   [0;0m  |  0.9720888366936934  |      +ve      |        
  [1mlengthscale[0;0m  |   5.127609133763431  |      +ve      |        


**Component 2**

**Members: ['Linear SVM', 'KNN', 'Decision Trees', 'Perceptron', 'Logistic Regression', 'Gauss Naive Bayes', 'QDA', 'LDA']**

  [1mMat52.     [0;0m  |               value  |  constraints  |  priors
  [1mvariance   [0;0m  |   46.03231114719721  |      +ve      |        
  [1mlengthscale[0;0m  |  21.029530765403038  |      +ve      |        


{'best_score_single_pipeline': 0.06294085111764766,
 'model_names_single_pipeline': '[ XGBoost ]',
 'ensemble_score': 0.06402736930011581,
 'ensemble_pipelines': ['[ XGBoost ]', '[ AdaBoost ]', '[ XGBoost ]'],
 'ensemble_pipelines_weight': [0.337141036259983,
  0.33191778877869144,
  0.33094117496132563],
 'optimisation_metric': 'aucprc',
 'hyperparameter_properties': [{'name': 'XGBoost',
   'hyperparameters': {'model': "XGBClassifier(base_score=0.5, booster=None, colsample_bylevel=1,\n       colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,\n       importance_type='gain', interaction_constraints=None,\n       learning_rate=0.06145542076570746, max_delta_step=0, max_depth=2,\n       min_child_weight=1, missing=nan, monotone_constraints=None,\n       n_estimators=253, n_jobs=0, num_parallel_tree=1,\n       objective='binary:logistic', random_state=0, reg_alpha=0,\n       reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method=None,\n       validate_parameters=False, verbosi