# Comparison: GapNet vs. Vanilla

In this tutorial we follow the same steps explaned in the gapnet tutorial notebook and compare them to a Vanilla neural network trained only on the subjects without missing values.

In [None]:
# Architecture of the GapNet
from IPython.display import Image
Image("assets/Gapnet.jpg", width = 500)

The figure shows a schematic representation of the dataset (on the left) and the GapNet approach (on the right) where the training takes place in two stages, where the connectors in black are trained in the first stage and the connectors in gray are trained in the second one.

In [None]:
# Architecture of the Vanilla
from IPython.display import Image
Image("assets/Vanilla.jpg", width = 500)

The figure shows a schematic representation of the dataset (on the left) and the Vanilla neural network (on the right). Only the complete cases are used for Vanilla model. 

## Initialization

First thing first, we load the main gapnet functions

In [None]:
from src import gapnet as gapnet

## Load the dataset

We provide an example dataset adapted from the the simulated dataset [Madelon](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html?highlight=madelon). We provide two files: one including the inputs "X.npy" and one with the targets "Y.npy".

The dataset consists of 1000 subjects of which only 100 have all 40 features.

In [4]:
from numpy import isnan, load

X = load('data/X.npy') 
Y = load('data/Y.npy')

print("Number of features {}".format(X.shape[1]))
print("Number of subjects {}".format(X.shape[0]))

Number of features 40
Number of subjects 1000


## Generate models

Now, it is time to build both gapnet and vanilla model.

It requires first of all to define an object that will include all gapnet elements, and is defined as
gapnet_model = gapnet.generate_gapnet_model()
vanilla_model = gapnet.generate_vanilla_model()

Afterwards, the build_model function is required to introduce the gapnet neural network architecture.
gapnet_model.build_model()
vanilla_model.build_model()

Now, the modes are ready to be trained. Use the following functions to take as inputs the training and validation sets.
gapnet_model.train_first_stage(X_train, Y_train, X_val, Y_val)
gapnet_model.train_second_stage(X_train, Y_train, X_val, Y_val)
vanilla_model.train_single_stage(X_overlap, Y_overlap, X_val, Y_val)


In [5]:
vanilla_model = gapnet.generate_vanilla_model(n_feature = X.shape[1],n_classes = 2)
vanilla_model.build_model(show_summary=True)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 80)                3280      
                                                                 
 dropout (Dropout)           (None, 80)                0         
                                                                 
 dense_1 (Dense)             (None, 80)                6480      
                                                                 
 dropout_1 (Dropout)         (None, 80)                0         
                                                                 
 dense_2 (Dense)             (None, 2)                 162       
                                                                 
Total params: 9,922
Trainable params: 9,922
Non-trainable params: 0
_________________________________________________________________
None


In [6]:
gapnet_model = gapnet.generate_gapnet_model(cluster_sizes = [25,15], n_feature = X.shape[1],n_classes = 2)
gapnet_model.build_model(show_summary=True)

Generating the 1 neural network model ... 
Generating the 2 neural network model ... 
Generating the final gapnet model ... 
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_4 (InputLayer)           [(None, 25)]         0           []                               
                                                                                                  
 input_5 (InputLayer)           [(None, 15)]         0           []                               
                                                                                                  
 dense_9 (Dense)                (None, 50)           1300        ['input_4[0][0]']                
                                                                                                  
 dense_10 (Dense)               (None, 30)           480         ['i

## Train the models

In [7]:
from sklearn.model_selection import StratifiedKFold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
X_overlap, Y_overlap, X_incomplete, Y_incomplete = gapnet.separate_missing_data(X, Y)

for train_index, test_index in skf.split(X_overlap, Y_overlap):
    X_train, X_test = X_overlap[train_index], X_overlap[test_index]
    Y_train, Y_test = Y_overlap[train_index], Y_overlap[test_index]
    X_train, X_test, X_train_overall, Y_train_overall = gapnet.preprocess_standardization_with_missing_data(X_train, Y_train, X_test, X_incomplete, Y_incomplete)
    
    # train vanilla
    vanilla_model.train_single_stage(X_train, Y_train, X_test, Y_test)
    
    # train gapnet
    gapnet_model.train_first_stage(X_train_overall, Y_train_overall, X_test, Y_test)
    gapnet_model.train_second_stage(X_train, Y_train, X_test, Y_test)

Training process of vanilla is done.
Training process of first stage is done.
Training process of second stage is done.
Training process of vanilla is done.
Training process of first stage is done.
Training process of second stage is done.
Training process of vanilla is done.
Training process of first stage is done.
Training process of second stage is done.
Training process of vanilla is done.
Training process of first stage is done.
Training process of second stage is done.
Training process of vanilla is done.
Training process of first stage is done.
Training process of second stage is done.


In [8]:
gapnet.present_results(vanilla_model)

Results :
best_epochs [21, 9, 1, 4, 13]
train_accuracy 0.664+/-0.107 : [0.7   0.712 0.47  0.65  0.788]
test_accuracy 0.580+/-0.103 : [0.55 0.6  0.4  0.65 0.7 ]
test_auc 0.596+/-0.124 : [0.545 0.68  0.395 0.6   0.76 ]
test_sens 0.567+/-0.106 : [0.5   0.6   0.4   0.636 0.7  ]
test_spec 0.593+/-0.104 : [0.6   0.6   0.4   0.667 0.7  ]
test_prec 0.591+/-0.111 : [0.556 0.6   0.4   0.7   0.7  ]


In [9]:
gapnet.present_results(gapnet_model)

Results :
best_epochs [127, 1, 159, 1, 1]
train_accuracy 0.945+/-0.039 : [0.875 0.95  0.938 0.988 0.975]
test_accuracy 0.810+/-0.097 : [0.75 0.7  0.95 0.75 0.9 ]
test_auc 0.930+/-0.060 : [0.828 0.91  1.    0.93  0.98 ]
test_sens 0.804+/-0.061 : [0.75  0.75  0.909 0.778 0.833]
test_spec 0.829+/-0.142 : [0.75  0.667 1.    0.727 1.   ]
test_prec 0.793+/-0.172 : [0.667 0.6   1.    0.7   1.   ]


## Compare the performances

After training the gapnet, it is possible to show the results by plotting the ROC curve, the confusion matrix, the loss, precision and recall functions along the training.

In [None]:
import matplotlib as mpl
%matplotlib inline
mpl.rcParams['figure.figsize'] = (12, 10)

# show training progress of gapnet
gapnet.plot_metrics(gapnet_model.history['gapnet'])

In [None]:
# show training progress of vanilla
gapnet.plot_metrics(vanilla_model.history)

In [None]:
from sklearn.metrics import roc_auc_score
print('AUC-ROC for the GapNet structure: {:.3f}'.format(roc_auc_score(gapnet_model.val_y_labels, gapnet_model.val_y_preds)))
print('AUC-ROC for the Vanilla structure: {:.3f}'.format(roc_auc_score(vanilla_model.val_y_labels, vanilla_model.val_y_preds)))

In [None]:
# show roc curve of both models
gapnet.plot_roc_avg("vanilla", vanilla_model.val_y_labels, vanilla_model.val_y_preds, num_trials, linestyle='solid', color='skyblue')
gapnet.plot_roc_avg("gapnet", gapnet_model.val_y_labels, gapnet_model.val_y_preds, num_trials, linestyle='solid', color='darkorange')

In [None]:
# show histogram for both models
gapnet.plot_hist(gapnet_model.val_aucs, 'gapnet', color='darkorange', alpha=0.5)
gapnet.plot_hist(vanilla_model.val_aucs, 'vallina', color='skyblue', alpha=0.5)

In [None]:
# show confusion matrix for gapnet
gapnet.plot_cm(gapnet_model.val_y_labels, gapnet_model.val_y_preds, 0.5)

In [None]:
# show confusion matrix for vanilla
gapnet.plot_cm(vanilla_model.val_y_labels, vanilla_model.val_y_preds, 0.5)