Model Ensemble: This code performs model ensemble for the top-3 performing models from Model-C using different ensemble strategies such as simple averaging, weighted averaging, majority voting, and stacking. For the stacking ensemble, a neural-network based meta-learner is used to learn to combine the predictions of the top-3 performing models from model-C.

In [None]:
#load libraries
from keras.models import Sequential, Model, Input, load_model
from keras.layers import Conv2D, Dense, MaxPooling2D, SeparableConv2D, BatchNormalization, ZeroPadding2D, GlobalAveragePooling2D,Flatten,Average, BatchNormalization, Dropout
import time
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import average_precision_score
from sklearn.metrics import matthews_corrcoef
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_squared_log_error
from sklearn.metrics import classification_report,confusion_matrix, roc_curve, auc, accuracy_score
import matplotlib.pyplot as plt
import scikitplot as skplt
from itertools import cycle
from sklearn.utils import class_weight
from keras.models import load_model
import numpy as np
import itertools
from keras.utils import plot_model
from keras.callbacks import ModelCheckpoint, TensorBoard, ReduceLROnPlateau
from keras.applications.vgg16 import VGG16
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.applications.inception_v3 import InceptionV3
from keras.applications.xception import Xception
from keras.applications.densenet import DenseNet121
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam, SGD
from keras.utils import to_categorical
from sklearn.metrics import accuracy_score
from keras import applications
%matplotlib inline

In [None]:
#important note: The size of the batch size plays an important role. 
#check whether the number of train, validation and test samples are
#absolutely divisible by the batch size. if not, make sure to add 1
# (+1) while fitting, evaluting and testing like this: do not use workers=1
#reset the generators everytime before using them otherwise you will
#get wierd results

img_width, img_height = 224,224
train_data_dir = 'cv5/train'
test_data_dir = 'cv5/test'
epochs = 64
batch_size = 8 #check if absolutely divisible for train, validation and test, else follow the procedure discussed above
num_classes= 2

# Since the models work with the data of the same shape, we 
#define a single input layer that will be used by every model.

input_shape = (img_width, img_height, 3)
model_input = Input(shape=input_shape)
print(model_input) 

In [None]:
# custom definition for confusion matrix

def plot_confusion_matrix(cm, classes,
                          normalize=False, #if true all values in confusion matrix is between 0 and 1
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, cm[i, j],
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

In [None]:
#declare data generators

train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=2,
      width_shift_range=0.1,
      height_shift_range=0.1,
      shear_range=0.1,
      zoom_range=0.5,
      horizontal_flip=False,
      fill_mode='nearest')

# Note that the validation data should not be augmented!
val_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode='categorical')

validation_generator = val_datagen.flow_from_directory(
        test_data_dir,
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode='categorical',shuffle=False)

test_generator = test_datagen.flow_from_directory(
        test_data_dir,
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode='categorical',shuffle=False)

#identify the number of samples
nb_train_samples = len(train_generator.filenames)
nb_validation_samples = len(validation_generator.filenames)
nb_test_samples = len(test_generator.filenames)

#check the class indices
print(train_generator.class_indices)
print(validation_generator.class_indices)
print(test_generator.class_indices)

#true labels
Y_test=test_generator.classes
print(Y_test.shape)

#convert test labels to categorical
Y_test1=to_categorical(Y_test, num_classes=num_classes, dtype='float32')
print(Y_test1.shape)

In [None]:
#%% assign class weights to balance model training and penalize over-represented classes

class_weights = class_weight.compute_class_weight(
               'balanced',
                np.unique(train_generator.classes), 
                train_generator.classes)
print(class_weights)

In [None]:
#define the custom model

def custom_cnn(model_input):
    x = BatchNormalization()(model_input)
    x = Conv2D(64, (5, 5), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2, 2), strides=(2,2))(x)
    x = Dropout(0.25)(x)
    x = BatchNormalization()(x)
    x = Conv2D(128, (5, 5), padding='same', activation='relu')(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)
    x = Dropout(0.25)(x)
    x = BatchNormalization()(x)
    x = Conv2D(256, (5, 5), padding='same', activation='relu')(x)
    x = GlobalAveragePooling2D()(x)
    x = Dense(256, activation='relu')(x)
    x = Dropout(0.5)(x)
    x = Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=model_input, outputs=x, name='custom_cnn')
    return model

#instantiate the model
custom_model = custom_cnn(model_input)

#display model summary
custom_model.summary()

#plot the model
plot_model(custom_model, to_file='custom_model.png',show_shapes=True, show_layer_names=False)

In [None]:
#load the pretrained custom model on large-scale CXR collection

custom_model.load_weights('custom_cnn.45-0.8608.h5')

#print model summary
custom_model.summary()

In [None]:
#%% VGG16 model 

def vgg16_cnn(model_input):
    vgg16_cnn = VGG16(weights='imagenet', include_top=False, input_tensor=model_input)
    vgg16_cnn = Model(inputs=vgg16_cnn.input, outputs=vgg16_cnn.get_layer('block5_conv3').output)
    x = vgg16_cnn.output
    x = GlobalAveragePooling2D()(x)
    predictions = Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=vgg16_cnn.input, outputs=predictions, name='vgg16_custom')
    model.get_layer(name='block1_conv1').name='block1_conv1VGG'  
    model.get_layer(name='block1_conv2').name='block1_conv2VGG' 
    model.get_layer(name='block2_conv1').name='block2_conv1VGG' 
    model.get_layer(name='block2_conv2').name='block2_conv2VGG' 
    model.get_layer(name='block3_conv1').name='block3_conv1VGG' 
    model.get_layer(name='block3_conv2').name='block3_conv2VGG' 
    model.get_layer(name='block3_conv3').name='block3_conv3VGG' 
    model.get_layer(name='block4_conv1').name='block4_conv1VGG' 
    model.get_layer(name='block4_conv2').name='block4_conv2VGG' 
    model.get_layer(name='block4_conv3').name='block4_conv3VGG' 
    model.get_layer(name='block5_conv1').name='block5_conv1VGG' 
    model.get_layer(name='block5_conv2').name='block5_conv2VGG' 
    model.get_layer(name='block5_conv3').name='block5_conv3VGG' 
    model.get_layer(name='block1_pool').name='block1_poolVGG' 
    model.get_layer(name='block2_pool').name='block2_poolVGG' 
    model.get_layer(name='block3_pool').name='block3_poolVGG' 
    model.get_layer(name='block4_pool').name='block4_poolVGG' 
    return model

#instantiate the model
vgg16_custom_model = vgg16_cnn(model_input)

#display model summary
vgg16_custom_model.summary()

#plot the model
plot_model(vgg16_custom_model, to_file='vgg16_custom_model.png',show_shapes=True, show_layer_names=False)

In [None]:
#load the CXR trained VGG16 model pretrained on large-scale CXR collection

vgg16_custom_model.load_weights('vgg16_custom.13-0.8955.h5')

#print model summary
vgg16_custom_model.summary()

In [None]:
#%% Inception-V3 model 

def inceptionv3_cnn(model_input):
    inceptionv3_cnn = InceptionV3(weights='imagenet', include_top=False, input_tensor=model_input)
    x = inceptionv3_cnn.output
    x = GlobalAveragePooling2D()(x)
    predictions = Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=inceptionv3_cnn.input, outputs=predictions, name='InceptionV3_custom')
    return model

#instantiate the model
inceptionv3_custom_model = inceptionv3_cnn(model_input)

#display model summary
inceptionv3_custom_model.summary()

#plot model
plot_model(inceptionv3_custom_model, to_file='inceptionv3_custom_model.png',show_shapes=True, show_layer_names=False)

In [None]:
#load the CXR trained InceptionV3 model 

inceptionv3_custom_model.load_weights('InceptionV3_custom.17-0.8957.h5')

#print model summary
inceptionv3_custom_model.summary()

In [None]:
#%% InceptionResNetV2 model 

def incepres_cnn(model_input):
    incepres_cnn = InceptionResNetV2(weights='imagenet', include_top=False, input_tensor=model_input)
    x = incepres_cnn.output
    x = GlobalAveragePooling2D()(x)
    predictions = Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=incepres_cnn.input, outputs=predictions, name='InceptionResnet_custom')
    return model

#instantiate the model
inceptionresnet_custom_model = incepres_cnn(model_input)

#display model summary
inceptionresnet_custom_model.summary()

#plot model
plot_model(inceptionresnet_custom_model, to_file='inceptionresnet_custom_model.png',show_shapes=True, show_layer_names=False)

In [None]:
#load the CXR trained InceptionResNet-V2 model

inceptionresnet_custom_model.load_weights('InceptionResnet_custom.24-0.8960.h5')

#print model summary
inceptionresnet_custom_model.summary()

In [None]:
#%% Xception model 

def xception_cnn(model_input):
    xception_cnn = Xception(weights='imagenet', include_top=False, input_tensor=model_input)
    x = xception_cnn.output
    x = GlobalAveragePooling2D()(x)
    predictions = Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=xception_cnn.input, outputs=predictions, name='xception_custom')
    return model

#instantiate the model
xception_custom_model = xception_cnn(model_input)

#display model summary
xception_custom_model.summary()

#plot model summary
plot_model(xception_custom_model, to_file='xception_custom_model.png',show_shapes=True, show_layer_names=False)

In [None]:
#load the CXR trained Xception model

xception_custom_model.load_weights('xception_custom.06-0.8870.h5')

#print model summary
xception_custom_model.summary()

In [None]:
#%% DenseNet121 model

def densenet_cnn(model_input):
    densenet_cnn = DenseNet121(weights='imagenet', include_top=False, input_tensor=model_input)
    x = densenet_cnn.output
    x = GlobalAveragePooling2D()(x)
    predictions = Dense(num_classes, activation='softmax')(x)
    model = Model(inputs=densenet_cnn.input, outputs=predictions, name='densenet121_custom')
    return model

#instantiate the model
densenet_custom_model = densenet_cnn(model_input)

#display model summary
densenet_custom_model.summary()

#plot model
plot_model(densenet_custom_model, to_file='densenet_custom_model.png',show_shapes=True, show_layer_names=False)

In [None]:
#load the CXR trained DenseNet121 model

densenet_custom_model.load_weights('densenet121_custom.18-0.8966.h5')

#print model summary
densenet_custom_model.summary()

In [None]:
#compile and train the pretrained custom model

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
custom_model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 

filepath = 'weights/' + custom_model.name + '.{epoch:02d}-{val_acc:.4f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, 
                             save_weights_only=False, save_best_only=True, mode='max', period=1)
tensor_board = TensorBoard(log_dir='logs/', histogram_freq=0, batch_size=batch_size)
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.2, patience=5,
                              verbose=1, mode='max', min_lr=0.00001)

callbacks_list = [checkpoint, tensor_board, reduce_lr]

#reset generators
train_generator.reset()
validation_generator.reset()

#train the model
history = custom_model.fit_generator(train_generator, steps_per_epoch=nb_train_samples // batch_size + 1,
                                  epochs=epochs, validation_data=validation_generator,
                                  class_weight = class_weights,
                                  callbacks=callbacks_list, 
                                  validation_steps=nb_validation_samples // batch_size + 1, verbose=1) 

In [None]:
# plot and save performance graphs

N = epochs
plt.style.use("ggplot")
plt.figure(figsize=(20,10), dpi=300)
plt.plot(np.arange(1, N+1), history.history["loss"], 'orange', label="train_loss")
plt.plot(np.arange(1, N+1), history.history["val_loss"], 'red', label="val_loss")
plt.plot(np.arange(1, N+1), history.history["acc"], 'blue', label="train_acc")
plt.plot(np.arange(1, N+1), history.history["val_acc"], 'green', label="val_acc")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower right")
plt.savefig("custom_plot_modelc.png")

In [None]:
#%% compile and train the VGG16 model

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)  
vgg16_custom_model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy']) 
filepath = 'weights/' + vgg16_custom_model.name + '.{epoch:02d}-{val_acc:.4f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, 
                             save_weights_only=False, save_best_only=True, mode='max', period=1)
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.2, patience=5,
                              verbose=1, mode='max', min_lr=0.00001)
tensor_board = TensorBoard(log_dir='logs/', histogram_freq=0, batch_size=batch_size)
callbacks_list = [checkpoint, tensor_board, reduce_lr]

#reset generators
train_generator.reset()
validation_generator.reset()

#train the model
history = vgg16_custom_model.fit_generator(train_generator, steps_per_epoch=nb_train_samples // batch_size + 1,
                                  epochs=epochs, validation_data=validation_generator,
                                  class_weight = class_weights,
                                  callbacks=callbacks_list, 
                                  validation_steps=nb_validation_samples // batch_size + 1, verbose=1) 

In [None]:
# plot and save performance graphs

N = epochs
plt.style.use("ggplot")
plt.figure(figsize=(20,10), dpi=300)
plt.plot(np.arange(1, N+1), history.history["loss"], 'orange', label="train_loss")
plt.plot(np.arange(1, N+1), history.history["val_loss"], 'red', label="val_loss")
plt.plot(np.arange(1, N+1), history.history["acc"], 'blue', label="train_acc")
plt.plot(np.arange(1, N+1), history.history["val_acc"], 'green', label="val_acc")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower right")
plt.savefig("vgg16_custom_plot_modelc.png")

In [None]:
#%% compile and train the InceptionV3 model

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)  
inceptionv3_custom_model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 
filepath = 'weights/' + inceptionv3_custom_model.name + '.{epoch:02d}-{val_acc:.4f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, 
                             save_weights_only=False, save_best_only=True, mode='max', period=1)
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.2, patience=5,
                              verbose=1, mode='max', min_lr=0.00001)
tensor_board = TensorBoard(log_dir='logs/', histogram_freq=0, batch_size=batch_size)
callbacks_list = [checkpoint, tensor_board, reduce_lr]

#reset generators
train_generator.reset()
validation_generator.reset()

#train the model
history = inceptionv3_custom_model.fit_generator(train_generator, steps_per_epoch=nb_train_samples // batch_size + 1,
                                  epochs=epochs, validation_data=validation_generator,
                                  class_weight = class_weights,
                                  callbacks=callbacks_list, 
                                  validation_steps=nb_validation_samples // batch_size + 1, verbose=1) 

In [None]:
# plot and save performance graphs

N = epochs
plt.style.use("ggplot")
plt.figure(figsize=(20,10), dpi=300)
plt.plot(np.arange(1, N+1), history.history["loss"], 'orange', label="train_loss")
plt.plot(np.arange(1, N+1), history.history["val_loss"], 'red', label="val_loss")
plt.plot(np.arange(1, N+1), history.history["acc"], 'blue', label="train_acc")
plt.plot(np.arange(1, N+1), history.history["val_acc"], 'green', label="val_acc")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower right")
plt.savefig("InceptionV3_custom_plot_modelc.png")

In [None]:
#train Incetpion ResNet-V2

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)  
inceptionresnet_custom_model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 
filepath = 'weights/' + inceptionresnet_custom_model.name + '.{epoch:02d}-{val_acc:.4f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, 
                             save_weights_only=False, save_best_only=True, mode='max', period=1)
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.2, patience=5,
                              verbose=1, mode='max', min_lr=0.00001)
tensor_board = TensorBoard(log_dir='logs/', histogram_freq=0, batch_size=batch_size)
callbacks_list = [checkpoint, tensor_board, reduce_lr]

#reset generators
train_generator.reset()
validation_generator.reset()

#train the model
history = inceptionresnet_custom_model.fit_generator(train_generator, steps_per_epoch=nb_train_samples // batch_size + 1,
                                  epochs=32, validation_data=validation_generator,
                                  class_weight = class_weights,
                                  callbacks=callbacks_list, 
                                  validation_steps=nb_validation_samples // batch_size + 1, verbose=1) 

In [None]:
# plot and save performance graphs

N = epochs
plt.style.use("ggplot")
plt.figure(figsize=(20,10), dpi=300)
plt.plot(np.arange(1, N+1), history.history["loss"], 'orange', label="train_loss")
plt.plot(np.arange(1, N+1), history.history["val_loss"], 'red', label="val_loss")
plt.plot(np.arange(1, N+1), history.history["acc"], 'blue', label="train_acc")
plt.plot(np.arange(1, N+1), history.history["val_acc"], 'green', label="val_acc")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower right")
plt.savefig("inceptionresnet_custom_plot_modelc.png")

In [None]:
#%% compile and train the Xception model

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)  
xception_custom_model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy']) 
filepath = 'weights/' + xception_custom_model.name + '.{epoch:02d}-{val_acc:.4f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, 
                             save_weights_only=False, save_best_only=True, mode='max', period=1)
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.2, patience=5,
                              verbose=1, mode='max', min_lr=0.00001)
tensor_board = TensorBoard(log_dir='logs/', histogram_freq=0, batch_size=batch_size)
callbacks_list = [checkpoint, tensor_board, reduce_lr]

#reset generators
train_generator.reset()
validation_generator.reset()

#train the model
history = xception_custom_model.fit_generator(train_generator, steps_per_epoch=nb_train_samples // batch_size + 1,
                                  epochs=32, validation_data=validation_generator,
                                  class_weight = class_weights,
                                  callbacks=callbacks_list, 
                                  validation_steps=nb_validation_samples // batch_size + 1, verbose=1) 

In [None]:
# plot and save performance graphs

N = epochs
plt.style.use("ggplot")
plt.figure(figsize=(20,10), dpi=300)
plt.plot(np.arange(1, N+1), history.history["loss"], 'orange', label="train_loss")
plt.plot(np.arange(1, N+1), history.history["val_loss"], 'red', label="val_loss")
plt.plot(np.arange(1, N+1), history.history["acc"], 'blue', label="train_acc")
plt.plot(np.arange(1, N+1), history.history["val_acc"], 'green', label="val_acc")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower right")
plt.savefig("Xception_custom_plot_modelc.png")

In [None]:
#%% compile and train the DenseNet121 model

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)  
densenet_custom_model.compile(optimizer=sgd,loss='categorical_crossentropy', metrics=['accuracy']) 
filepath = 'weights/' + densenet_custom_model.name + '.{epoch:02d}-{val_acc:.4f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, 
                             save_weights_only=False, save_best_only=True, mode='max', period=1)
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.2, patience=5,
                              verbose=1, mode='max', min_lr=0.00001)

tensor_board = TensorBoard(log_dir='logs/', histogram_freq=0, batch_size=batch_size)
callbacks_list = [checkpoint, tensor_board, reduce_lr]

#reset generators
train_generator.reset()
validation_generator.reset()

#train the model
history = densenet_custom_model.fit_generator(train_generator, steps_per_epoch=nb_train_samples // batch_size + 1,
                                  epochs=32, validation_data=validation_generator,
                                  class_weight = class_weights,
                                  callbacks=callbacks_list, 
                                  validation_steps=nb_validation_samples // batch_size + 1, verbose=1) 

In [None]:
# plot and save performance graphs

N = epochs
plt.style.use("ggplot")
plt.figure(figsize=(20,10), dpi=300)
plt.plot(np.arange(1, N+1), history.history["loss"], 'orange', label="train_loss")
plt.plot(np.arange(1, N+1), history.history["val_loss"], 'red', label="val_loss")
plt.plot(np.arange(1, N+1), history.history["acc"], 'blue', label="train_acc")
plt.plot(np.arange(1, N+1), history.history["val_acc"], 'green', label="val_acc")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower right")
plt.savefig("DenseNet_custom_plot_modelc.png")

In [None]:
#Evaluate the models by loading the best weights
#custom model

custom_model.load_weights('weights/custom_cnn.02-0.8561.h5')
custom_model.summary()

#compile the custom model
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
custom_model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 

#reset validation generator
validation_generator.reset()

#calculate validation accuracy
scorecustom = custom_model.evaluate_generator(validation_generator, nb_validation_samples // batch_size + 1, verbose = 1)
print("Validation Accuracy = ",scorecustom[1])

In [None]:
#measure performance on test data

#first reset the test generator otherwise it gives wierd results
test_generator.reset()

#evaluate accuracy 
custom_y_pred = custom_model.predict_generator(test_generator, nb_test_samples//batch_size + 1, verbose=1)

#print prediction shapes
print(custom_y_pred.shape)
print(Y_test.shape)

#measure performance metrics

accuracy = accuracy_score(Y_test1.argmax(axis=-1),custom_y_pred.argmax(axis=-1))
print('The test accuracy of the Custom model is: ', accuracy)

#evaluate mean squared error
custom_mse = mean_squared_error(Y_test1.argmax(axis=-1),custom_y_pred.argmax(axis=-1))
print('The Mean Squared Error of the Custom model is: ', custom_mse)

#evaluate mean squared log error
custom_msle = mean_squared_log_error(Y_test1.argmax(axis=-1),custom_y_pred.argmax(axis=-1))  
print('The Mean Squared Log Error of the Custom model is: ', custom_msle)

#evaluate matthews correlation coefficient
custom_MCC = matthews_corrcoef(Y_test1.argmax(axis=-1),custom_y_pred.argmax(axis=-1))
print('The Matthews correlation coefficient value (MCC) for the Custom model is: ', custom_MCC)

In [None]:
#%% print classification report and plot confusion matrix

target_names = ['class 0(abnormal)','class 1(normal)'] 
print(classification_report(Y_test1.argmax(axis=-1),custom_y_pred.argmax(axis=-1),
                            target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test1.argmax(axis=-1),custom_y_pred.argmax(axis=-1))
np.set_printoptions(precision=4)

# Plot normalized confusion matrix using scikit plot
skplt.metrics.plot_confusion_matrix(Y_test1.argmax(axis=-1),custom_y_pred.argmax(axis=-1),
                                    normalize=True, x_tick_rotation=45, figsize=(20,10),
                                    title_fontsize='large', text_fontsize='medium')
plt.show()

# Plot non-normalized confusion matrix using scikit learn
plt.figure(figsize=(10,10), dpi=300)
plot_confusion_matrix(cnf_matrix, classes=target_names)
plt.show()

In [None]:
#%% compute the ROC-AUC values

skplt.metrics.plot_roc(Y_test,custom_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#compute precision-recall curves

colors = cycle(['red', 'blue', 'green', 'cyan', 'teal'])

plt.figure(figsize=(15,10), dpi=100)
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))
    
# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(num_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test1[:, i],
                                                        custom_y_pred[:, i])
    average_precision[i] = average_precision_score(Y_test1[:, i], custom_y_pred[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test1.ravel(),
   custom_y_pred.ravel())
average_precision["micro"] = average_precision_score(Y_test1, custom_y_pred,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.4f}'
      .format(average_precision["micro"]))

lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.4f})'
              ''.format(average_precision["micro"]))

for i, color in zip(range(num_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.4f})'
                  ''.format(i, average_precision[i]))

fig = plt.gcf()
fig.subplots_adjust(bottom=0.05)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))
plt.show()

In [None]:
#%% plot the KS statistic plot

skplt.metrics.plot_ks_statistic(Y_test,custom_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#VGG-16

vgg16_custom_model.load_weights('weights/vgg16_custom.16-0.8731.h5')
vgg16_custom_model.summary()

#compile the model
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
vgg16_custom_model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 

#reset validation generator
validation_generator.reset()

#calculate validation accuracy, make sure the batch size is absolutely divisible or add +1
scorevgg16 = vgg16_custom_model.evaluate_generator(validation_generator, nb_validation_samples // batch_size + 1, verbose=1)
print("Validation Accuracy = ",scorevgg16[1])

#measure performance on test data, first reset the test generator otherwise it gives wierd results
test_generator.reset()

#evaluate accuracy 
vgg16_custom_y_pred = vgg16_custom_model.predict_generator(test_generator, nb_test_samples//batch_size + 1, verbose=1)

#print prediction shapes
print(vgg16_custom_y_pred.shape)
print(Y_test.shape)
print(Y_test1.shape)

In [None]:
#measure performance metrics

accuracy = accuracy_score(Y_test1.argmax(axis=-1),vgg16_custom_y_pred.argmax(axis=-1))
print('The test accuracy of the Custom model is: ', accuracy)

#evaluate mean squared error
custom_mse = mean_squared_error(Y_test1.argmax(axis=-1),vgg16_custom_y_pred.argmax(axis=-1))
print('The Mean Squared Error of the Custom model is: ', custom_mse)

#evaluate mean squared log error
custom_msle = mean_squared_log_error(Y_test1.argmax(axis=-1),vgg16_custom_y_pred.argmax(axis=-1))  
print('The Mean Squared Log Error of the Custom model is: ', custom_msle)

#evaluate matthews correlation coefficient
custom_MCC = matthews_corrcoef(Y_test1.argmax(axis=-1),vgg16_custom_y_pred.argmax(axis=-1))
print('The Matthews correlation coefficient value (MCC) for the Custom model is: ', custom_MCC)

In [None]:
#%% print classification report and plot confusion matrix

target_names = ['class 0(abnormal)','class 1(normal)'] 
print(classification_report(Y_test1.argmax(axis=-1),vgg16_custom_y_pred.argmax(axis=-1),
                            target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test1.argmax(axis=-1),vgg16_custom_y_pred.argmax(axis=-1))
np.set_printoptions(precision=4)

# Plot normalized confusion matrix using scikit plot
skplt.metrics.plot_confusion_matrix(Y_test1.argmax(axis=-1),vgg16_custom_y_pred.argmax(axis=-1),
                                    normalize=True, x_tick_rotation=45, figsize=(20,10),
                                    title_fontsize='large', text_fontsize='medium')
plt.show()

# Plot non-normalized confusion matrix using scikit learn
plt.figure(figsize=(10,10), dpi=300)
plot_confusion_matrix(cnf_matrix, classes=target_names)
plt.show()

In [None]:
#%% compute the ROC-AUC values

skplt.metrics.plot_roc(Y_test,vgg16_custom_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#compute precision-recall curves

from itertools import cycle
colors = cycle(['red', 'blue', 'green', 'cyan', 'teal'])

plt.figure(figsize=(15,10), dpi=100)
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))
    
# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(num_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test1[:, i],
                                                        vgg16_custom_y_pred[:, i])
    average_precision[i] = average_precision_score(Y_test1[:, i], vgg16_custom_y_pred[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test1.ravel(),
   vgg16_custom_y_pred.ravel())
average_precision["micro"] = average_precision_score(Y_test1, vgg16_custom_y_pred,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.4f}'
      .format(average_precision["micro"]))

lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.4f})'
              ''.format(average_precision["micro"]))

for i, color in zip(range(num_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.4f})'
                  ''.format(i, average_precision[i]))

fig = plt.gcf()
fig.subplots_adjust(bottom=0.05)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))
plt.show()

In [None]:
#%% plot the KS statistic plot

skplt.metrics.plot_ks_statistic(Y_test,vgg16_custom_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#Evaluate the Xception model by loading the best weights

xception_custom_model.load_weights('weights/xception_custom.09-0.8507.h5')
xception_custom_model.summary()

#compile the model
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
xception_custom_model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 

#reset validation generator
validation_generator.reset()

#calculate validation accuracy, make sure the batch size is absolutely divisible or add +1
scorexception = xception_custom_model.evaluate_generator(validation_generator, nb_validation_samples // batch_size + 1, verbose=1)
print("Validation Accuracy = ",scorexception[1])

#measure performance on test data, first reset the test generator otherwise it gives wierd results
test_generator.reset()

#evaluate accuracy
xception_custom_y_pred = xception_custom_model.predict_generator(test_generator, nb_test_samples//batch_size + 1, verbose=1)
print(xception_custom_y_pred.shape)
print(Y_test.shape)

In [None]:
# evaluate the performance metrics

accuracy = accuracy_score(Y_test1.argmax(axis=-1),xception_custom_y_pred.argmax(axis=-1))
print('The test accuracy of the Custom model is: ', accuracy)

#evaluate mean squared error
custom_mse = mean_squared_error(Y_test1.argmax(axis=-1),xception_custom_y_pred.argmax(axis=-1))
print('The Mean Squared Error of the Custom model is: ', custom_mse)

#evaluate mean squared log error
custom_msle = mean_squared_log_error(Y_test1.argmax(axis=-1),xception_custom_y_pred.argmax(axis=-1))  
print('The Mean Squared Log Error of the Custom model is: ', custom_msle)

#evaluate matthews correlation coefficient
custom_MCC = matthews_corrcoef(Y_test1.argmax(axis=-1),xception_custom_y_pred.argmax(axis=-1))
print('The Matthews correlation coefficient value (MCC) for the Custom model is: ', custom_MCC)

In [None]:
#%% print classification report and plot confusion matrix

target_names = ['class 0(abnormal)','class 1(normal)'] 
print(classification_report(Y_test1.argmax(axis=-1),xception_custom_y_pred.argmax(axis=-1),
                            target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test1.argmax(axis=-1),xception_custom_y_pred.argmax(axis=-1))
np.set_printoptions(precision=4)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,10), dpi=300)
plot_confusion_matrix(cnf_matrix, classes=target_names)
plt.show()

# Plot normalized confusion matrix using scikit plot
skplt.metrics.plot_confusion_matrix(Y_test1.argmax(axis=-1),xception_custom_y_pred.argmax(axis=-1),
                                    normalize=True, x_tick_rotation=45, figsize=(20,10),
                                    title_fontsize='large', text_fontsize='medium')
plt.show()

In [None]:
#%% compute the ROC-AUC values

skplt.metrics.plot_roc(Y_test,xception_custom_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#compute precision-recall curves

colors = cycle(['red', 'blue', 'green', 'cyan', 'teal'])

plt.figure(figsize=(15,10), dpi=100)
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))
    
# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(num_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test1[:, i],
                                                        xception_custom_y_pred[:, i])
    average_precision[i] = average_precision_score(Y_test1[:, i], xception_custom_y_pred[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test1.ravel(),
   xception_custom_y_pred.ravel())
average_precision["micro"] = average_precision_score(Y_test1, xception_custom_y_pred,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.4f}'
      .format(average_precision["micro"]))

lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.4f})'
              ''.format(average_precision["micro"]))

for i, color in zip(range(num_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.4f})'
                  ''.format(i, average_precision[i]))

fig = plt.gcf()
fig.subplots_adjust(bottom=0.05)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))
plt.show()

In [None]:
#%% plot the KS statistic plot

skplt.metrics.plot_ks_statistic(Y_test,xception_custom_y_pred,figsize=(20,10),
                                title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#Evaluate the DenseNet model by loading the best weights

densenet_custom_model.load_weights('weights/densenet121_custom.15-0.8657.h5')
densenet_custom_model.summary()

#compile the model
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
densenet_custom_model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 

#reset validation generator
validation_generator.reset()

#calculate validation accuracy, make sure the batch size is absolutely divisible or add +1
scoredensenet = densenet_custom_model.evaluate_generator(validation_generator, nb_validation_samples // batch_size + 1, verbose=1)
print("Validation Accuracy = ",scoredensenet[1])

#measure performance on test data, first reset the test generator otherwise it gives wierd results
test_generator.reset()

#evaluate accuracy
densenet_custom_y_pred = densenet_custom_model.predict_generator(test_generator, nb_test_samples//batch_size + 1, verbose=1)
print(densenet_custom_y_pred.shape)
print(Y_test.shape)

In [None]:
#Evaluate the performance metrics of the DenseNet model

accuracy = accuracy_score(Y_test1.argmax(axis=-1),densenet_custom_y_pred.argmax(axis=-1))
print('The test accuracy of the Custom model is: ', accuracy)

#evaluate mean squared error
custom_mse = mean_squared_error(Y_test1.argmax(axis=-1),densenet_custom_y_pred.argmax(axis=-1))
print('The Mean Squared Error of the Custom model is: ', custom_mse)

#evaluate mean squared log error
custom_msle = mean_squared_log_error(Y_test1.argmax(axis=-1),densenet_custom_y_pred.argmax(axis=-1))  
print('The Mean Squared Log Error of the Custom model is: ', custom_msle)

#evaluate matthews correlation coefficient
custom_MCC = matthews_corrcoef(Y_test1.argmax(axis=-1),densenet_custom_y_pred.argmax(axis=-1))
print('The Matthews correlation coefficient value (MCC) for the Custom model is: ', custom_MCC)

In [None]:
#%% print classification report and plot confusion matrix

target_names = ['class 0(abnormal)','class 1(normal)'] 
print(classification_report(Y_test1.argmax(axis=-1),densenet_custom_y_pred.argmax(axis=-1),
                            target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test1.argmax(axis=-1),densenet_custom_y_pred.argmax(axis=-1))
np.set_printoptions(precision=4)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,10), dpi=300)
plot_confusion_matrix(cnf_matrix, classes=target_names)
plt.show()

# Plot normalized confusion matrix using scikit plot
skplt.metrics.plot_confusion_matrix(Y_test1.argmax(axis=-1),densenet_custom_y_pred.argmax(axis=-1),
                                    normalize=True, x_tick_rotation=45, figsize=(20,10),
                                    title_fontsize='large', text_fontsize='medium')
plt.show()

In [None]:
#%% compute the ROC-AUC values

skplt.metrics.plot_roc(Y_test,densenet_custom_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#compute precision-recall curves

colors = cycle(['red', 'blue', 'green', 'cyan', 'teal'])

plt.figure(figsize=(15,10), dpi=100)
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))
    
# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(num_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test1[:, i],
                                                        densenet_custom_y_pred[:, i])
    average_precision[i] = average_precision_score(Y_test1[:, i], densenet_custom_y_pred[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test1.ravel(),
   densenet_custom_y_pred.ravel())
average_precision["micro"] = average_precision_score(Y_test1, densenet_custom_y_pred,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.4f}'
      .format(average_precision["micro"]))

lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.4f})'
              ''.format(average_precision["micro"]))

for i, color in zip(range(num_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.4f})'
                  ''.format(i, average_precision[i]))

fig = plt.gcf()
fig.subplots_adjust(bottom=0.05)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))
plt.show()

In [None]:
#%% plot the KS statistic plot

skplt.metrics.plot_ks_statistic(Y_test,densenet_custom_y_pred,figsize=(20,10),
                                title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#Evaluate the Inception-V3 model by loading the best weights

inceptionv3_custom_model.load_weights('weights/InceptionV3_custom.13-0.8731.h5')
inceptionv3_custom_model.summary()

#compile the model
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
inceptionv3_custom_model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 

#reset validation generator
validation_generator.reset()

#calculate validation accuracy, make sure the batch size is absolutely divisible or add +1
scoreinceptionv3 = inceptionv3_custom_model.evaluate_generator(validation_generator, 
                                                               nb_validation_samples // batch_size + 1, verbose=1)
print("Validation Accuracy = ",scoreinceptionv3[1])

#measure performance on test data, first reset the test generator otherwise it gives wierd results
test_generator.reset()

#evaluate accuracy
inceptionv3_custom_y_pred = inceptionv3_custom_model.predict_generator(test_generator, 
                                                                       nb_test_samples//batch_size + 1, verbose=1)
print(inceptionv3_custom_y_pred.shape)
print(Y_test.shape)

In [None]:
#evaluate the performance metrics of the Inception-V3 model

accuracy = accuracy_score(Y_test1.argmax(axis=-1),inceptionv3_custom_y_pred.argmax(axis=-1))
print('The test accuracy of the Custom model is: ', accuracy)

#evaluate mean squared error
custom_mse = mean_squared_error(Y_test1.argmax(axis=-1),inceptionv3_custom_y_pred.argmax(axis=-1))
print('The Mean Squared Error of the Custom model is: ', custom_mse)

#evaluate mean squared log error
custom_msle = mean_squared_log_error(Y_test1.argmax(axis=-1),inceptionv3_custom_y_pred.argmax(axis=-1))  
print('The Mean Squared Log Error of the Custom model is: ', custom_msle)

#evaluate matthews correlation coefficient
custom_MCC = matthews_corrcoef(Y_test1.argmax(axis=-1),inceptionv3_custom_y_pred.argmax(axis=-1))
print('The Matthews correlation coefficient value (MCC) for the Custom model is: ', custom_MCC)

In [None]:
#%% print classification report and plot confusion matrix

target_names = ['class 0(abnormal)','class 1(normal)'] 
print(classification_report(Y_test1.argmax(axis=-1),inceptionv3_custom_y_pred.argmax(axis=-1),
                            target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test1.argmax(axis=-1),inceptionv3_custom_y_pred.argmax(axis=-1))
np.set_printoptions(precision=4)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,10), dpi=300)
plot_confusion_matrix(cnf_matrix, classes=target_names)
plt.show()

# Plot normalized confusion matrix using scikit plot
skplt.metrics.plot_confusion_matrix(Y_test1.argmax(axis=-1),inceptionv3_custom_y_pred.argmax(axis=-1),
                                    normalize=True, x_tick_rotation=45, figsize=(20,10),
                                    title_fontsize='large', text_fontsize='medium')
plt.show()

In [None]:
#%% compute the ROC-AUC values

skplt.metrics.plot_roc(Y_test,inceptionv3_custom_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#compute precision-recall curves

colors = cycle(['red', 'blue', 'green', 'cyan', 'teal'])

plt.figure(figsize=(15,10), dpi=100)
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))
    
# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(num_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test1[:, i],
                                                        inceptionv3_custom_y_pred[:, i])
    average_precision[i] = average_precision_score(Y_test1[:, i], inceptionv3_custom_y_pred[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test1.ravel(),
   inceptionv3_custom_y_pred.ravel())
average_precision["micro"] = average_precision_score(Y_test1, inceptionv3_custom_y_pred,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.4f}'
      .format(average_precision["micro"]))

lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.4f})'
              ''.format(average_precision["micro"]))

for i, color in zip(range(num_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.4f})'
                  ''.format(i, average_precision[i]))

fig = plt.gcf()
fig.subplots_adjust(bottom=0.05)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))
plt.show()

In [None]:
#%% plot the KS statistic plot

skplt.metrics.plot_ks_statistic(Y_test,inceptionv3_custom_y_pred,figsize=(20,10),
                                title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#Evaluate the InceptionResnet model by loading the best weights

inceptionresnet_custom_model.load_weights('weights/InceptionResnet_custom.10-0.8582.h5')
inceptionresnet_custom_model.summary()

#compile the model
sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
inceptionresnet_custom_model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 

#reset validation generator
validation_generator.reset()

#calculate validation accuracy, make sure the batch size is absolutely divisible or add +1
scoreinceptionresnet = inceptionresnet_custom_model.evaluate_generator(validation_generator,
                                                                       nb_validation_samples // batch_size + 1, verbose=1)
print("Validation Accuracy = ",scoreinceptionresnet[1])

#measure performance on test data, first reset the test generator otherwise it gives wierd results
test_generator.reset()

#evaluate accuracy
inceptionresnet_custom_y_pred = inceptionresnet_custom_model.predict_generator(test_generator,
                                                                               nb_test_samples//batch_size + 1, verbose=1)
print(inceptionresnet_custom_y_pred.shape)
print(Y_test.shape)

In [None]:
#evaluate the performance metrics of the InceptionResnet model

accuracy = accuracy_score(Y_test1.argmax(axis=-1),inceptionresnet_custom_y_pred.argmax(axis=-1))
print('The test accuracy of the Custom model is: ', accuracy)

#evaluate mean squared error
custom_mse = mean_squared_error(Y_test1.argmax(axis=-1),inceptionresnet_custom_y_pred.argmax(axis=-1))
print('The Mean Squared Error of the Custom model is: ', custom_mse)

#evaluate mean squared log error
custom_msle = mean_squared_log_error(Y_test1.argmax(axis=-1),inceptionresnet_custom_y_pred.argmax(axis=-1))  
print('The Mean Squared Log Error of the Custom model is: ', custom_msle)

#evaluate matthews correlation coefficient
custom_MCC = matthews_corrcoef(Y_test1.argmax(axis=-1),inceptionresnet_custom_y_pred.argmax(axis=-1))
print('The Matthews correlation coefficient value (MCC) for the Custom model is: ', custom_MCC)

In [None]:
#%% print classification report and plot confusion matrix

target_names = ['class 0(abnormal)','class 1(normal)'] 
print(classification_report(Y_test1.argmax(axis=-1),inceptionresnet_custom_y_pred.argmax(axis=-1),
                            target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test1.argmax(axis=-1),inceptionresnet_custom_y_pred.argmax(axis=-1))
np.set_printoptions(precision=4)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,10), dpi=300)
plot_confusion_matrix(cnf_matrix, classes=target_names)
plt.show()

# Plot normalized confusion matrix using scikit plot
skplt.metrics.plot_confusion_matrix(Y_test1.argmax(axis=-1),inceptionresnet_custom_y_pred.argmax(axis=-1),
                                    normalize=True, x_tick_rotation=45, figsize=(20,10),
                                    title_fontsize='large', text_fontsize='medium')
plt.show()

In [None]:
#%% compute the ROC-AUC values

skplt.metrics.plot_roc(Y_test,inceptionresnet_custom_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#compute precision-recall curves

colors = cycle(['red', 'blue', 'green', 'cyan', 'teal'])

plt.figure(figsize=(15,10), dpi=100)
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))
    
# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(num_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test1[:, i],
                                                        inceptionresnet_custom_y_pred[:, i])
    average_precision[i] = average_precision_score(Y_test1[:, i], inceptionresnet_custom_y_pred[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test1.ravel(),
   inceptionresnet_custom_y_pred.ravel())
average_precision["micro"] = average_precision_score(Y_test1, inceptionresnet_custom_y_pred,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.4f}'
      .format(average_precision["micro"]))

lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.4f})'
              ''.format(average_precision["micro"]))

for i, color in zip(range(num_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.4f})'
                  ''.format(i, average_precision[i]))

fig = plt.gcf()
fig.subplots_adjust(bottom=0.05)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))
plt.show()

In [None]:
#%% plot the KS statistic plot

skplt.metrics.plot_ks_statistic(Y_test,inceptionresnet_custom_y_pred,figsize=(20,10),
                                title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

After performing these evaluations, the Inception-V3, INceptionresNet-V2, and DenseNet-121 models from model-C were found to be the top-3 performing models. They are used to create ensemble models. Lets begin with Majority voting.

In [None]:
#lets do a dummy assignment of the predictions

inceptionv3_custom_y_pred1 = inceptionv3_custom_y_pred
densenet_custom_y_pred1 = densenet_custom_y_pred
inceptionresnet_custom_y_pred1 = inceptionresnet_custom_y_pred
print("The shape of inceptionv3 custom model prediction inceptionv3_custom_y_pred is = ", inceptionv3_custom_y_pred1.shape)
print("The shape of densenet custom model prediction densenet_custom_y_pred is  = ", densenet_custom_y_pred1.shape)
print("The shape of Inception ResNet custom model prediction inceptionresnet_custom_y_pred is  = ", inceptionresnet_custom_y_pred.shape)

#%% Max voting or majority voting
inceptionv3_custom_y_pred1 = inceptionv3_custom_y_pred1.argmax(axis=-1)
print(inceptionv3_custom_y_pred1)
densenet_custom_y_pred1 = densenet_custom_y_pred1.argmax(axis=-1)
print(densenet_custom_y_pred1)
inceptionresnet_custom_y_pred1 = inceptionresnet_custom_y_pred1.argmax(axis=-1)
print(inceptionresnet_custom_y_pred1)

In [None]:
#max voting begins: 

results = np.concatenate((inceptionresnet_custom_y_pred1.reshape(-1,1),
                          inceptionv3_custom_y_pred1.reshape(-1,1),
                          densenet_custom_y_pred1.reshape(-1,1)),
                          axis=1)


#Now perform the mode on that matrix across the single rows, 
#but all in one single operation (no need for a loop):
max_voting = mode(results, axis=1)

# The results contain two things: the mode values for each row 
#and the counts of that mode within that row.
# To get the mode values, take the first element from max_voting_pred:
max_voting_pred = max_voting[0]

#calcualte majority voting accuracy
ensemble_model_max_voting_accuracy = accuracy_score(Y_test,max_voting_pred)
print("The max voting accuracy of the ensemble model is  = ", ensemble_model_max_voting_accuracy)

In [None]:
#plot confusion matrix

target_names = ['class 0(abnormal)', 'class 1(normal)'] #modify according to tasks

#print classification report
print(classification_report(Y_test,max_voting_pred,target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test,max_voting_pred)
np.set_printoptions(precision=4)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,10), dpi=100)
plot_confusion_matrix(cnf_matrix, classes=target_names,
                      title='Confusion matrix for Max Voting ensemble without normalization')

plt.show()

In [None]:
#save the predictions
np.savetxt('max_voting_y_pred.csv',max_voting_pred,fmt='%i',delimiter = ",")

#evaluate error
ensemble_model_maxvoting_mean_squared_error = mean_squared_error(Y_test,max_voting_pred)  
ensemble_model_maxvoting_mean_squared_log_error = mean_squared_log_error(Y_test,max_voting_pred)  
print("The max voting mean squared error of the ensemble model is  = ", ensemble_model_maxvoting_mean_squared_error)
print("The max voting mean squared log error of the ensemble model is  = ", ensemble_model_maxvoting_mean_squared_log_error)

In [None]:
#lets perform simple averaging of the predictions from individual models

average_pred=(inceptionresnet_custom_y_pred + inceptionv3_custom_y_pred +
              densenet_custom_y_pred)/3

#compute simple averaging accuracy
ensemble_model_averaging_accuracy = accuracy_score(Y_test,average_pred.argmax(axis=-1))
print("The averaging accuracy of the ensemble model is  = ", ensemble_model_averaging_accuracy)

In [None]:
#plot confusion matrix

target_names = ['class 0(abnormal)', 'class 1(normal)'] #modify according to tasks

#print classification report
print(classification_report(Y_test,average_pred.argmax(axis=-1),target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test,average_pred.argmax(axis=-1))
np.set_printoptions(precision=4)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,10), dpi=100)
plot_confusion_matrix(cnf_matrix, classes=target_names,
                      title='Confusion matrix for Average Ensemble without normalization')

plt.show()
#save the predictions
np.savetxt('averagring_y_pred.csv',average_pred.argmax(axis=-1),fmt='%i',delimiter = ",")

In [None]:
#plot roc curves

skplt.metrics.plot_roc(Y_test,average_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#compute precision-recall curves

colors = cycle(['red', 'blue', 'green', 'cyan', 'teal'])

plt.figure(figsize=(15,10), dpi=100)
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))
    
# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(num_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test1[:, i],
                                                        average_pred[:, i])
    average_precision[i] = average_precision_score(Y_test1[:, i], average_pred[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test1.ravel(),
   average_pred.ravel())
average_precision["micro"] = average_precision_score(Y_test1, average_pred,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.4f}'
      .format(average_precision["micro"]))

lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.4f})'
              ''.format(average_precision["micro"]))

for i, color in zip(range(num_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.4f})'
                  ''.format(i, average_precision[i]))

fig = plt.gcf()
fig.subplots_adjust(bottom=0.05)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))
plt.show()


In [None]:
# evaluate simple averaging error

ensemble_model_averaging_mean_squared_error = mean_squared_error(Y_test,average_pred.argmax(axis=-1))  
ensemble_model_averaging_mean_squared_log_error = mean_squared_log_error(Y_test,average_pred.argmax(axis=-1))  
print("The averaging mean squared error of the ensemble model is  = ", ensemble_model_averaging_mean_squared_error)
print("The averaging mean squared log error of the ensemble model is  = ", ensemble_model_averaging_mean_squared_log_error)

Weighted averaging: weighing the predictions based on the performance of the models. This is an extension of the averaging method. All models are assigned different weights defining the importance of each model for prediction. With DenseNet and IRV2 performing equally better, we award equal importance to them and higher importance to InceptionV3.

In [None]:
#weighted averaging
weighted_average_pred=(inceptionresnet_custom_y_pred * 0.25 +
                       inceptionv3_custom_y_pred * 0.5 +
                       densenet_custom_y_pred * 0.25)

#calculate weighted averaging accuracy
ensemble_model_weighted_averaging_accuracy = accuracy_score(Y_test,weighted_average_pred.argmax(axis=-1))
print("The weighted averaging accuracy of the ensemble model is  = ", ensemble_model_weighted_averaging_accuracy)

In [None]:
#plot confusion matrix

target_names = ['class 0(abnormal)', 'class 1(normal)'] #modify according to tasks

#print classification report
print(classification_report(Y_test,weighted_average_pred.argmax(axis=-1),target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test,weighted_average_pred.argmax(axis=-1))
np.set_printoptions(precision=4)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,10), dpi=100)
plot_confusion_matrix(cnf_matrix, classes=target_names,
                      title='Confusion matrix for Weighted Average Ensemble without normalization')

plt.show()

#save the predictions
np.savetxt('weighted_averaging_y_pred.csv',weighted_average_pred.argmax(axis=-1),fmt='%i',delimiter = ",")

In [None]:
#plot roc curves

skplt.metrics.plot_roc(Y_test,weighted_average_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#compute precision-recall curves

colors = cycle(['red', 'blue', 'green', 'cyan', 'teal'])

plt.figure(figsize=(15,10), dpi=100)
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))
    
# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(num_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test1[:, i],
                                                        weighted_average_pred[:, i])
    average_precision[i] = average_precision_score(Y_test1[:, i], weighted_average_pred[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test1.ravel(),
   weighted_average_pred.ravel())
average_precision["micro"] = average_precision_score(Y_test1, weighted_average_pred,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.4f}'
      .format(average_precision["micro"]))

lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.4f})'
              ''.format(average_precision["micro"]))

for i, color in zip(range(num_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.4f})'
                  ''.format(i, average_precision[i]))

fig = plt.gcf()
fig.subplots_adjust(bottom=0.05)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))
plt.show()

In [None]:
#%% evaluate error
ensemble_model_weighted_average_mean_squared_error = mean_squared_error(Y_test,weighted_average_pred.argmax(axis=-1))  
ensemble_model_weighted_average_mean_squared_log_error = mean_squared_log_error(Y_test,weighted_average_pred.argmax(axis=-1))  
print("The weighted averaging mean squared error of the ensemble model is  = ", ensemble_model_weighted_average_mean_squared_error)
print("The weighted averaging mean squared log error of the ensemble model is  = ", ensemble_model_weighted_average_mean_squared_log_error)

Stacking: we attempted performing a stacking ensemble by training a meta-learner that will best combine the predictions from the sub-models and ideally perform better than any single sub-model.The first step is to load the saved models. We can use the load_model() Keras function and create a Python list of loaded models.

In [None]:
# load models from file
n_models = 3 #we have three models

def load_all_models(n_models):
    all_models = list()
    inceptionresnet_custom_model.load_weights('weights/InceptionResnet_custom.09-0.8955.h5')
    all_models.append(inceptionresnet_custom_model)
    densenet_custom_model.load_weights('weights/densenet121_custom.06-0.9179.h5')
    all_models.append(densenet_custom_model)
    inceptionv3_custom_model.load_weights('weights/InceptionV3_custom.11-0.9328.h5')
    all_models.append(inceptionv3_custom_model)
    return all_models

# We can call this function to load our three saved models from the “models/” sub-directory.
# load all models

n_members = 3
members = load_all_models(n_members)
print('Loaded %d models' % len(members))

It would be useful to know how well the single models perform on the validation dataset and test dataset as we would expect a stacking model to perform better. We can easily evaluate each single model on the training dataset and establish a baseline of performance.

In [None]:
# evaluate standalone models on the validation dataset
for model in members:
    sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy']) 
    _, acc = model.evaluate_generator(validation_generator,nb_validation_samples // batch_size + 1, verbose=1)
    print('Model Accuracy: %.3f' % acc)

# evaluate standalone models on test dataset
for model in members:
    sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True) 
    model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
    _, acc = model.evaluate_generator(test_generator, nb_test_samples//batch_size + 1, verbose=1)
    print('Model Accuracy: %.3f' % acc)

Integrated Stacking Model: It may be desirable to use a neural network as a meta-learner.Specifically, the sub-networks can be embedded in a larger multi-headed neural network that then learns how to best combine the predictions from each input sub-model. It allows the stacking ensemble to be treated as a single large model. The benefit of this approach is that the outputs of the submodels are provided directly to the meta-learner. Further, it is also possible to update the weights of the submodels in conjunction with the meta-learner model, if this is desirable. This can be achieved using the Keras functional interface for developing models. 

After the models are loaded as a list, a larger stacking ensemble model can be defined where each of the loaded models is used as a separate input-head to the model. All of the layers in each of the loaded models be marked as not trainable so the weights cannot be updated when the new larger model is being trained. Keras also requires that each layer has a unique name, therefore the names of each layer in each of the loaded models will have to be updated to indicate to which ensemble member they belong. Once the sub-models have been prepared, we can define the stacking ensemble model. The input layer for each of the sub-models will be used as a separate input head to this new model. This means that k copies of any input data will have to be provided to the model, where k is the number of input models, in this case, 3. The outputs of each of the models can then be merged. In this case, we will use a simple concatenation merge, where a single 6-element vector will be created from the two class-probabilities predicted by each of the 3 models. 

We will then define a hidden layer to interpret this “input” to the meta-learner and an output layer that will make its own probabilistic prediction. A plot of the network graph is created when this function is called to give an idea of how the ensemble model fits together.

In [None]:
def define_stacked_model(members):
    # update all layers in all models to not be trainable
    for i in range(len(members)):
        model = members[i]
        for layer in model.layers:
        # make not trainable
            layer.trainable = False
            # rename to avoid 'unique layer name' issue
            layer.name = 'ensemble_' + str(i+1) + '_' + layer.name
    # define multi-headed input
    ensemble_visible = [model.input]
    # concatenate merge output from each model
    ensemble_outputs = [model.output for model in members]
    merge = concatenate(ensemble_outputs)
    hidden = Dense(6, activation='relu')(merge) 
    drop1 = Dropout(0.1)(hidden) 
    output = Dense(2, activation='softmax')(drop1)
    model = Model(inputs=ensemble_visible, outputs=output, name = 'stacking_ensemble')
    # compile
    #sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

# define ensemble model
stacked_model = define_stacked_model(members)
stacked_model.summary()

#plot model
plot_model(stacked_model, to_file='stacked_model.png',show_shapes=True, show_layer_names=False)

Once the model is defined, it can be fit. We can fit it directly on the holdout validation dataset. Because the sub-models are not trainable, their weights will not be updated during training and only the weights of the new hidden and output layer will be updated. The stacking neural network model will be fit on the trainig data for 300 epochs.

In [None]:
#reset generators
train_generator.reset()
validation_generator.reset()

#train the ensemble model
history = stacked_model.fit_generator(train_generator, steps_per_epoch=nb_train_samples // batch_size,
                                  epochs=300, validation_data=validation_generator,
                                  class_weight = class_weights,
                                  #callbacks=callbacks_list, 
                                  validation_steps=nb_validation_samples // batch_size + 1, verbose=1) 

In [None]:
#plot performance of the ensemble model

N = 300
plt.figure(figsize=(20,10), dpi=100)
plt.plot(np.arange(1, N+1), history.history["loss"], 'orange', label="train_loss")
plt.plot(np.arange(1, N+1), history.history["val_loss"], 'red', label="val_loss")
plt.plot(np.arange(1, N+1), history.history["acc"], 'blue', label="train_acc")
plt.plot(np.arange(1, N+1), history.history["val_acc"], 'green', label="val_acc")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower right")
plt.savefig("stacking_ensemble_plot.png")

In [None]:
# Once fit, we can use the new stacked model to make a prediction on new data.
# This is as simple as calling the predict_generator() function on the model. 

#first reset the test generator otherwise it gives wierd results
test_generator.reset()

#evaluate accuracy 
ensemble_y_pred = stacked_model.predict_generator(test_generator, nb_test_samples//batch_size + 1, verbose=1)

#print prediction shapes
print(ensemble_y_pred.shape)

#ground truth label
print(Y_test.shape)

#measure performance metrics of the stacked ensemble

accuracy = accuracy_score(Y_test1.argmax(axis=-1),ensemble_y_pred.argmax(axis=-1))
print('The test accuracy of the Custom model is: ', accuracy)

#evaluate mean squared error
custom_mse = mean_squared_error(Y_test1.argmax(axis=-1),ensemble_y_pred.argmax(axis=-1))
print('The Mean Squared Error of the Custom model is: ', custom_mse)

#evaluate mean squared log error
custom_msle = mean_squared_log_error(Y_test1.argmax(axis=-1),ensemble_y_pred.argmax(axis=-1))  
print('The Mean Squared Log Error of the Custom model is: ', custom_msle)

#evaluate matthews correlation coefficient
custom_MCC = matthews_corrcoef(Y_test1.argmax(axis=-1),ensemble_y_pred.argmax(axis=-1))
print('The Matthews correlation coefficient value (MCC) for the Custom model is: ', custom_MCC)

In [None]:
#%% print classification report and plot confusion matrix

target_names = ['class 0(abnormal)','class 1(normal)'] 
print(classification_report(Y_test1.argmax(axis=-1),ensemble_y_pred.argmax(axis=-1),
                            target_names=target_names, digits=4))

# Compute confusion matrix
cnf_matrix = confusion_matrix(Y_test1.argmax(axis=-1),ensemble_y_pred.argmax(axis=-1))
np.set_printoptions(precision=4)

# Plot non-normalized confusion matrix using scikit learn
plt.figure(figsize=(10,10), dpi=100)
plot_confusion_matrix(cnf_matrix, classes=target_names)
plt.show()

In [None]:
#%% compute the ROC-AUC values

skplt.metrics.plot_roc(Y_test,ensemble_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

In [None]:
#compute precision-recall curves

colors = cycle(['red', 'blue', 'green', 'cyan', 'teal'])

plt.figure(figsize=(15,10), dpi=100)
f_scores = np.linspace(0.2, 0.8, num=4)
lines = []
labels = []
for f_score in f_scores:
    x = np.linspace(0.01, 1)
    y = f_score * x / (2 * x - f_score)
    l, = plt.plot(x[y >= 0], y[y >= 0], color='gray', alpha=0.2)
    plt.annotate('f1={0:0.1f}'.format(f_score), xy=(0.9, y[45] + 0.02))
    
# For each class
precision = dict()
recall = dict()
average_precision = dict()
for i in range(num_classes):
    precision[i], recall[i], _ = precision_recall_curve(Y_test1[:, i],
                                                        ensemble_y_pred[:, i])
    average_precision[i] = average_precision_score(Y_test1[:, i], ensemble_y_pred[:, i])

# A "micro-average": quantifying score on all classes jointly
precision["micro"], recall["micro"], _ = precision_recall_curve(Y_test1.ravel(),
   ensemble_y_pred.ravel())
average_precision["micro"] = average_precision_score(Y_test1, ensemble_y_pred,
                                                     average="micro")
print('Average precision score, micro-averaged over all classes: {0:0.4f}'
      .format(average_precision["micro"]))

lines.append(l)
labels.append('iso-f1 curves')
l, = plt.plot(recall["micro"], precision["micro"], color='gold', lw=2)
lines.append(l)
labels.append('micro-average Precision-recall (area = {0:0.4f})'
              ''.format(average_precision["micro"]))

for i, color in zip(range(num_classes), colors):
    l, = plt.plot(recall[i], precision[i], color=color, lw=2)
    lines.append(l)
    labels.append('Precision-recall for class {0} (area = {1:0.4f})'
                  ''.format(i, average_precision[i]))

fig = plt.gcf()
fig.subplots_adjust(bottom=0.05)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Extension of Precision-Recall curve to multi-class')
plt.legend(lines, labels, loc=(0, -.38), prop=dict(size=14))
plt.show()

In [None]:
#%% plot the KS statistic plot

skplt.metrics.plot_ks_statistic(Y_test,ensemble_y_pred,figsize=(20,10),
                       title_fontsize='large', text_fontsize='large')
plt.legend(loc="lower right")
plt.show()

Every model has its own weaknesses. The reasoning behind using an ensemble is that by stacking different models representing different hypotheses about the data, we can find a better hypothesis that is not in the hypothesis space of the models from which the ensemble is built. By using a very basic ensemble, a much lower error rate was achieved than when a single model was used. This proves effectiveness of ensembling. Of course, there are some practical considerations to keep in mind when using an ensemble for your machine learning task. Since ensembling means stacking multiple models together, it also means that the input data needs to be forward-propagated for each model. This increases the amount of 
#compute that needs to be performed and, consequently, evaluation (predicition) time. Increased evaluation time is not critical if you use an ensemble in research or in a Kaggle competition. However, it is a very critical factor when designing a commercial product. Another consideration is increased size of the final model which, again, might be a limiting factor for ensemble use in a commercial product.