<a href="https://colab.research.google.com/github/mtwenzel/image-video-understanding/blob/master/Session_3_Explainable_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Session 3
# Explainable Convolutional Neural Networks

<div id="toc"></div>

## 1. Imports and preparations

In [None]:
#@title Imports  { display-mode: "form" }
#@markdown Import TensorFlow and Tensorflow Probability (TFP). The latter gets you some warnings, but they are not of concern. Also, display GPU information.
import tensorflow as tf
#tf.enable_eager_execution()

from tensorflow.keras.layers import Input, InputLayer, Conv2D, MaxPool2D, Flatten, Dense, UpSampling2D, LocallyConnected2D, SpatialDropout2D, BatchNormalization
from tensorflow.keras.models import Model, Sequential
import numpy as np
import tensorflow.keras.backend as K
import tensorflow_probability as tfp
tfkl = tf.keras.layers
tfd = tfp.distributions

from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator

import matplotlib.pyplot as plt
%matplotlib inline

from tensorflow.python.client import device_lib
# print all devices visible to tensorflow 
print(device_lib.list_local_devices())

## 2. Classification

We will this time use a Keras data generator to do data augmentation. It can also take care of splitting the data into training and validation. The test data should be kept independent.

In [None]:
#@title Download and unzip data  { display-mode: "form" }
#@markdown The data will be downloaded from the Fraunhofer OwnCloud. It is about 150 MB, so it may take a few seconds. Afterwards, the data is unpacked to disk.

# importing required modules 
from zipfile import ZipFile 

# Download from Fraunhofer OwnCloud
!test -e train_val.zip || curl -L "https://owncloud.fraunhofer.de/index.php/s/vLTCGGJJI8hv3bM/download" --output train_val.zip
file_name = "train_val.zip"
with ZipFile(file_name, 'r') as zip: 
    print('Extracting all the files now...') 
    zip.extractall() 
    print('Done!')
    

In [None]:
#@title Prepare global variables and training data  { display-mode: "form" }
#@markdown Depending on the GPU memory, set appropriate image size and batch size. The setting below work for a 4 GB Nvidia GTX1060. Colab allows larger batch sizes.
train_data_dir = "train_val/" #@param {type:"string"}
batch_size = 6 #@param {type:"integer"}
target_width = 512 #@param {type:"integer"}
target_height = 386 #@param {type:"integer"}

target_size = (target_width, target_height)

#@markdown Set data augmentation parameters
shear_range = 0.2 #@param {type:"slider", min:0.0, max:1.0, step:0.05}
zoom_range = 0.2 #@param {type:"slider", min:0.0, max:1.0, step:0.05}
width_shift_range = 0.2 #@param {type:"slider", min:0.0, max:1.0, step:0.05}
height_shift_range = 0.2 #@param {type:"slider", min:0.0, max:1.0, step:0.05}
rotation_range = 10 #@param {type:"slider", min:0, max:90, step:5}
horizontal_flip = True #@param {type:"boolean"}
vertical_flip = True #@param {type:"boolean"}
validation_split = 0.2 #@param {type:"slider", min:0.0, max:1.0, step:0.05}

train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=shear_range,
    zoom_range=zoom_range,
    width_shift_range=width_shift_range,
    height_shift_range=height_shift_range,
    rotation_range=rotation_range,
    horizontal_flip=horizontal_flip,
    vertical_flip=vertical_flip,
    validation_split=validation_split) # set validation split

train_generator = train_datagen.flow_from_directory(
    train_data_dir,
    target_size=target_size,
    batch_size=batch_size,
    class_mode='categorical',
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    train_data_dir, # same directory as training data
    target_size=target_size,
    batch_size=batch_size,
    class_mode='categorical',
    subset='validation') # set as validation data

In [None]:
d_fn = lambda t: tfd.Normal(loc=t, scale=1)

model = tf.keras.Sequential(layers=[
    BatchNormalization(input_shape=target_size+(3,)),
    Conv2D(filters=64, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=64, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=64, kernel_size=(3,3), activation='relu', strides=(2,2)),
    
    BatchNormalization(),
    Conv2D(filters=64, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=64, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=64, kernel_size=(3,3), activation='relu', strides=(2,2)),

    BatchNormalization(),
    Conv2D(filters=96, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=96, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=96, kernel_size=(3,3), activation='relu', strides=(2,2)),

    BatchNormalization(),
    Conv2D(filters=128, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=128, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=128, kernel_size=(3,3), activation='relu', strides=(2,2)),

    BatchNormalization(),
    Conv2D(filters=128, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=128, kernel_size=(3,3), activation='relu'),
    BatchNormalization(),
    Conv2D(filters=128, kernel_size=(3,3), activation='relu', strides=(2,2)),

    Flatten(),
    BatchNormalization(),
    Dense(256),
    Dense(2),
    tfp.layers.DistributionLambda(d_fn)]
)

model.build()
model.summary()

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics = ['accuracy'])

In [None]:
history = model.fit_generator(generator=train_generator,
                              steps_per_epoch=train_generator.samples//batch_size,
                              epochs=2,
                             validation_data=validation_generator,
                             validation_steps=validation_generator.samples//batch_size)


### 2.2 Transfer Learning from pretrained networks

This code sets up the selected network to have appropriate input sizes.

Try to train it, afterwards you can proceed to convert it into a probabilistic network similar to the experiment above.

In [None]:
#@title Set up a transfer learning model { display-mode: "form" }
#@markdown Using a Flatten layer creates too many parameters. If you choose to do so, first reduce the number of channels from the output 2048.
#@markdown The three available models will in the default setting have quite large numbers of parameters to adjust, still -- ranging from about 2 to about 4 Mio.
#@markdown The networks are usually trained on image sizes of about $250^2$, but we apply it to the size set above. 

apply_flatten = True #@param {type:'boolean'}
print_summary = False #@param {type:'boolean'}
base_model_name = "Inception" #@param ["Inception", "DenseNet121", 'ResNet50']

if base_model_name == 'Inception':
    base_model=tf.keras.applications.InceptionV3(weights='imagenet',include_top=False, input_shape=target_size+(3,)) 
if base_model_name == 'DenseNet121':
    base_model=tf.keras.applications.DenseNet121(weights='imagenet',include_top=False, input_shape=target_size+(3,)) 
if base_model_name == 'ResNet50':
    base_model=tf.keras.applications.ResNet50(weights='imagenet',include_top=False, input_shape=target_size+(3,)) 

x=base_model.output

if apply_flatten:
    x = BatchNormalization()(x)
    x = Conv2D(filters=128, kernel_size=1, strides=2)(x)
    x = Flatten()(x)
else:
    x = tf.keras.layers.GlobalAveragePooling2D()(x) 

x = BatchNormalization()(x)
x = Dense(512,activation='relu')(x)
x = BatchNormalization()(x)
x = Dense(256,activation='relu')(x)
x = BatchNormalization()(x)
preds = Dense(2,activation='softmax')(x) #final layer with softmax activation

model = Model(inputs=base_model.input,outputs=preds)

if print_summary:
    model.summary()

In [None]:
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics = ['accuracy'])

trainable_count = np.sum([np.prod(v.get_shape()) for v in model.trainable_weights])
non_trainable_count = np.sum([np.prod(v.get_shape()) for v in model.non_trainable_weights])

print('Total params: {:,}'.format(trainable_count + non_trainable_count))
print('Trainable params: {:,}'.format(trainable_count))
print('Non-trainable params: {:,}'.format(non_trainable_count))

The training of this model takes considerable time; approximately 1 minute per epoch. We will not be able to conduct the training in the course, but feel free to run it during a break or as part of your later continuation.

Running it for 20 epochs lets one at least observe an increase in training accuracy; on the small dataset, however, it usually overfits (increasing validation loss).

In [None]:
# train the model on the new data for a few epochs just to prove it works in principle.
history = model.fit_generator(generator=train_generator,
                              steps_per_epoch=train_generator.samples//batch_size,
                              epochs=2,
                              validation_data=validation_generator,
                              validation_steps=validation_generator.samples//batch_size)

In [None]:
print('training loss')
plt.plot(history.history['loss'])
plt.show()
print('training acc')
plt.plot(history.history['accuracy'])
plt.show()
print('validation loss')
plt.plot(history.history['val_loss'])
plt.show()
print('validation acc')
plt.plot(history.history['val_accuracy'])
plt.show()

## 3. Segmentation: Auto Encoder (AE)-Style

This time, all models will be probabilistic in a certain fashion -- using dropout at inference time, or using probabilistic layers that learn a distribution and sample from it.

* We define short functions that return a model.
* The first is a simple architecture that collapses and expands an image into the desired mask, similar to an Auto Encoder (AE).
* The second is the famous U-Net.

Further reading (catchphrases to search in your favourite search engine): 
* Tensorflow Probability (TFP)
* Variational Inference
* Bayes by Dropout

In [None]:
%%bash
test -e tmp_slices.npz || curl -L "https://drive.google.com/uc?export=download&id=1R2-H0dhhrj6XNK7Q-MazIWGeFDOf6Zya" --output tmp_slices.npz

In [None]:
#@title Prepare segementation images and masks {display-mode:"form"}
TRAINING_SLICE_COUNT = 300 #@param {type:"slider", min:100, max:1500, step:100}

loaded = np.load('tmp_slices.npz')

x_train = loaded['x_train'][:TRAINING_SLICE_COUNT]
y_train = loaded['y_train'][:TRAINING_SLICE_COUNT]

x_test = loaded['x_train'][TRAINING_SLICE_COUNT:]
y_test = loaded['y_train'][TRAINING_SLICE_COUNT:]

assert len(x_train) == len(y_train)

example_test_slice = 1800 #@param {type:"integer"}

# remove the lesion labels (values 2..3)
y_train_binary = y_train.clip(0, 1)
y_test_binary = y_test.clip(0, 1)

In [None]:
len(loaded["x_train"])

In [None]:
tf.compat.v1.disable_eager_execution()

In [None]:
def getDropoutBayesModel(_filters=32, filters_add=0, _kernel_size=(3,3), _padding='same', _activation='relu', _kernel_regularizer=None, _final_layer_nonlinearity='sigmoid'):
    model = Sequential()
    # We are indifferent about the xy size, but accept only one channel (gray value images). This has the consequence that debugging sizes gets harder.
    input_layer = Input(shape=(None,None,1))
    
    x = BatchNormalization()(input_layer)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer, name='firstConvolutionalLayer')(x)
    x = SpatialDropout2D(0.3)(x, training=True)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = SpatialDropout2D(0.3)(x, training=True)
    x = MaxPool2D()(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = SpatialDropout2D(0.3)(x, training=True)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = SpatialDropout2D(0.3)(x, training=True)
    x = MaxPool2D()(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+2*filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+2*filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = UpSampling2D()(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = UpSampling2D()(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)

    x = BatchNormalization()(x)
    output_layer = Conv2D(1, kernel_size=(1,1), activation=_final_layer_nonlinearity)(x)
    
    model = Model(input_layer, output_layer)
    return model

In [None]:
from tensorflow_probability.python.layers import Convolution2DFlipout

def getProbBayesModel(_filters=32, filters_add=0, _kernel_size=(3,3), _padding='same', _activation='relu', _kernel_regularizer=None, _final_layer_nonlinearity='sigmoid'):
    model = Sequential()
    # We are indifferent about the xy size, but accept only one channel (gray value images). This has the consequence that debugging sizes gets harder.
    input_layer = Input(shape=(None,None,1))
    
    x = BatchNormalization()(input_layer)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer, name='firstConvolutionalLayer')(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = MaxPool2D()(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = MaxPool2D()(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+2*filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+2*filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = UpSampling2D()(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = UpSampling2D()(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)

    x = BatchNormalization()(x)
    output_layer = Convolution2DFlipout(1, kernel_size=(1,1), activation=_final_layer_nonlinearity)(x)
    
    model = Model(input_layer, output_layer)
    return model


In [None]:
from tensorflow_probability.python.layers import Convolution2DFlipout
from tensorflow.keras.layers import Conv2DTranspose

def getProbBayesModernModel(_filters=32, filters_add=0, _kernel_size=(3,3), _padding='same', _activation='relu', _kernel_regularizer=None, _final_layer_nonlinearity='sigmoid'):
    model = Sequential()
    # We are indifferent about the xy size, but accept only one channel (gray value images). This has the consequence that debugging sizes gets harder.
    input_layer = Input(shape=(None, None,1))
    
    x = BatchNormalization()(input_layer)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, strides=(2,2), padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, strides=(2,2), padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+2*filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2DTranspose(filters=_filters+2*filters_add, kernel_size=_kernel_size, strides=2, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters+filters_add, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2DTranspose(filters=_filters+filters_add, kernel_size=_kernel_size, strides=2, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)

    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)
    x = BatchNormalization()(x)
    x = Conv2D(filters=_filters, kernel_size=_kernel_size, padding=_padding, activation=_activation, kernel_regularizer=_kernel_regularizer)(x)

    x = BatchNormalization()(x)
    output_layer = Convolution2DFlipout(1, kernel_size=(1,1), activation=_final_layer_nonlinearity)(x)
    
    model = Model(input_layer, output_layer)
    return model


In [None]:
def pad_image_for_model(model, input_image):
    '''Determine the necessary amount of padding
    (difference between input and output size of the model)
    and apply it to an ndarry with one or more images.'''
    
    padding = 0
    if 'firstConvolutionalLayer' in [layer.name for layer in model.layers]:
        if model.get_layer('firstConvolutionalLayer').padding == 'valid':
            padding = 20 # WARNING: Hard-coded for above architecture!

            # determine in which dimension to apply this padding
            ndim_padding = []
            if np.ndim(input_image) > 2:
                # do not pad along batch dimension (if present)
                ndim_padding.append((0, 0))
            ndim_padding.append((padding, padding)) # pad above/below image (y dimension)
            ndim_padding.append((padding, padding)) # pad left/right of image (x dimension)
            if np.ndim(input_image) > 3:
                # do not pad along channel dimension (if present)
                ndim_padding.append((0, 0))

            input_image = np.lib.pad(input_image, ndim_padding,
                                     #'constant', constant_values = 0)
                                     'reflect')

    return input_image, padding

from tensorflow.keras.callbacks import Callback

class VisualHistory(Callback):
    def on_train_begin(self, logs={}):
        # also show initial prediction
        plot_prediction(self.model, example_test_slice)
    
    def on_epoch_end(self, batch, logs={}):
        # show prediction after every training epoch
        plot_prediction(self.model, example_test_slice)
        
vh_callback = VisualHistory()

def do_prediction(model, input_image, verbose = False):
    # first do padding of full slice
    input_image, padding = pad_image_for_model(model, input_image)
    
    # add batch and channel dimensions (network expects 4D arrays)
    input_array = input_image[np.newaxis,:,:,np.newaxis]
    if verbose:
        print("input shape:", input_array.shape)

    y_predicted = model.predict(input_array)
    if verbose:
        print("output shape:", y_predicted.shape)

    return input_image, input_array, y_predicted, padding

def plot_prediction(model, pred_slice_index):
    # get single slice
    input_image    = x_test[pred_slice_index]
    # could use y_train_binary here for the first half of the notebook, but in the end we want to see the lesion
    reference_mask = y_test[pred_slice_index]

    input_image, input_array, y_predicted, padding = do_prediction(model, input_image)
    
    padded_extent = np.array([0, input_array.shape[2], input_array.shape[1], 0]) - 0.5 - padding

    # display prediction for inspection
    f, ax = plt.subplots(1, 5 if padding else 4, figsize = (11 if padding else 8, 3), sharey = True)
    ax[0].imshow(x_test[pred_slice_index])
    ax[0].set_title('orig')
    if padding:
        ax[1].imshow(input_array[0,:,:,0], extent = padded_extent)
        ax[1].set_title('padded input')
    ax[-2].imshow(y_predicted[0,:,:,0])
    ax[-2].set_title('predicted mask')
    ax[-3].imshow(reference_mask.clip(0,1))
    ax[-3].set_title('reference mask')
    ax[-1].imshow(reference_mask.clip(0,1) - y_predicted[0,:,:,0])
    ax[-1].set_title('(ref - predicted)')
    ax[0].set_ylim(*padded_extent[2:])
    plt.show()

## Training

In [None]:
modelBayesValid = getDropoutBayesModel(_padding='valid')
modelBayesValid.compile(loss='binary_crossentropy', optimizer='adam')
print("Model parameters: {0:,}".format(modelBayesValid.count_params()))

In [None]:
model_ProbBayes = getProbBayesModel(_padding='valid')
model_ProbBayes.compile(loss='binary_crossentropy', optimizer='adam')
print("Model parameters: {0:,}".format(model_ProbBayes.count_params()))

In [None]:
model_ProbBayesModern = getProbBayesModernModel(_padding='same')
model_ProbBayesModern.compile(loss='binary_crossentropy', optimizer='adam')
print("Model parameters: {0:,}".format(model_ProbBayesModern.count_params()))

### Dropout Uncertainty

In [None]:
# This is for _padding = 'valid' and 'reflect' padding
historyBayesValid = modelBayesValid.fit(np.lib.pad(x_train[...,np.newaxis],
                                         [(0,0), (20,20), (20,20), (0,0)], 'reflect'),
                              y_train_binary[...,np.newaxis],
                              batch_size=20, epochs=5, callbacks=[vh_callback])

### Probabilistic Conv2D layers

In [None]:
# This is for _padding = 'valid' and 'reflect' padding
historyProbBayesValid = model_ProbBayes.fit(np.lib.pad(x_train[...,np.newaxis],
                                         [(0,0), (20,20), (20,20), (0,0)], 'reflect'),
                              y_train_binary[...,np.newaxis],
                              batch_size=20, epochs=5, callbacks=[vh_callback])

In [None]:
# This is for _padding = 'same'.
# Model needs no padding even for "valid"
historyProbBayesModernValid = model_ProbBayesModern.fit(x_train[...,np.newaxis],
                              y_train_binary[...,np.newaxis],
                              batch_size=20, epochs=5, callbacks=[vh_callback])

## Prediction using Bayes by Dropout on the initial model

For this approach, the prediction needs to run $n$ times, averaging the results per voxel for the final prediction.

Warning: this unconditionally puts all layers into training mode, also the BatchNormalization, which will lead to side effects. 


In [None]:
modelBayesValid.layers[-1].output

In [None]:
# Define a prediction function from the model, setting the learning phase to "learn" to let dropout be active.
f = K.function([modelBayesValid.layers[0].input, K.learning_phase()],
               [modelBayesValid.layers[-1].output])

def predict_with_uncertainty(f, x, no_classes, n_iter=20):
    result = np.zeros( (n_iter,) + (x.shape) + (no_classes,) )

    for i in range(n_iter):
        result[i,:, :] = f((x, 1))[0]
    prediction = result.mean(axis=0)
    uncertainty = result.std(axis=0)
    return prediction, uncertainty, result    

## Prediction on test examples

Adapt the following cells that show the principle of prediction, so that
* multiple images can be predicted
* all model types can be used again

In [None]:
x_test0 = x_test[0:2][...,np.newaxis]

print(x_test0.shape, len(x_test0))

In [None]:
x_test0 = pad_image_for_model(modelBayesValid, x_test0)
result = np.zeros((20,) + (len(x_test0),) + (x_test[0].shape) + (1,) )

In [None]:
print(len(f((x_test0,1))),f((x_test0,1))[0].shape)

In [None]:
for i in range(20):
    result[i,:,:,:] = f((x_test0,1))[0]
    
print(result.shape)

In [None]:
y_pred0 = do_prediction(modelBayesValid, x_test[0])
y_pred1 = do_prediction(modelBayesValid, x_test[1])
y_pred0[2].shape

In [None]:
prediction = result.mean(axis=0)
uncertainty = result.std(axis=0)
print(prediction.shape, uncertainty.shape)

In [None]:
def plot_result(in_image, in_mask, direct_prediction, prob_prediction, uncertainty):
    f, ax = plt.subplots(1, 5, figsize = (22, 6), sharey = True)
    ax[0].imshow(in_image)
    ax[0].set_title('orig')
    ax[1].imshow(in_mask)
    ax[1].set_title('mask')
    ax[2].imshow(direct_prediction)
    ax[2].set_title('mask')
    ax[3].imshow(prob_prediction)
    ax[3].set_title('prob. prediction')
    ax[4].imshow(uncertainty)
    ax[4].set_title('uncertainty')
    plt.show()

In [None]:
plot_result(x_test[0], y_test[0], y_pred0[2][0,:,:,0], prediction[0,:,:,0],uncertainty[0,:,:,0])
plot_result(x_test[1], y_test[1], y_pred1[2][0,:,:,0], prediction[1,:,:,0],uncertainty[1,:,:,0])

## Prediction
Let's look at the prediction from some more example slices, but let's only use the `x_test` slices that we did not use for training. (In a real scenario, we would to the separation of training & test data on the level of patients, *before* extracting slices, and we'd also have a validation set.)

In [None]:
slice_indices = np.random.choice(x_test.shape[0], 6)