## CNNs with MC dropout

MC dropout is a technique that exploits the presence of dropout layers in a NN (usually used for regularization), keeping their effect active at inference time as well as at training time (which is usually not the case). This can be interpreted as giving rise to a Bayesian NN in which each node affected by the dropout layers is replaced by a Bernoulli distribution - much the same way in which nodes were replaced by a Gaussian approximate posterior when using reparametrization or flipout layers.

__Classic idea of dropout:__
- At _training time_, at each iteration in the training loop each node affected by a dropout layer is set to 0 with probability $p$ (independent for each node). At each iteration the model is effectively different, as some of the connections between nodes have been cut. This helps preventing overfitting because the network needs to learn how to correctly connect input and output wihtout relying always on the same nodes: no individual node (or sub-network) is responsible for a particular prediction. Backpropagation happens across all active nodes.
- At _inference time_, all nodes are kept active with a fixed value (dropout is turned off). If the value learned during training is $w$, though, the value actually used is set to be $w^\star = w\, p$, reflecting that due to the dropout procedure each (affected) node has seen the data a fraction $p$ times the number of training loops.

__MC dropout__:
- At _training time_ everything remains the same as with classic dropout.
- At _inference time_ **dropout is kept active** so that each prediction effectively sees a different NN. As with probabilistic deep learning, multiple predictions over the same inputs generate different outputs (each time a different set of weights is kept active): this is indeed another example of Bayesian NN, with a Bernoulli (discrete) probability distribution over the weights. For each weight affected by dropout the distribution is
$$
w^\star = \left\lbrace
\begin{array}{l}
w\quad\text{with probability $p$}\,,\\
0\quad\text{with probability $1 - p$}\,.
\end{array}
\right.
$$

Technically this is very easy to achieve: with Keras it's sufficient to use the model in "training mode" (the model state corresponding to active dropout) also at inference time.

In [None]:
import os
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow.keras import Sequential
from tensorflow.keras.layers import (Conv2D, MaxPooling2D, Dropout,
    BatchNormalization, Dense, Activation, Flatten)
from tensorflow.keras.activations import relu, softmax
from tensorflow.keras import Input, Model
import keras_cv
import matplotlib.pyplot as plt
import seaborn as sns

tfd = tfp.distributions

sns.set_theme()

## Load data: the CIFAR10 dataset

In [None]:
def preprocess_data(x, y, n_classes=None, pixel_norm=255.):
    """
    Preprocesses data.
    """
    # Turn images to grayscale.
    x_preprocessed = keras_cv.layers.Grayscale()(x)

    # Normalize pixel values.
    x_preprocessed = x_preprocessed / pixel_norm

    # Convert to Tensorflow tensor.
    y_preprocessed = tf.constant(y[:, 0], dtype=tf.int32)

    # One-hot encode the true labels.
    if n_classes is None:
        print('Inferring the number of classes', end='')
        
        depth = tf.reduce_max(y_preprocessed) + 1

        print(f' | {depth} classes found')
    else:
        depth = n_classes

    y_preprocessed = tf.one_hot(y_preprocessed, depth)

    return x_preprocessed, y_preprocessed

In [None]:
(x_train_raw, y_train_raw), (x_test_raw, y_test_raw) = tf.keras.datasets.cifar10.load_data()

x_train_raw.shape, y_train_raw.shape, x_test_raw.shape, y_test_raw.shape

In [None]:
x_train, y_train = preprocess_data(x_train_raw, y_train_raw)
x_test, y_test = preprocess_data(x_test_raw, y_test_raw)

## Build and train a model with dropout layers

__Idea:__ build a traditional CNN model and train it, then define the corresponding MC dropout model using the same (trained) layers, but in which the dropout layers are called with the `training=True` option.

__Notes:__
- The above idea needs the MC dropout model to be built using Keras' functional API, as we need to specify the `training=True` option right inside the call to the droppout layers.
- We could have accessed the Keras backend using the `tensorflow.keras.backend.set_learning_phase`, but that would have influenced **all** the layers at the same time (even the `BatchNormalization` ones, which also have a different behovious between training and inference time and should be called with `training=False` at inference time.

### Build and train a CNN with dropout layers

In [None]:
model = tf.keras.Sequential([
    # Conv block.
    Conv2D(filters=8, kernel_size=(3, 3), padding='same'),
    Activation(relu),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    # Conv block.
    Conv2D(filters=16, kernel_size=(3, 3), padding='same'),
    Activation(relu),
    BatchNormalization(),
    MaxPooling2D(pool_size=(2, 2)),
    # Flatten.
    Flatten(),
    # Dense block.
    Dense(64),
    Activation(relu),
    BatchNormalization(),
    Dropout(rate=0.9),
    Dense(64),
    Activation(relu),
    BatchNormalization(),
    Dropout(rate=0.9),
    # Output layer.
    Dense(10),
    Activation(softmax)
])

In [None]:
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)

model.compile(
    loss='categorical_crossentropy',
    optimizer=optimizer,
    metrics=['accuracy']
)

model(x_train[:1, ...])

model.summary()

In [None]:
epochs = 10
batch_size=128

model.fit(
    x=x_train,
    y=y_train,
    validation_data=(x_test, y_test),
    epochs=epochs,
    batch_size=batch_size
)

In [None]:
save_model = False

models_dir = '../models/'

if save_model:
    model.save(os.path.join(models_dir, 'dropout_cnn.tf'))

### Build the equivalent MC dropout CNN model

In [None]:
class MCDropoutModel(tf.keras.Model):
    """
    Given a model with dropout layers (passed to the
    constructor), this class builds an equivalent model
    with the same exact layers (with trained parameters),
    but in which the dropout layers are always called with
    the `training` option set to True so the sampling
    happens at inference time as well.
    """
    def __init__(self, original_model):
        """
        """
        super().__init__()

        self.original_model = original_model

    def build(self):
        """
        """
        input = Input(shape=(32, 32, 1,))

        output = self.original_model.layers[0](input)

        for layer in self.original_model.layers[1:]:
            if 'dropout' in layer.name:
                print(f'Dropout layer found: {layer.name}')
                
                output = layer(output, training=True)
            else:
                output = layer(output)

        return Model(inputs=input, outputs=output)

In [None]:
mc_dropout_model = MCDropoutModel(model).build()

In [None]:
# Generate `n_pred` predictions for a single sample.
n_pred = 100

pred = []

for _ in range(n_pred):
    pred.append(mc_dropout_model(x_test[:1, ...]))

pred = tf.concat(pred, axis=0)

In [None]:
# Plot the distribution of predicted probabilities
# for each class.
fig = plt.figure(figsize=(14, 6))

for class_index in range(pred.shape[1]):
    sns.scatterplot(
        x=[class_index] * pred.shape[0],
        y=pred[:, class_index],
        alpha=.5,
        color=sns.color_palette()[0]
    )

plt.xticks(
    ticks=range(pred.shape[1]),
)

plt.xlabel('Class')
plt.ylabel('Predicted probability')
plt.title('Distribution of predicted probabilities for each class (1 sample)', fontsize=14)