
# Bayesian deep-learning

### Author: [Dr. Rahul Remanan](https://linkedin.com/in/rahulremanan); CEO, [Moad Computer](https://moad.computer)
### Contact: rahul@moad.computer

### [Launch in Google Colab](https://colab.research.google.com/drive/1W-lTnsAvb8paJK7SB8WeUecIQRx3F0tN)

Bayesian deep-learning network using [dropout layers to perform Monte Carlo approximations](https://arxiv.org/pdf/1506.02142.pdf) for quantifying model uncertainty.

# Overview

This notebook uses the fashion MNIST dataset and a Bayesain deep-learning model. If the Google Cloud TPU is attached to the notebook, the model can utilize the TPU to accelerate the training and inference performance.

# Learning goals

*   Build a Bayesian deep-learning network in Keras
*   Create and compile the model under a distribution strategy that uses TPUs
*   Run Bayesian inference

# Instructions

<h3>  &nbsp;&nbsp;Train on Google Colab using TPU&nbsp;&nbsp; <a href="https://colab.research.google.com/"><img valign="middle" src="https://raw.githubusercontent.com/rahulremanan/python_tutorial/master/Machine_Vision/07_Bayesian_deep_learning/media/tpu-hexagon.png" width="50"></a></h3>

1. On the main menu, click Runtime and select **Change runtime type**. Set "TPU" as the hardware accelerator.
1. Click Runtime again and select **Runtime > Run All**. You can also run the cells manually with Shift-ENTER.

## A quick word about TPUs

TPUs are currently available only in the Google Cloud. They are designed to read the data directly from Google Cloud Storage (GCS). Therefore, local datasets need to be either stored in the cloud instance memory to pass it to the TPU or as a GCS bucket so that the TPU can access it. For developers, this means that the typical generator functions that can handle CPUs or GPUs will therefore fail when trying to use TPUs, necessitating custom TPU specific generator functions. In this notebook, we are using the first approach by storing the entire fashion MNIST dataset in the instance memory. This approach of handling the dataset without a generator function works well in this particular case due to the manageable size of the dataset.

# Bayesian deep-learning using Fashion MNIST, Keras and TPUs

## Install HandBrake for transcoding video

In [None]:
%%capture
!apt install -y handbrake handbrake-cli

## Import

In [None]:
import os, cv2, time, sys, glob, requests, \
       numpy as np, tensorflow as tf

from PIL import Image
from tqdm import tqdm
from io import BytesIO
from matplotlib import pyplot as plt
from keras.preprocessing import image
from tensorflow.keras import backend as K

%matplotlib inline
print("Tensorflow version " + tf.__version__)

## Specify variables

In [None]:
WEIGHTS_FILE='./bayesian_fashionMNIST.h5'
GITHUB_REPO='https://github.com/rahulremanan/python_tutorial/'
WEIGHTS_URL='{}raw/master/Machine_Vision/07_Bayesian_deep_learning/weights/bayesian_fashionMNIST.h5'.format(GITHUB_REPO)
LABEL_NAMES = ['t_shirt','trouser','pullover','dress','coat','sandal','shirt','sneaker','bag','ankle_boots']

ENABLE_TRAINING = False

# Fashion MNIST dataset

The fashion MNIST dataset is available as a `tf.keras.datasets`.

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

# add empty color dimension
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

# Define the Bayesian deep-learning model

The following example uses a single layer conv-net with a dropout layer for doing the Monte Carlo approximations during Bayesian inference.

In [None]:
def fashionMNIST_model(input_data, dropout_rate=0.5,
                       model_name="Bayesian_fashionMNIST",
                       enable_bayesian_inference=True):
  inputs = tf.keras.Input(shape=(input_data.shape[1:]))
  x = tf.keras.layers.Conv2D(128, (3,3), name='Conv2D')(inputs)
  x = tf.keras.layers.MaxPooling2D(pool_size=(2,2), strides=(2,2),
                                   name='MaxPool2D')(x)
  x = tf.keras.layers.Activation('elu')(x)
  x = tf.keras.layers.Dropout(dropout_rate)(x,
                          training=enable_bayesian_inference)
  x = tf.keras.layers.Flatten(name='Flatten')(x)
  x = tf.keras.layers.Dense(10, name='Predictions')(x)
  outputs = tf.keras.layers.Activation('softmax')(x)
  model = tf.keras.Model(inputs=inputs, outputs=outputs, name=model_name)
  return model

# Using the TPU

To use the TPU for training and inference, first the TPU device needs to be initialized. Then the model has to be built and compiled specifically to use the TPU.

In [None]:
try:
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
  print('Running on TPU ...')

  tf.config.experimental_connect_to_cluster(tpu)
  tf.tpu.experimental.initialize_tpu_system(tpu)

  strategy = tf.distribute.TPUStrategy(tpu)
  print('Replicas: ', strategy.num_replicas_in_sync)
except ValueError:
  tpu, strategy = None, None

In [None]:
if strategy is not None:
  with strategy.scope():
    bayesian_model = fashionMNIST_model(x_train, enable_bayesian_inference=True)
    bayesian_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
                           loss=tf.keras.losses.sparse_categorical_crossentropy,
                           metrics=[tf.keras.metrics.sparse_categorical_accuracy])
else:
  bayesian_model = fashionMNIST_model(x_train, enable_bayesian_inference=True)
  bayesian_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
                         loss=tf.keras.losses.sparse_categorical_crossentropy,
                         metrics=[tf.keras.metrics.sparse_categorical_accuracy])

In [None]:
bayesian_model.summary()

# Train

## Download pre-trained weights

In [None]:
if not os.path.exists(WEIGHTS_FILE):
  !wget {WEIGHTS_URL} -O {WEIGHTS_FILE}

In [None]:
if os.path.exists(WEIGHTS_FILE):
  bayesian_model.load_weights(WEIGHTS_FILE)
  print('Loaded pre-trained weights: {} ...'.format(WEIGHTS_FILE))

## Training the fashion MNIST Bayesian deep-learning model

In [None]:
if ENABLE_TRAINING:
  bayesian_model.fit(x_train.astype(np.float32),y_train.astype(np.float32),
                     epochs=5,
                     steps_per_epoch=60,
                     validation_data=(x_test.astype(np.float32),
                                      y_test.astype(np.float32)),
                     validation_freq=1)

  bayesian_model.save_weights(WEIGHTS_FILE, overwrite=True)

# Bayesian inference
The inference step is repeated over and over again to obtain the model uncertainty associated with each prediction class. Unlike in the regular deep-learning architecture, each inference step returns a different set of probabilities for each class. The final accuracy is calculated as the class-wise mean of all the probabilities. The model uncertainty is numerically represented as the class-wise standard deviation of all the probabilities.

In [None]:
if strategy is not None:
  with strategy.scope():
    bayesian_model = fashionMNIST_model(x_train, enable_bayesian_inference=True)
    bayesian_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
                           loss=tf.keras.losses.sparse_categorical_crossentropy,
                           metrics=[tf.keras.metrics.sparse_categorical_accuracy])
else:
  bayesian_model = fashionMNIST_model(x_train, enable_bayesian_inference=True)
  bayesian_model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
                         loss=tf.keras.losses.sparse_categorical_crossentropy,
                         metrics=[tf.keras.metrics.sparse_categorical_accuracy])
bayesian_model.load_weights(WEIGHTS_FILE)

In [None]:
preds=[]
num_bayesian_inference_steps=10
for i in tqdm(range(num_bayesian_inference_steps)):
  preds.append(bayesian_model.predict(x_test[:16].astype(np.float32)))
mean_preds=np.mean(np.asarray(preds), axis=0)
stdev_preds=np.std(np.asarray(preds), axis=0)

## Visualize predictions

In [None]:
def plot_predictions(images, ground_truths, preds_acc,
                     preds_stdev=None, label_names=None,
                     enable_bayesian_inference=True):
  n = images.shape[0]
  nc = int(np.ceil(n / 4))
  f, axes = plt.subplots(nc, 4)
  for i in range(nc * 4):
    y = i // 4
    x = i % 4
    axes[x, y].axis('off')

    label = label_names[np.argmax(preds_acc[i])]
    ground_truth=label_names[ground_truths[i]]
    accuracy = np.max(preds_acc[i])
    if enable_bayesian_inference and preds_stdev is not None:
      confidence = preds_stdev[i][np.argmax(preds_acc[i])]

    if i > n:
      continue
    axes[x, y].imshow(images[i])
    if enable_bayesian_inference and preds_stdev is not None:
      axes[x, y].text(0.5,0.5, '\nLabel (Actual): {} ({})'.format(label,ground_truth) +
                               '\nAccuracy: {}, \nUncertainty: {}\n'.format(str(round(accuracy,2)),
                                                                            str(round(confidence,2))),
                      fontsize=10)
    else:
      axes[x, y].text(0.5,0.5, '\nLabel: {}'.format(label) +
                               '\nAccuracy: {} \n'.format(str(round(accuracy,2))),
                      fontsize=10)
    plt.gcf().set_size_inches(16,16)

In [None]:
print(x_test.shape)

In [None]:
plot_predictions(np.squeeze(x_test[:16]), y_test[:16],
                 mean_preds, stdev_preds,
                 label_names=LABEL_NAMES,
                 enable_bayesian_inference=True)

# Visualizing uncertainty -- Bayesian class activation maps

In [None]:
def tensor_featureSizeExtractor(last_conv_layer):
  try:
    if len(last_conv_layer.output.get_shape().as_list()) == 4:
      feature_size = last_conv_layer.output.get_shape().as_list()[3]
      return feature_size
    else:
      print ('Received tensor shape: {} instead of expected shape: 4'.format(len(last_conv_layer.output.get_shape().as_list())))
      return None
  except AttributeError:
    if len(last_conv_layer.output.shape) == 4:
      feature_size = list(last_conv_layer.output.shape)[3]
      return feature_size
    else:
      print ('Received tensor shape: {} instead of expected shape: 4'.format(len(last_conv_layer.output.shape)))
      return None

In [None]:
def pre_process_input_image(input_image_file,
                            image_height,
                            image_width,
                            pre_processor=None,
                            url_mode=False,
                            file_mode=False):
  if input_image_file is None:
    print ('No input file specified to generate predictions ...')
    return

  if url_mode:
    response = requests.get(input_image_file)
    img = Image.open(BytesIO(response.content))
    img = img.resize((image_width, image_height))
  elif file_mode:
    img = input_image_file
  else:
    img = tf.keras.preprocessing.image.load_img(input_image_file, target_size=(image_width, image_height))

  x = img

  if not file_mode:
    x = tf.keras.preprocessing.image.img_to_array(img)
    x= np.sum(x, axis=-1)
    x = (x - np.min(x))/(np.max(x) - np.min(x))
    x = np.expand_dims(x, axis=0)
    x = np.expand_dims(x, axis=-1)
    if pre_processor !=None:
      x = pre_processor(x)
  return x

In [None]:
def get_img_array(img_path, size):
    img = tf.keras.utils.load_img(img_path, target_size=size)
    array = tf.keras.utils.img_to_array(img)
    array = np.expand_dims(array, axis=0)
    return array

def make_gradcam_heatmap(img_array, model, grad_cam_conv_layer_name, pred_index=None):
    grad_model = tf.keras.models.Model(
        model.inputs, [model.get_layer(grad_cam_conv_layer_name).output, model.output]
    )

    with tf.GradientTape() as tape:
        last_conv_layer_output, preds = grad_model(img_array)
        if pred_index is None:
            pred_index = tf.argmax(preds[0])
        class_channel = preds[:, pred_index]

    grads = tape.gradient(class_channel, last_conv_layer_output)

    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))

    last_conv_layer_output = last_conv_layer_output[0]
    heatmap = last_conv_layer_output @ pooled_grads[..., tf.newaxis]
    heatmap = tf.squeeze(heatmap)

    heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
    return heatmap.numpy(), preds.numpy()

In [None]:
def bayesian_class_activation_map(input_image_file=None,
                                  pre_processor=None,
                                  label_decoder=None,
                                  model=None,
                                  labels=None,
                                  image_width=299,
                                  image_height=299,
                                  gradcam_conv_layer='conv_7b',
                                  url_mode=False,
                                  file_mode=False,
                                  eval_steps=10,
                                  benchmark=True):
  """
     A function to visualize class activation maps.

     Also generate a Bayesian class activation map, that outputs a list of
     heatmaps summarizing the model uncertainty.

     GradCAMS are generated using Keras 3 example notebook: https://keras.io/examples/vision/grad_cam/
  """
  x = pre_process_input_image(
        input_image_file=input_image_file,
        image_height=image_height,
        image_width=image_width,
        pre_processor=pre_processor,
        url_mode=url_mode,
        file_mode=file_mode
      )

  model = model
  if model == None:
    print ('No input model specified to generate predictions ...')
    return
  labels = labels

  heatmaps      = []
  iterate_input = []
  pred_labels   = []
  out_labels    = []

  probabilities = np.empty((0,len(labels)), float)
  print(probabilities)

  for step in (range(eval_steps)):
    startTime = time.time()
    input_img = x
    heatmap, preds = make_gradcam_heatmap(x, model, gradcam_conv_layer)
    heatmaps.append(heatmap)
    probability = preds.flatten()
    probabilities = np.append(probabilities,
                              np.array([probability]),
                              axis=0)

    if labels !=None:
      pred_label = labels[np.argmax(probability)]
      pred_labels.append(pred_label)
      out_labels.append(pred_label)
      print('PREDICTION: {}'.format(pred_label))
      print('ACCURACY: {}'.format(preds[0]))
      del pred_label
    elif label_decoder !=None:
      pred_label = pd.DataFrame(label_decoder(preds, top=3)[0],
                                columns=['col1',
                                         'category',
                                         'probability']).iloc[:,1:]
      pred_labels.append(pred_label.loc[0,'category'])
      out_labels.append(pred_label.loc[0,'category'])
      print('PREDICTION:', pred_label.loc[0,'category'])
      del pred_label
    else:
      print ('No labels will be generated ...')

    pred_labels = set(pred_labels)
    pred_labels = list(pred_labels)
    argmax = np.argmax(probability)
    endTime = time.time()
    executionTime = endTime - startTime

    if benchmark:
      print ('Completed processing {} out of {} steps in {} seconds ...'.format(int(step+1), int(eval_steps), float(executionTime)))
  if eval_steps > 1:
    heatmap_sum = heatmaps[0]
    for i in range(len(heatmaps)-1):
      if i<= len(heatmaps):
        heatmap_sum = np.nan_to_num(heatmaps[i+1])+np.nan_to_num(heatmap_sum)
    print (heatmap_sum)
    mean_heatmap = heatmap_sum/len(heatmaps)
  else:
    mean_heatmap = heatmap

  mean = np.matrix.mean(np.asmatrix(probabilities), axis=0)
  stdev = np.matrix.std(np.asmatrix(probabilities), axis=0)

  accuracy = np.matrix.tolist(mean)[0][np.argmax(mean)]
  uncertainty = np.matrix.tolist(stdev)[0][np.argmax(mean)]

  return [mean_heatmap, accuracy, uncertainty, pred_labels, heatmaps, out_labels, probabilities]

In [None]:
def heatmap_overlay(img, heatmap, threshold=0.8, read_file=True):
  if read_file:
    img = cv2.imread(img)
  heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
  heatmap = np.uint8(255*heatmap)
  heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
  hif = threshold
  superimposed_img = cv2.addWeighted(img, threshold, heatmap, 1-threshold, 0)
  return superimposed_img, heatmap

In [None]:
bayesian_cam = True
IMG_URL = 'https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Reebok_Royal_Glide_Ripple_Clip_shoe.jpg/440px-Reebok_Royal_Glide_Ripple_Clip_shoe.jpg'

In [None]:
!wget {IMG_URL}
INPUT_IMAGE_FILE = './440px-Reebok_Royal_Glide_Ripple_Clip_shoe.jpg'

## Visualize input image

In [None]:
img = tf.keras.preprocessing.image.load_img(INPUT_IMAGE_FILE, target_size=(x_test.shape[1], x_test.shape[2]))
x = tf.keras.preprocessing.image.img_to_array(img)
print(x.shape)
print(np.max(x), np.min(x))
x= np.sum(x, axis=-1)
x = (x - np.min(x))/(np.max(x) - np.min(x))
plt.imshow(x)

In [None]:
bayesian_model = fashionMNIST_model(x_train, enable_bayesian_inference=True)
bayesian_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss=tf.keras.losses.sparse_categorical_crossentropy,
    metrics=[tf.keras.metrics.sparse_categorical_accuracy]
    )
bayesian_model.load_weights(WEIGHTS_FILE)

In [None]:
visualize_layers = ['Conv2D', 'MaxPool2D']

In [None]:
if bayesian_cam:
  outputs = []
  for visualizer_layer in visualize_layers:
    output = bayesian_class_activation_map(input_image_file=INPUT_IMAGE_FILE,
                                           url_mode=False,
                                           file_mode=False,
                                           pre_processor=None,
                                           model=bayesian_model,
                                           labels=LABEL_NAMES,
                                           image_width=x_test.shape[1],
                                           image_height=x_test.shape[2],
                                           gradcam_conv_layer=visualizer_layer,
                                           eval_steps=100)
    print (output[3])
    outputs.append(output)

In [None]:
for i, v in enumerate(visualize_layers):
  HEATMAP = outputs[i][0]

  plt.matshow(HEATMAP)
  plt.show()

In [None]:
if bayesian_cam:
  for i, v in enumerate(visualize_layers):
    output = outputs[i]
    heatmaps=output[4]
    labels=output[5]
    img_array = []

    for i in range(len(heatmaps)):
      h_map = heatmaps[i]
      LABEL = labels[i]
      heatmap_output = heatmap_overlay(INPUT_IMAGE_FILE, h_map, threshold=0.7)
      height, width, layers = heatmap_output[0].shape
      size = (width, height)
      superimposed_img = heatmap_output[0]
      font = cv2.FONT_HERSHEY_DUPLEX
      cv2.putText(superimposed_img,'{}'.format(LABEL),(10,50), font, 2,(255,255,255),2)
      img_array.append(np.uint8(superimposed_img))

    out = cv2.VideoWriter(f'bayesian_class_activation_maps_{v}.avi',
                          cv2.VideoWriter_fourcc(*'DIVX'), 8, size)

    for i in range(len(img_array)):
      out.write(img_array[i])
    out.release()

In [None]:
%%capture
for v in visualize_layers:
  !HandBrakeCLI -i ./bayesian_class_activation_maps_{v}.avi \
                -o ./bayesian_class_activation_maps_{v}.mp4 \
                -e x264 \
                -q 22   \
                -r 15   \
                -B 64   \
                -X 480  \
                -O

In [None]:
import io
import base64
from IPython.display import HTML

for v in visualize_layers:
  video = io.open(f'./bayesian_class_activation_maps_{v}.mp4', 'r+b').read()
  encoded = base64.b64encode(video)
  display(HTML(data='''<video alt="test" controls>
                    <source src="data:video/mp4;base64,{0}" type="video/mp4" />
                 </video>'''.format(encoded.decode('ascii'))))

# Learning tasks:
* Compare and contrast a regular deep-learning architecture and the Bayesian deep-learning architecture described in this notebook
* Implement a regular inference (non-Bayesian) using the same model architecture and model weights used in this notebook
* Describe why it is important to visualize uncertainty
* The limitations of describing model uncertainty using just descriptive statistics
* Anscombe's quartet and the importance of visual tools such as Bayesian class activation maps described in this notebook