# Deep Learning from Pre-Trained Models with Keras

## Introduction

ImageNet, an image recognition benchmark dataset*, helped trigger the modern AI explosion.  In 2012, the AlexNet architecture (a deep convolutional-neural-network) rocked the ImageNet benchmark competition, handily beating the next best entrant.  By 2014, all the leading competitors were deep learning based.  Since then, accuracy scores continued to improve, eventually surpassing human performance.

In this hands-on tutorial we will build on this pioneering work to create our own neural-network architecture for image recognition.  Participants will use the elegant Keras deep learning programming interface to build and train TensorFlow models for image classification tasks on the CIFAR-10 / MNIST datasets*.  We will demonstrate the use of transfer learning* (to give our networks a head-start by building on top of existing, ImageNet pre-trained, network layers*), and explore how to improve model performance for standard deep learning pipelines.  We will use cloud-based interactive Jupyter notebooks to work through our explorations step-by-step.  Once participants have successfully trained their custom model we will show them how to submit their model's predictions to Kaggle for scoring*.

This tutorial aims to prepare participants for the HPC Saudi 2020 Student AI Competition.

Participants are expected to bring their own laptops and sign-up for free online cloud services (e.g., Google Colab, Kaggle).  They may also need to download free, open-source software prior to arriving for the workshop.

This tutorial assumes some basic knowledge of neural networks. If you’re not already familiar with neural networks, then you can learn the basics concepts behind neural networks at [course.fast.ai](https://course.fast.ai/).

* Tutorial materials are derived from:
  * [PyTorch Tutorials](https://github.com/kaust-vislab/pytorch-tutorials) by David Pugh.
  * [What is torch.nn really?](https://pytorch.org/tutorials/beginner/nn_tutorial.html) by Jeremy Howard, Rachel Thomas, Francisco Ingham.
  * [Machine Learning Notebooks](https://github.com/ageron/handson-ml2) (2nd Ed.) by Aurélien Géron.
  * *Deep Learning with Python* by François Chollet.

### Jupyter Notebooks

This is a Jupyter Notebook.  It provides a simple, cell-based, IDE for developing and exploring complex ideas via code, visualizations, and documentation.

A notebook has two primary types of cells: i) `markdown` cells for textual notes and documentation, such as the one you are reading now, and ii) `code` cells, which contain snippets of code (typically *Python*, but also *bash* scripts) that can be executed.  

The currently selected cell appears within a box. A green box indicates that the cell is editable.  Clicking inside a *code* cell makes it selected and editable.  Double-click inside *markdown* cells to edit.

Use `Tab` for context-sensitive code-completion assistance when editing Python code in *code* cells.  For example, use code assistance after a `.` seperator to find available object members.  For help documentation, create a new *code* cell, and use commands like `dir(`*module*`)`, `help(`*topic*`)`, `?`*name*, or `??`*function* for user provided *module*, *topic*, variable *name*, or *function* name.  The magic `?` and `??` commands show documentation / source code in a separate pane.

Clicking on `[Run]` or pressing `Ctrl-Enter` will execute the contents of a cell.  A *markdown* cell converts to its display version, and a *code* cell runs the code inside.  To the left of a *code* cell is a small text bracket `In [ ]:`.  If the bracket contains an asterix, e.g., `In [*]:`, that cell is currently executing.  Only one cell executes at a time (if multiple cells are *Run*, they are queued up to execute in the order they were run).  When a *code* cell finishes executing, the bracket shows an execution count in the bracket – each *code* cell execution increments the counter and provides a way to determine the order in which codes were executed – e.g., `In [7]` for the seventh cell to complete.  

The output produced by a *code* cell appears at the bottom of that cell after it executes.  The output generated by a code cell includes anything printed to the output during execution (e.g., print statements, or thrown errors) and the final value generated by the cell (i.e., not the intermediate values).  The final value is 'pretty printed' by Jupyter.

Typically, notebooks are written to be executed in order, from top to bottom.  Behind the scenes, however, each Notebook has a single Python state (the `kernel`), and each *code* cell that executes, modifies that state.  It is possible to modify and re-run earlier cells; however, care must be taken to also re-run any other cells that depend upon the modified one.  List the Python state global variables with the magic command `%wgets`.  The *kernel* can be restarted to a known state, and cell output cleared, if the Python state becomes too confusing to fix manually (choose `Restart & Clear Output` from the Jupyter `Kernel` menu) – this requires running each *code* cell again.

Complete user documentation is available at [jupyter-notebook.readthedocs.io](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html#notebook-user-interface). <br/>
Many helpful tips and techniques from [28 Jupyter Notebook Tips, Tricks, and Shortcuts](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/).

## Setup

### Create a Kaggle Account

#### 1. Register for an account

In order to download Kaggle competition data you will first need to create a [Kaggle](https://www.kaggle.com/) account.

#### 2. Create an API key

Once you have registered for a Kaggle account you will need to create some [API credentials](https://github.com/Kaggle/kaggle-api#api-credentials) in order to be able to use the `kaggle` CLI to download data.

### Download MNIST Data

If you are using Binder to run this notebook, then the data is already downloaded and available.  Skip to the next step.

If you are using Google Colab to run this notebook, then you will need to download the data before proceeding.

#### Download MNIST from Kaggle

Provide your Kaggle username and API key in the cell below and execute the code to download the Kaggle [Digit Recognizer: Learn computer vision with the famous MNIST data](https://www.kaggle.com/c/digit-recognizer) competition data. 

**Note: Before attempting to download the competition data you will need to login to your Kaggle account and accept the rules for this competition.**

In [None]:
%%bash
# NOTE: Replace YOUR_USERNAME and YOUR_API_KEY with actual credentials 
export KAGGLE_USERNAME="YOUR_USERNAME"
export KAGGLE_KEY="YOUR_API_KEY"
kaggle competitions download -c digit-recognizer -p ../datasets/mnist/

#### (Alternative) Download MNIST from GitHub

If you are running this notebook using Google Colab, but did create a Kaggle account and API key, then  dowload the data from our GitHub repository by running the code in the following cells.

In [None]:
import os
import pathlib
import requests

#RAW_URL = "https://raw.githubusercontent.com/kaust-vislab/keras-tutorials/master/mnist/data/raw"
RAW_URL = "https://github.com/holstgr-kaust/keras-tutorials/raw/master/datasets/mnist"
DEST_DIR = pathlib.Path('../datasets/mnist')

def fetch_mnist_data():
    DEST_DIR.mkdir(parents=True, exist_ok=True)
    for n in ["mnist.npz", "kaggle/train.csv", "kaggle/test.csv", "kaggle/sample_submission.csv"]:
        path = DEST_DIR / n
        with path.open(mode = 'wb') as f:
            response = requests.get(RAW_URL + "/" + n)
            f.write(response.content)

In [None]:
fetch_mnist_data()

#### (Alternative) Download MNIST with Keras

If you are running this notebook using Google Colab, but did create a Kaggle account and API key, then dowload the data using the Keras load_data() API by running the code in the following cells.

In [None]:
from tensorflow.keras.datasets import mnist
mnist.load_data();

### Download CIFAR10 Data

If you are using Binder to run this notebook, then the data is already downloaded and available.  Skip to the next step.

If you are using Google Colab to run this notebook, then you will need to download the data before proceeding.

#### Download CIFAR10 from Kaggle

Provide your Kaggle username and API key in the cell below and execute the code to download the Kaggle [CIFAR-10 keras files](https://www.kaggle.com/guesejustin/cifar10-keras-files-cifar10load-data) competition data. 

**TODO:** Fix kaggle CLI download command

In [None]:
%%bash
# NOTE: Replace YOUR_USERNAME and YOUR_API_KEY with actual credentials 
export KAGGLE_USERNAME="YOUR_USERNAME"
export KAGGLE_KEY="YOUR_API_KEY"
# https://www.kaggle.com/guesejustin/cifar10-keras-files-cifar10load-data/download
kaggle competitions download -c guesejustin/cifar10-keras-files-cifar10load-data -p ../datasets/cifar10/

#### (Alternative) Download CIFAR10 from GitHub

If you are running this notebook using Google Colab, but did create a Kaggle account and API key, then  dowload the data from our GitHub repository by running the code in the following cells.

In [None]:
import os
import pathlib
import requests

#RAW_URL = "https://raw.githubusercontent.com/kaust-vislab/keras-tutorials/master/datasets/cifar10"
RAW_URL = "https://github.com/holstgr-kaust/keras-tutorials/raw/master/datasets/cifar10"
DEST_DIR = pathlib.Path('../datasets/cifar10')

def fetch_cifar10_data():
    DEST_DIR.mkdir(parents=True, exist_ok=True)
    for n in ["cifar-10.npz", "cifar-10-batches-py.tar.gz"]:
        path = DEST_DIR / n
        with path.open(mode = 'wb') as f:
            response = requests.get(RAW_URL + "/" + n)
            f.write(response.content)

In [None]:
fetch_cifar10_data()

In [None]:
%%bash
DEST_DIR='../datasets/cifar10'
tar xvpf "${DEST_DIR}/cifar-10-batches-py.tar.gz" --directory="${DEST_DIR}" 

#### (Alternative) Download CIFAR10 with Keras

If you are running this notebook using Google Colab, but did *not* create a Kaggle account and API key, then dowload the data using the Keras load_data() API by running the code in the following cells.

In [None]:
from tensorflow.keras.datasets import cifar10
cifar10.load_data();

## Tutorial

### Python Initialization

Import the modules we will use.  `%matplotlib inline` is a magic command that makes *matplotlib* charts and plots appear was outputs in the notebook.

In [None]:
import os
import pathlib
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
import tensorflow as tf
import tensorflow.keras as keras

`%matplotlib inline` enables plots to appear in cell output.

`%matplotlib notebook` enables semi-interactive plots that can be enlarged, zoomed, and cropped while the plot is active.  One issue with this option is that new plots appear in the active plot widget, not in the cell where the data was produced.

In [None]:
%matplotlib inline

In [None]:
print("executing_eagerly:", tf.executing_eagerly())
print("is_gpu_available:", tf.test.is_gpu_available(), tf.test.gpu_device_name())

### Dataset Pre-processing - MNIST

The previously acquired MNIST dataset is the essential input needed to train an image classification model. Before using the dataset, there are several preprocessing steps required to load the data, and create the correctly sized training, validation, and testing arrays used as input to the network.

The following data preparation steps are needed before they can become inputs to the network:

* Cache the downloaded dataset (to use Keras `load_data()` functionality).
* Load the dataset (MNIST is small, and fits in memory).
    * Convert from textual CSV files into binary tensor arrays (https://www.tensorflow.org/tutorials/load_data/csv).
    * Reshape from (784, 1) to (28, 28,1) to (32, 32, 3)
* Verify the shape and type of the data, and understand it...
* Convert label indices into categorical vectors.
* Convert image data from integer to float values, and normalize.
  * Verify converted input data.

#### Cache Data

Make downloaded data available to Keras.  Provide dataset utility functions.

In [None]:
# Cache MNIST Datasets

for n in ["mnist.npz", "kaggle/train.csv", "kaggle/test.csv"]:
    #DATA_URL = "https://github.com/holstgr-kaust/keras-tutorials/raw/master/datasets/mnist/%s" % n
    DATA_URL = "file:///" + str(pathlib.Path("../datasets/mnist/%s" % n).absolute())
    #data_file_path = tf.keras.utils.get_file(p + n, DATA_URL)
    data_file_path = tf.keras.utils.get_file(n.replace('/','-mnist-'), DATA_URL)
    print("cached file: %s" % n)

In [None]:
%%bash
find ~/.keras -name "*mnist*" -type f

In [None]:
def get_csv_dataset(file_path, **kwargs):
    dataset = tf.data.experimental.make_csv_dataset(
        file_path,
        batch_size=5, # Artificially small to make examples easier to show.
        label_name='label',
        na_value="?",
        num_epochs=1,
        ignore_errors=True, 
        **kwargs)
    return dataset

def pack(features, label):
    return tf.stack(list(features.values()), axis=-1), label

In [None]:
def show_batch(dataset):
    for batch, label in dataset.take(1):
        print("{:20s}: {} :: {}".format('label', label, type(label)))
        for key, value in batch.items():
              print("{:20s}: {} :: {}".format(key, value.numpy(), type(value)))

def show_packed(packed_dataset):
    for features, labels in packed_dataset.take(1):
        print(features.numpy())
        print()
        print(labels.numpy())

#### Load Data

In [None]:
train_file_path = "../datasets/mnist/kaggle/train.csv"
test_file_path = "../datasets/mnist/kaggle/test.csv"

raw_train_data = get_csv_dataset(train_file_path)
packed_train_data = raw_train_data.map(pack)
train_data = packed_train_data.shuffle(500)

# NOTE: unlabelled Kaggle test dataset?
#raw_test_data = get_dataset(test_file_path)

(Alternative) Load data via Keras API.  This loads data into a `numpy` array, and the test examples are labelled.

In [None]:
# TODO: complete example and convert numpy array into Dataset
from tensorflow.keras.datasets import mnist

# The data, split between train and test sets:
(x_train_mnist, y_train_mnist), (x_test_mnist, y_test_mnist) = mnist.load_data()

#(x_train_mnist, y_train_mnist, x_test_mnist, y_test_mnist)

**TODO: Modify Explore Data examples to use Dataset, move into Explore Data section**

In [None]:
print('x_train type:', type(x_train_mnist), ',', 'y_train type:', type(y_train_mnist))
print('x_train dtype:', x_train_mnist.dtype, ',', 'y_train dtype:', y_train_mnist.dtype)
print('x_train shape:', x_train_mnist.shape, ',', 'y_train shape:', y_train_mnist.shape)
print('x_test shape:', x_test_mnist.shape, ',', 'y_test shape:', y_test_mnist.shape)
print(x_train_mnist.shape[0], 'train samples')
print(x_test_mnist.shape[0], 'test samples')

In [None]:
print('x_train (min, max, mean): (%s, %s, %s)' % (x_train_mnist.min(), x_train_mnist.max(), x_train_mnist.mean()))
print('y_train (min, max): (%s, %s)' % (y_train_mnist.min(), y_train_mnist.max()))

In [None]:
# Show array of random labelled images with matplotlib (re-run cell to see new examples)

fig = plt.figure(figsize=(16,8))

for i in range(40):
    plt.subplot(4, 10, i + 1)
    plt.xticks([])
    plt.yticks([])
    idx = int(random.uniform(0, x_train_mnist.shape[0]))
    plt.title(y_train_mnist[idx])
    plt.imshow(x_train_mnist[idx], cmap=plt.get_cmap('gray'))
plt.show()

In [None]:
hist, bins = np.histogram(y_train_mnist, bins = range(y_train_mnist.min(), y_train_mnist.max() + 2))

fig = plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.hist(y_train_mnist, bins = range(y_train_mnist.min(), y_train_mnist.max() + 2))
plt.xticks(range(y_train_mnist.min(), y_train_mnist.max() + 2))
plt.title("y_train histogram")
plt.subplot(1,2,2)
plt.hist(x_train_mnist.flat, bins = range(x_train_mnist.min(), x_train_mnist.max() + 2))
plt.title("x_train histogram")
plt.tight_layout()
plt.show()

print('y_train histogram counts:', hist)

#### Explore Data

Explore data types, shape, and value ranges.  Ensure they make sense, and you understand the data well.

In [None]:
show_batch(raw_train_data)

In [None]:
show_packed(packed_train_data)

In [None]:
packed_train_data

### CIFAR10 - Dataset Processing

The previously acquired CIFAR10 dataset is the essential input needed to train an image classification model. Before using the dataset, there are several preprocessing steps required to load the data, and create the correctly sized training, validation, and testing arrays used as input to the network.

The following data preparation steps are needed before they can become inputs to the network:

* Cache the downloaded dataset (to use Keras `load_data()` functionality).
* Load the dataset (CIFAR10 is small, and fits into a `numpy` array).
* Verify the shape and type of the data, and understand it...
* Convert label indices into categorical vectors.
* Convert image data from integer to float values, and normalize.
  * Verify converted input data.

#### Cache Data

Make downloaded data available to Keras.  Provide dataset utility functions.

In [None]:
# Cache CIFAR10 Datasets

for n in ["cifar-10.npz", "cifar-10-batches-py.tar.gz"]:
    #DATA_URL = "https://github.com/holstgr-kaust/keras-tutorials/raw/master/datasets/cifar10/%s" % n
    DATA_URL = "file:///" + str(pathlib.Path("../datasets/cifar10/%s" % n).absolute())
    data_file_path = tf.keras.utils.get_file(n, DATA_URL)
    print("cached file: %s" % n)

In [None]:
%%bash
find ~/.keras -name "cifar-10*" -type f

In [None]:
# Helper functionality to provide human-readable labels
cifar10_label_names = ['airplane', 'automobile', 
                       'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 
                       'ship', 'truck']

def cifar10_index_label(idx):
    return cifar10_label_names[idx]

def cifar10_category_label(cat):
    return cifar10_index_label(cat.argmax())

def cifar10_label(v):
    return cifar10_index_label(v) if np.isscalar(v) else cifar10_category_label(v)

#### Load Data

In [None]:
from tensorflow.keras.datasets import cifar10

# The data, split between train and test sets:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

#### Explore Data

Explore data types, shape, and value ranges.  Ensure they make sense, and you understand the data well.

In [None]:
print('x_train type:', type(x_train), ',', 'y_train type:', type(y_train))
print('x_train dtype:', x_train.dtype, ',', 'y_train dtype:', y_train.dtype)
print('x_train shape:', x_train.shape, ',', 'y_train shape:', y_train.shape)
print('x_test shape:', x_test.shape, ',', 'y_test shape:', y_test.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

In [None]:
print('x_train (min, max, mean): (%s, %s, %s)' % (x_train.min(), x_train.max(), x_train.mean()))
print('y_train (min, max): (%s, %s)' % (y_train.min(), y_train.max()))

In [None]:
# Show array of random labelled images with matplotlib (re-run cell to see new examples)

fig = plt.figure(figsize=(16,8))

for i in range(40):
    plt.subplot(4, 10, i + 1)
    plt.xticks([])
    plt.yticks([])
    idx = int(random.uniform(0, x_train.shape[0]))
    plt.title(cifar10_label(y_train[idx][0]))
    plt.imshow(x_train[idx], cmap=plt.get_cmap('gray'))
plt.show()

In [None]:
hist, bins = np.histogram(y_train, bins = range(y_train.min(), y_train.max() + 2))

fig = plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.hist(y_train, bins = range(y_train.min(), y_train.max() + 2))
plt.xticks(range(y_train.min(), y_train.max() + 2))
plt.title("y_train histogram")
plt.subplot(1,2,2)
plt.hist(x_train.flat, bins = range(x_train.min(), x_train.max() + 2))
plt.title("x_train histogram")
plt.tight_layout()
plt.show()

print('y_train histogram counts:', hist)

The data looks reasonable: there are sufficient examples for each category (y_train) and a near-normal distribution of pixel values.

#### Data Conversion

However, the data type for the training data is `uint8`, while the input type for the network will be `float32` so the data must be converted.  Also, the data should be normalized, and the labels need to be categorical.  I.e., instead of label existing as 10 different values in a 1-D space, they need to exist as Boolean values in a 10-D space — one dimension for each category, and either a 0 or 1 value in each dimension to represent membership in that category.

* https://keras.io/examples/cifar10_cnn/

In [None]:
num_classes = (y_train.max() - y_train.min()) + 1
print('num_classes =', num_classes)

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

In [None]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

train_data = (x_train, y_train)
test_data = (x_test, y_test)

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator


'''
datagen = ImageDataGenerator(
    featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    zca_epsilon=1e-06,  # epsilon for ZCA whitening
    rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
    # randomly shift images horizontally (fraction of total width)
    width_shift_range=0.1,
    # randomly shift images vertically (fraction of total height)
    height_shift_range=0.1,
    shear_range=0.,  # set range for random shear
    zoom_range=0.,  # set range for random zoom
    channel_shift_range=0.,  # set range for random channel shifts
    # set mode for filling points outside the input boundaries
    fill_mode='nearest',
    cval=0.,  # value used for fill_mode = "constant"
    horizontal_flip=True,  # randomly flip images
    vertical_flip=False,  # randomly flip images
    # set rescaling factor (applied before any other transformation)
    rescale=None,
    # set function that will be applied on each input
    preprocessing_function=None,
    # image data format, either "channels_first" or "channels_last"
    data_format=None,
    # fraction of images reserved for validation (strictly between 0 and 1)
    validation_split=0.0)

# Compute quantities required for feature-wise normalization
# (std, mean, and principal components if ZCA whitening is applied).
datagen.fit(x_train)
''';

In [None]:
print('x_train type:', type(x_train))
print('x_train dtype:', x_train.dtype)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('y_train type:', type(y_train))
print('y_train dtype:', y_train.dtype)
print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)

## Acquire Pre-Trained Network

Download an *ImageNet* pretrained VGG16 network[<sup>1</sup>](#fn1), sans classification layer, shaped for 32x32px colour images<sup>[*](https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5)</sup> (the smallest supported size).  This image-feature detection network is an example of a deep CNN (Convolutional Neural Network).

**Note:** The network must be fixed – it was already trained on a very large dataset, so training it on our smaller dataset would result in it un-learning valuable generic features.

<span id="fn1"><sup>[1]</sup> Very Deep Convolutional Networks for Large-Scale Image Recognition* by Karen Simonyan and Andrew Zisserman, [arXiv (2014)](https://arxiv.org/abs/1409.1556).</span>

In [None]:
from tensorflow.keras.applications import VGG16

conv_base = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))
conv_base.trainable = False
conv_base.summary()

The input layer shape and data type should match with the input data:

*Note:* The first dimension of the shape will differ; the input layer has `None` to indicate it accepts an a batch sized collection of arrays of the remaining shape.  The data shape has a number indicating how many samples it contains.

In [None]:
print("input layer shape:", conv_base.layers[0].input.shape)
print("input layer dtype:", conv_base.layers[0].input.dtype)
conv_base.layers[0].input

In [None]:
print("input data shape:", x_train.shape)
print("input data dtype:", x_train.dtype)

### Explore Convolutional Layers

In [None]:
def train_image_plot(image_index=None):
    if not image_index:
        image_index = int(random.uniform(0, x_train.shape[0]))

    plt.imshow(x_train[image_index], cmap='gray')
    plt.title("%s (%s)" % (cifar10_label(y_train[image_index]), image_index))
    plt.show()
    
    return image_index

def get_model_layer(model, layer_name):
    if type(layer_name) == str:
        layer = model.get_layer(layer_name)
    else:
        m = model
        for ln in layer_name:
            model = m
            m = m.get_layer(ln)
        layer = m
    return (model, layer)

In [None]:
def visualize_conv_layer_weights(model, layer_name):
    (model, layer) = get_model_layer(model, layer_name)
    layer_weights = layer.weights[0]

    max_size = layer_weights.shape[3]
    col_size = 12
    row_size = int(np.ceil(float(max_size) / float(col_size)))

    print("conv layer: %s shape: %s size: (%s,%s) count: %s" % 
          (layer_name,
           layer_weights.shape,
           layer_weights.shape[0], layer_weights.shape[1],
           max_size))

    fig, ax = plt.subplots(row_size,col_size,figsize=(12, 1.2 * row_size))
    idx = 0

    for row in range(0,row_size):
        for col in range(0,col_size):
            ax[row][col].set_xticks([])
            ax[row][col].set_yticks([])
            if idx < max_size:
                ax[row][col].imshow(layer_weights[:, :, 0, idx], cmap='gray')
            else:
                fig.delaxes(ax[row][col])
            idx += 1

    plt.tight_layout()
    plt.show()

In [None]:
def visualize_conv_layer_output(model, layer_name, image_index=None):
    (model, layer) = get_model_layer(model, layer_name)
    layer_output = layer.output

    if not image_index:
        image_index = train_image_plot()
        
    intermediate_model = keras.models.Model(inputs = model.input, outputs=layer_output) 
    intermediate_prediction = intermediate_model.predict(x_train[image_index].reshape(1,32,32,3))
  
    max_size = layer_output.shape[3]
    col_size = 10
    row_size = int(np.ceil(float(max_size) / float(col_size)))

    print("conv layer: %s shape: %s size: (%s,%s) count: %s" % 
          (layer_name,
           layer_output.shape,
           layer_output.shape[1], layer_output.shape[2],
           max_size))
    
    fig, ax = plt.subplots(row_size,col_size,figsize=(12, 1.2 * row_size))
    idx = 0

    for row in range(0,row_size):
        for col in range(0,col_size):
            ax[row][col].set_xticks([])
            ax[row][col].set_yticks([])
            if idx < max_size:
                ax[row][col].imshow(intermediate_prediction[0, :, :, idx], cmap='gray')
            else:
                fig.delaxes(ax[row][col])
            idx += 1

    plt.tight_layout()
    plt.show()

In [None]:
from tensorflow.keras import backend as K

def generate_response_pattern(model, conv_layer_output, filter_index=0):
    image_size = 32
    #step_size = 1.0
    epsilon = 1e-5

    def process_image(x):
        # Normalizes the tensor: centers on 0, ensures that std is 0.1 Clips to [0, 1]
        x -= x.mean()
        x /= (x.std() + epsilon)
        x *= 0.1
        x += 0.5
        x = np.clip(x, 0, 1)
        x *= 255
        x = np.clip(x, 0, 255).astype('uint8')
        return x

    # TODO: is this required?
    #with tf.device('/gpu:0'):
    img_tensor = tf.Variable(tf.random.uniform((1, 32, 32, 3)) * 20 + 128.0, trainable=True)

    response_model = keras.models.Model([model.inputs], [conv_layer_output])

    for i in range(40):
        with tf.GradientTape() as gtape:
            layer_output = response_model(img_tensor)
            loss = K.mean(layer_output[0, :, :, filter_index])
            grads = gtape.gradient(loss, img_tensor)
            grads /= (K.sqrt(K.mean(K.square(grads))) + epsilon)
        img_tensor = tf.Variable(tf.add(img_tensor, grads))

    img = np.array(img_tensor[0])
    return process_image(img)

In [None]:
def visualize_conv_layer_response(model, layer_name):
    (model, layer) = get_model_layer(model, layer_name)
    layer_output = layer.output
    
    max_size = layer_output.shape[3]
    col_size = 12
    row_size = int(np.ceil(float(max_size) / float(col_size)))

    print("conv layer: %s shape: %s size: (%s,%s) count: %s" % 
          (layer_name,
           layer_output.shape,
           layer_output.shape[1], layer_output.shape[2],
           max_size))
    
    fig, ax = plt.subplots(row_size,col_size,figsize=(12, 1.2 * row_size))
    idx = 0

    for row in range(0,row_size):
        for col in range(0,col_size):
            ax[row][col].set_xticks([])
            ax[row][col].set_yticks([])
            if idx < max_size:
                img = generate_response_pattern(model, layer_output, idx)
                ax[row][col].imshow(img, cmap='gray')
                ax[row][col].set_title("%s" % idx)
            else:
                fig.delaxes(ax[row][col])
            idx += 1

    plt.tight_layout()
    plt.show()

In [None]:
for n in [l.name for l in conv_base.layers if isinstance(l, keras.layers.Conv2D)][:4]:
    visualize_conv_layer_weights(conv_base, n)

In [None]:
image_index = train_image_plot()
for n in [l.name for l in conv_base.layers if isinstance(l, keras.layers.Conv2D)][:7]:
    visualize_conv_layer_output(conv_base, n, image_index)

In [None]:
for n in [l.name for l in conv_base.layers if isinstance(l, keras.layers.Conv2D)][:4]:
    visualize_conv_layer_response(conv_base, n)

In [None]:
# NOTE: Visualize mid to higher level convolutional layers; 
#       lengthy operation, be prepared to wait...
for n in [l.name for l in conv_base.layers if isinstance(l, keras.layers.Conv2D)][4:]:
    visualize_conv_layer_response(conv_base, n)

### CNN Base + Classifier Model

Create a simple model that has the pre-trained CNN (Convolutional Neural Network) as a base, and adds a basic classifier on top.

Notice the split of total parameters (\~15 million) between trainable (\~0.3 million for our classifier) and non-trainable (\~14.7 million for the pre-trained CNN).

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Activation, Dropout

model = Sequential()
model.add(conv_base)
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.summary()

### Train Model

In [None]:
batch_size = 128 #32
epochs = 25 #100
learning_rate = 1e-3 #1e-4
decay = 1e-6

In [None]:
from tensorflow.keras.optimizers import RMSprop

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(learning_rate=learning_rate, decay=decay),
              metrics=['accuracy'])

In [None]:
#history = model.fit_generator(train_data,
history = model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test, y_test),
          shuffle=True)

### Evaluate Model

Visualize accuracy and loss for training and validation.

* https://keras.io/visualization/

In [None]:
def history_plot(history):
    fig = plt.figure(figsize=(12,5))

    plt.title('Model accuracy & loss')

    # Plot training & validation accuracy values
    ax1 = fig.add_subplot()
    #ax1.set_ylim(0, 1.1 * max(history.history['loss']+history.history['val_loss']))
    ax1.set_prop_cycle(color=['green', 'red'])
    p1 = ax1.plot(history.history['loss'], label='Train Loss')
    p2 = ax1.plot(history.history['val_loss'], label='Test Loss')

    # Plot training & validation loss values
    ax2 = ax1.twinx()
    ax2.set_ylim(0, 1.1 * max(history.history['accuracy']+history.history['val_accuracy']))
    ax2.set_prop_cycle(color=['blue', 'orange'])
    p3 = ax2.plot(history.history['accuracy'], label='Train Acc')
    p4 = ax2.plot(history.history['val_accuracy'], label='Test Acc')

    ax1.set_ylabel('Loss')
    ax1.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')

    pz = p3 + p4 + p1 + p2
    plt.legend(pz, [l.get_label() for l in pz], loc='center right')
    plt.show()

In [None]:
#history.history.keys()
history_plot(history)

In [None]:
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

In [None]:
def prediction_plot(model, test_data):
    (x_test, y_test) = test_data
    fig = plt.figure(figsize=(16,8))
    correct = 0
    total = 0
    
    for i in range(40):
        plt.subplot(4, 10, i + 1)
        plt.xticks([])
        plt.yticks([])
        idx = int(random.uniform(0, x_test.shape[0]))
        result = model.predict_classes(x_test[idx:idx+1])[0]
        rCorrect = True if cifar10_label(y_test[idx]) == cifar10_label(result) else False
        rSym = '✔' if rCorrect else '✘'
        correct += 1 if rCorrect else 0
        total += 1
        plt.title("%s %s" % (rSym, cifar10_label(result)))
        plt.imshow(x_test[idx], cmap=plt.get_cmap('gray'))
    plt.show()
    
    print("% 3.2f%% correct (%s/%s)" % (100.0 * float(correct) / float(total), correct, total))

In [None]:
def prediction_proba_plot(model, test_data):
    (x_test, y_test) = test_data
    fig = plt.figure(figsize=(15,15))
    
    for i in range(10):
        plt.subplot(10, 2, (2*i) + 1)
        plt.xticks([])
        plt.yticks([])
        idx = int(random.uniform(0, x_test.shape[0]))
        result = model.predict_proba(x_test[idx:idx+1])[0] * 100 # prob -> percent
        plt.title("%s (%s)" % (cifar10_label(y_test[idx]), idx))
        plt.imshow(x_test[idx], cmap=plt.get_cmap('gray'))
        
        ax = plt.subplot(10, 2, (2*i) + 2)
        plt.bar(np.arange(len(result)), result, label='%')
        plt.xticks(range(0, len(result) + 1))
        ax.set_xticklabels(cifar10_label_names)
        plt.title("classifier probabilities")

        plt.tight_layout()
    plt.show()

In [None]:
prediction_plot(model, (x_test, y_test))

In [None]:
prediction_proba_plot(model, (x_test, y_test))

### CNN Classifier Model

Create a basic CNN (Convolutional Neural Network) based classifier from scratch.

Notice the split of total parameters (\~15 million) between trainable (\~0.3 million for our classifier) and non-trainable (\~14.7 million for the pre-trained CNN).

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Activation, Dropout, Conv2D, MaxPooling2D

model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same',
                 input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.summary()

In [None]:
batch_size = 128 #32
epochs = 25 #100
learning_rate = 1e-3 #1e-4
decay = 1e-6

In [None]:
from tensorflow.keras.optimizers import RMSprop

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(learning_rate=learning_rate, decay=decay),
              metrics=['accuracy'])

In [None]:
#history = model.fit_generator(train_data,
history = model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test, y_test),
          shuffle=True)

In [None]:
history_plot(history)

In [None]:
# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

In [None]:
prediction_plot(model, (x_test, y_test))

In [None]:
prediction_proba_plot(model, (x_test, y_test))

In [None]:
for n in [l.name for l in model.layers if isinstance(l, keras.layers.Conv2D)][:4]:
    visualize_conv_layer_weights(model, n)

In [None]:
image_index = train_image_plot()
for n in [l.name for l in model.layers if isinstance(l, keras.layers.Conv2D)]:
    visualize_conv_layer_output(model, n, image_index)

In [None]:
for n in [l.name for l in model.layers if isinstance(l, keras.layers.Conv2D)][:4]:
    visualize_conv_layer_response(model, n)

TODO: In progress...

In [None]:
import seaborn as sns

print(history.history.keys())
max(history.history['val_loss']+history.history['loss'])

In [None]:
train_data

TODO: Data Agumentation

TODO: Resize Data to (150, 150, 3)

TODO: Create joined model (combine two models above)

TODO: Mixed Precision
* https://www.tensorflow.org/guide/keras/mixed_precision
* https://developer.nvidia.com/automatic-mixed-precision
* https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html

```python
opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt) 
```

TODO: Multi-GPU Example

In [None]:
# Multi-GPU Example
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Activation, Dropout
from tensorflow.keras.optimizers import RMSprop

batch_size = 2 * 32
epochs = 100
learning_rate = 2 * 2e-5

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    conv_base = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))
    conv_base.trainable = False

    model = Sequential()
    model.add(conv_base)
    model.add(Flatten())
    model.add(Dense(512))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(learning_rate=learning_rate),
              metrics=['acc'])

    history = model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          validation_data=(x_test, y_test),
          shuffle=True)

#conv_base.summary()
#model.summary()

## Speaker Bios

Glendon Holst is a Staff Scientist in the Visualization Core Lab at KAUST (King Abdullah University of Science and Technology) specializing in HPC workflow solutions for deep learning, image processing, and scientific visualization.

Mohsin Ahmed Shaikh is a Computational Scientist in the Supercomputing Core Lab at KAUST (King Abdullah University of Science and Technology) specializing in large scale HPC applications and GPGPU support for users on Ibex (cluster) and Shaheen (supercomputer).  Mohsin holds a PhD in Computational Bioengineering, and a Post Doc, from University of Canterbury, New Zealand.

## References

* https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/
* https://www.cs.toronto.edu/~kriz/cifar.html
* http://yann.lecun.com/exdb/mnist/index.html
* https://towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751 <br/>
  https://towardsdatascience.com/keras-transfer-learning-for-beginners-6c9b8b7143e
  https://machinelearningmastery.com/how-to-improve-performance-with-transfer-learning-for-deep-learning-neural-networks/
* https://arxiv.org/abs/1409.1556
* https://www.kaggle.com/c/digit-recognizer
* https://jupyter-notebook.readthedocs.io/en/stable/
* https://github.com/kaust-vislab/handson-ml2
* https://keras.io/examples/cifar10_cnn/

# TESTING / In-Progress

In [None]:
#x0 = tf.get_variable('x0', shape=(), dtype=tf.float32)
#x1 = tf.constant(3.)
#x = x0 + x1
x = tf.constant(3.0)
y = tf.constant(4.0)

img0 = tf.Variable(tf.zeros((1, 32, 32, 3)), trainable=True)
img = tf.Variable(tf.random.uniform((1, 32, 32, 3)) * 20 + 128.0, trainable=True)

gtape = tf.GradientTape(persistent=True)
with gtape as tape:
    tape.watch([img0, img, m.input, l_out, m.output, model_bp.input, model_bp.output])
    r0 = m.call(img0)
    print('img0', K.mean(img0), type(img0), img0.shape, img0.dtype)
    print('r0', K.mean(r0), type(r0), r0.shape, r0.dtype)
    print('l_out', K.mean(l_out), type(l_out), l_out.shape, l_out.dtype)
    r1 = m.call(img)
    print('img', K.mean(img), type(img), img.shape, img.dtype)
    print('r1', K.mean(r1), type(r1), r1.shape, r1.dtype)
    print('l_out', K.mean(l_out), type(l_out), l_out.shape, l_out.dtype)
    r2 = model_bp(img)
    print('r2', K.mean(r2), type(r2), r2.shape, r2.dtype)
    print('l_out', K.mean(l_out), type(l_out), l_out.shape, l_out.dtype)
    #m.input(img)
    #m.layers[0](img)
    loss = K.mean(l_out[0, :, :, 0])
    #r = m.predict(img)
    z = img + img


gradzxy = tape.gradient(l_out, img)
print("gradzxy", gradzxy)

loss = K.mean(l_out[0, :, :, 0])
#grads = tape.gradient(loss, m.input)
#grads = tape.gradient(l_out, m.input)
print("m.input:", m.input[0,:,:,:])
print("loss:", loss, loss.shape, loss.dtype)
#print("grads", grads)
#gtape.gradient?
#conv_base_bp.trainable_variables
#conv_base.variables
print(m.layers[0])
print(m.input)
m.layers[0](img)
l_out[0, :, :, 0]
loss

* https://stackoverflow.com/questions/58322147/how-to-generate-cnn-heatmaps-using-built-in-keras-in-tf2-0-tf-keras
* https://gist.github.com/haimat/10a53ad9675f8f5ac1290f06c3e4f973

In [None]:
from tensorflow.keras.applications.vgg16 import preprocess_input
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.models import load_model

from tensorflow.keras import preprocessing
from tensorflow.keras import backend as K
from tensorflow.keras import models

import tensorflow as tf
import numpy as np

image_size = 32

# Load pre-trained Keras model and the image to classify
#model = tf.keras.applications.vgg16.VGG16()
model = VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))

img_tensor = tf.Variable(tf.random.uniform((1, 32, 32, 3)) * 20 + 128.0, trainable=True)
print(type(img_tensor), img_tensor.shape, img_tensor.dtype)

conv_layer = model.get_layer("block5_conv3")
heatmap_model = models.Model([model.inputs], [conv_layer.output, model.output])

# Get gradient of the winner class w.r.t. the output of the (last) conv. layer
with tf.GradientTape() as gtape:
    conv_output, predictions = heatmap_model(img_tensor)
    print(type(conv_output), type(predictions))
    print(conv_output.shape, predictions.shape, np.argmax(predictions[0]))
    loss = predictions[:, :, :, np.argmax(predictions[0])]
    grads = gtape.gradient(loss, conv_output)
    #grads = gtape.gradient(loss, img_tensor)
    #grads = gtape.gradient(conv_output, img_tensor)
    pooled_grads = K.mean(grads, axis=(0, 1, 2))

print("grads", grads)

heatmap = tf.reduce_mean(tf.multiply(pooled_grads, conv_output), axis=-1)
heatmap = np.maximum(heatmap, 0)
max_heat = np.max(heatmap)
if max_heat == 0:
    max_heat = 1e-10
heatmap /= max_heat

print(heatmap.shape)
print(heatmap)

In [None]:
def process_image(x):
    # Normalizes the tensor: centers on 0, ensures that std is 0.1 Clips to [0, 1]
    x -= x.mean()
    x /= (x.std() + epsilon)
    x *= 0.1
    x += 0.5
    x = np.clip(x, 0, 1)
    x *= 255
    x = np.clip(x, 0, 255).astype('uint8')
    return x

In [None]:
from tensorflow.keras.applications import VGG16
from tensorflow.keras.applications import vgg16
from tensorflow.keras import preprocessing
from tensorflow.keras import backend as K
from tensorflow.keras import optimizers
from tensorflow.keras import layers
from tensorflow.keras import models

import tensorflow as tf
import numpy as np

def _watch_layer(layer, tape):
    def decorator(func):
        def wrapper(*args, **kwargs):
            # Store the result of `layer.call` internally.
            layer.result = func(*args, **kwargs)
            # From this point onwards, watch this tensor.
            tape.watch(layer.result)
            # Return the result to continue with the forward pass.
            return layer.result

        return wrapper

    layer.call = decorator(layer.call)
    return layer


def _deprocess_tensor_to_image(tensor):
    # Normalize the tensor: centers on 0, ensures that std is 0.1
    tensor -= tensor.mean()
    tensor /= (tensor.std()) + 1e-5
    tensor *= 0.1

    # Clip to [0,1]
    tensor += 0.5
    tensor = np.clip(tensor, 0, 1)

    # Converts to an RGB array
    tensor *= 255
    tensor = np.clip(tensor, 0, 255).astype("uint8")

    return tensor


image_size = 150
image_path = "/tmp/images/test-image.jpg"

# Create Keras model from pre-trained VGG16 and custom classifier
input_layer = layers.Input(shape=(image_size, image_size, 3), name="model_input")
vgg16_model = VGG16(weights="imagenet", include_top=False, input_tensor=input_layer)
model_head = vgg16_model.output
model_head = layers.Flatten(name="model_head_flatten")(model_head)
model_head = layers.Dense(256, activation="relu")(model_head)
model_head = layers.Dense(3, activation="softmax")(model_head)
model = models.Model(inputs=input_layer, outputs=model_head)
model.compile(loss="categorical_crossentropy", optimizer=optimizers.Adam(), metrics=["accuracy"])

# Load image to classify
image = preprocessing.image.load_img(image_path, target_size=(image_size, image_size))
img_tensor = preprocessing.image.img_to_array(image)
img_tensor = np.expand_dims(img_tensor, axis=0)
img_tensor = vgg16.preprocess_input(img_tensor)

# Get the gradient of the winner class with regard to the output of the (last) conv. layer
conv_layer = model.get_layer("block5_conv3")
with tf.GradientTape() as gtape:
    _watch_layer(conv_layer, gtape)
    preds = model.predict(img_tensor)
    model_prediction = model.output[:, np.argmax(preds[0])]
    grads = gtape.gradient(model_prediction, conv_layer.output)

pooled_grads = K.mean(grads, axis=(0, 1, 2))

# Get values of pooled grads and model conv. layer output as Numpy arrays
iterate = K.function([input_layer], [pooled_grads, conv_layer.output[0]])
pooled_grads_value, conv_layer_output_value = iterate([img_tensor])

# Multiply each channel in the feature-map array by "how important this channel is"
for i in range(pooled_grads_value.shape[0]):
    conv_layer_output_value[:, :, i] *= pooled_grads_value[i]

# The channel-wise mean of the resulting feature map is the heatmap of the class activation
heatmap = np.mean(conv_layer_output_value, axis=-1)
heatmap = np.maximum(heatmap, 0)
max_heat = np.max(heatmap)
if max_heat == 0:
    max_heat = 1e-10
heatmap /= max_heat

'''
# Load image via CV2
image = cv2.imread(image_path)

# Resize heatmap to original image size, normalize it, convert to RGB, apply color map
heatmap = cv2.resize(heatmap, (image.shape[1], image.shape[0]))
heatmap = cv2.normalize(heatmap, heatmap, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)
heatmap = cv2.applyColorMap(np.uint8(255 * (255 - heatmap)), cv2.COLORMAP_JET)
'''