# Overview

In this tutorial we will explore best practices for building modern CNN models, including recommendations for baseline (simple but robust) design choices as well as advanced motifs including separable convolutions in combination with alternative normalization and activation functions. In addition we will examine strategies for tracking model performance across a variety of network architecture and hyperparameter configurations. As a representative use case, we will build various convolutional neural networks (CNNs) for classification of pneumonia (lung infection) from chest radiographs, the most common imaging modality used to screen for pulmonary disease. 

## Workshop Links

This tutorial focuses on specific considerations related network architecture and hyperparameter tuning. For more detailed information on topics covered in this notebook, consider the following:

* Introduction to TensorFlow 2 and Keras: https://bit.ly/2VSYaop
* CNN for pneumonia classification: https://bit.ly/2D9ZBrX
* CNN for pneumonia segmentation: https://bit.ly/2VQMWk9

Other useful tutorials can be found at this link: https://github.com/peterchang77/dl_tutor/tree/master/workshops

# Environment

The following lines of code will configure your Google Colab environment for this tutorial.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [None]:
# --- Install Jarvis library
%pip install jarvis-md

### Imports

Use the following lines to import any needed libraries:

In [None]:
import numpy as np, pandas as pd
import tensorflow as tf
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers, metrics
from jarvis.train import datasets, params
from jarvis.utils.display import imshow

# Data

The data used in this tutorial will consist of (frontal projection) chest radiographs from a subset of the RSNA / Kaggle pneumonia challenge (https://www.kaggle.com/c/rsna-pneumonia-detection-challenge). From the complete cohort, a random subset of 1,000 exams will be used for training and evaluation.

### Download

The custom `datasets.download(...)` method can be used to download a local copy of the dataset. By default the dataset will be archived at `/data/raw/xr_pna`; as needed an alternate location may be specified using `datasets.download(name=..., path=...)`. 

In [None]:
# --- Download dataset
datasets.download(name='xr/pna-512')

### Python generators

Once the dataset is downloaded locally, Python generators to iterate through the dataset can be easily prepared using the `datasets.prepare(...)` method:

In [None]:
# --- Prepare generators
gen_train, gen_valid, client = datasets.prepare(name='xr/pna-512', keyword='cls-512')

The created generators, `gen_train` and `gen_valid`, are designed to yield two variables per iteration: `xs` and `ys`. Both `xs` and `ys` each represent a dictionary of NumPy arrays containing model input(s) and output(s) for a single *batch* of training. The use of Python generators provides a generic interface for data input for a number of machine learning libraries including TensorFlow 2 / Keras.

Note that any valid Python iterable method can be used to loop through the generators indefinitely. For example the Python built-in `next(...)` method will yield the next batch of data:

In [None]:
# --- Yield one example
xs, ys = next(gen_train)

### Data exploration

To help facilitate algorithm design, each original chest radiograph has been resampled to a uniform `(512, 512)` matrix. Overall, the dataset comprises a total of `1,000` 2D images: a total of `500` negaative exams and `500` positive exams.

### `xs` dictionary

The `xs` dictionary contains a single batch of model inputs:

1. `dat`: input chest radiograph resampled to `(1, 512, 512, 1)` matrix shape

In [None]:
# --- Print keys 
for key, arr in xs.items():
    print('xs key: {} | shape = {}'.format(key.ljust(8), arr.shape))

### `ys` dictionary

The `ys` dictionary contains a single batch of model outputs:

1. `pna`: binary classification of pneumonia vs. not pneumonia chest radiographs

* 0 = negative
* 1 = positive of pneumonia

In [None]:
# --- Print keys 
for key, arr in ys.items():
    print('ys key: {} | shape = {}'.format(key.ljust(8), arr.shape))

### Visualization

Use the following lines of code to visualize a single input image using the `imshow(...)` method:

In [None]:
# --- Show labels
xs, ys = next(gen_train)
imshow(xs['dat'][0])

Use the following lines of code to visualize an N x N mosaic of all images in the current batch using the `imshow(...)` method:

In [None]:
# --- Show "montage" of all images
imshow(xs['dat'], figsize=(12, 12))

### Model inputs

For every input in `xs`, a corresponding `Input(...)` variable can be created and returned in a `inputs` dictionary for ease of model development:

In [None]:
# --- Create model inputs
inputs = client.get_inputs(Input)

In this example, the equivalent Python code to generate `inputs` would be:

```python
inputs = {}
inputs['dat'] = Input(shape=(1, 512, 512, 1))
```

# Hyperparameters

In this tutorial, all model hyperparameters are maintained in a CSV file and organized such that each column represents a single hyperparameter and each row represents a unique combination of hyperparameter options. This strategy helps to record an archive of previous experiments and to improve the modularity and readibility of code. Note that in a realistic workflow, a CSV file may be created and manipulated directly (either in a Jupyter notebook or other editor) however in this tutorial, the CSV file will be generated programatically using Pandas. 

In [None]:
def create_hyperparameters(csv='./hyper.csv'):
    """
    Method to create CSV hyperparameter file
    
    """
    # --- Define hyperparameters
    p = {
        'name': ['exp-01', 'exp-02', 'exp-03', 'exp-04'],
        'filters': [8, 8, 8, 16],
        'n_blocks': [3, 4, 5, 5]}

    # --- Create Pandas DataFrame
    df = pd.DataFrame(p)

    # --- Create CSV file
    df.to_csv(csv, index=False)

To create our CSV file:

In [None]:
# --- Create hyperparameters
create_hyperparameters()

Once prepared, the `params` module as part of the `jarvis-md` library will be used to read each row of hyperparameters into a Python dictionary which may referenced as part of the model building code in subsequent sections.

In [None]:
# --- Load hyperparameters
p = params.load('./hyper.csv', row=0)

# Creating Model

In this section, we will define a template neural network architecture that dynamically references the hyperparameters defined in the `hyper.csv` file. Using this strategy, modifications to the network topology and training hyperparameters may be propogated through modification of the `hyper.csv` file only.

## Standard CNN

Based on historic best practices, the following design choices are recommended for a simple, baseline CNN approach:

* 3x3 convolutional kernel size
* batch normalization (after convolution and before nonlinearity)
* ReLU (or leaky ReLU) activation function
* stride-2 convolutions for subsampling

The following code block creates nested lambda function to quickly implement CNN models using these strategies:

In [None]:
def create_blocks():
    """
    Method to define simple stride-1 and stride-2 convolutional blocks
    
      "block" = conv > norm > relu
      
    """
    # --- Define kwargs dictionary
    kwargs = {
        'kernel_size': (1, 3, 3),
        'padding': 'same'}

    # --- Define lambda functions
    conv = lambda x, filters, strides : layers.Conv3D(filters=filters, strides=strides, **kwargs)(x)
    norm = lambda x : layers.BatchNormalization()(x)
    relu = lambda x : layers.ReLU()(x)

    # --- Define stride-1, stride-2 blocks
    conv1 = lambda filters, x : relu(norm(conv(x, filters, strides=1)))
    conv2 = lambda filters, x : relu(norm(conv(x, filters, strides=(1, 2, 2))))
    
    return conv1, conv2

## Advanced CNN

In the past few years, several key advances in design choice have become popular in most state-of-the-art designs. These include:

* separable convolutions (depthwise and pointwise operations)
* layer normalization (or group normalization)
* Gaussian error linear unit (GeLU) activation

Many additional recent design advances are summarized well in the ConvNeXt (2022) paper: https://arxiv.org/pdf/2201.03545.pdf.

Note that while these design choices may yield small incremental gains, a proportional increase in dataset size is commonly needed to maximize more complex approaches. 

The following code block creates nested lambda function to quickly implement CNN models using these strategies:

In [None]:
def create_blocks():
    """
    Method to define advanced stride-1 and stride-2 convolutional blocks
    
      "block" = conv > norm > gelu
      
    """
    # --- Define kwargs dictionary
    kwargs_point = {
        'kernel_size': 1,
        'padding': 'same',
        'strides': 1}
    
    kwargs_depth = {
        'kernel_size': (1, 3, 3),
        'padding': 'same'}

    # --- Define separable conv as depthwise + pointwise convolutions
    conv = lambda x, filters, strides : layers.Conv3D(filters=filters, **kwargs_point)(
                                        layers.Conv3D(filters=x.shape[-1], strides=strides, groups=x.shape[-1], **kwargs_depth)(x))
    
    # --- Define lambda functions
    norm = lambda x : layers.LayerNormalization()(x)
    gelu = lambda x : tf.nn.gelu(x)

    # --- Define stride-1, stride-2 blocks
    conv1 = lambda filters, x : gelu(norm(conv(x, filters, strides=1)))
    conv2 = lambda filters, x : gelu(norm(conv(x, filters, strides=(1, 2, 2))))
    
    return conv1, conv2

## Creating Layers

In this tutorial, we will explore the following key variations in network topology:

* total number of feature maps (channels) for each convolutional operation
* total number of convolutional blocks

To define the model, we will use a for-loop to create a series of stride-2 and stride-1 convolutional blocks spanning a total of `n_blocks` repeats. After each subsampling operation (stride-2 convolution), the total number of features is scaled linearly based on the `filters` variable. 

After a series of convolutional blocks, a flatten operation is used to convert high dimensional feature maps into a single dimensional feature vector (note that you may alternatively implement a global pooling operation here as well). At this time, a single hidden layer is defined using a dense matrix multiplication and ReLU nonlinearity. The final logit scores should be implemented using a two-element projection operation (non-activated matrix multiplication). 

The following code block will flexibly define a CNN model using the hyperparameters defined above:

In [None]:
def create_layers(x, p, hidden_size=64):
    """
    Method to create model layers based on hyperparameters defined in p
    
    """
    # --- Create lambda functions for creating blocks
    conv1, conv2 = create_blocks()
    
    # --- Create lambda function of extracting last layer
    last = lambda blocks : list(blocks.values())[-1]
    
    # --- Create first conv layer
    blocks = {}
    blocks['l0'] = conv1(p['filters'], x)
    
    # --- Create remaining conv layers
    for i in range(p['n_blocks']):
        layer_key = 'l{}'.format(i + 1)
        n_filters = p['filters'] * (i + 2)
        blocks[layer_key] = conv1(n_filters, conv2(n_filters, last(blocks)))
    
    # --- Create hidden layer
    blocks['f0'] = layers.Flatten()(last(blocks))
    blocks['h0'] = layers.Dense(hidden_size, activation='relu')(blocks['f0'])
    
    # --- Create final logit scores
    blocks['pna'] = layers.Dense(2, name='pna')(blocks['h0'])

    return blocks

Let us test the code block here:

In [None]:
blocks = create_layers(x=inputs['dat'], p=p)

# Model

Putting everything together, use the following cell to create and compile a convolutional neural network corresponding the target `row` of hyperparameter values. By default, the following initial configurations are good baseline values for training hyperparameters:

* Optimizer: Adam
* Loss: softmax cross-entropy
* Learning rate: 2e-4

In [None]:
# --- Load hyperparameters
p = params.load('./hyper.csv', row=0)

# --- Define blocks
blocks = create_layers(x=inputs['dat'], p=p)

# --- Create model
model = Model(inputs=inputs, outputs=blocks['pna'])

# --- Compile model
model.compile(
    optimizer=optimizers.Adam(learning_rate=2e-4), 
    loss={'pna': losses.SparseCategoricalCrossentropy(from_logits=True)}, 
    metrics={'pna': 'sparse_categorical_accuracy'})

To check the properties of the created model object, use the `model.summary()` method:

In [None]:
# --- Print model summary
model.summary()

# Model Training

### In-Memory Data

The following line of code will load all training data into RAM memory. This strategy can be effective for increasing speed of training for small to medium-sized datasets.

In [None]:
# --- Load data into memory
client.load_data_in_memory()

### Tensorboard

To use Tensorboard, create the necessary Keras callbacks:

In [None]:
from tensorflow.keras import callbacks  
tensorboard_callback = callbacks.TensorBoard('./logs/{}'.format(p['name']))

### Training

Once the model has been compiled and the data prepared (via a generator), training can be invoked using the `model.fit(...)` method. Ensure that both the training and validation data generators are used. In this particular example, we are defining arbitrary epochs of 50 steps each. Training will proceed for 20 epochs in total. Validation statistics will be assess every fifth epoch. As needed, tune these arugments as need.

In [None]:
model.fit(
    x=gen_train, 
    steps_per_epoch=50, 
    epochs=20,
    validation_data=gen_valid,
    validation_steps=50,
    validation_freq=5,
    callbacks=[tensorboard_callback]
)

### Launching Tensorboard

After running several iterations, start Tensorboard using the following cells. After Tensorboard has registered the first several checkpoints, subsequent data will be updated automatically (asynchronously) and model training can be resumed:

In [None]:
%load_ext tensorboard
%tensorboard --logdir logs

## Saving and Loading a Model

After a model has been successfully trained, it can be saved and/or loaded by simply using the `model.save()` and `models.load_model()` methods. 

In [None]:
# --- Serialize a model
model.save('./model.hdf5')

In [None]:
# --- Load a serialized model
del model
model = models.load_model('./model.hdf5', compile=False)