# Overview

In this tutorial we will explore several strategies to mitigate the effect of CNN overfitting on small size data cohorts. Among the strategies covered in this tutorial, topics will include:

* use of multiple loss functions
* minimizing network parameters
* network regularizers (e.g. dropout, L2 regularization)

Ultimately, the goal of this tutorial (and class assignment) is to create a global classifier to detect pulmonary infection (pneumonia) on chest radiographs. 

This tutorial is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found at: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

# Google Colab

The following lines of code will configure your Google Colab environment for this tutorial.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

### Mount Google Drive

The Google Colab environment is transient and will reset after any prolonged break in activity. To retain important and/or large files between sessions, use the following lines of code to mount your personal Google drive to this Colab instance:

In [None]:
try:
    # --- Mount gdrive to /content/drive/My Drive/
    from google.colab import drive
    drive.mount('/content/drive')
    
except: pass

Throughout this tutorial we will use the following global `MOUNT_ROOT` variable to reference a location to store long-term data. If you are using a local Jupyter server and/or wish to store your data elsewhere, please update this variable now.

In [None]:
# --- Set data directory
MOUNT_ROOT = '/content/drive/My Drive'

### Select Tensorflow library version

This tutorial will use the Tensorflow 2.1 library. Use the following line of code to select and download this specific version:

In [None]:
# --- Select Tensorflow 2.x (only in Google Colab)
% tensorflow_version 2.x
% pip install tensorflow-gpu==2.1

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [None]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md

### Imports

Use the following lines to import any additional needed libraries:

In [None]:
import numpy as np, pandas as pd
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers, metrics
from jarvis.train import datasets, custom
from jarvis.train.client import Client
from jarvis.utils.general import overload, tools as jtools
from jarvis.utils.display import imshow

# Data

The data used in this tutorial will consist of (frontal projection) chest radiographs from the RSNA / Kaggle pneumonia challenge (https://www.kaggle.com/c/rsna-pneumonia-detection-challenge). The chest radiograph is the standard screening exam of choice to identify and trend changes in lung disease including infection (pneumonia). To simulate the problem of small dataset size, only 100 exams will be used for training (50 normal, 50 positive). A separate 100 exams will be used for independent testing.

The custom `datasets.download(...)` method can be used to download a local copy of the dataset subcohort. By default the dataset will be archived at `/data/raw/xr_pna`; as needed an alternate location may be specified using `datasets.download(name=..., path=...)`. 

In [None]:
# --- Download dataset
datasets.download(name='xr/pna-mul')

Once downloaded, the `datasets.prepare(...)` method can be used to generate the required python Generators to iterate through the dataset, as well as a `client` object for any needed advanced functionality. As needed, pass any custom configurations (e.g. batch size, normalization parameters, etc) into the optional `configs` dictionary argument. 

In [None]:
# --- Prepare generators
configs = {'batch': {'size': 8}}
gen_train, gen_valid, client = datasets.prepare(name='xr/pna-mul', configs=configs, keyword='mul')

The created generators yield a total of `n` training samples based on the specified batch size. As before, each iteration yields two variables, `xs` and `ys`, each representing a dictionary of model input(s) and output(s). Compared to prior tutorials with just a single input and output, there are two separate inputs in the `xs` dictionary *as well as* two seperate outputs in the `ys` dictionary. 

For the `xs` dictionary, similar to prior tutorials, an extra array named `msk` is available to mask (and/or weight) the loss function. For the `ys` dictionary, the two outputs provided correspond to the two seperate (classifier and segmentation) loss functions that will be used to train this algorithm.

Let us examine the generator data:

In [None]:
# --- Yield one example
xs, ys = next(gen_train)

# --- Print dict keys
print('xs keys: {}'.format(xs.keys()))
print('ys keys: {}'.format(ys.keys()))

In [None]:
# --- Print data shape
print('xs shape: {}'.format(xs['dat'].shape))
print('xs shape: {}'.format(xs['msk'].shape))
print('ys shape: {}'.format(ys['pna-cls'].shape))
print('ys shape: {}'.format(ys['pna-seg'].shape))

Use the following lines of code to visualize both the image data and corresponding mask label using the `imshow(...)` method:

In [None]:
# --- Show the first example, msk
xs, ys = next(gen_train)
imshow(xs['dat'][0], xs['msk'][0], radius=3)

In [None]:
# --- Show the first example, pna
xs, ys = next(gen_train)
imshow(xs['dat'][0], ys['pna-seg'][0], radius=3)

Use the `imshow(...)` function to create an N x N mosaic of all images:

In [None]:
# --- Show "montage" of all images, msk
imshow(xs['dat'], xs['msk'], figsize=(12, 12), radius=3)

In [None]:
# --- Show "montage" of all images, pna
imshow(xs['dat'], ys['pna-seg'], figsize=(12, 12), radius=3)

### Model inputs

For every input in `xs`, a corresponding `Input(...)` variable can be created and returned in a `inputs` dictionary for ease of model development:

In [None]:
# --- Create model inputs
inputs = client.get_inputs(Input)

print(inputs.keys())
print(inputs['dat'].shape)
print(inputs['msk'].shape)

In this example, the equivalent Python code to generate `inputs` would be:

```python
inputs = {}
inputs['dat'] = Input(shape=(1, 512, 512, 1))
inputs['msk'] = Input(shape=(1, 512, 512, 1))
```

### 3D operations

Note that the model input shapes for this exercise will be provided as 3D tensors. Even if your current model does not require 3D data (as in this current tutorial), all 2D tensors can be represented by a 3D tensor with a z-axis shape of 1. In addition, designing all models with this configuration (e.g. 3D operations) ensures that minimal code changes are needed when testing various 2D and 3D network architectures. 

# Model (single loss)

As a baseline, let us first develop a standard CNN architecture for binary classification. This network will serve as a baseline for performance (upon which the goal will be to further optimize in downstream multiple loss function architectures). To do so, we will utilize standard approaches and techniques discussed in the first few weeks of this class.

### Create model

In [None]:
# --- Define kwargs dictionary
kwargs = {
    'kernel_size': (1, 3, 3),
    'padding': 'same'}

# --- Define lambda functions
conv = lambda x, filters, strides : layers.Conv3D(filters=filters, strides=strides, **kwargs)(x)
norm = lambda x : layers.BatchNormalization()(x)
relu = lambda x : layers.LeakyReLU()(x)
tran = lambda x, filters, strides : layers.Conv3DTranspose(filters=filters, strides=strides, **kwargs)(x)

# --- Define stride-1, stride-2 blocks
conv1 = lambda filters, x : relu(norm(conv(x, filters, strides=1)))
conv2 = lambda filters, x : relu(norm(conv(x, filters, strides=(1, 2, 2))))
tran2 = lambda filters, x : relu(norm(tran(x, filters, strides=(1, 2, 2))))

Define a standard contracting network architecture with alternative stride-1 and stride-2 convolutions:

In [None]:
# --- Define contracting layers
l1 = conv1(8, inputs['dat'])
l2 = conv1(16, conv2(16, l1))
l3 = conv1(32, conv2(32, l2))
l4 = conv1(48, conv2(48, l3))
l5 = conv1(64, conv2(64, l4))
l6 = conv1(80, conv2(80, l5))

**Checkpoint**: what is the size of the final `l6` feature map?

To convert this feature map (CNN) into a feature vector (MLP), we will utilize a simple flatten / reshape operation, which will then be projected directly to yield a two-element logit score for binary prediction. Note that while additional matrix multiply operations may typically be performed at this point in the network architecture, to minimize total number of model parameters, no such operations will be performed here. 

In [None]:
# --- Flatten / reshape
c1 = layers.Reshape((-1, 1, 1, 16 * 16 * 80))(l6)

# --- Create logits
logits = {}
logits['pna-cls'] = layers.Conv3D(filters=2, kernel_size=(1, 1, 1), name='pna-cls')(c1)

Use the following cell to create the model:

In [None]:
# --- Create model
model = Model(inputs=inputs, outputs=logits) 

### Compile model

To compile this model, we will use a standard **softmax cross-entropy** loss function. Recall that a **sparse** variant of this loss function can be invoked if all prediction classes are mutually exclusive. Also recall that for numeric stability, we will pass the raw logit scores directly into the loss function (as opposed to manually performing a softmax normalization in a separate step). Similarly, for model performance, the overall model accuracy (% correct prediction) will be monitored,

In [None]:
# --- Compile the model
model.compile(
    optimizer=optimizers.Adam(learning_rate=2e-4),
    loss={'pna-cls': losses.SparseCategoricalCrossentropy(from_logits=True)},
    metrics={'pna-cls': metrics.SparseCategoricalAccuracy()},
    experimental_run_tf_function=False)

# Training

### In-memory data

For moderate sized datasets which are too large to fit into immediate hard-drive cache, but small enough to fit into RAM memory, it is often times a good idea to first load all training data into RAM memory for increased speed of training. The `client` can be used for this purpose as follows:

In [None]:
# --- Load data into memory for faster training
client.load_data_in_memory()

### Tensorboard

To use Tensorboard, create the necessary Keras callbacks:

In [None]:
from tensorflow.keras import callbacks  
tensorboard_callback = callbacks.TensorBoard('./logs')

Now, let us train the model:

In [None]:
# --- Train model
model.fit(
    x=gen_train, 
    steps_per_epoch=100, 
    epochs=6,
    validation_data=gen_valid,
    validation_steps=100,
    validation_freq=2,
    use_multiprocessing=True,
    callbacks=[tensorboard_callback])

### Launching Tensorboard

After running several iterations, start Tensorboard using the following cells. After Tensorboard has registered the first several checkpoints, subsequent data will be updated automatically (asynchronously) and model training can be resumed:

In [None]:
% load_ext tensorboard
% tensorboard --logdir logs

# Model (dual loss)

To help regularize the network, a second (or multiple additional) loss functions may be used. In this particular example, in addition to classification loss, segmentation masks of infection are also available (as are ground-truth lung masks). To synthesize both labels into a single network, consider that a standard classifier and segmentation (contracting-expanding) network essentially share the same identical *backbone*---a series of contracting convolutional operations.

### Create Client

Building upon the prior work with masked loss functions, the segmentation loss in this tutorial will utilize a similar approach. Specifically the segmentation loss will be masked in areas of the image that do not contain lung or infection (e.g. the loss for these non-important areas will be manually *masked* to zero).

In [None]:
@overload(Client)
def preprocess(self, arrays, **kwargs):
    """
    Method to create a custom msk array for class weights and/or masks
    
    """
    # --- Create msk
    msk = np.zeros(arrays['xs']['dat'].shape)

    lng = arrays['xs']['msk']
    pna = arrays['ys']['pna-seg']

    msk[lng > 0] = 1
    msk[pna > 0] = 1

    arrays['xs']['msk'] = msk
    
    return arrays

After creating your custom `Client`, create the required `client` object and generators:

In [None]:
# --- Find client yml file
yml = '{}/data/ymls/client-mul.yml'.format(jtools.get_paths('xr/pna')['code'])

# --- Manually create Client
client = Client(yml)

# --- Manually create generators
gen_train, gen_valid = client.create_generators()

### Create model

To create the model, start by creating a standard contracting network architecture with alternative stride-1 and stride-2 convolutions (identical to the network above):

In [None]:
# --- Define contracting layers
l1 = conv1(8, inputs['dat'])
l2 = conv1(16, conv2(16, l1))
l3 = conv1(32, conv2(32, l2))
l4 = conv1(48, conv2(48, l3))
l5 = conv1(64, conv2(64, l4))
l6 = conv1(80, conv2(80, l5))

Next, continue by creating a standard expanding network architecture with alternative stride-1 and convolutional transpose operations:

In [None]:
# --- Define expanding layers
l7  = tran2(64, l6)
l8  = tran2(48, conv1(64, l7  + l5))
l9  = tran2(32, conv1(48, l8  + l4))
l10 = tran2(16, conv1(32, l9  + l3))
l11 = tran2(8,  conv1(16, l10 + l2))
l12 = conv1(8,  conv1(8,  l11 + l1))

At this point, the `l12` feature map can be projected into logit scores for a segmentation loss function. However we still need to create an appropriate feature vector for classification. 

**Checkpoint**: which layer (`l1` to `l12`) should be used to continue building the classifier arm of this network?

In [None]:
# --- Create classifier feature vector
c1 = layers.Reshape((-1, 1, 1, 16 * 16 * 80))(l6)

Excellent. At this point, both the `l12` and `c1` layers are ready for creation of logit scores:

In [None]:
# --- Create logits
logits = {}
logits['pna-seg'] = layers.Conv3D(filters=2, name='pna-seg', **kwargs)(l12)
logits['pna-cls'] = layers.Conv3D(filters=2, kernel_size=(1, 1, 1), name='pna-cls')(c1)

Use the following cell to create the (dual loss) model:

In [None]:
# --- Create model
model = Model(inputs=inputs, outputs=logits) 

### Compile model

To compile this model, in addition to a standard **softmax cross-entropy** loss function (for classification), we will also use a **weighted** pixel-wise softmax cross-entropy loss for segmentation. Similar to track segmentation performance, we will use a standard Dice score metric. Both functions are available as custom losses and metrics as part of the `jarvis` library. Please refer to earlier tutorials on segmentation for further information.

In [None]:
# --- Compile the model
model.compile(
    optimizer=optimizers.Adam(learning_rate=2e-4),
    loss={
        'pna-seg': custom.sce(inputs['msk']),
        'pna-cls': losses.SparseCategoricalCrossentropy(from_logits=True)},
    metrics={
        'pna-seg': custom.dsc(weights=inputs['msk']),
        'pna-cls': metrics.SparseCategoricalAccuracy()
        },
    experimental_run_tf_function=False)

# Training

### In-memory data

For moderate sized datasets which are too large to fit into immediate hard-drive cache, but small enough to fit into RAM memory, it is often times a good idea to first load all training data into RAM memory for increased speed of training. The `client` can be used for this purpose as follows:

In [None]:
# --- Load data into memory for faster training
client.load_data_in_memory()

### Tensorboard

To use Tensorboard, create the necessary Keras callbacks:

In [None]:
from tensorflow.keras import callbacks  
tensorboard_callback = callbacks.TensorBoard('./logs')

Now, let us train the model:

In [None]:
# --- Train model
model.fit(
    x=gen_train, 
    steps_per_epoch=100, 
    epochs=6,
    validation_data=gen_valid,
    validation_steps=100,
    validation_freq=2,
    use_multiprocessing=True,
    callbacks=[tensorboard_callback])

### Launching Tensorboard

After running several iterations, start Tensorboard using the following cells. After Tensorboard has registered the first several checkpoints, subsequent data will be updated automatically (asynchronously) and model training can be resumed:

In [None]:
% load_ext tensorboard
% tensorboard --logdir logs

# Evaluation

To test the trained model, the following steps are required:

* load data
* use `model.predict(...)` to obtain logit scores
* compare prediction with ground-truth (accuracy)
* serialize in Pandas DataFrame

Recall that the generator used to train the model simply iterates through the dataset randomly. For model evaluation, the cohort must instead be loaded manually in an orderly way. For this tutorial, we will create new **test mode** data generators, which will simply load each example individually once for testing. 

In [None]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True)

To run prediction on a single (first) example from the generator:

In [None]:
# --- Run a single prediction
x, y = next(test_valid)
logits = model.predict(x)

Given that two separate loss functions are used, two separate logit scores are generated:

In [None]:
# --- Print logit information
print(len(logits))
print(logits[0].shape)
print(logits[1].shape)

In this tutorial, the overall classification accuracy is most important (`logits[0]`). To calculate prediction from raw logit score, use the `np.argmax(...)` function:

In [None]:
# --- Print prediction
pred = np.argmax(logits[0], axis=-1)
print('Pred: {}'.format(pred.squeeze()))

# --- Print ground-truth
print('True: {}'.format(y['pna-cls'].squeeze()))

### Running evaluation

In [None]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True)
accuracy = []

for x, y in test_valid:
    
    # --- Create prediction
    logits = model.predict(x)
    pred = np.argmax(logits[0], axis=-1)
    
    # --- Compare with ground truth
    accuracy.append(pred.squeeze() == y['pna-cls'].squeeze())

accuracy = np.array(accuracy)

### Saving results

In [None]:
# --- Define columns
df = pd.DataFrame(...)
df['accuracy'] = accuracy

## Saving and Loading a Model

After a model has been successfully trained, it can be saved and/or loaded by simply using the `model.save()` and `models.load_model()` methods. 

In [None]:
# --- Serialize a model
model.save('./lesion_characterization.hdf5')

In [None]:
# --- Load a serialized model
del model
model = models.load_model('./lesion_characterization.hdf5', compile=False)

# Exercises

### Exercise 1

The final `l6` layer of the example network architecture above is relatively large (`16 x 16`). By simply reducing the feature map size of your final layer, the overall number of model parameters can be decreased significantly.  

There are multiple ways to decrease the final feature map size. What are examples of several choices? Is there any option that requires **no additional parameters** to be added to the model?

Use the following code cell to experiment collapsing the `16 x 16` feature map into an `8 x 8` feature map (or `4 x 4` feature map): 

In [None]:
# --- Convert L6 from (16 x 16) to (8 x 8) or (4 x 4)

# --- Create feature vector from (8 x 8) or (4 x 4) feature map

*Hint*: to reduce feature maps without any parameters, consider the options for reduction across any network. What original option was popular prior to strided convolutions?

**Checkpoint**: How many less parameters are now used in the model compared to prior?

### Exercise 2

Recall that several popular methods were desecribed earlier in this curriculum to limit overfitting. Two of the most popular techniques include the use of **dropout** and **L2 regularization**. 

To implement dropout, use the `layers.Dropout(rate=...)` object. Use the following code cell to experiment:

*Hint*: which layer(s) are most ideal to use when implementing dropout?

To implement L2 regularlization, specify a `kernel_regularizer` when creating a new layer (e.g. `layers.Conv3D(kernel_regularizer=...)`. Use the following code cell to experiment: