# Overview

In this tutorial we will review key modern CNN architecture motifs and discuss implementation strategies using Tensorflow 2.0 / Keras.

**Modern Architectures**

* residual connection
* bottleneck operation
* Inception module

This tutorial is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found at: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

# Google Colab

The following lines of code will configure your Google Colab environment for this tutorial.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

### Mount Google Drive

The Google Colab environment is transient and will reset after any prolonged break in activity. To retain important and/or large files between sessions, use the following lines of code to mount your personal Google drive to this Colab instance:

In [None]:
try:
    # --- Mount gdrive to /content/drive/My Drive/
    from google.colab import drive
    drive.mount('/content/drive')
    
except: pass

Throughout this tutorial we will use the following global `MOUNT_ROOT` variable to reference a location to store long-term data. If you are using a local Jupyter server and/or wish to store your data elsewhere, please update this variable now.

In [None]:
# --- Set data directory
MOUNT_ROOT = '/content/drive/My Drive'

### Select Tensorflow library version

This tutorial will use the (new) Tensorflow 2.0 library. Use the following line of code to select this updated version:

In [None]:
# --- Select Tensorflow 2.x (only in Google Colab)
% tensorflow_version 2.x

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [None]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md

### Imports

Use the following lines to import any additional needed libraries:

In [None]:
import numpy as np, pandas as pd
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers
from jarvis.train import datasets
from jarvis.utils.display import imshow

# Data

The data used in this tutorial will consist of prostate MRI exams. Each image will consist of one of four different sequences (T2, low b-value DWI, high b-value DWI, ADC). In this initial exercise, the goal is to simply develop an algorith that is capable of differentiating image type so that downstream models for cancer prediction can be used properly. The custom `datasets.download(...)` method can be used to download a local copy of the dataset. By default the dataset will be archived at `/data/raw/mr_prostatex`; as needed an alternate location may be specified using `datasets.download(name=..., path=...)`. 

In [None]:
# --- Download dataset
datasets.download(name='mr/prostatex')

Once downloaded, the `datasets.prepare(...)` method can be used to generate the required python Generators to iterate through the dataset, as well as a `client` object for any needed advanced functionality. As needed, pass any custom configurations (e.g. batch size, normalization parameters, etc) into the optional `configs` dictionary argument.

In [None]:
# --- Prepare generators
configs = {'batch': {'size': 24}}
gen_train, gen_valid, client = datasets.prepare(name='mr/prostatex', configs=configs)

The created generators yield a total of `n` training samples based on the specified batch size. As before, each iteration yields two variables, `xs` and `ys`, each representing a dictionary of model input(s) and output(s). In the current example, there is just a single input and output. Let us examine the generator data:

In [None]:
# --- Yield one example
xs, ys = next(gen_train)

# --- Print dict keys
print('xs keys: {}'.format(xs.keys()))
print('ys keys: {}'.format(ys.keys()))

In [None]:
# --- Print data shape
print('xs shape: {}'.format(xs['dat'].shape))
print('ys shape: {}'.format(ys['class'].shape))

Use the following lines of code to visualize using the `imshow(...)` method:

In [None]:
# --- Show the first example
imshow(xs['dat'][0])

Use the `montage(...)` function to create an N x N mosaic of all images:

In [None]:
# --- Show "montage" of all images
imshow(xs['dat'])

As expected, the 36-element `ys['class']` vector corresponds to ground-truth:

In [None]:
# --- Print ys['digit']
print(ys['class'])

### MRI sequences

As above, the dataset comprises of four different types MRI sequences. Use the following cell to visualize examples of the different sequences:

In [None]:
# --- Select number of images and sequence
KEY = {
    0: 'T2 weighted',
    1: 'low b-value DWI',
    2: 'high b-value DWI',
    3: 'ADC'}

N = 4
sequence = 0

# --- All image labels are stored in client.db.header['sequence']
rows = np.nonzero((client.db.header['sequence'] == sequence).to_numpy())[0]
rows = rows[np.random.permutation(rows.size)][:N]

# --- Load using get
xs = []
for row in rows:
    arrs = client.get(row=row)
    xs.append(arrs['xs']['dat'])

# --- Show
imshow(np.stack(xs), title=KEY[sequence])

### Model inputs

For every input in `xs`, a corresponding `Input(...)` variable can be created and returned in a `inputs` dictionary for ease of model development:

In [None]:
# --- Create model inputs
inputs = client.get_inputs(Input)

print(inputs.keys())
print(inputs['dat'].shape)

In this example, the equivalent Python code to generate `inputs` would be:

```python
inputs = {}
inputs['dat'] = Input(shape=(256, 256, 1))
```

# Convolutional Operations

As in the prior tutorial, let us set up the same lambda functions for CNN definition:

In [None]:
# --- Define kwargs dictionary
kwargs = {
    'kernel_size': (3, 3),
    'padding': 'same'}

# --- Define lambda functions
conv = lambda x, filters, strides : layers.Conv2D(filters=filters, strides=strides, **kwargs)(x)
norm = lambda x : layers.BatchNormalization()(x)
relu = lambda x : layers.LeakyReLU()(x)

# --- Define stride-1, stride-2 blocks
conv1 = lambda filters, x : relu(norm(conv(x, filters, strides=1)))
conv2 = lambda filters, x : relu(norm(conv(x, filters, strides=(2, 2))))

# Residual Layer

Recall the definition of a residual layer:

Implementation is reasonably straightforward as recent versions of Tensorflow / Keras layers can utilize the native Python addition `+` operator:

In [None]:
# --- Define blocks
l1 = conv1(16, inputs['dat'])
l2 = conv1(16, l1)

# --- Define third block with residual connection
l3 = conv1(16, l2) + l1

### Projection

Note that layers can added **only if** the layer sizes match exactly. What happens if the total number of feature maps (e.g. layer depth) is different? The solution is use of the `1 x 1` projection matrix (e.g. convolutional operation without corresponding nonlinearity). Here the third and fourth dimension of the convolutional kernel are designed to match the number of channels in the input and target output tensors, respectively:

```
filter size = I x J x C0 x C1

I  ==> 1
J  ==> 1
C0 ==> # of channels in input tensor
C1 ==> # of channels in output tensor
```

Recall that in Tensorflow, only the output layer channel size needs to be defined (the third channel of the convolutional kernel is inferred based on the input tensor). Consider the following example:

In [None]:
# --- Define blocks
l1 = conv1(16, inputs['dat'])
l2 = conv1(32, l1)
l3 = conv1(32, l2) # + l1 would not work

At this point, `l1` cannot be added since dimensions do not match. Thus consider the following projection operation:

In [None]:
# --- Define projection
proj = lambda filters, x : layers.Conv2D(
    filters=filters, 
    strides=1, 
    kernel_size=(1, 1),
    padding='same')(x)

# --- Define third block with residual connection
l3 = conv1(32, l2) + proj(32, l1)

What about differences not only in channel depth but also feature map size?

In [None]:
# --- Define blocks
l1 = conv1(16, inputs['dat'])
l2 = conv2(32, l1)
l3 = conv1(32, l2) # + l1 would not work

To match the subsample operation, the projection operation now also must strided as well. Given this, it may be useful to increase the `kernel_size` of the project operation as you recall so that all activations are contributing to the output projection tensor:

In [None]:
# --- Define projection
proj = lambda filters, x : layers.Conv2D(
    filters=filters, 
    strides=2, 
    kernel_size=(1, 1),
    padding='same')(x)

# --- Define third block with residual connection
l3 = conv1(32, l2) + proj(32, l1)

### Bottleneck

In addition to creating matching layer sizes, projection matrices can be used to perform bottleneck operations for convolutional efficiency:

In [None]:
# --- Define projection
proj = lambda filters, x : layers.Conv2D(
    filters=filters, 
    strides=1, 
    kernel_size=(1, 1),
    padding='same')(x)

# --- Define standard conv-conv block
l1 = conv1(32, inputs['dat'])
l2 = conv1(32, l1)

# --- Define bottleneck conv-conv block
l1 = conv1(32, inputs['dat'])
l2 = proj(32, conv1(8, proj(8, l1)))

What is the computational efficiency of the bottleneck vs. the standard conv block in this example?

# Inception

Recall the definition of an Inception module:

Let us first implement a naive Inception module without any bottlenecks. Recall that we will need four different paths implemented:

* 1x1 convolution
* 3x3 convolution
* 5x5 convolution
* 3x3 max-pool

Let us define these building blocks with the following lambda functions:

In [None]:
# --- Define lambda functions
conv = lambda x, filters,kernel_size : layers.Conv2D(
    filters=filters, 
    kernel_size=kernel_size, 
    padding='same')(x)

norm = lambda x : layers.BatchNormalization()(x)
relu = lambda x : layers.LeakyReLU()(x)
pool = lambda x : layers.MaxPool2D(pool_size=(3, 3), strides=1, padding='same')(x)

# --- Define 1x1, 3x3 and 5x5 convs
conv1 = lambda filters, x : relu(norm(conv(x, filters, kernel_size=1)))
conv3 = lambda filters, x : relu(norm(conv(x, filters, kernel_size=3)))
conv5 = lambda filters, x : relu(norm(conv(x, filters, kernel_size=5)))
mpool = lambda x : relu(norm(pool(x)))

Note that in the above implementation, max-pooling is used as a standard layer without any subsampling.

Let us now use these lambda functions to create a test Inception module:

In [None]:
# --- Define first layer operation
l1 = conv3(16, inputs['dat'])

# --- Define four different paths
filters = 16
p1 = conv1(filters, l1)
p2 = conv3(filters, l1)
p3 = conv5(filters, l1)
p4 = mpool(l1)

# --- Concatenate
l2 = layers.Concatenate()([p1, p2, p3, p4])

As discussed, naive implementation of the Inception module yields large channel depths over time. To avoid this, use bottleneck operations (as above):

In [None]:
# --- Define first layer operation
l1 = conv3(16, inputs['dat'])

# --- Define four different paths
filters = 4
b1 = proj(filters, l1)

p1 = conv1(filters, b1)
p2 = conv3(filters, b1)
p3 = conv5(filters, b1)
p4 = proj(filters, mpool(l1))

# --- Concatenate
l2 = layers.Concatenate()([p1, p2, p3, p4])

# Model

Let us compile a temporary model for purposes of demontrating evaluation procedure:

In [None]:
# --- Define convolution parameters
kwargs = {
    'kernel_size': (3, 3),
    'padding': 'same',
    'kernel_initializer': 'he_normal'}

# --- Define block components
conv = lambda x, filters, strides : layers.Conv2D(filters=filters, strides=strides, **kwargs)(x)
norm = lambda x : layers.BatchNormalization()(x)
relu = lambda x : layers.ReLU()(x)

# --- Define stride-1, stride-2 blocks
conv1 = lambda filters, x : relu(norm(conv(x, filters, strides=1)))
conv2 = lambda filters, x : relu(norm(conv(x, filters, strides=(2, 2)))) 
fconn = lambda outputs, x : relu(norm(layers.Dense(outputs)(x)))

In [None]:
# --- Define model
l1 = conv1(32, inputs['dat'])
l2 = conv1(48, conv2(48, l1))
l3 = conv1(64, conv2(64, l2))
l4 = conv1(80, conv2(80, l3))
l5 = conv1(96, conv2(96, l4))
l6 = conv1(128, conv2(128, l5))

f0 = layers.Flatten()(l6)

logits = {}
logits['class'] = layers.Dense(4, name='class')(f0)

# --- Create model
model = Model(inputs=inputs, outputs=logits)

# Evaluation

To test the trained model, the following steps are required:

* load data
* use `model.predict(...)` to obtain logit scores
* use `np.argmax(...)` to obtain prediction
* compare prediction with ground-truth
* serialize in Pandas DataFrame

Recall that the generator used to train the model simply iterates through the dataset randomly. For model evaluation, the cohort must instead be loaded manually in an orderly way. For this tutorial, we will create new **test mode** data generators, which will simply load each example individually once for testing. 

In [None]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True, expand=True)

**Important note**: although the model is trained using 2D slices, there is nothing to preclude passing an entire 3D volume through the model at one time (e.g. consider that the entire 3D volume is a single *batch* of data). In fact, typically performance metrics for medical imaging models are commonly reported on a volume-by-volume basis (not slice-by-slice). Thus, use the `expand=True` flag in `client.create_generators(...)` as above to yield entire 3D volumes instead of slices.

In [None]:
# --- Run entire volume through model
x, y = next(test_train)
logits = model.predict(x['dat'][0])

The key to converting this vector to a final global prediction is to implement some sort of aggregation metric. The most common shown below uses the mean prediction as the final global classification. 

Use the following lines of code to run prediction through the **valid** cohort generator:

In [None]:
trues = []
preds = []

for x, y in test_valid:
    
    # --- Predict
    logits = model.predict(x['dat'][0])

    if type(logits) is dict:
        logits = logits['class']

    # --- Argmax
    pred = np.argmax(logits, axis=1)

    trues.append(y['class'][0, 0])
    preds.append(int(np.round(pred.mean())))

trues = np.array(trues)
preds = np.array(preds)

Prepare results in Pandas DataFrame for ease of analysis and sharing:

In [None]:
# --- Create DataFrame
df = pd.DataFrame(index=np.arange(preds.size))

# --- Define columns
df['true'] = trues
df['pred'] = preds
df['corr'] = df['true'] == df['pred']

# --- Print accuracy
print(df['corr'].mean())

## Saving and Loading a Model

After a model has been successfully trained, it can be saved and/or loaded by simply using the `model.save()` and `models.load_model()` methods. 

In [None]:
# --- Serialize a model
model.save('./series_id.hdf5')

In [None]:
# --- Load a serialized model
del model
model = models.load_model('./series_id.hdf5', compile=False)

# Exercises

Up until this point, this tutorial has presented key implementation details for two advanced CNN motifs: residual operation (with bottleneck); Inception module. However the working examples above may become tedious to use in a large network architecture. To facilitate additional organization, consider the following helper methods to generically implement these motifs in various settings.

### Exercise 1

Create a general method to facilitate a residual connection between any arbitrary two tensors. Keep in mind that prior to the addition operation, one needs to account for potential feature map differences in:

* feature map size
* feature map depth

Use the following cell to implement this method:

In [None]:
def residual(a, b):
    """
    Method to implement residual connection between two arbitrary tensors (a + b)
    
    """
    pass

#### Hints

Consider the following psuedocode:

In [None]:
# --- Check to see if projection is needed (how to determine?)

# --- If projection is needed:

    # --- Account for potential change in feature map depth

    # --- Account for potential change in feature map size (subsample)

    # --- Modify kernel_size if needed

    # --- Perform projection

# --- Perform residual operation

### Exercise 2

Create a general method to facilitate an Inception module for any given single input tensor. Allow for the total number of output feature maps (after concatenation) to be determined dynamically as an argument for the method. Assume that the number of feature maps for each of the four Inception **paths** to yield an equal number of channels.

Use the following cell to implement this method:

In [None]:
def inception(a, filters):
    """
    Method to implement Inception module
    
      p1 = 1 x 1 conv
      p2 = BN > 3 x 3 conv
      p3 = BN > 5 x 5 conv
      p4 = 3 x 3 pool > BN
      
      BN = bottleneck operation
    
    :return
    
      (tf.Tensor) None * i * j * c tensor
      
        i == a.shape[1]
        j == a.shape[2]
        c == filters
        
    """
    pass

#### Hints

Use the template code from above. The only minor modification that needs to be made is to automatically account for number of output filters in each individual pathway to yield a concatenated filter that is the desired output shape.

Consider the following pseudocode:

In [None]:
# --- Defiine lambda functions for: conv, proj, norm, relu, pool

# --- Define 1x1, 3x3 and 5x5 convs

# --- Define requisite filter size for each individual path

# --- Define four different paths

# --- Create bottlenecked operations

# --- Concatenate