# Assignment

In this assignment we will train a box localization algorithm derived from RetinaNet to perform identification of the prostate gland on MRI. The algorithm will be implemented using a feature pyramid network backbone. Accuracy will be calculated based on median IoU performance against ground-truth masks.

This assignment is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

### Submission

Once complete, the following items must be submitted:

* final `*.ipynb` notebook (push to https://github.com/[username]/cs190/cnn/assignment.ipynb)
* final trained `*.hdf5` model file
* final compiled `*.csv` file with performance statistics

# Google Colab

The following lines of code will configure your Google Colab environment for this assignment.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

### Mount Google Drive

The Google Colab environment is transient and will reset after any prolonged break in activity. To retain important and/or large files between sessions, use the following lines of code to mount your personal Google drive to this Colab instance:

In [46]:
try:
    # --- Mount gdrive to /content/drive/My Drive/
    from google.colab import drive
    drive.mount('/content/drive')
    
except: pass

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


Throughout this assignment we will use the following global `MOUNT_ROOT` variable to reference a location to store long-term data. If you are using a local Jupyter server and/or wish to store your data elsewhere, please update this variable now.

In [0]:
# --- Set data directory
MOUNT_ROOT = '/content/drive/My Drive'

### Select Tensorflow library version

This assignment will use the (new) Tensorflow 2.0 library. Use the following line of code to select this updated version:

In [1]:
# --- Select Tensorflow 2.0 (only in Google Colab)
% tensorflow_version 2.x
% pip install tensorflow-gpu==2.1

Collecting tensorflow-gpu==2.1
[?25l  Downloading https://files.pythonhosted.org/packages/0a/93/c7bca39b23aae45cd2e85ad3871c81eccc63b9c5276e926511e2e5b0879d/tensorflow_gpu-2.1.0-cp36-cp36m-manylinux2010_x86_64.whl (421.8MB)
[K     |████████████████████████████████| 421.8MB 27kB/s 
Collecting tensorboard<2.2.0,>=2.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/d9/41/bbf49b61370e4f4d245d4c6051dfb6db80cec672605c91b1652ac8cc3d38/tensorboard-2.1.1-py3-none-any.whl (3.8MB)
[K     |████████████████████████████████| 3.9MB 36.1MB/s 
[?25hCollecting gast==0.2.2
  Downloading https://files.pythonhosted.org/packages/4e/35/11749bf99b2d4e3cceb4d55ca22590b0d7c2c62b9de38ac4a4a7f4687421/gast-0.2.2.tar.gz
Collecting tensorflow-estimator<2.2.0,>=2.1.0rc0
[?25l  Downloading https://files.pythonhosted.org/packages/18/90/b77c328a1304437ab1310b463e533fa7689f4bfc41549593056d812fab8e/tensorflow_estimator-2.1.0-py2.py3-none-any.whl (448kB)
[K     |████████████████████████████████| 450kB 4

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [2]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md

Collecting jarvis-md
[?25l  Downloading https://files.pythonhosted.org/packages/2c/b7/28006fe159ff4ab5ab02ac0f08cc0171ce308a301aabe1eaa8791bea7d23/jarvis_md-0.0.1a7-py3-none-any.whl (68kB)
[K     |████▉                           | 10kB 12.9MB/s eta 0:00:01[K     |█████████▋                      | 20kB 3.4MB/s eta 0:00:01[K     |██████████████▍                 | 30kB 4.3MB/s eta 0:00:01[K     |███████████████████▏            | 40kB 4.6MB/s eta 0:00:01[K     |████████████████████████        | 51kB 3.7MB/s eta 0:00:01[K     |████████████████████████████▉   | 61kB 4.2MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 3.2MB/s 
Collecting pyyaml>=5.2
[?25l  Downloading https://files.pythonhosted.org/packages/64/c2/b80047c7ac2478f9501676c988a5411ed5572f35d1beff9cae07d321512c/PyYAML-5.3.1.tar.gz (269kB)
[K     |████████████████████████████████| 276kB 13.6MB/s 
Building wheels for collected packages: pyyaml
  Building wheel for pyyaml (setup.py) ... [?25l[?25hdone

### Imports

Use the following lines to import any additional needed libraries:

In [0]:
import numpy as np, pandas as pd
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers, metrics
from jarvis.train import datasets, custom
from jarvis.train.box import BoundingBox

# Data

As in the tutorial, data for this assignment will consist of prostate MRI exams. In prior work, an algorithm was created to separate out different MRI sequences. In this current assignment, only T2-weighted images (isolated using the prior algorithm) will be used for segmentation. In prostate imaging, the T2-weighted sequence captures the greatest amount of anatomic detail and is thus ideal for delineation of prostate gland structures.

The following line of code will download the dataset (if not already present) and prepare the required generators:

In [4]:
# --- Download dataset
datasets.download(name='mr/prostatex-seg')

# --- Prepare generators
configs = {'batch': {'size': 12}}
gen_train, gen_valid, client = datasets.prepare(name='mr/prostatex-seg', configs=configs, keyword='box')



# Training

In this assignment we will train a box localization network for prostate segmentation.

### Define box parameters

Use the following cell block to define your `BoundingBox` object as discussed in the tutorial. Feel free to optimize hyperparameter choices for grid size, anchor shapes, anchor aspect ratios, and anchor scales: 

In [0]:
bb = BoundingBox(
    image_shape=(256, 256),
    classes=1,
    c=[3, 4, 5],
    anchor_shapes=[16, 32, 64],
    anchor_scales=[0, 1, 2],
    anchor_ratios=[0.5, 1, 2],
    iou_upper=0.5,
    iou_lower=0.2)

### Define inputs

Use the following cell block to define the nested generators needed to convert raw masks into bounding box ground-truth predictions:

In [0]:
inputs = client.get_inputs(Input)
inputs = bb.get_inputs(inputs, Input)

# --- Prepare generators
gen_train, gen_valid = client.create_generators()
gen_train, gen_valid = bb.create_generators(gen_train, gen_valid, msk='prostate')

### Define the model

Use the following cell block to define your feature pyramid network backbone and RetinaNet classification / regression networks:

In [0]:
# --- Define kwargs dictionary
kwargs = {
    'kernel_size': (1, 3, 3),
    'padding': 'same'}

# --- Define lambda functions
conv = lambda x, filters, strides : layers.Conv3D(filters=filters, strides=strides, **kwargs)(x)
norm = lambda x : layers.BatchNormalization()(x)
relu = lambda x : layers.LeakyReLU()(x)

# --- Define stride-1, stride-2 blocks
conv1 = lambda filters, x : relu(norm(conv(x, filters, strides=1)))
conv2 = lambda filters, x : relu(norm(conv(x, filters, strides=(1, 2, 2))))

# --- Define zoom
zoom = lambda x : layers.UpSampling3D(
    size=(1, 2, 2))(x)

# --- Define 1 x 1 x 1 projection
proj = lambda filters, x : layers.Conv3D(
    filters=filters,
    strides=1,
    kernel_size=(1, 1, 1),
    padding='same',
    kernel_initializer='he_normal')(x)


In [0]:
# --- Define contracting layers
l1 = conv1(8, inputs['dat'])
l2 = conv1(16, conv2(16, l1))
l3 = conv1(24, conv2(24, l2))
l4 = conv1(32, conv2(32, l3))
l5 = conv1(48, conv2(48, l4))
l6 = conv1(64, conv2(64, l5))

# --- Define expanding layers
l7 = proj(64, l6)
l8 = conv1(64, zoom(l7) + proj(64, l5))
l9 = conv1(64, zoom(l8) + proj(64, l4))

# --- Determine filter sizes
logits = {}
K = 1
A = 9

# --- C3
c3_cls = conv1(64, conv1(64, l9))
c3_reg = conv1(64, conv1(64, l9))
logits['cls-c3'] = layers.Conv3D(filters=(A * K), name='cls-c3', **kwargs)(c3_cls)
logits['reg-c3'] = layers.Conv3D(filters=(A * 4), name='reg-c3', **kwargs)(c3_reg)

# --- C4
c4_cls = conv1(64, conv1(64, l8))
c4_reg = conv1(64, conv1(64, l8))
logits['cls-c4'] = layers.Conv3D(filters=(A * K), name='cls-c4', **kwargs)(c4_cls)
logits['reg-c4'] = layers.Conv3D(filters=(A * 4), name='reg-c4', **kwargs)(c4_reg)

# --- C5
c5_cls = conv1(64, conv1(64, l7))
c5_reg = conv1(64, conv1(64, l7))
logits['cls-c5'] = layers.Conv3D(filters=(A * K), name='cls-c5', **kwargs)(c5_cls)
logits['reg-c5'] = layers.Conv3D(filters=(A * 4), name='reg-c5', **kwargs)(c5_reg)


In [0]:
# --- Create model
model = Model(inputs=inputs, outputs=logits)

### Compile the model

Use the following cell block to compile your model. Recall the following requirements as described in the tutorial:

* use of a focal sigmoid (binary) cross-entropy loss function for regression
* use of a Huber loss function for classification
* use of masked loss functions to ensure only relevant examples are used for training
* use of appropriate metrics to track algorithm training

In [0]:
# --- Compile the model
model.compile(
    optimizer=optimizers.Adam(learning_rate=2e-4),
    loss={
        'cls-c3': custom.focal_sigmoid_ce(inputs['cls-c3-msk']),
        'cls-c4': custom.focal_sigmoid_ce(inputs['cls-c4-msk']),
        'cls-c5': custom.focal_sigmoid_ce(inputs['cls-c5-msk']),
        'reg-c3': custom.sl1(inputs['reg-c3-msk']),
        'reg-c4': custom.sl1(inputs['reg-c4-msk']),
        'reg-c5': custom.sl1(inputs['reg-c5-msk'])
        },
    metrics={
        'cls-c3': [custom.sigmoid_ce_sens(), custom.sigmoid_ce_ppv()],
        'cls-c4': [custom.sigmoid_ce_sens(), custom.sigmoid_ce_ppv()],
        'cls-c5': [custom.sigmoid_ce_sens(), custom.sigmoid_ce_ppv()]},
    experimental_run_tf_function=False)

### Train the model

Use the following cell block to train your model. **Note**: it is recommended to train for at least 10,000 iterations for convergence.

In [31]:
client.load_data_in_memory()



In [32]:
# --- Train model
model.fit(
    x=gen_train, 
    steps_per_epoch=500, 
    epochs=16,
    validation_data=gen_valid,
    validation_steps=500,
    validation_freq=4,
    use_multiprocessing=True)

Epoch 1/16
Epoch 2/16
Epoch 3/16
Epoch 4/16
Epoch 1/16
Epoch 5/16
Epoch 6/16
Epoch 7/16
Epoch 8/16
Epoch 1/16
Epoch 9/16
Epoch 10/16
Epoch 11/16
Epoch 12/16
Epoch 1/16
Epoch 13/16
Epoch 14/16
Epoch 15/16
Epoch 16/16
Epoch 1/16


<tensorflow.python.keras.callbacks.History at 0x7f5b3ac10978>

# Evaluation

Based on the tutorial discussion, use the following cells to calculate model performance. The following metrics should be calculated:

* median IoU
* 25th percentile IoU
* 75th percentile IoU

As in prior assignments, accuracy is determined on a patient by patient (volume by volume) basis, so please calculate the IoU prediction values for each 3D volume (not slice-by-slice).

### Performance

The following minimum performance metrics must be met for full credit:

* median IoU: >0.50
* 25th percentile IoU: >0.40
* 75th percentile IoU: >0.60

In [0]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True, expand=True)
test_train, test_valid = bb.create_generators(test_train, test_valid, msk='prostate')

In [34]:
ious = {
    'med': [],
    'p25': [],
    'p75': []}

for x, y in test_train:
    
    # --- Predict
    box = model.predict(x)
    if type(box) is list:
        box = {k: l for k, l in zip(model.output_names, box)}
        
    # --- Convert predictions to anchors
    anchors_pred, _ = bb.convert_box_to_anc(box)
    
    # --- Convert ground-truth to anchors
    anchors_true, _ = bb.convert_box_to_anc(y)
    
    # --- Calculate IoUs
    curr = []
    for pred, true in zip(anchors_pred, anchors_true):
        for p in pred:
            iou = bb.calculate_ious(box=p, anchors=true)
            if iou.size > 0:
                curr.append(np.max(iou))
            else: 
                curr.append(0)
    
    if len(curr) == 0:
        curr = [0]
        
    ious['med'].append(np.median(curr))
    ious['p25'].append(np.percentile(curr, 25))
    ious['p75'].append(np.percentile(curr, 75))
    
ious = {k: np.array(v) for k, v in ious.items()}



In [36]:
# --- Define columns
df = pd.DataFrame(index=np.arange(ious['med'].size))
df['iou_median'] = ious['med']
df['iou_p-25th'] = ious['p25']
df['iou_p-75th'] = ious['p75']

# --- Print accuracy
print(df['iou_median'].median())
print(df['iou_p-25th'].median())
print(df['iou_p-75th'].median())

0.6470589637756348
0.5710867047309875
0.7327057421207428


### Results

When ready, create a `*.csv` file with your compiled **validation** cohort Dice score statistics. There is no need to submit training performance accuracy.

In [44]:
ious = {
    'med': [],
    'p25': [],
    'p75': []}

for x, y in test_valid:
    
    # --- Predict
    box = model.predict(x)
    if type(box) is list:
        box = {k: l for k, l in zip(model.output_names, box)}
        
    # --- Convert predictions to anchors
    anchors_pred, _ = bb.convert_box_to_anc(box)
    
    # --- Convert ground-truth to anchors
    anchors_true, _ = bb.convert_box_to_anc(y)
    
    # --- Calculate IoUs
    curr = []
    for pred, true in zip(anchors_pred, anchors_true):
        for p in pred:
            iou = bb.calculate_ious(box=p, anchors=true)
            if iou.size > 0:
                curr.append(np.max(iou))
            else: 
                curr.append(0)
    
    if len(curr) == 0:
        curr = [0]
        
    ious['med'].append(np.median(curr))
    ious['p25'].append(np.percentile(curr, 25))
    ious['p75'].append(np.percentile(curr, 75))
    
ious = {k: np.array(v) for k, v in ious.items()}

# --- Define columns
df = pd.DataFrame(index=np.arange(ious['med'].size))
df['iou_median'] = ious['med']
df['iou_p-25th'] = ious['p25']
df['iou_p-75th'] = ious['p75']

# --- Print accuracy
print(df['iou_median'].median())
print(df['iou_p-25th'].median())
print(df['iou_p-75th'].median())

0.5526261702179909
0.7226076871156693


# Submission

Use the following line to save your model for submission (in Google Colab this should save your model file into your personal Google Drive):

In [0]:
import os
# --- Serialize a model
fname = '{}/models/box_localization/final.hdf5'.format(MOUNT_ROOT)
os.makedirs(os.path.dirname(fname), exist_ok=True)
model.save(fname)

### Canvas

Once you have completed this assignment, download the necessary files from Google Colab and your Google Drive. You will then need to submit the following items:

* final (completed) notebook: `[UCInetID]_assignment.ipynb`
* final (results) spreadsheet: `[UCInetID]_results.csv`
* final (trained) model: `[UCInetID]_model.hdf5`

**Important**: please submit all your files prefixed with your UCInetID as listed above. Your UCInetID is the part of your UCI email address that comes before `@uci.edu`. For example, Peter Anteater has an email address of panteater@uci.edu, so his notebooke file would be submitted under the name `panteater_notebook.ipynb`, his spreadshhet would be submitted under the name `panteater_results.csv` and and his model file would be submitted under the name `panteater_model.hdf5`.