# Assignment

In this assignment we will train a box localization algorithm derived from RetinaNet to perform kidney localization onCT. The algorithm will be implemented using a feature pyramid network backbone. Accuracy will be calculated based on median IoU performance against ground-truth masks.

This assignment is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

### Submission

Once complete, the following items must be submitted:

* final `*.ipynb` notebook (push to https://github.com/[username]/cs190/cnn/assignment.ipynb)
* final trained `*.hdf5` model file
* final compiled `*.csv` file with performance statistics

# Google Colab

The following lines of code will configure your Google Colab environment for this assignment.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [None]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md

### Imports

Use the following lines to import any additional needed libraries:

In [None]:
import numpy as np, pandas as pd
import tensorflow as tf
from tensorflow.keras import Input, Model, models, layers, optimizers
from jarvis.train import datasets
from jarvis.utils.display import imshow
from jarvis.train.box import BoundingBox

# Data

The data used in this tutorial will consist of kidney tumor CT exams derived from the Kidney Tumor Segmentation Challenge (KiTS). More information about he KiTS Challenge can be found here: https://kits21.kits-challenge.org/. The custom `datasets.download(...)` method can be used to download a local copy of the dataset. By default the dataset will be archived at `/data/raw/ct_kits`; as needed an alternate location may be specified using `datasets.download(name=..., path=...)`. 

The following lines of code will:

1. Download the dataset (if not already present) 
2. Prepare the necessary Python generators to iterate through dataset

In [None]:
# --- Download dataset
datasets.download(name='ct/kits')

# --- Prepare generators and model inputs
gen_train, gen_valid, client = datasets.prepare(name='ct/kits', keyword='2d-bin', custom_layers=True)

# Training

In this assignment we will train a box localization network for kidney detection.

### Define box parameters

Use the following cell block to define your `BoundingBox` object as discussed in the tutorial. Feel free to optimize hyperparameter choices for grid size, anchor shapes, anchor aspect ratios, and anchor scales: 

In [None]:
bb = BoundingBox(...)

### Define inputs

Use the following cell block to define the nested generators needed to convert raw masks into bounding box ground-truth predictions:

In [None]:
def box_generator(G):
    
    for xs, _ in G:
        
        # --- Convert mask into bounding-box paramaterization
        msk = xs.pop('lbl')
        box = bb.convert_msk_to_box(msk=msk)
        
        # --- Update xs dictionary
        xs.update(box)
        
        yield xs

In [None]:
# --- Prepare generators
gen_train, gen_valid = client.create_generators()
gen_train = box_generator(G=gen_train)
gen_valid = box_generator(G=gen_valid)

### Define backbone model

Use the following cell block to define your feature pyramid network backbone and RetinaNet classification / regression networks:

In [None]:
# --- Define input
x = Input(shape=?, dtype='float32')

# --- Define model

# --- Define logits
logits = ...

# --- Create model
backbone = Model(inputs=x, outputs=logits)

### Define training model

Recall the following requirements as described in the tutorial:

* use of a focal sigmoid (binary) cross-entropy loss function for regression
* use of a Huber loss function for classification
* use of masked loss functions to ensure only relevant examples are used for training
* use of appropriate metrics to track algorithm training

In [None]:
# --- Define inputs
inputs = {?}

# --- Define model
logits = backbone(inputs['dat'])

# --- Define loss

# --- Define metric

Now, we are ready to create the `training` model and add the corresponding loss and accuracy tensors. 

In [None]:
# --- Create model
training = Model(inputs=inputs, outputs=?)

# --- Add loss

# --- Add metric

### Compile the model

Use the following cell block to compile your model with an appropriate optimizer. 

### In-memory data

To speed up training, consider loading all your model data into RAM memory:

In [None]:
# --- Load data into memory for faster training
client.load_data_in_memory()

### Train the model

Use the following cell block to train your model.

# Evaluation

Based on the tutorial discussion, use the following cells to calculate model performance. The following metrics should be calculated:

* median IoU
* 25th percentile IoU
* 75th percentile IoU

### Performance

The following minimum performance metrics must be met for full credit:

* median IoU: >0.50
* 25th percentile IoU: >0.40
* 75th percentile IoU: >0.60

In [None]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True, expand=True)
test_train = box_generator(test_train)
test_valid = box_generator(test_valid)

### Results

When ready, create a `*.csv` file with your compiled **validation** cohort IoU statistics. There is no need to submit training performance accuracy.

# Submission

Use the following line to save your model for submission (in Google Colab this should save your model file into your personal Google Drive):

In [None]:
# --- Serialize a model
backbone.save('./model.hdf5')

### Canvas

Once you have completed this assignment, download the necessary files from Google Colab and your Google Drive. You will then need to submit the following items:

* final (completed) notebook: `[UCInetID]_assignment.ipynb`
* final (results) spreadsheet: `[UCInetID]_results.csv`
* final (trained) model: `[UCInetID]_model.hdf5`

**Important**: please submit all your files prefixed with your UCInetID as listed above. Your UCInetID is the part of your UCI email address that comes before `@uci.edu`. For example, Peter Anteater has an email address of panteater@uci.edu, so his notebooke file would be submitted under the name `panteater_notebook.ipynb`, his spreadshhet would be submitted under the name `panteater_results.csv` and and his model file would be submitted under the name `panteater_model.hdf5`.