# Overview

Congratulations! You have reached the end of the CS 190 curriculum. As a final summative exercise, you will be tasked to develop a model to differentiate between kidneys with tumor from those that are normal using any of the approaches and tools you have learned this quarter. The brief tutorial will simply introduce the dataset and provide some strategies to help guide exploration. Once you are familiar with the task, you are welcome to move onto the assignment which contains more details regarding algorithm design requirements and submission.

This tutorial is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found at: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

# Google Colab

The following lines of code will configure your Google Colab environment for this tutorial.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [None]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md

### Imports

Use the following lines to import any additional needed libraries:

In [None]:
import numpy as np, pandas as pd
import tensorflow as tf
from tensorflow.keras import Input, Model, models, layers, losses, metrics, optimizers
from jarvis.train import datasets
from jarvis.utils.display import imshow

# Data

The data used in this tutorial will consist of kidney tumor CT exams derived from the Kidney Tumor Segmentation Challenge (KiTS). More information about the KiTS Challenge can be found here: https://kits21.kits-challenge.org/. The custom `datasets.download(...)` method can be used to download a local copy of the dataset. By default the dataset will be archived at `/data/raw/ct_kits`; as needed an alternate location may be specified using `datasets.download(name=..., path=...)`. 

In [None]:
# --- Download dataset
datasets.download(name='ct/kits')

Once downloaded, the `datasets.prepare(...)` method can be used to generate the required python Generators to iterate through the dataset, as well as a `client` object for any needed advanced functionality. As needed, pass any custom configurations (e.g. batch size, normalization parameters, etc) into the optional `configs` dictionary argument. 

In this exercise, we will be using abdominal CT volumes that have been preprocessed into 96 x 96 x 96 matrix volumes, each cropped to the right and left kidney, facilitating ease of algorithm training within the Google Colab platform. Based on model implementation strategy, both 2D and 3D data have been prepared for this exercise. To specificy the correct Generator template file, pass a designated `keyword` string. 

**2D dataset**: To select the 2D data of input size `(1, 96, 96, 1)` use the keyword `2d`:

In [None]:
# --- Prepare generators
configs = {'batch': {'size': 16, 'fold': 0}}
gen_train, gen_valid, client = datasets.prepare(name='ct/kits', keyword='2d', configs=configs, custom_layers=True)

**3D dataset**: To select the 3D data of input size `(96, 96, 96, 1)` use the keyword `3d`:

In [None]:
# --- Prepare generators
configs = {'batch': {'size': 2, 'fold': 0}}
gen_train, gen_valid, client = datasets.prepare(name='ct/kits', keyword='3d', configs=configs, custom_layers=True)

### Stratified Sampling

As needed, consider stratified sampling to increase and/or decrease the rate at which certain classes are presented to the model during the training process. Recall that a total of three different classes are defined: 

* class 0: background
* class 1: normal kidney
* class 2: tumor

To change the default sampling strategy, pass a distribution of sampling rates in the `sampling` entry within the `configs` dictioary:

```python
# --- Prepare configs dict
configs = {
    'batch': {'size': ...},
    'sampling': {
        'lbl-crp-00': 0.4,
        'lbl-crp-01': 0.3,
        'lbl-crp-02': 0.3}}
        
# --- Prepare generators
gen_train, gen_valid, client = datasets.prepare(name='ct/kits', keyword='2d', configs=configs, custom_layers=True)
        
```

See details presented in Week 8 (Class Imbalance) for further information.

### KITS Data


Use the following lines of code to visualize using the `imshow(...)` method:

In [None]:
# --- Show the first example
xs, _ = next(gen_train)
imshow(xs['dat'][0])

### Kidney masks

The ground-truth labels are three class masks of the same matrix shape as the model input:

In [None]:
print(xs['lbl'][0].shape)

Use the `imshow(...)` method to visualize the ground-truth tumor mask labels:

In [None]:
# --- Show tumor masks overlaid on original data
imshow(xs['dat'], xs['lbl'])

# --- Show tumor masks isolated
imshow(xs['lbl'])

### Data Generator

In this final project, both classification and segmentation models will be evaluated. Depending on preference, either 2D and/or 3D models may be created. Additionally, while three total classes are available for use during the training process, all models will produce a binary prediction result (tumor vs. no tumor). 

To accomodate these various permutations, consider the following custom code to implement a nested generator strategy:

In [None]:
def G(gen, dims=2, task='cls', binarize=True):
    """
    Custom generator to modify raw labels for 2D/3D classification or segmentation tasks
    
    :params
    
      (generator) gen      : original unmodified generator
      (int)       dims     : 2D or 3D model
      (str)       task     : 'cls' or 'seg' 
      (bool)      binarize : whether or not to binarize original 3-class labels
    
    """
    assert task in ['cls', 'seg']

    for xs, _ in gen:

        # --- Convert segmentation into classification labels
        if task == 'cls':
            axis = (2, 3, 4) if dims == 2 else (1, 2, 3, 4)
            xs['lbl'] = np.max(xs['lbl'], axis=axis, keepdims=True)
            
        # --- Binarize
        if binarize:
            xs['lbl'] = xs['lbl'] == 2

        yield xs

# Models

### Blocks

To facilitate development of either 2D or 3D models, consider the following generic code template for creating 2D or 3D blocks:

In [None]:
def create_blocks(dims=2):
    
    kernel_size = (1, 3, 3) if dims == 2 else (3, 3, 3)
    strides = (1, 2, 2) if dims == 2 else (2, 2, 2)
    
    # --- Define kwargs
    kwargs = {
        'kernel_size': kernel_size,
        'padding': 'same',
        'kernel_initializer': 'he_normal'}

    # --- Define block components
    conv = lambda x, filters, strides : layers.Conv3D(filters=filters, strides=strides, **kwargs)(x)
    tran = lambda x, filters, strides : layers.Conv3DTranspose(filters=filters, strides=strides, **kwargs)(x)

    norm = lambda x : layers.BatchNormalization()(x)
    relu = lambda x : layers.ReLU()(x)

    conv1 = lambda filters, x : relu(norm(conv(x, filters, strides=1)))
    conv2 = lambda filters, x : relu(norm(conv(x, filters, strides=strides)))
    tran2 = lambda filters, x : relu(norm(tran(x, filters, strides=strides)))

    concat = lambda a, b : layers.Concatenate()([a, b])
                                     
    return conv1, conv2, tran2, concat

### Losses

For both classification and segmentation tasks, the default loss function of choice is standard softmax cross-entropy:

```python
sce = losses.SparseCategoricalCrossentropy(from_logits=True)(
    y_true=inputs['lbl'],
    y_pred=logits)
```

To emphasize the foreground (tumor) class, consider either **weighted** loss functions and/or focal loss. Again, these strategies may be applied to both the classification and segmentation tasks. See Week 8 (Class Imbalance) for further details.

### Metrics

For the classification task, the default metric of choice is accuracy:

```python
acc = metrics.sparse_categorical_accuracy(
    y_true=inputs['lbl'], 
    y_pred=logits)
```

For the segmentation task, the default metric of choice is Dice score.

```python
def calculate_dsc(y_true, y_pred, weights=None, c=1):
    """
    Method to calculate the Dice score coefficient for given class
    
    :params
    
      y_true : ground-truth label
      y_pred : predicted logits scores
           c : class to calculate DSC on
    
    """    
    true = y_true[..., 0] == c
    pred = tf.math.argmax(y_pred, axis=-1) == c 
    
    if weights is not None:
        true = true & (weights[..., 0] != 0)
        pred = pred & (weights[..., 0] != 0)

    A = tf.math.count_nonzero(true & pred) * 2
    B = tf.math.count_nonzero(true) + tf.math.count_nonzero(pred)
    
    return tf.math.divide_no_nan(
        tf.cast(A, tf.float32), 
        tf.cast(B, tf.float32))
```

# Classification

The first task is to create any classification model for binary tumor detection. A 2D model will predict tumor vs. no tumor on a slice-by-slice basis whereas a 3D model will predict tumor vs. no tumor on a volume basis. Regardless of implementation choice, all statistical analysis will be performed on a **volume basis**. For those that choose a 2D model, a reduction strategy must be implemented (see details further below).

An example simple 2D backbone implementation may be defined as follows:

```python
# --- Define model input 
x = Input(shape=(None, 96, 96, 1), dtype='float32')

# --- Define layers
l1 = conv1(8, x)
l2 = conv1(16, conv2(16, l1))
l3 = conv1(32, conv2(32, l2))
l4 = conv1(48, conv2(48, l3))
l5 = conv1(64, conv2(64, l4))
l6 = conv1(80, conv2(80, l5))

# --- Reshape
f0 = layers.Reshape(...)(l6)

# --- Create logits
...
```

See details presented in Weeks 3 and 4 for further information.

# Segmentation

The second task is to create any segmentation model for binary tumor localization. A 2D model will predict tumor segmentation masks on a slice-by-slice basis whereas a 3D model will predict tumor segmentation masks on a volume basis. Regardless of implementation choice, all statistical analysis will be performed on a **volume basis**. To do so, a reduction strategy must be implemented (see details further below).

An example simple 2D backbone U-Net implementation may be defined as follows:

```python
# --- Define model input 
x = Input(shape=(None, 96, 96, 1), dtype='float32')

# --- Define contracting layers
l1 = conv1(8, x)
l2 = conv1(16, conv2(16, l1))
l3 = conv1(32, conv2(32, l2))
l4 = conv1(48, conv2(48, l3))
l5 = conv1(64, conv2(64, l4))

# --- Define expanding layers
l6  = tran2(48, l5)
l7  = tran2(32, conv1(48, concat(l4, l6)))
l8  = tran2(16, conv1(32, concat(l3, l7)))
l9  = tran2(8,  conv1(16, concat(l2, l8)))
l10 = conv1(8,  l9)

# --- Create logits
...
```

See details presented in Weeks 5 for further information.

# Reduction Strategies

Regardless of implementation strategy, the final goal of all models in this project is to produce a single binary classification result for tumor vs. no tumor on a per volume basis. Aside from a 3D binary classification model, all other implementations must define a **reduction strategy** to collapse multiple predictions into just a single global per-volume prediction.

The following examples demonstrate expected predictions for any given single `96 x 96 x 96 x 1` volume:

* 2D classification model: 96 predictions (one per slice)
* 2D or 3D segmentation models: 96 ** 3 predictions

### Reduction

The first step in a reduction strategy is to define a single aggregate per exam **score** based on the available predictions. One simple implementation is to add together all binarized predictions. Alternatives include various statistical metrics derived from softmax-transformed logit scores.

The following code block demonstrates how to calculate an aggregate summed score across all validation exams:

In [None]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True, expand=True)
test_train = G(test_train, dims=2, task='cls')
test_valid = G(test_valid, dims=2, task='cls')

preds = []
trues = []

for x in test_valid:
    
    # --- Aggregate preds
    pred = backbone.predict(x['dat'])
    preds.append(np.argmax(pred, axis=-1).sum())

    # --- Aggregate trues
    trues.append(x['lbl'].any())

# --- Create Numpy arrays
preds = np.array(preds)
trues = np.array(trues)

### Thresholds

To convert the raw reduction score into a prediction, consider application of various thresholds. Assuming that approximately half of the predictions need to be positive (tumor), consider using the mean or median prediction as a first-pass threshold.

In [None]:
# --- Apply threshold
thresh = np.median(preds)
preds_ = preds >= thresh

Use the following code cell to calculate accuracy, sensitivity, specificity, PPV and NPV:

In [None]:
# --- Calculate TP/TN/FN/FP
corr = preds_ == trues
tp = np.sum(corr & trues)
tn = np.sum(corr & ~trues)
fn = np.sum(~corr & trues)
fp = np.sum(~corr & ~trues)

# --- Calculate stats
acc = (tp + tn) / corr.size
sen = tp / (tp + fn)
spe = tn / (tn + fp)
ppv = tp / (tp + fp)
npv = tn / (tn + fn)

print('Acc: {:0.4f}'.format(acc))
print('Sen: {:0.4f}'.format(sen))
print('Spe: {:0.4f}'.format(spe))
print('PPV: {:0.4f}'.format(ppv))
print('NPV: {:0.4f}'.format(npv))

How can you automate the above to test various different thresholds?

# Tips

To create the overall best model, consider the following.

### Classification

For classification models, consider using the advanced motifs described in Week 4. 

* residual connection
* bottleneck operation
* Inception module
* Squeeze-and-Excite (SE) module

### Segmentation

For segmentation models, consider using the advanced motifs described in Week 5 and the midterm.

* deep supervision
* residual skip connections
* multiple skip operations

### Alternative models

* box localization networks (Week 7)
* unsupervised pretraining (Week 9)

### Loss 

Masked and/or weighted loss functions are likely to produce the best results on this task. To use these strategies, do *not* binarize the raw data e.g., `G(..., binarize=False)`. Use the different classes to create customized `sample_weight` tensors, but ensure that the final ground-truth label is binarized manually before passing into the loss function. Additionally, during the inference loop, ensure that the necessary modifications are applied to model predictions (e.g., mask out portions of your prediction that are masked during algorithm training).

See Week 8 for further details.

### Data augmentation

Consider the following code block to implement data augmentation:

```python
def augmentation(inputs, factor=0.2):

    kwargs = {
        'interpolation': 'nearest',
        'fill_mode': 'constant',
        'fill_value': 0.0}

    a = layers.Concatenate()((inputs['dat'], inputs['lbl']))
    a = tf.reshape(a, (-1, 96, 96, 2))
    a = layers.experimental.preprocessing.RandomRotation(factor=factor, **kwargs)(a)
    a = layers.experimental.preprocessing.RandomTranslation(factor, factor, **kwargs)(a)
    a = layers.experimental.preprocessing.RandomZoom(factor, **kwargs)(a)
    a = tf.reshape(a, (inputs['dat'].shape[0], inputs['dat'].shape[1], 96, 96, 2)) 

    x = a[..., 0:1]
    y = a[..., 1:2]

    return x, y
```