# Final Project

The final class project is to develop a model to predict brain tumor patient survival using any of the approaches and tools you have learned this quarter. The goal is both to create a high-performing algorithm for the target task, as well as to analyze performance across several different architecture permutations. At minimum, three different network designs of your choice will be tested. As each model is built and trained, recommend that you serialize the final model `*.hdf5` file before moving to the next iteration.

This assignment is part of the class **Introduction to Deep Learning for Medical Imaging** at University of California Irvine (CS190); more information can be found: https://github.com/peterchang77/dl_tutor/tree/master/cs190.

### Submission

Once complete, the following items must be submitted:

* final `*.ipynb` notebook
* final trained `*.hdf5` model files for **all** models (each independently saved)
* final compiled `*.csv` file with performance statistics across the different architectures
* final write-up with methods and results of experiments

# Google Colab

The following lines of code will configure your Google Colab environment for this assignment.

### Enable GPU runtime

Use the following instructions to switch the default Colab instance into a GPU-enabled runtime:

```
Runtime > Change runtime type > Hardware accelerator > GPU
```

# Environment

### Jarvis library

In this notebook we will Jarvis, a custom Python package to facilitate data science and deep learning for healthcare. Among other things, this library will be used for low-level data management, stratification and visualization of high-dimensional medical data.

In [None]:
# --- Install jarvis (only in Google Colab or local runtime)
% pip install jarvis-md

### Imports

Use the following lines to import any additional needed libraries (note that depending on architecture choices, various additional modules will need to be specified here):

In [None]:
import os, numpy as np, pandas as pd
from tensorflow import losses, optimizers
from tensorflow.keras import Input, Model, models, layers
from jarvis.train import datasets, custom

# Data

The data used in this final project will consist of brain tumor MRI exams derived from the MICCAI Brain Tumor Segmentation Challenge (BRaTS). More information about he BRaTS Challenge can be found here: http://braintumorsegmentation.org/. Each single 3D volume will consist of one of four different sequences (T2, FLAIR, T1 pre-contrast and T1 post-contrast). The custom `datasets.download(...)` method can be used to download a local copy of the dataset. By default the dataset will be archived at `/data/raw/mr_brats_2020`; as needed an alternate location may be specified using `datasets.download(name=..., path=...)`. 

In [None]:
# --- Download dataset
datasets.download(name='mr/brats-2020-096')

### Python generators

Once the dataset is downloaded locally, Python generators to iterate through the dataset can be easily prepared using the `datasets.prepare(...)` method. To specificy the correct Generator template file, pass a designated `keyword` string. In this exercise, we will be using brain MRI volumes that have been cropped to the boundaries of the tumor and resampled to a uniform 3D volume of shape (96, 96, 96, 4). Using this input, two separate target labels have been prepared:

* survival scores (use `096*glb-org` keyword)
* tumor segmentation labels (use `096*vox-org` keyword)

To select the correct template and generators for this task, use the keyword string as above.

### Model inputs

For every input in `xs`, a corresponding `Input(...)` variable can be created and returned in a `inputs` dictionary for ease of model development:

In [None]:
# --- Create model inputs
inputs = client.get_inputs(Input)

print(inputs.keys())
print(inputs['dat'].shape)

# Training

The goal of this project is to perform **global survival prediction** for each patient (e.g., 3D volume of data). In other words, regardless of algorithm choice, the final objective is to predict a global survival score. This however does **not** mean that you are required to use global regression networks only; in fact it very well may be the case that a hybrid algorithm will overall perform better on this task.

The task is designed to be open-ended on purpose. The only requirements are to:

* test at minimum three different network architectures
* one algorithm must use (at least) a global regression type loss function
* one algorithm must use (at least) a pretrained autoencoder strategy

While you can choose to be creative and employ any architecture that you like, the following discussion may help guide your development process.

# Evaluation

For each of the three models, the following metrics should be calculated for **both the training and validation** cohorts:

* absolute error, mean
* absolute error, median
* absolute error, 25th percentile
* absolute error, 75th percentile

### Performance

The only requirement for full credit is that your overall top-performing model achieves an overall median accuracy of 0.085 or below. In addition, the **top three performing models** out of the entire class will recieve a full letter bonus to your overall final grade (e.g. C to B, B to A, etc). 

In [None]:
# --- Create validation generator
test_train, test_valid = client.create_generators(test=True)

### Results

When ready, create a `*.csv` file with your compiled **training and validation** cohort statistics for the different models. Consider the following table format (although any format that contains the required information is sufficient):

```
          TRAINING                                VALIDATION
          mean | median | 25th-tile | 75th-tile | mean | median | 25th-tile | 75th-tile
model 1
model 2
model 3
```


As above, tables for both training and validation should be provided.

In [None]:
# --- Create *.csv
                              
# --- Serialize *.csv

# Summary

In addition to algorithm training as above, a brief write-up is required for this project (minimum of one page). The goal is to *briefly* summarize algorithm design and key results. The write-up should be divided into three sections: methods; results; discussion.

### Methods

In this section, include details such as:

* **Data**: How much data was used. How many cases were utilized for training and validation?
* **Network design**: What are the different network architectures? How many layers and parameters? Were 2D or 3D operations used? Recall that the `model.summary(...)` can be used to provide key summary statistics for this purpose. If desired, feel free to include a model figure or diagram.
* **Implementation**: How was training implemented. What are the key hyperparameters (e.g. learning rate, batch size, optimizer, etc)? How many training iterations were required for convergence? Did these hyperparameters change during the course of training?
* **Statistics**: What statistics do you plan to use to evaluate model accuracy? 

### Results

In this section, briefly summarize experimental results (a few sentences), and include the result table(s) as derived above.

### Discussion

Were the results expected or unexpected? What accounts for the differences in performance between the algorithms?  How did you choose the network architecture implemented in your final model? With more time and/or resources, how would further optimize your top model? Feel free to elaborate on any additional observations noted during the course of this expierment.

# Submission


### Canvas

Once you have completed the midterm assignment, download the necessary files from Google Colab and your Google Drive. As in prior assigments, be sure to prepare:

* final (completed) notebook: `[UCInetID]_assignment.ipynb`
* final (results) spreadsheet: `[UCInetID]_results.csv` (compiled for all three parts)
* final (trained) model: `[UCInetID]_model.hdf5` (three separate files for all three parts)

In addition, submit the summary write-up as in any common document format (`.docx`, `.tex`, `.pdf`, etc):

* final summary write-up: `[UCInetID]_summary.[docx|tex|pdf]`

**Important**: please submit all your files prefixed with your UCInetID as listed above. Your UCInetID is the part of your UCI email address that comes before `@uci.edu`. For example, Peter Anteater has an email address of panteater@uci.edu, so his notebooke file would be submitted under the name `panteater_notebook.ipynb`, his spreadsheet would be submitted under the name `panteater_results.csv` and and his model file would be submitted under the name `panteater_model.hdf5`.