# Things to include in tutorial

- make nobrainer installable via pip, conda
- create a set of google collaboratory demos
   - how to create a network and train it on the tpu
   - how to insert data augmentation
   - how to use transfer learning on a trained network
- release a set of niflow applications
   - brain extraction
   - brain segmentation
   - tumor labeling
- release the corresponding set of trained models (with details on which data it was trained on)
- release patrick's DWC PR and see if we can get the uncertainty mapping  (by march 7)
   - it would be great if we can work with patrick to basically have his version of the brain extraction and segmentation together with uncertainty images available as applications and pre-trained models. this would be in addition to the ones you have trained.
- if possible, a Web version of the brain segmentor that runs on a nifti image

# Install Nobrainer

Nobrainer can be installed using `pip`. Use the extra `[gpu]` to install TensorFlow with GPU support or the extra `[cpu]` to install TensorFlow without GPU support.

In [None]:
!pip install --no-cache-dir https://github.com/kaczmarj/nobrainer-tf2/tarball/dev tf-nightly-gpu

# Accessing Nobrainer

## Command-line

Nobrainer provides the command-line program `nobrainer`, which contains various methods for preparing data, training and evaluating models, generating predictions, etc.

In [None]:
!nobrainer convert --help

In [None]:
# TMP: use !nobrainer convert --help
!PYTHONPATH='..' python -m nobrainer.cli.main --help

## Python

The `nobrainer` Python package can be imported as below.

In [None]:
# TMP
import sys; sys.path.append('..'); del sys

import nobrainer

# Prepare training data

You will need a nested list of features and labels volumes. One can store this as a CSV that looks like the following:

```
features,labels
/path/to/1_features.nii.gz,/path/to/1_labels.nii.gz
/path/to/2_features.nii.gz,/path/to/2_labels.nii.gz
/path/to/3_features.nii.gz,/path/to/3_labels.nii.gz
/path/to/4_features.nii.gz,/path/to/4_labels.nii.gz
```

and then read it with `nobrainer.io.read_csv`.

## Get sample data

Here, we download 10 T1-weighted brain scans and their corresponding FreeSurfer segmentations. These volumes take up about 46 MB and are saved to a temporary directory. The object `volume_filepaths` is a list of tuples, where each tuple hold the path to the features volume and the path to the corresponding labels volume.

In [None]:
csv_path = nobrainer.utils.get_data()

!head $csv_path

## Convert to volume files to TFRecords

To achieve the best performance, training data should be in TFRecords format. This is the preferred file format for TensorFlow, Training can be done on medical imaging volume files but will be slower.

Nobrainer has a command-line utility to convert volumes to TFRecords: `nobrainer convert`. This will verify that all of the volumes have the same shape and that the label volumes are an integer type or can be safely coerced to an integer type. 

Following successful verification, the volumes will be converted to TFRecords files. The dataset should be sharded into multiple TFRecords files, so that data can be shuffled more properly. This is especially helpful for large datasets. Users can choose how many pairs of volumes (i.e., features and labels) will be saved to one TFRecords file. In this example, we will save 3 pairs of volumes per TFRecords file because our dataset is small. With a larger dataset, users should choose a larger shard value. For example, with 10,000 volumes, one might choose 100 volumes per TFRecords file.

In [None]:
# TMP: use !nobrainer convert --help
!PYTHONPATH='..' python -m nobrainer.cli.main convert --help

In [None]:
!mkdir -p tfrecords

In [None]:
# TMP: use !nobrainer convert --help
!PYTHONPATH='..' python -m nobrainer.cli.main convert \
    --csv='/tmp/nobrainer-data/filepaths.csv' \
    --tfrecords-template='tfrecords/data_shard-{shard:03d}.tfrecords' \
    --volumes-per-shard=4 \
    --volume-shape 256 256 256 \
    --num-parallel-calls=8 \
    --verbose

# Create input data pipeline

We will now create an data pipeline to feed our models with training data. The steps below will create a `tensorflow.data.Dataset` object that is built according to [TensorFlow's guidelines](https://www.tensorflow.org/guide/performance/datasets). The basic pipeline is summarized below.

- Read data
- Separate volumes into non-overlapping sub-volumes
    - This is done to get around memory limitations with larger models.
    - For example, a volume with shape (256, 256, 256) can be separated into eight non-overlapping blocks of shape (128, 128, 128).
- Apply random rigid augmentations if requested.
- Standard score volumes of features.
- Binarize labels if binary segmentation.
- Replace values according to some mapping if multi-class segmentation.
- Batch the results so every iteration yields `batch_size` elements.

In [None]:
# A glob pattern to match the files we want to train on.
file_pattern = 'tfrecords/data_shard-*.tfrecords'

# The number of classes the model predicts. A value of 1 means the model performs
# binary classification (i.e., target vs background).
n_classes = 1

# Batch size is the number of features and labels we train on with each step.
batch_size = 2

# The shape of the original volumes.
volume_shape = (256, 256, 256)

# The shape of the non-overlapping sub-volumes. Most models cannot be trained on
# full volumes because of hardware and memory constraints, so we train and evaluate
# on sub-volumes.
block_shape = (128, 128, 128)

# Whether or not to apply random rigid transformations to the data on the fly.
# This can improve model generalizability but increases processing time.
augment = False

# The tfrecords filepaths will be shuffled before reading, but we can also shuffle
# the data. This will shuffle 10 volumes at a time. Larger buffer sizes will require
# more memory, so choose a value based on how much memory you have available.
shuffle_buffer_size = 10

# Number of parallel processes to use.
num_parallel_calls = 6

In [None]:
!ls $file_pattern

In [None]:
dataset = nobrainer.volume.get_dataset(
    file_pattern=file_pattern,
    n_classes=n_classes,
    batch_size=batch_size,
    volume_shape=volume_shape,
    block_shape=block_shape,
    augment=augment,
    n_epochs=1,
    shuffle_buffer_size=shuffle_buffer_size,
    num_parallel_calls=num_parallel_calls)

dataset

# Train a model

## Instantiate a pre-defined `nobrainer` model

Users can find pre-defined models under the namespace `nobrainer.models`. All models are implemented using the `tf.keras` API, which makes definitions highly readable and hackable, despite being a high-level interface.

In [None]:
model = nobrainer.models.unet(n_classes=n_classes, input_shape=(*block_shape, 1))

In [None]:
# TODO: plot the model here?

## Compile the model

All Keras models must be compiled before they can be trained. This is where you choose the loss function and any other metrics that should be reported during training. Nobrainer implements several loss functions useful for semantic segmentation, including Dice, Generalized Dice, Jaccard, and Tversky losses.

In [None]:
import tensorflow as tf

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(lr=1e-04),
    loss=nobrainer.losses.jaccard,
    metrics=[nobrainer.metrics.dice])

## Train on a single GPU

In [None]:
steps_per_epoch = nobrainer.volume.get_steps_per_epoch(
    n_volumes=10, 
    volume_shape=volume_shape, 
    block_shape=block_shape, 
    batch_size=batch_size)

steps_per_epoch

In [None]:
model.fit(dataset, steps_per_epoch=steps_per_epoch)

## Train on multiple GPUs on the same machine

In [None]:
strategy = tf.distribute.MirroredStrategy()

In [None]:
with strategy.scope():
    model = nobrainer.models.unet(n_classes=n_classes, input_shape=(*block_shape, 1))
    model.compile(
        optimizer=tf.train.AdamOptimizer(1e-04),
        loss=nobrainer.losses.jaccard,
        metrics=[nobrainer.metrics.dice])

In [None]:
# We can train on double the amount of data. With 4 1080Ti GPUs, one can effectively train
# on an entire volume at a time!
batch_size = 4
    
dataset = nobrainer.volume.get_dataset(
    file_pattern=file_pattern,
    n_classes=n_classes,
    batch_size=batch_size,
    volume_shape=volume_shape,
    block_shape=block_shape,
    augment=augment,
    n_epochs=1,
    shuffle_buffer_size=shuffle_buffer_size,
    num_parallel_calls=num_parallel_calls
)

steps_per_epoch = nobrainer.volume.get_steps_per_epoch(
    n_volumes=10, 
    volume_shape=volume_shape, 
    block_shape=block_shape, 
    batch_size=batch_size)

steps_per_epoch

In [None]:
model.fit(dataset, steps_per_epoch=steps_per_epoch)

## Train on TPU

For a guide on how to train on TPUs (available via Google Cloud), please see the notebook [02-train_on_tpu.ipynb](02-train_on_tpu.ipynb)