# Overview

## Setting Up Your Environment

The following lines of code will set up your python/iPython shell with the appropriate requirements and environment variables needed to run this tutorial, as well as download and prepare any necessary data. Of note, all required dependencies and additional tutorials are available at: https://github.com/peterchang77/dl_core. This GitHub repository contains the `dl_core` module and will be used throughout the tutorial. If a copy of this repository is present already, pass the complete path to the repository root directory to the `DL_PATH` variable as described below. 

The following arguments may be passed to the function:

* `DL_PATH`: complete path to the `dl_core` library (GitHub repo); if not present, this path represents location where the GitHub repository will be cloned
* `DS_PATH`: complete path to the dataset used in this library; if not present, this path represents location where the data will be downloaded
* `DS_NAME`: name of dataset to be downloaded

In [None]:
from setenv import prepare_environment
prepare_environment(
    DL_PATH='../../',
    DS_PATH='/data/raw/brats',
    DS_NAME='brats')

## Import 

The following modules will be used in this tutorial:

In [None]:
import glob, os
import numpy as np
from dl_core.io import hdf5
from dl_core.client import Client

# Data

The data you have downloaded above contains preprocessed images and labels in HDF5 format. This data can be loaded directly using the `h5py` Python library, or using a high-level API as part of the `dl_core.io.hdf5` module. Let us load an example image and label pair:

In [None]:
# --- Find data
dirs = sorted(glob.glob('%s/hdfs/*/' % os.environ['DS_PATH']))
print('A total of %i patients in dataset' % len(dirs))

# --- Load first example using hdf5 module
dat = '%sdat.hdf5' % dirs[0]
lbl = '%slbl.hdf5' % dirs[0]

dat = hdf5.load(dat)[0]
lbl = hdf5.load(lbl)[0]

For more information about the HDF5 file format and the custom `hdf5` module as part of the `dl_core` library, see the following tutorial links (remote/local). Now, let us take a closer look at the data structure and view the underlying pixel information:

In [None]:
# --- Inspect
print(type(dat))
print(dat.shape)
print(lbl.shape)

# --- View middle slice
z = int(dat.shape[0] / 2)

## Data Client

To efficiently load the data used in this tutorial, a special `Client` class has been prepared that handles many of the low-level tasks that need to be accounted for during algorithm training such as keeping track of training / validation splits and randomization between epochs. Importantly several medical imaging specific functionality has also been accounted for, including stratified sampling by disease entity and statistics for image normalization. For more information about the `Client` class and how to customize, see the following tutorial links (remote/local). 

For this tutorial, we leverage this class by creating a custom preprocessing method for this experiment:

In [None]:
class MyClient(Client):
    
    def preprocess(self, arrays, meta):
        """
        Method to preprocess arrays
        
        :params
        
          (np.ndarray) arrays['dat']: input data
          (np.ndarray) arrays['lbl']: input labels
          (dict) meta : metadata information about current data
        
        """
        # --- Preprocess data
        
        # --- Preprocess labels
        arrays['lbl'] = (arrays['lbl'] >= 1).astype('uint8')
        
        return arrays

Now we will instantiate a new `client` object, set a stratified sampling rate and prepare the first valildation fold. In our experiment, we will choose to stratify our data sampling such that an even 50-50% of loaded examples contain a positive or negative finding.

In [None]:
# --- Instantiate the data client
client = MyClient()

# --- Set 50/50% sampling rate
client.set_sampling_rate({
    1: 0.5,
    2: 0.5})

# --- Validate on the first (of 5 total) folds of data
client.prepare(fold=0)

# Model

# Training

# Prediction