All the Jupyter Notebooks are available at https://github.com/neuro-ml/dpipe_tutorial

# Tutorials on Deep Pipe

The tutorials introduce the library called **Deep Pipe**, which is useful for medical image analysis, including preprocessing, data augmentation, performance validation and final prediction.

## Tutorial 4: Putting all together

In the current tutorial we implement our own MNIST dataset, `model_core` for classification, simple batch iterator and assemble the experiment.

### Imports:

In [1]:
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


# import deep pipe library
# how to install: https://github.com/neuro-ml/deep_pipe/blob/master/README.md
import dpipe

  return f(*args, **kwds)


## I. Data and batch iterator

Create toy MNIST dataset for classification:

In [2]:
from abc import abstractmethod, ABC, ABCMeta

import numpy as np


class Dataset(ABC):
    @property
    @abstractmethod
    def ids(self):
        """Returns a tuple of ids of all objects in the dataset."""


class ClassificationDataset(Dataset):
    """Abstract class that describes a dataset for medical image classification."""
    @abstractmethod
    def load_image(self, identifier: str) -> np.array:
        """
        Loads a dataset entry given its identifier
        Parameters
        ----------
        identifier: str
            object's identifier
        Returns
        -------
        object:
            The entry corresponding to the identifier
        """
    
    @abstractmethod
    def load_label(self, identifier: str) -> int:
        """
        Loads a label for a given identifier
        Parameters
        ----------
        identifier: str
            object's identifier
        Returns
        -------
        int:
            The label corresponding to the identifier
        """

    @property
    @abstractmethod
    def n_chans_image(self) -> int:
        """
        The number of channels in the input image's tensor
        
        Returns
        -------
        channels: int
        """

In [127]:
from tensorflow.examples.tutorials.mnist import input_data

class MNIST(ClassificationDataset):
    def __init__(self, path_to_data: str):
        # fetch mnist datasets
        mnist = input_data.read_data_sets(path_to_data, reshape=False, one_hot=False)
        X_train, y_train           = mnist.train.images, mnist.train.labels
        X_validation, y_validation = mnist.validation.images, mnist.validation.labels
        X_test, y_test             = mnist.test.images, mnist.test.labels
        
        # transope the matrices so `channels` is the secord index
        X_train = np.transpose(X_train, [0, 3, 1, 2])
        X_validation = np.transpose(X_validation, [0, 3, 1, 2])
        X_test = np.transpose(X_test, [0, 3, 1, 2])
        
        self._images = np.concatenate((X_train, X_validation, X_test), axis=0)
        self._labels = np.concatenate((y_train, y_validation, y_test), axis=0)
        self._n_chans_image = X_train.shape[1]
        self._ids = list(np.arange(self._labels.shape[0]).astype(str))
        
        
    def load_image(self, identifier: str) -> np.array:
        return self._images[int(identifier), ...]
    
    def load_label(self, identifier: str) -> int:
        return self._labels[int(identifier)]
        
    @property
    def ids(self) -> list:
        return self._ids
    
    @property
    def n_chans_image(self) -> int:
        return self._n_chans_image
        

In [128]:
mnist = MNIST('MNIST_data/')

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [129]:
from dpipe.medim.visualize import slice3d

slice3d(mnist.load_image(mnist.ids[1]), axis=0)

A Jupyter Widget

In [130]:
from dpipe.batch_iter.slices import slices
from dpipe.train.batch_iter import make_batch_iter_from_finite
from functools import partial
from dpipe.medim.utils import load_by_ids
import pdp


def simple_iterator(ids, load_x, load_y, batch_size, *, shuffle=False):
    def simple():
        for x, y in load_by_ids(load_x, load_y, ids, shuffle):
            yield x, y

    return pdp.Pipeline(pdp.Source(simple(), buffer_size=5),
                        pdp.Many2One(chunk_size=batch_size, buffer_size=2),
                        pdp.One2One(pdp.combine_batches, buffer_size=3))

In [131]:
batch_iter = make_batch_iter_from_finite(
    get_batch_iter=partial(
        simple_iterator,
        ids=mnist.ids[:16*3000],
        load_x=mnist.load_image,
        load_y=mnist.load_label,
        batch_size=16,
        shuffle=False
    )
)

## II. Model Core for classification

A model core is exactly neural net which will later be used for computing logits, losses and making predictions. A model core must have the method `build`, which builds the computational graph along with placeholders and operations and returns the sequence of input placeholders and output logits. *The source: https://github.com/neuro-ml/deep_pipe/blob/develop/dpipe/model_core/base.py*

Implement LeNet-5:

In [154]:
import tensorflow as tf
from tensorflow.contrib import slim


def weight_variable(shape):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial)


def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)


def conv_block(inputs, output_channels, training, name, kernel_size=3, 
               strides=1, layer=tf.layers.conv2d, activation=tf.nn.relu):
    
    with tf.name_scope(name):
        inputs = layer(inputs, output_channels, kernel_size=kernel_size,
                           strides=strides, padding='same',
                           data_format='channels_first')
               
        inputs = slim.batch_norm(inputs, decay=0.9, scale=True,
                                     is_training=training,
                                     data_format='NCHW', fused=True)
               
        inputs = activation(inputs)
    return inputs


def pooling(inputs, name, pool_size=(2,2), strides=(2,2), layer=tf.layers.max_pooling2d):
    
    with tf.name_scope(name):
        inputs = layer(inputs, pool_size, strides, padding='same', data_format='channels_first')
    return inputs


def fc(inputs, num_outputs, name, activation=tf.nn.relu):
    with tf.name_scope(f'{name}/weights'):
        W = weight_variable((int(inputs.shape[1]), num_outputs))
    with tf.name_scope(f'{name}/bias'):
        b = bias_variable([num_outputs])
    return tf.nn.xw_plus_b(inputs, W, b, name=name)


def build_model(inputs, classes, name, training):
    with tf.name_scope(name):
        inputs = conv_block(inputs, 6, training, 'layer1', kernel_size=5)
        inputs = pooling(inputs, 'pooling1')

        inputs = conv_block(inputs, 16, training, 'layer2', kernel_size=5)
        inputs = pooling(inputs, 'pooling2')
        inputs = tf.layers.flatten(inputs, 'flatten')
        inputs = fc(inputs, 120, 'fc1')
        inputs = fc(inputs, 84, 'fc2')
        inputs = fc(inputs, classes, 'fc3', activation=tf.identity)
        
        return inputs
    
    
def make_lenet(builder):
    class LeNet():
        def __init__(self, image_size, n_chans_img, classes):
            self.image_size = image_size
            self.n_chans_img = n_chans_img
            self.classes = classes

        def build(self, training_ph):
            x_ph = tf.placeholder(
                tf.float32, (None, self.n_chans_img, self.image_size, self.image_size), name='input'
            )

            logits = builder(x_ph, self.classes, 'lenet_2d', training_ph)
            return [x_ph], logits

    return LeNet


LeNet2D = make_lenet(build_model)

In [155]:
lenet = LeNet2D(28, 1, 10)

## III. Training

Now, we can put all things together and finally start training!

In [162]:
from dpipe.tf.model import TFModel, TFFrozenModel
from dpipe.tf.utils import get_tf_optimizer, softmax, softmax_cross_entropy
from functools import partial

Create optimizer function:

In [163]:
optimizer = partial(
    get_tf_optimizer,
    tf_optimizer_name='AdamOptimizer',
    beta1=0.899
)

Initialize a computation graph using `TFModel` class with our `model_core`:

In [158]:
#graph = tf.Graph().as_default()
tf.reset_default_graph()
model = TFModel(lenet, softmax, softmax_cross_entropy, optimizer)

Import function, which wraps the training / validation process:

*Source: https://github.com/neuro-ml/deep_pipe/blob/develop/dpipe/train/train.py*

In [166]:
from dpipe.train.train import train_base
# here you may need to install `tensorboard_easy` package: https://pypi.python.org/pypi/tensorboard-easy/0.2.3
# via pip: `pip install tensorboard-easy`

from dpipe.train.lr_policy import Constant

train_base?

Create `validator` function, which will be invoked each time when the epoch is done!

In [165]:
from dpipe.train.validator import validate
from sklearn.metrics import accuracy_score


def accuracy_metric(y_true, y_one_hot):
    y_pred = np.array(y_one_hot).squeeze().argmax(axis=1)
    return accuracy_score(y_true, y_pred)


val_metrics = {}
val_metrics['accuracy'] = accuracy_metric


def validate_fn(x, y):
    try:
        l = len(y)
    except TypeError:
        x = x[np.newaxis, ...]
        y = np.array([y])
    return model.do_val_step(x, y)
  
    
validator = partial(
    validate,
    load_x=mnist.load_image,
    load_y=mnist.load_label,
    ids=mnist.ids[16*3000:],
    metrics=val_metrics,
    validate_fn=validate_fn
)

Run `train` function for 10 epochs with constant learning rate, estimating the validation loss and accuracy:

In [161]:
n_epochs = 10
lr_init = 1e-3
train_base(model, batch_iter, n_epochs, lr_policy=Constant(lr_init), log_path='logs/', validator=validator)

We can open `tensorboard` and check that our losses are decreasing.