# Deep Learning and Medical Imaging - Radiology

## Problem

We want to classify an image (patch) from an MR scan into one of 2 categories, {non-tumor,tumor}. Given such a classifier we could run it over all the image patches in an image to get a segmentation mask.

To do this we frame the problem as trying to estimate the conditional probabillity of the class given the image pixels, $f(x)=p(y|x)$, where $f(x)$ is the function we are trying to learn. For $f(x)$ to be a valid probabillity distribution we only require that $f_{i}(x)\ge 0$ and that $\sum_{i=0}^n f_{i} = 1$. A common trick in machine learning to convert any function into a probabillity distribution is to define $g_{i}(x) = \frac{e^{x_{i}}}{\sum_{j=0}^n e^{x_{j}}}$, $g$ is called the softmax function. By applying the softmax to a vector valued function we end up with a valid probabillity distribution.
We will use this to learn an "unconstrained" real valued function and apply a softmax to convert the output into a probabillity distribution.

## Imports and data loading

In [None]:
# Run the below lines to download and unpack the data when running in Colab
!wget -O data.tar https://github.com/eseaflower/cmiv-ai-course/raw/master/data.tar
!tar -xvf data.tar

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Activation, BatchNormalization, Dropout, Dense, Flatten
from tensorflow.keras.regularizers import l2
import tensorflow.keras.backend as K
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.datasets import mnist
from tensorflow.keras.callbacks import Callback
import tensorflow as tf

def get_device():
    device_string = '/cpu:0'
    gpu=0 # Set to None to avoid using the GPU
    if gpu is not None:
        device_string='/device:GPU:{0}'.format(gpu)
    return tf.device(device_string)

#def _init_keras():
    # Setup the session to dynamically allocate memory
#    config = tf.ConfigProto()
#    config.gpu_options.allow_growth = True
#    session = tf.Session(config=config)
#    K.set_session(session)

# Init the default session to be used by Keras
#_init_keras()


In [None]:
import os
import glob
import matplotlib.image
from sklearn.model_selection import train_test_split

def load_data():
    rootdir = os.path.abspath(os.getcwd())
    datadir = os.path.join(os.path.join(rootdir, "data"), "MR")
    pos_pattern = os.path.join(os.path.join(datadir,"positive_images"), "*.jpg")
    neg_pattern = os.path.join(os.path.join(datadir,"negative_images"), "*.jpg")
    pos_filenames = list(glob.glob(pos_pattern))
    neg_filenames = list(glob.glob(neg_pattern))

    pos_images = [matplotlib.image.imread(fname) for fname in pos_filenames]
    neg_images = [matplotlib.image.imread(fname) for fname in neg_filenames]
    X = np.vstack([np.array(pos_images, dtype=np.float32), np.array(neg_images, dtype=np.float32)])
    y = np.array([1]*len(pos_images) + [0]*len(neg_images), dtype=np.int32)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)
    
    return (X_train, y_train), (X_test, y_test)

In [None]:
# This will download data and cache it.
(X_train_orig, y_train_orig), (X_test_orig, y_test_orig) = load_data()

Check the size of the training and test sets, aswell as the dimension of each array

In [None]:
print(X_train_orig.shape, y_train_orig.shape, X_test_orig.shape, y_test_orig.shape)

We can visualize the image patches by plotting some of them

In [None]:
def plot_patches(X, y, y_true=None, to_plot=None):    
    to_plot = to_plot or len(X)
    plt.figure(figsize=(16,8))
    for i in range(to_plot):
        plt.subplot(1, to_plot, i+1)
        plt.imshow(X[i].reshape((32, 32)), interpolation='nearest', cmap='gray')
        plt.text(0, 0, y[i], color='black', 
                 bbox=dict(facecolor='white', alpha=1))
        if y_true is not None:
            plt.text(0, 32, y_true[i], color='black', 
                     bbox=dict(facecolor='white', alpha=1))
            
        plt.axis('off')
    plt.show()

In [None]:
plot_patches(X_train_orig, y_train_orig, to_plot=10)

## Train a model

*First we need to run some code for support, you do not need to understand it*

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Activation, BatchNormalization, Dropout, Dense, Flatten
from tensorflow.keras.regularizers import l2
import tensorflow.keras.backend as K
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.datasets import mnist
from tensorflow.keras.callbacks import Callback
import tensorflow as tf
import os
import glob
import matplotlib.image

def get_device():
    device_string = '/cpu:0'
    gpu=0 # Set to None to avoid using the GPU
    if gpu is not None:
        device_string='/device:GPU:{0}'.format(gpu)
    return tf.device(device_string)

def load_data():
    rootdir = os.path.abspath(os.getcwd())
    datadir = os.path.join(os.path.join(rootdir, "data"), "MR")
    pos_pattern = os.path.join(os.path.join(datadir,"positive_images"), "*.jpg")
    neg_pattern = os.path.join(os.path.join(datadir,"negative_images"), "*.jpg")
    pos_filenames = list(glob.glob(pos_pattern))
    neg_filenames = list(glob.glob(neg_pattern))

    pos_images = [matplotlib.image.imread(fname) for fname in pos_filenames]
    neg_images = [matplotlib.image.imread(fname) for fname in neg_filenames]
    X = np.vstack([np.array(pos_images, dtype=np.float32), np.array(neg_images, dtype=np.float32)])
    y = np.array([1]*len(pos_images) + [0]*len(neg_images), dtype=np.int32)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)
    
    return (X_train, y_train), (X_test, y_test)

def crossentropy_logits(y_true, y_pred):
    return K.categorical_crossentropy(y_true, y_pred, from_logits=True)

def to_tensors(X, y):
    return X[:, :, :, np.newaxis], to_categorical(y, num_classes=2)

def plot_patches(X, y, y_true=None, to_plot=None):    
    to_plot = to_plot or len(X)
    plt.figure(figsize=(16,8))
    for i in range(to_plot):
        plt.subplot(1, to_plot, i+1)
        plt.imshow(X[i].reshape((32, 32)), interpolation='nearest', cmap='gray')
        plt.text(0, 0, y[i], color='black', 
                 bbox=dict(facecolor='white', alpha=1))
        if y_true is not None:
            plt.text(0, 32, y_true[i], color='black', 
                     bbox=dict(facecolor='white', alpha=1))
            
        plt.axis('off')
    plt.show()
    
class LogCallback(Callback):            
    def on_epoch_end(self, epoch, logs=None):                                        
        print("{}: L: {:.7} A: {:.7} VL: {:.7} VA: {:.7}".format(epoch,                                                                            
                                                                 logs['loss'], 
                                                                 logs['accuracy'], 
                                                                 logs['val_loss'], 
                                                                 logs['val_accuracy']))
        
def Conv(n):
    return Conv2D(n, (3, 3), padding="same", activation='relu')

def ConvMax(n):
    return lambda x: MaxPooling2D(pool_size=(2, 2))(Conv(n)(x))

real_dense = Dense
def Dense(n):
    return real_dense(n, activation='relu')

To train the model we need to:
- Create the model
- Decide on a loss function
- Iteratively optimize the loss with respect to the model parameters
- (Visualize the training and result)

In [None]:
def run_experiment(model, optimizer, epochs):

    (X_train_orig, y_train_orig), (X_test_orig, y_test_orig) = load_data()
    plot_patches(X_train_orig, y_train_orig, to_plot=10)

    X_train, y_train = to_tensors(X_train_orig, y_train_orig)
    X_test, y_test = to_tensors(X_test_orig, y_test_orig)

    with get_device():
        model.compile(optimizer=optimizer, loss = crossentropy_logits, metrics=['accuracy'])

        n_train = -1
        logs = model.fit(X_train[:n_train], y_train[:n_train], 
                        validation_split=0.3, 
                        epochs=epochs,
                        verbose=0,
                        callbacks=[LogCallback()])
    plt.plot(logs.history['accuracy'], c='r', label='Train')
    plt.plot(logs.history['val_accuracy'], c='g', label='Validation')
    plt.legend()
    plt.show()

    with get_device():
        # Predict on test data
        y_proba_test = model.predict(X_test)
    y_pred_test = np.argmax(y_proba_test, axis=-1)
    y_true = np.argmax(y_test, axis=-1)
    errors = y_pred_test != y_true
    # Compute the accuracy    
    print("Accuracy: {}".format(1.0-np.mean(errors)))

    # Plot the first examples.
    to_evaluate = 15
    X_eval = X_test[:to_evaluate]    
    y_eval = y_pred_test[:to_evaluate]
    plot_patches(X_eval, y_eval, y_true=y_true[:to_evaluate])

    # Plot the first error examples
    X_eval = X_test[np.where(errors)][:to_evaluate]
    y_eval = y_pred_test[np.where(errors)][:to_evaluate]
    plot_patches(X_eval, y_eval)

## CNN

When dealing with images we often use a hypothesis of stationarity in the image, pixels in a neighbourhood are correlated simliarilly across the entire image, the absolute coordinate (x,y) of a pixel does not influence its distribution. Using this hypothesis we can share weights across the image, thus reducing the total nbumber of parameters that need to be learned. This is the idea behind Convolutional Neural Networks.

In CNNs the parameters of the model is convolved across the image to produce feature maps. The feature maps produced in one layer can be used as input to another convolution layer in the same way as layers are stacked in a feed-forward network. To introduce some translation invariance into the model the output feature maps are pooled at certain stages. This effectively increases the receptive field of later convolutions, allowing them to "see" a larger part of the input.

Here we define the input to output relationship that is our neural network.

We need to decide:
- Input/output data dimensions
- Number of hidden layers
- Number of neurons in each layer
- Activation function

In [None]:
def build_cnn_model(cnn_backbone, mlp_top=[]):
    # Images are 1 channel and 32x32
    input = Input(shape=(32, 32, 1))
    x = input
    for layer in cnn_backbone:
        x = layer(x)
    x = Flatten()(x)
    for layer in mlp_top:
        x = layer(x)
    x = real_dense(2)(x)
    return Model(input, x)

## Experiment

We can experiment with a number of parameters for our learning problem:

Architecture:
- Change the number of hidden nodes `Conv(64)`
- Change the number of layers `[Conv(32), Conv(64)]`
- Add downsampling `[ConvMax(32), ConvMax(64)]` - `ConvMax` adds both a `Conv` and `MaxPool` layer.
- Add normalization `[ConvMax(32), Dropout(0.2), ConvMax(64)]`
- (*Extra*) - Add a MLP classifier at the end `[Dropout(0.2), Dense(256)]`

Optimizer:
- We can try another optimizer `optimizer = Adam()`

Time:
- We can increase the number of steps the optimizer takes (how far we run) `epochs=30`


In [None]:
model = build_cnn_model([Conv(32)])
#model = build_cnn_model([Conv(32), Conv(64)])
#model = build_cnn_model([ConvMax(32), ConvMax(64)], [Dropout(0.2), Dense(256)])

optimizer = SGD()
#optimizer = Adam()

epochs = 5
#epochs = 30

# Run the experiment with the above specified model and parameters.
run_experiment(model, optimizer, epochs)