This notebook guides you trough a simple example of colorectal histology images classification using Deep Learning.

We will be using this [Collection of textures in colorectal cancer histology](https://zenodo.org/record/53169#.XjBUGOGCGXn), which contains 5000 histological images of 150 * 150 px each. Each image belongs to exactly one of eight tissue categories (specified by the folder name).

Let's get started by importing the neccessary modules.

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import tensorflow_hub as hub

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers

AUTOTUNE = tf.data.experimental.AUTOTUNE

import pathlib
import IPython.display as display
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import os

Now, set the paths of the images and load the class names.

In [None]:
data_dir = pathlib.Path("Kather_texture_2016_image_tiles_5000/")
image_count = len(list(data_dir.glob('*/*.tif')))
print("There are " + str(image_count) + " images")

CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])
print("The class names are: " + str(CLASS_NAMES))

images_list_example = list(data_dir.glob('07_ADIPOSE/*'))

Display a single image from the list to see how it looks like. Feel free to change the index.

In [None]:
display.display(Image.open(str(images_list_example[1])))

Configure the image loaders. For this, the `ImageDataGenerator` class of Keras is used to define one loader for the training dataset and another one for the validation dataset.

In [None]:
BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224

# The 1./255 is to convert from uint8 to float32 in range [0,1].
image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255, validation_split = 0.2)

train_data_gen = image_generator.flow_from_directory(
    seed=2020,
    directory=str(data_dir),
    batch_size=BATCH_SIZE,
    shuffle=True,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    classes = list(CLASS_NAMES),
    subset='training')

val_data_gen = image_generator.flow_from_directory(
    seed=2020,
    directory=str(data_dir),
    batch_size=BATCH_SIZE,
    shuffle=False,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    classes = list(CLASS_NAMES),
    subset='validation')

We will use Transfer Learning. For this, we will take advantage of the TensorFlow Hub, which distributes models without the top classification layer. These can be used to easily do transfer learning.

All we have to do is to download a model from the TensorFlow Hub and retrain the top layer of the model to recognize the classes in our dataset.

Any [Tensorflow 2 compatible image feature vector URL](https://tfhub.dev/s?module-type=image-feature-vector&q=tf2) from tfhub.dev will work here. Let's use the `MobileNetV2` model trained on the ImageNet dataset.

In [None]:
feature_extractor_url = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/2" #@param {type:"string"}

And now, we create the feature extractor layer using this URL.

In [None]:
feature_extractor_layer = hub.KerasLayer(feature_extractor_url, input_shape=(224,224,3))

If we uncomment the following line, we will freeze the variables in the feature extractor layer, so that the training only modifies the new classifier layer.

In [None]:
# feature_extractor_layer.trainable = False

Now, it is time to wrap the hub layer in a tf.keras.Sequential model, and add a new classification layer. 

In [None]:
model = tf.keras.Sequential([
  feature_extractor_layer,
  layers.Dense(train_data_gen.num_classes, activation='softmax')
])

model.summary()

Train the model. Let's use compile to configure the training process:

In [None]:
model.compile(
  optimizer=tf.keras.optimizers.Adam(),
  loss='categorical_crossentropy',
  metrics=['acc'])

Now use the `.fit` method to train the model.

To keep this example short train a few epochs (e.g. 3). To visualize the training progress, use a custom callback to log the loss and accuracy of each batch individually, instead of the epoch average.

In [None]:
class CollectBatchStats(tf.keras.callbacks.Callback):
  def __init__(self):
    self.batch_losses = []
    self.batch_acc = []

  def on_train_batch_end(self, batch, logs=None):
    self.batch_losses.append(logs['loss'])
    self.batch_acc.append(logs['acc'])
    self.model.reset_metrics()

steps_per_epoch = np.ceil(train_data_gen.samples/train_data_gen.batch_size)

batch_stats_callback = CollectBatchStats()

history = model.fit(train_data_gen, epochs=3,
                              steps_per_epoch=steps_per_epoch,
                              validation_data = val_data_gen, 
                              callbacks = [batch_stats_callback])

Plot loss and accuracy to track the progress of the training.

In [None]:
plt.figure()
plt.ylabel("Loss")
plt.xlabel("Training Steps")
plt.ylim([0,2])
plt.plot(batch_stats_callback.batch_losses)

plt.figure()
plt.ylabel("Accuracy")
plt.xlabel("Training Steps")
plt.ylim([0,1])
plt.plot(batch_stats_callback.batch_acc)

Finally, use the trained model to check the predictions on the validation set. To do this, the `val_data_gen` is created again so that we can iterate over all the images.

The code bellow iterates trough each group of 32 (`BATCH_SIZE`) validation images, printing them together with the predicted labels in green if the predictions are right and in red if the predictions are wrong.

To have the example finished early, only two groups are printed. Uncomment the line that says `if batch_count == len(val_data_gen):` in order to print them all.

In [None]:
val_data_gen = image_generator.flow_from_directory(
    seed=2020,
    directory=str(data_dir),
    batch_size=BATCH_SIZE,
    shuffle=False,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    classes = list(CLASS_NAMES),
    subset='validation')


class_names = sorted(val_data_gen.class_indices.items(), key=lambda pair:pair[1])
class_names = np.array([key.title() for key, value in class_names])
class_names

batch_count = 0
for image_batch, label_batch in val_data_gen:
    batch_count = batch_count + 1;
    
    predicted_batch = model.predict(image_batch)
    predicted_id = np.argmax(predicted_batch, axis=-1)
    predicted_label_batch = class_names[predicted_id]

    label_id = np.argmax(label_batch, axis=-1)

    plt.figure(figsize=(10,19))
    plt.subplots_adjust(hspace=0.8)
    for n in range(image_batch.shape[0]):
        plt.subplot(8,4,n+1)
        plt.imshow(image_batch[n])
        color = "green" if predicted_id[n] == label_id[n] else "red"
        title = predicted_label_batch[n].title() if predicted_id[n] == label_id[n] else predicted_label_batch[n].title() + "(" + class_names[label_id[n]] + ")"
        plt.title(predicted_label_batch[n].title(), color=color)
        plt.axis('off')
        _ = plt.suptitle("Model predictions (green: correct, red: incorrect) for batch " + str(batch_count))
    plt.tight_layout(rect=[0, 0.03, 1, 0.95])
    plt.show()

    # if batch_count == len(val_data_gen):
    if batch_count == 2:
        break