## Lung and Colon Histopathology Classification using ResNet-50 V2


The purpose of this notebook is to train an image classification model to identify carcinomas in histopathological images. 

The model has been trained in the [Lung and Colon Cancer Histopathological Images](https://www.kaggle.com/datasets/andrewmvd/lung-and-colon-cancer-histopathological-images) dataset. The model identifies three different classes: `Benign`, `Adenocarcinoma`, `Squamous Cell Carcinoma`. The idea behind this decision is to make a model capable of generalize the diagnostic regardless the organ of the sample. 

In order to improve the accuracy of the model, I have used transfer learning with the [ResNet-50 V2](https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/5) model trained on Imagenet. This decision is based on the article [Pathologist-level Classification of Histologic Patterns on Resected Lung Adenocarcinoma Slides with Deep Neural Networks](https://arxiv.org/pdf/1901.11489v1.pdf) written by Jason W. Wei et al., where the authors recommend to use `ResNet-18` trained with Imagenet dataset to classify histopathological images. 

## 1. Setup


In [None]:
import os
import sys
import numpy as np
import pandas as pd
from contextlib import ExitStack
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

import tensorflow as tf
from tensorflow import keras
from keras import (
    Sequential,
    layers,
    optimizers,
    regularizers,
    callbacks,
    metrics,
    losses,
    activations,
)
import tensorflow_hub as hub
from tensorflow.train import Example, Feature, Features, BytesList, Int64List, FloatList

In [None]:
BATCH_SIZE = 256
AUTOTUNE = tf.data.AUTOTUNE
SHUFFLE_BUFFER = 1000
DATASET_PATH = tf.io.gfile.join(os.environ["DATA_PATH"], "lung_colon_histopathology")
RAW_DATASET_PATH = tf.io.gfile.join(DATASET_PATH, "raw_data")
TFRECORDS_PATH = tf.io.gfile.join(DATASET_PATH, "tfrecord_data")
MODEL_PATH = os.path.join("..", "..", "..", "models", "chapter_14", "lung_colon_histopathology")
SEED = 1992

# List of all the diagnostics
TYPE_TISSUE = {
    0: "Benign",
    1: "Adenocarcinoma",
    2: "Squamous Cell Carcinoma",
}

NAME_CLASSES = ["Benign", "Adenocarcinoma", "Squamous Cell Carcinoma"]

IMG_SIZE = (768, 768)
IMG_CHANNELS = 3
NEW_IMG_SIZE = [448, 448, 3]

# Model Url
MODEL_URL = "https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/5"

### 1.1 Helper Functions

In [None]:
def exponential_decay_with_warmup(
    lr_start=1e-4,
    lr_max=1e-3,
    lr_min=1e-5,
    lr_rampup_epochs=4,
    lr_sustain_epochs=1,
    lr_exp_decay=0.8,
):
    """Implements exponential decay learning rate with warm up.
    
    example:
        lr_function = exponential_decay_wtih_warmup()
        lr_cb = tf.keras.callbacks.LearningRateScheduler(lr_function)

    Args:
        lr_start (float, optional): Initial value of the learning rate. Defaults to 0.0001.
        lr_max (float, optional): Maximum value of the learning rate. Defaults to 0.0001.
        lr_min (float, optional): Minimum value of the learning rate. Defaults to 0.00001.
        lr_rampup_epochs (int, optional): Number of epochs that the learning rate will increase up to lr_max. Defaults to 4.
        lr_sustain_epochs (int, optional): Number of epochs the learning rate will be equal to lr_max. Defaults to 1.
        lr_exp_decay (float, optional): Factor in which the learning rate will decay. Defaults to 0.8.
    """

    def exponential_decay_fn(epoch):
        if epoch < lr_rampup_epochs:
            lr = (lr_max - lr_start) / lr_rampup_epochs * epoch + lr_start
        elif epoch < (lr_rampup_epochs + lr_sustain_epochs):
            lr = lr_max
        else:
            lr = (lr_max - lr_min) * lr_exp_decay ** (
                epoch - lr_rampup_epochs - lr_sustain_epochs
            ) + lr_min

        return lr

    return exponential_decay_fn

In [None]:
def balanced_split(dataset, percentages=[0.80, 0.10, 0.10], verbose=False):
    """
    Split a given dataset into three datasets, defined by percentages, according to the classes in the dataset.

    Args:
        dataset (tf.data.Dataset): The dataset to be splitted.
        percentages (List[float], optional): A list with 3 elements that defines the percentage of the dataset
            that will be used for each of the sets. The elements must be floats in the range [0, 1] and
            must sum up to 1. Defaults to [0.80, 0.10, 0.10].
        verbose (bool, optional): A boolean that defines whether the function will print the split for each
            class and the final split. Defaults to False.

    Returns:
        tuple: A tuple with three datasets, corresponding to the training, validation and testing sets, respectively.
    """
    # Obtain the different classes in the datasets and sort the list
    list_classes = dataset.map(lambda x, y: y, num_parallel_calls=_AUTOTUNE).unique()
    list_classes = [class_.numpy() for class_ in list_classes]

    # Initialize the sets to False. This is just to avoid creating
    # a dataset without knowing the dimensions. This will not affect
    # the final dataset since the variable will be totally overwritten
    train_set = False
    valid_set = False
    test_set = False

    # Keep track of the total samples per set
    samples_train = 0
    samples_valid = 0
    samples_test = 0

    for class_ in list_classes:
        # Get the samples that match every class and samples per set
        tmp_dataset = dataset.filter(lambda x, y: y == class_)
        n_samples = len(list(tmp_dataset.as_numpy_iterator()))

        n_valid = int(percentages[1] * n_samples)
        n_test = int(percentages[1] * n_samples)
        n_train = n_samples - n_valid - n_test

        samples_train += n_train
        samples_valid += n_valid
        samples_test += n_test

        # Separate the sets and concatenate to the other classes sets
        tmp_train_set = tmp_dataset.take(n_train)
        tmp_valid_set = tmp_dataset.skip(n_train).take(n_valid)
        tmp_test_set = tmp_dataset.skip(n_train).skip(n_valid)

        train_set = (
            tmp_train_set
            if train_set == False
            else train_set.concatenate(tmp_train_set)
        )
        valid_set = (
            tmp_valid_set
            if valid_set == False
            else valid_set.concatenate(tmp_valid_set)
        )
        test_set = (
            tmp_test_set if test_set == False else test_set.concatenate(tmp_test_set)
        )

        if verbose == True:
            print(f"\tSplit for class {class_} is [{n_train}, {n_valid}, {n_test}]")

    if verbose == True:
        print("\nThe Split has been completed. The final split is the following: ")
        print(
            f"\tTraining Set: {samples_train}\n\tValidation Set:{samples_valid}\n\tTesting Set:{samples_valid}"
        )
    return train_set, valid_set, test_set

In [None]:
def get_logdir(date_type="date", path_folder=None, id_value=None):
    """This function creates the name of a folder for Tensorboard
    logs using the current date or datetime

    Args:
        date_type (str, optional): Format of the second part of the folder name. Options "date", "datetime", "id".
            If selected "id", argument id must be provided. Defaults to "date".
        path_folder (str, optional): String of the path to add before the folder name. Defaults to None.
        id (str, optional): String use to create the name of the folder if option "id" has been selected.
            Defaults to None.

    Returns:
        str: Name of the folder or path of the folder.
    """
    log_dir = datetime.now().strftime("%Y%m%d_%H%M%S") if date_type == "datetype" else datetime.now().strftime("%Y%m%d")
    log_dir = f"run_{log_dir}" if not path_folder else os.path.join(path_folder, f"run_{log_dir}")
    
    log_id = ""
    if date_type == "date":
        log_id = datetime.now().strftime("%Y%m%d")
    elif date_type == "datetime":
        log_id = datetime.now().strftime("%Y%m%d_%H%M%S")
    elif date_type == "id":
        log_id = id_value
    
    log_folder = f"run_{log_id}"
    
    if path_folder:
        log_dir = os.path.join(path_folder, log_folder)
    else:
        log_dir = log_folder

    return log_dir

### 2. Data Pipeline


The images are provided in two separate folders as follows:
<pre>
├── colon_images
│   ├── colon_aca
│   └── colon_n
└── lung_images
    ├── lung_aca
    ├── lung_n
    └── lung_scc</pre>

The images are labeled according to the folders in the second level. This creates only three posible labels as mentioned above. 

The first part of the data pipeline is to load the images, add the labels and save them as TFRecord files. The reason for this is that TensorFlow is more efficient when working with this files, reducing the training time. 

In [None]:
def get_folders_images(filepath):
    """
    This function takes in a filepath as an input. It uses the TensorFlow library's
    `tf.io.gfile.listdir()` function to list the names of all the folders located in 
    the given filepath. It then creates a list of filepaths by joining the filepath 
    with the names of the folders. It then creates a dictionary where the keys are the 
    names of the folders and the values are the corresponding filepaths.

    Args:
        filepath: the filepath where the folders are located

    Returns:
        list_folders: a dictionary where keys are the folder names and values are the 
        corresponding filepaths
    """
    name_folders = tf.io.gfile.listdir(filepath)
    path_folders = [tf.io.gfile.join(filepath, folder) for folder in name_folders]

    list_folders = {folder: path for folder, path in zip(name_folders, path_folders)}
    return list_folders


def create_example(image, tissue_label):
    """
    This function takes in an image and a tissue label as inputs. It serializes the image
    using TensorFlow's `tf.io.serialize_tensor()` function, and creates a feature dictionary 
    with two keys: "image" and "tissue_label".

    Args:
        image: a Tensorflow tensor representing an image
        tissue_label: a Tensorflow tensor representing tissue label of the image

    Returns:
        example: TensorFlow Example object
    """
    image_data = tf.io.serialize_tensor(image)
    feature = {
        "image": Feature(bytes_list=BytesList(value=[image_data.numpy()])),
        "tissue_label": Feature(int64_list=Int64List(value=[tissue_label.numpy()])),
    }

    return Example(features=Features(feature=feature))


def save_protobufs(dataset, type_set="train", n_shards=10):
    """
    This function takes in a dataset, a type of the set (train, valid or test) and the 
    number of shards. It creates a folder with the given type_set name in the TFRECORDS_PATH 
    directory, then creates a list of filepaths by joining the the folder name of the 
    dataset and the file names. The dataset is shuffled if the type_set is train. Then 
    it creates shard number of tfRecord files in the set_folder with the given file names
    and writes the dataset in the created files.

    Args:
        dataset: a TensorFlow Dataset
        type_set: string, either 'train', 'valid' or 'test'
        n_shards: number of TFRecord files to be created

    Returns:
        file_paths: a list of filepaths where the data is saved in the TFRecord format
    """
    set_folder = tf.io.gfile.join(TFRECORDS_PATH, type_set)

    tf.io.gfile.makedirs(set_folder)
    file_paths = [tf.io.gfile.join(set_folder, filepath) for filepath in files]

    if type_set == "train":
        dataset.shuffle(SHUFFLE_BUFFER)
    files = [
        f"{type_set}.tfrecord-{shard.numpy() + 1:02d}-of-{n_shards:02d}"
        for shard in tf.range(n_shards)
    ]

    with ExitStack() as stack:
        writers = [
            stack.enter_context(tf.io.TFRecordWriter(file)) for file in file_paths
        ]
        for index, (image, organ_label, tissue_label) in dataset.enumerate():
            shard = index % n_shards
            example = create_example(image, organ_label, tissue_label)
            writers[shard].write(example.SerializeToString())
    return file_paths


def get_folders_tfrecords(type_set="train"):
    """
    This function takes in a type of the set (train, valid or test) and returns the list 
    of filepaths of the TFRecord files located in the corresponding folder in the 
    TFRECORDS_PATH directory.

    Args:
        type_set: string, either 'train', 'valid' or 'test'

    Returns:
        list_files: a list of filepaths where the data is saved in the tfRecord format
    """
    folder = tf.io.gfile.join(TFRECORDS_PATH, type_set)
    files = tf.io.gfile.listdir(folder)
    list_files = [tf.io.gfile.join(folder, filepath) for filepath in files]
    return list_files


def save_images_protobufs():
    """
    This function creates three datasets (train, valid, test) from the images located in 
    the RAW_DATASET_PATH directory, saves the datasets in tfRecord format, and returns 
    the list of filepaths where the data is saved. If the folder for the TFRecords already
    exists, it only returns the list of filepaths.
    """
    # Verify if the folder for the TFRecords exist to process the data
    if not tf.io.gfile.exists(TFRECORDS_PATH):

        tf.io.gfile.makedirs(TFRECORDS_PATH)

        list_folders = get_folders_images(RAW_DATASET_PATH)

        # Create the dataset per folder, split the dataset into train, valid and test sets
        colon_dataset = tf.keras.preprocessing.image_dataset_from_directory(
            list_folders["colon_images"], batch_size=None, image_size=IMG_SIZE
        )
        train_colon, valid_colon, test_colon = balanced_split(
            colon_dataset
        )

        lung_dataset = tf.keras.preprocessing.image_dataset_from_directory(
            list_folders["lung_images"], batch_size=None, image_size=IMG_SIZE
        )
        train_lung, valid_lung, test_lung = balanced_split(lung_dataset)

        # Combines the datasets into a single one
        train_set = train_colon.concatenate(train_lung)
        valid_set = valid_colon.concatenate(valid_lung)
        test_set = test_colon.concatenate(test_lung)

        train_paths = save_protobufs(train_set, "train")
        valid_paths = save_protobufs(valid_set, "valid")
        test_paths = save_protobufs(test_set, "test")
    else:
        # If the folders exist, only return the list of filepaths
        train_paths = get_folders_tfrecords("train")
        valid_paths = get_folders_tfrecords("valid")
        test_paths = get_folders_tfrecords("test")

    return train_paths, valid_paths, test_paths


In [None]:
train_files, valid_files, test_files = save_images_protobufs()

The second part of the data pipeline is loading the images and preprocessing them. 

The original size of the images is `768x768`. The `ResNet-50` documentation suggest to use `224x224` size images to feed the model. Reducing this much would mean to lose a lot of spatial and resolution information. Instead of directly reducing the size to `224x224`, the images are resized to `448x448` (double the size recommended). This will allow to use a `MaxPool2D` layer to reduce the size without losing that much information of the image.

In [None]:
def get_record(tfrecord):
    """
    This function takes in a tfrecord, parse it using TensorFlow's `tf.io.parse_single_example()` 
    function, and returns the image and tissue label as a tuple.
    The function also parses the tensor, reshapes it and resize it to the desired size.
    
    Args:
        tfrecord: a TensorFlow TFRecord
    
    Returns:
        image: Tensorflow tensor representing the image
        tissue_label: Tensorflow tensor representing the tissue label of the image
    """
    feature_descriptions = {
        "image": tf.io.FixedLenFeature([], tf.string, default_value=""),
        "tissue_label": tf.io.FixedLenFeature([], tf.int64, default_value=-1),
    }

    example = tf.io.parse_single_example(tfrecord, feature_descriptions)
    image = tf.io.parse_tensor(example["image"], out_type=tf.float32)
    image = tf.reshape(image, shape=[IMG_SIZE[0], IMG_SIZE[1], IMG_CHANNELS])
    image = tf.image.resize(image, size=[NEW_IMG_SIZE[0], NEW_IMG_SIZE[1]])
    return image, example["tissue_label"]


def get_dataset(file_paths, cache=False, shuffle_buffer=None):
    """
    This function takes in a list of filepaths, loads the data in the tfRecord format 
    and returns a TensorFlow dataset. The function can also cache the dataset, shuffle 
    it and return it in multi-label format.
    
    Args:
        file_paths: list of filepaths where the data is saved in the TFRecord format
        cache: boolean, whether to cache the dataset or not
        shuffle_buffer: int, the buffer size for shuffling the dataset
    
    Returns:
        dataset: TensorFlow Dataset
    """
    dataset = tf.data.TFRecordDataset(file_paths, num_parallel_reads=AUTOTUNE)

    if cache:
        dataset = dataset.cache()
    if shuffle_buffer:
        dataset = dataset.shuffle(shuffle_buffer)

    dataset = (
        dataset.map(get_record, num_parallel_calls=AUTOTUNE)
        .batch(BATCH_SIZE)
        .prefetch(AUTOTUNE)
    )

    return dataset

In [None]:
train_set = get_dataset(train_files, shuffle_buffer=SHUFFLE_BUFFER)
valid_set = get_dataset(valid_files)
test_set = get_dataset(test_files)

## 3. Training


In [None]:
# Clear the TensorFlow session
tf.keras.backend.clear_session()
tf.random.set_seed(SEED)


The model is trained taking into consideration the following:

- The images are normalized being divided by `255.0`
- The first layer of the model is a max pooling layer to reduce the spatial dimensionality, and match the suggested size of the images.
- The ResNet model is not set as trainable to reduce the training time. 
- In the article mentioned above, they used the learning rate with decay. In this case, I selected learning rate with warmup and exponential decay. 
- The optimizer selected is Adam.


In [None]:
def train_model(
    train_set,
    valid_set,
    learning_rate,
    epochs=100,
    trainable=False,
):
    """
    This function trains a ResNet model on the given train and validation sets.
    The function normalizes the data, loads the ResNet model, compiles the model,
    and sets up callbacks for learning rate scheduling, TensorBoard, model checkpointing, 
    and early stopping. The model is then fit on the train set and validated on the 
    validation set.
    
    Args:
        train_set: TensorFlow dataset for training the model
        valid_set: TensorFlow dataset for validation the model
        learning_rate: float, the initial learning rate for the optimizer
        epochs: int, the number of epochs for training the model
        trainable: boolean, whether to make the base model trainable or not
    
    Returns:
        None
    """
    # Normalization of the data
    def normalize(image, label):
        norm_image = image / 255.0
        return (norm_image, label)

    train_set_normalized = train_set.map(normalize, num_parallel_calls=AUTOTUNE)
    valid_set_normalized = valid_set.map(normalize, num_parallel_calls=AUTOTUNE)

    # Load the ResNet model
    base_model = hub.KerasLayer(
        MODEL_URL,
        trainable=trainable,
    )

    model = Sequential(
        [layers.MaxPool2D(), base_model, layers.Dense(3, activation="softmax")]
    )

    # Compilation of the model
    optimizer_ = optimizers.Adam(learning_rate=learning_rate)
    metrics_ = [
        "accuracy",
    ]
    
    loss_ = "sparse_categorical_crossentropy"

    model.compile(optimizer=optimizer_, metrics=metrics_, loss=loss_)

    # Callbacks
    exponential_decay_fn = ml_learning_rate.exponential_decay_with_warmup(
        lr_start=learning_rate,
        lr_max=learning_rate * 10,
        lr_min=learning_rate / 10,
    )
    lr_scheduler_cb = callbacks.LearningRateScheduler(exponential_decay_fn)

    folder_logs = tf.io.gfile.join(
        "..", "..", "..", "reports", "logs", "chapter_14", "lung_colon_histopathology"
    )
    logdir = get_logdir(path_folder=folder_logs)
    tensorboard_cb = callbacks.TensorBoard(log_dir=logdir, histogram_freq=1)

    model_path = tf.io.gfile.join(MODEL_PATH)
    model_checkpoint_cb = callbacks.ModelCheckpoint(
        filepath=model_path, save_best_only=True
    )

    early_stopping_cb = callbacks.EarlyStopping(patience=5)

    callbacks_list = [
        lr_scheduler_cb,
        tensorboard_cb,
        model_checkpoint_cb,
        early_stopping_cb,
    ]

    # Train the model
    model.fit(
        train_set_normalized,
        validation_data=valid_set_normalized,
        epochs=epochs,
        callbacks=callbacks_list,
    )


In [None]:

train_model(
    train_set,
    valid_set,
    test_set,
    epochs=100,
    learning_rate=5e-4,
    trainable=False,
    lr_function=None
)

The result of the training is shown in the images below:

<h4 align="center">Epochs vs Accuracy</h4>

<img src="https://storage.googleapis.com/mmenendezg-ml-bucket/models/lung_colon_histopathological/training_images/lung_colon_epoch_accuracy.png" alt="Epochs Vs Accuracy" style="width: 70%; margin-left: 15%; margin-right: 15%"/>

In the chart above we see the evolution of the accuracy of the model. We can see in blue the `valid_accuracy`, and in red the `train_accuracy`. At the beginning the `valid_accuracy` is better, however after epoch 11-12 the `train_accuracy` starts improving. This is an expected behavior when training a model, the accuracy will be higher with the training data since that is the dataset used to tune the weights of the model.

After the epoch 11-12 the gap between the `train_accuracy` and the `valid_accuracy` is a little higher, but this decreases with the epochs. 

<h4 align="center">Epochs vs Loss</h4>

<img src="https://storage.googleapis.com/mmenendezg-ml-bucket/models/lung_colon_histopathological/training_images/lung_colon_epoch_loss.png" alt="Epochs Vs Accuracy" style="width: 70%; margin-left: 15%; margin-right: 15%"/>

In the second chart we see the evolution of the loss of the model. The code of color keeps the same, being blue for the `valid_loss` and red for the `train_loss`. The tendency is the same, and we can observe the same phenomenon that with the accuracy. 

Overall, the result of the model is a model that generalizes well, that is not overfitting the training dataset or underfitting it neither. 

We will evaluate the model in the following section.

## 4. Evaluation

This section has as objective the evaluation of the model using different metrics and the confusion matrix.

Tensorflow offers the `evaluate()` method with the `Model` object. In this case, the method returns the `accuracy` and the `loss`, since only one metric was specified. 

However, `tf.keras.metrics` offers the class `Precision` and the class `Recall`. We will use them to evaluate the model and plot the confusion matrix. 

In [None]:
def normalize(image, label):
    """
    Normalize image by dividing each pixel by 255.0
    
    Args:
        image (ndarray): Image data in numpy array format
        label (int): Image label
    
    Returns:
        tuple: Normalized image in numpy array format and label
    """
    norm_image = image / 255.0
    return (norm_image, label)


def plot_confusion_matrix(confusion_matrix, precision, recall, labels=NAME_CLASSES):
    """
    Plot confusion matrix using seaborn heatmap.
    
    Args:
        confusion_matrix (ndarray): Confusion matrix in numpy array format
        precision (float): Precision score
        recall (float): Recall score
        labels (list, optional): List of class labels, defaults to NAME_CLASSES
    """
    plt.figure(figsize=(5, 5))
    sns.heatmap(
        confusion_matrix,
        annot=True,
        cmap="flare",
        cbar=False,
        fmt=".2f",
        xticklabels=labels,
        yticklabels=labels,
    )
    title_plot = f"Confusion Matrix\nPrecision: {precision * 100:.2f}%\nRecall: {recall * 100:.2f}%"
    plt.xticks(rotation=60)
    plt.yticks(rotation=60)
    plt.ylabel("Labels", weight="bold")
    plt.xlabel("Predictions", weight="bold")
    plt.title(title_plot)


def create_confusion_matrix(model_path, dataset):
    """
    Create a confusion matrix for a model given a dataset.
    
    Args:
        model_path (str): File path of the model.
        dataset (tf.data.Dataset): A dataset containing the images and labels
    """
    # Load the model
    model = tf.keras.models.load_model(model_path)
    # Preprocess the images to predict
    dataset_normalized = dataset.map(normalize, num_parallel_calls=AUTOTUNE)
    # Separate the images and the labels
    images = dataset_normalized.map(
        lambda image, label: image, num_parallel_calls=AUTOTUNE
    )
    labels = dataset_normalized.map(
        lambda image, label: label, num_parallel_calls=AUTOTUNE
    )
    labels = [label.numpy() for label in labels.unbatch()]
    # Create the confusion matrix with the predicted labels
    predictions = model.predict(images, verbose=0)
    predicted_classes = tf.argmax(predictions, axis=1).numpy()
    # Calculate the precision and recall of the model
    precision_metric = metrics.Precision()
    precision_metric.update_state(labels, predicted_classes)
    precision = precision_metric.result().numpy()
    
    recall_metric = metrics.Recall()
    recall_metric.update_state(labels, predicted_classes)
    recall = recall_metric.result().numpy()
    
    # Create the confusion matrix
    conf_matrix = tf.math.confusion_matrix(labels, predicted_classes)
    conf_matrix = conf_matrix.numpy() / conf_matrix.numpy().sum(axis=1)[:, np.newaxis]
    
    # Plot the results
    plot_confusion_matrix(conf_matrix, precision, recall)


def evaluate_model(model_path, test_set):
    """
    Evaluate a model given a test set.
    
    Args:
        model_path (str): File path of the model.
        test_set (tf.data.Dataset): A dataset containing the test images and labels
    """
    # Load the model
    model = tf.keras.models.load_model(model_path)
    evaluations = model.evaluate(test_set, verbose=0)
    print(
        f"The evaluation of the model is the following:\n\tLoss: {evaluations[0]:.2f}\n\tAccuracy: {evaluations[1] * 100:.2f}%"
    )


In [None]:
test_set_normalized = test_set.map(normalize, num_parallel_calls=AUTOTUNE)
evaluate_model(MODEL_PATH, test_set_normalized)


In [None]:
create_confusion_matrix(MODEL_PATH, test_set)

The confusion matrix above is normalized, meaning that it shows the percentage of the predictions made. Having 0.95 and above in the main diagonal shows that this model has good accuracy as calculated by the `evaluate()` method: $97.60\%$. But also precision and recall are very high.

Let's remember that in this case the model takes images from both organs and generalize to an unified diagnosis, meaning that this model has learned to generalize well. 

## 5. Making Predictions


The final section of the model is to make predictions on images that the model was not trained on. In this case we will use images from the test set.

The labels have been removed and the images have not been normalized before being fed to the model.

In [None]:
def preprocess_image(image):
    # Normalize the image
    norm_image = image / 255.0

    # Add dimension at the beginning
    norm_image = tf.expand_dims(norm_image, axis=0)
    return norm_image


def make_prediction(dataset, model_path=MODEL_PATH, n_cols=4):
    # Load the model
    model = tf.keras.models.load_model(model_path)
    # Preprocess the images to predict
    dataset_normalized = dataset.map(preprocess_image, num_parallel_calls=AUTOTUNE)
    predictions = model.predict(dataset_normalized, verbose=0)
    predicted_classes = tf.argmax(predictions, axis=1).numpy()

    # Determine the number of images to show
    n_images = len(predicted_classes)
    n_rows = (
        (n_images // n_cols) if (n_images % n_cols) == 0 else (n_images // n_cols) + 1
    )
    idx = 1
    plt.figure(figsize=(5 * n_cols, 5 * n_rows))
    for image, label in zip(dataset_normalized, predicted_classes):
        prob_prediction = predictions[idx - 1][label] * 100
        plt.subplot(n_rows, n_cols, idx)
        plt.imshow(image[0].numpy())
        plt.title(f"Type Tissue: {TYPE_TISSUE[label]}\nProbability: {prob_prediction:.2f}%")
        plt.axis("off")
        idx += 1

In [None]:
preds = (
    test_set.map(lambda image, label: image, num_parallel_calls=AUTOTUNE)
    .unbatch()
    .shuffle(2500)
    .take(15)
)
make_prediction(preds)


We can see that for most of the images, the model generalizes pretty well. The only case where the model is uncertain is the last image. 

Having more images to train the model would be optimal, and in this case maybe doing more augmentation (changing the contrast, or rotating the image) would benefit the model. 