Logging the training progress
=============================

This example shows how to use the `pymia.evaluation` package to log the training progress in deep learning projects.
The Jupyter notebook can be found at `./examples/evaluation/logging_torch.ipynb`.

<div class="alert alert-info">

Note

To be able to run this example:

- Get the example data by executing `./examples/example-data/pull_example_data.py`.
- Install torch (`pip install torch`)
- Install tensorboard (`pip install tensorboard`)
- You should have a basic understanding of the `pymia.data` package, see the example "Data extraction and assembling".

</div>

Import the required modules.

In [1]:
import numpy as np
import pymia.data.assembler as assm
import pymia.data.backends.pytorch as pymia_torch
import pymia.data.conversion as conv
import pymia.data.definition as defs
import pymia.data.extraction as extr
import pymia.data.transformation as tfm
import pymia.evaluation.metric as metric
import pymia.evaluation.evaluator as eval_
import pymia.evaluation.writer as writer
import torch
import torch.nn as nn
import torch.utils.data as torch_data
import torch.utils.tensorboard as tensorboard

In this example, we show how to log predictions of segmentations of a neural network against a reference ground truth. Common metrics in medical image segmentation are the Dice coefficient, an overlap-based metric, and the Hausdorff distance, a distance-based metric. Further, we also evaluate the volume similarity, a metric that does not consider the spatial overlap.

In [3]:
metrics = [metric.DiceCoefficient(), metric.HausdorffDistance(percentile=95, metric='HDRFDST95'), metric.VolumeSimilarity()]

Now, we need to define the labels we want to log during the training. In the provided example data, we have five labels for different brain structures. But we are only interested in three of them: white matter, grey matter, and the thalamus.

In [4]:
labels = {1: 'WHITEMATTER',
          2: 'GREYMATTER',
          5: 'THALAMUS'
          }

Using the metrics and labels, we can initialize the evaluator.

In [5]:
evaluator = eval_.SegmentationEvaluator(metrics, labels)

The evaluator will return results for all subjects in the dataset. However, we would like to log only statistics like the mean and the standard deviation of the metrics among all subjects. Therefore, we initialize a statistics aggregator.

In [None]:
functions = {'MEAN': np.mean, 'STD': np.std}
statistics_aggregator = writer.StatisticsAggregator(functions=functions)

The [TensorBoard](https://www.tensorflow.org/tensorboard) is commonly used to visualize the training in deep learning. PyTorch provides a module to log to the TensorBoard, which we will use.

In [6]:
log_dir = '../example-data/log'
tb = tensorboard.SummaryWriter(log_dir)

We now initialize the data handling, please refer to the above mentioned example to understand what is going on.

In [7]:
hdf_file = '../example-data/example-dataset.h5'
transform = tfm.Permute(permutation=(2, 0, 1), entries=(defs.KEY_IMAGES,))
dataset = extr.PymiaDatasource(hdf_file, extr.SliceIndexing(), extr.DataExtractor(categories=(defs.KEY_IMAGES,)), transform)
pytorch_dataset = pymia_torch.PytorchDatasetAdapter(dataset)
loader = torch_data.dataloader.DataLoader(pytorch_dataset, batch_size=2, shuffle=False)

assembler = assm.SubjectAssembler(dataset)
direct_extractor = extr.ComposeExtractor([
    extr.SubjectExtractor(),  # extraction of the subject name for evaluation
    extr.ImagePropertiesExtractor(),  # extraction of image properties (origin, spacing, etc.) for evaluation in physical space
    extr.DataExtractor(categories=(defs.KEY_LABELS,))  # extraction of "labels" entries for evaluation
])

Let's now define a dummy network, which will actually just return a random prediction.

In [None]:
class DummyNetwork(nn.Module):

    def forward(self, x):
        return torch.randint(0, 5, (*x.size()[:-1], 1))

dummy_network = DummyNetwork()
torch.manual_seed(0)  # set seed for reproducibility

We can now start the training loop. We will loop over the samples in our dataset, feed them to the "neural network", and assemble them to back to entire volumetric predictions. As soon as a prediction is fully assembled, it will be evaluated against its reference. We do this evaluation in the physical space, as the spacing might be important for metrics like the Hausdorff distance (distances in mm rather than voxels). At the end of each epoch, we can calculate the mean and standard deviation of the metrics among all subjects in the dataset, and log them to the TensorBoard.
Note that this example is just for illustration because usually you would want to log the performance on the validation set.

In [8]:
nb_batches = len(loader)

for epoch in range(10):
    for i, batch in enumerate(loader):
        # get the data from batch and predict
        x, sample_indices = batch[defs.KEY_IMAGES], batch[defs.KEY_SAMPLE_INDEX]
        prediction = dummy_network(x)

        # translate the prediction to numpy and back to (B)HWC (channel last)
        numpy_prediction = prediction.numpy().transpose((0, 2, 3, 1))

        # add the batch prediction to the assembler
        is_last = i == nb_batches - 1
        assembler.add_batch(numpy_prediction, sample_indices.numpy(), is_last)

        # Process the subjects/images that are fully assembled
        for subject_index in assembler.subjects_ready:
            subject_prediction = assembler.get_assembled_subject(subject_index)

            # Extract the target and image properties via direct extract
            direct_sample = dataset.direct_extract(direct_extractor, subject_index)
            reference, image_properties = direct_sample[defs.KEY_LABELS], direct_sample[defs.KEY_PROPERTIES]

            # convert prediction and reference back to SimpleITK images, as we want to use the physical spacing for the Hausdorff distance metric
            prediction_image = conv.NumpySimpleITKImageBridge.convert(subject_prediction, image_properties)
            reference_image = conv.NumpySimpleITKImageBridge.convert(reference, image_properties)

            # evaluate the prediction against the reference
            evaluator.evaluate(prediction_image, reference_image, direct_sample[defs.KEY_SUBJECT])

    # calculate mean and standard deviation of each metric
    results = statistics_aggregator.calculate(evaluator.results)
    # log to TensorBoard into category train
    for result in results:
        tb.add_scalar(f'train/{result.metric}-{result.id_}', result.value, epoch)

    # clear results such that the evaluator is ready for the next evaluation
    evaluator.clear()

TypeError: 'int' object is not iterable

You can now start the TensorBoard and point the location to the log directory:

.. code-block:: bash

    cd <path_to_pymia>/examples/example-data/log
    tensorboard --logdir=.

Open a browser and type `localhost:6006` to see the logged training progress.