Basic evaluation of results
===========================

This example shows how to use the :mod:`pymia.evaluation` package to evaluate results.
The Jupyter notebook can be found at `./examples/evaluation/basic.ipynb`.

.. note::
   To be able to run this example:

    * Get the example data by executing `./examples/example-data/pull_example_data.py`.
    * Install pandas (`pip install pandas`)

Import the required modules.

In [1]:
import glob
import os

import numpy as np
import pymia.evaluation.metric as metric
import pymia.evaluation.evaluator as eval_
import pymia.evaluation.writer as writer
import SimpleITK as sitk

Define the paths to the data and the result CSV files.

In [2]:
data_dir = '../example-data'

result_file = '../example-data/results.csv'
result_summary_file = '../example-data/results_summary.csv'

In this example, we show how to evaluate segmentations against a reference ground truth. Common metrics in medical image segmentation are the Dice coefficient, an overlap-based metric, and the Hausdorff distance, a distance-based metric. Further, we also evaluate the volume similarity, a metric that does not consider the spatial overlap.

In [3]:
metrics = [metric.DiceCoefficient(), metric.HausdorffDistance(percentile=95), metric.VolumeSimilarity()]

Now, we need to define the labels we want to evaluate. In the provided example data, we have TODO labels for different brain structures.

In [4]:
labels = {1: 'DUMMY-LABEL',  # todo(fabianbalsiger): adapt labels to example
          }

Finally, we can initialize the evaluator with the metrics and labels.

In [5]:
evaluator = eval_.SegmentationEvaluator(metrics, labels)

We can now loop over the subjects of the example data. We will load the ground truth image as reference. An artificial segmentation (prediction) is created by eroding the ground truth. Both images, and the subject identifier are passed to the evaluator.

In [6]:
# get subjects to evaluate
subject_dirs = [subject for subject in glob.glob(os.path.join(data_dir, '*')) if os.path.isdir(subject)]

for subject_dir in subject_dirs:
    subject_id = os.path.basename(subject_dir)
    print(f'Evaluating {subject_id}...')

    # load ground truth image and create artificial prediction by erosion
    ground_truth = sitk.ReadImage(os.path.join(subject_dir, f'{subject_id}_GT.mha'))
    prediction = sitk.BinaryErode(ground_truth)  # todo(fabianbalsiger): handle multi label dummy data (BRATS)

    # evaluate the "prediction" against the ground truth
    evaluator.evaluate(prediction, ground_truth, subject_id)

Evaluating Subject_1...
Evaluating Subject_2...
Evaluating Subject_3...
Evaluating Subject_4...


After we evaluated all subjects, we can use a CSV writer to write the evaluation results to a CSV file.

In [7]:
writer.CSVWriter(result_file).write(evaluator.results)

Further, we can use a console writer to display the results in the console.

In [8]:
print('\nSubject-wise results...')
writer.ConsoleWriter().write(evaluator.results)


Subject-wise results...
SUBJECT    LABEL        DICE   HDRFDST  VOLSMTY
Subject_1  DUMMY-LABEL  0.642  6.708    0.642  
Subject_2  DUMMY-LABEL  0.654  6.000    0.654  
Subject_3  DUMMY-LABEL  0.641  6.164    0.641  
Subject_4  DUMMY-LABEL  0.649  6.000    0.649  


We can also report statistics such as the mean and standard deviation among all subjects using dedicated statistics writers. Note that you can pass any functions that take a list of floats and return a scalar value to the writers. Again, we will write a CSV file and display the results in the console.

In [9]:
functions = {'MEAN': np.mean, 'STD': np.std}
writer.CSVStatisticsWriter(result_summary_file, functions=functions).write(evaluator.results)
print('\nAggregated statistic results...')
writer.ConsoleStatisticsWriter(functions=functions).write(evaluator.results)


Aggregated statistic results...
LABEL        METRIC   STATISTIC  VALUE
DUMMY-LABEL  DICE     MEAN       0.647
DUMMY-LABEL  DICE     STD        0.005
DUMMY-LABEL  HDRFDST  MEAN       6.218
DUMMY-LABEL  HDRFDST  STD        0.291
DUMMY-LABEL  VOLSMTY  MEAN       0.647
DUMMY-LABEL  VOLSMTY  STD        0.005


Finally, we clear the results in the evaluator such that the evaluator is ready for the next evaluation.

In [10]:
evaluator.clear()

Now, let us have a look at the saved result CSV file.

In [11]:
import pandas as pd

pd.read_csv(result_file)

Unnamed: 0,SUBJECT;LABEL;DICE;HDRFDST;VOLSMTY
0,Subject_1;DUMMY-LABEL;0.6420209637040026;6.708...
1,Subject_2;DUMMY-LABEL;0.6542393455768277;6.0;0...
2,Subject_3;DUMMY-LABEL;0.6412505192785168;6.164...
3,Subject_4;DUMMY-LABEL;0.6492026923399021;6.0;0...


And also at the saved statistics CSV file.

In [12]:
pd.read_csv(result_summary_file)


Unnamed: 0,LABEL;METRIC;STATISTIC;VALUE
0,DUMMY-LABEL;DICE;MEAN;0.6466783802248124
1,DUMMY-LABEL;DICE;STD;0.005354753771402503
2,DUMMY-LABEL;HDRFDST;MEAN;6.218154483867086
3,DUMMY-LABEL;HDRFDST;STD;0.29078310604924873
4,DUMMY-LABEL;VOLSMTY;MEAN;0.6466783802248123
5,DUMMY-LABEL;VOLSMTY;STD;0.005354753771402463
