# WORC Tutorial: Simple

Welcome to the tutorial of WORC: a Workflow for Optimal Radiomics Classification! It will provide you with basis knowledge and practical skills on how to run the WORC. For advanced topics and WORCflows, please see the other notebooks provided with this tutorial. For installation details, see the ReadMe.md provided with this tutorial.


This tutorial interacts with  WORC through SimpleWORC and is especially suitable for first time usage. We first do some neccesary imports.

In [1]:
# import neccesary packages
from WORC import SimpleWORC
import os

# These packages are only used in analysing the results
import pandas as pd
import json
import fastr
import glob

# If you don't want to use your own data, we use the following example set,
# see also the next code block in this example.
from WORC.exampledata.datadownloader import download_HeadAndNeck

# Define the folder this script is in, so we can easily find the example data
script_path = os.getcwd()

# NOTE: If on Google Colab, uncomment this line
# script_path = os.path.join(script_path, 'WORCTutorial')

# Determine whether you would like to use WORC for binary_classification,
# multiclass_classification or regression
modus = 'binary_classification'

[INFO] LightGCM classifier currently not available. Please see https://worc.readthedocs.io/en/latest/static/additionalfunctionality.html.
[INFO] Bayesian optimization through SMAC functionality currently not available. Please see https://worc.readthedocs.io/en/latest/static/additionalfunctionality.html.




---------------------------------------------------------------------------
Input
---------------------------------------------------------------------------
The minimal inputs to WORC are:
  - Images
  - Segmentations
  - Labels

In SimpleWORC, we assume you have a folder "datadir", in which there is a
folder for each patient, where in each folder there is a image.nii.gz and a mask.nii.gz:
 * Datadir
     * Patient_001
         * image.nii.gz
         * mask.nii.gz
     * Patient_002
         * image.nii.gz
         * mask.nii.gz
     * ...


You can skip this part if you use your own data.
In the example, We will use open source data from the online XNAT platform
at https://xnat.bmia.nl/data/archive/projects/stwstrategyhn1. This dataset
consists of CT scans of patients with Head and Neck tumors. 

In [6]:
# Download a subset of 20 patients in this folder. You can change these if you want.
nsubjects = 10  # use "all" if you want to download all patients.
data_path = os.path.join(script_path, 'Data')
download_HeadAndNeck(datafolder=data_path, nsubjects=nsubjects)

Working on subject 1/137
	Downloading patient HN1331, experiment HN1331_20190402_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
183.6 KiB |#                                                      |   9.5 MiB/s


	Downloading patient HN1331, experiment HN1331_20190402_CT, scan 1_3_6_1_4_1_40744_29_120873174302085915918976213152022016685.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  1.2 MiB | #                                                     |  11.1 MiB/s
  4.2 MiB |  #                                                    |  20.4 MiB/s
  8.2 MiB |   #                                                   |  26.8 MiB/s
 11.9 MiB |    #                                                  |  29.2 MiB/s
 15.9 MiB |     #                                                 |  31.2 MiB/s
 19.5 MiB |      #                                                |  31.9 MiB/s
 21.4 MiB |      #                                                |  31.9 MiB/s


Working on subject 2/137
	Downloading patient HN1519, experiment HN1519_20190402_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
187.2 KiB |#                                                      |  17.1 MiB/s


	Downloading patient HN1519, experiment HN1519_20190402_CT, scan 1_3_6_1_4_1_40744_29_178045394474815118369144065452878666404.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  3.8 MiB | #                                                     |  37.1 MiB/s
  6.5 MiB |  #                                                    |  32.3 MiB/s
  9.0 MiB |   #                                                   |  29.7 MiB/s
 11.7 MiB |    #                                                  |  28.8 MiB/s
 15.2 MiB |     #                                                 |  30.0 MiB/s
 19.1 MiB |      #                                                |  31.4 MiB/s
 19.1 MiB |      #                                                |  31.4 MiB/s


Working on subject 3/137
	Downloading patient HN1088, experiment HN1088_20190402_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
 71.5 KiB | #                                                     | 519.1 KiB/s
209.3 KiB |  #                                                    | 519.1 KiB/s


	Downloading patient HN1088, experiment HN1088_20190402_CT, scan 1_3_6_1_4_1_40744_29_120903286350475892686849765504908597220.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  2.5 MiB | #                                                     |  24.6 MiB/s
  5.2 MiB |  #                                                    |  25.6 MiB/s
  8.1 MiB |   #                                                   |  26.6 MiB/s
 11.4 MiB |    #                                                  |  28.1 MiB/s
 14.9 MiB |     #                                                 |  29.5 MiB/s
 17.8 MiB |      #                                                |  29.3 MiB/s
 20.7 MiB |       #                                               |  29.3 MiB/s


Working on subject 4/137
	Downloading patient HN1260, experiment HN1260_20190402_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
 71.5 KiB | #                                                     | 705.7 KiB/s
135.5 KiB |  #                                                    | 639.8 KiB/s
217.0 KiB |  #                                                    | 639.8 KiB/s


	Downloading patient HN1260, experiment HN1260_20190402_CT, scan 1_3_6_1_4_1_40744_29_161797156309701878529360999104102719772.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  3.2 MiB | #                                                     |  31.5 MiB/s
  6.7 MiB |  #                                                    |  33.2 MiB/s
 10.3 MiB |   #                                                   |  33.8 MiB/s
 13.6 MiB |    #                                                  |  33.7 MiB/s
 18.9 MiB |     #                                                 |  37.5 MiB/s
 19.7 MiB |     #                                                 |  37.5 MiB/s


Working on subject 5/137
	Downloading patient HN1192, experiment HN1192_20190402_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
103.5 KiB | #                                                     | 999.6 KiB/s
209.3 KiB | #                                                     | 999.6 KiB/s


	Downloading patient HN1192, experiment HN1192_20190402_CT, scan 1_3_6_1_4_1_40744_29_76038811511814195873478246519752470711.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  5.0 MiB | #                                                     |  49.8 MiB/s
  9.4 MiB |  #                                                    |  46.9 MiB/s
 13.2 MiB |   #                                                   |  43.8 MiB/s
 16.6 MiB |    #                                                  |  41.2 MiB/s
 19.0 MiB |     #                                                 |  37.7 MiB/s
 19.8 MiB |     #                                                 |  37.7 MiB/s


Working on subject 6/137
	Downloading patient HN1501, experiment HN1501_20190403_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
135.5 KiB | #                                                     |   1.1 MiB/s
199.5 KiB |  #                                                    | 890.0 KiB/s
327.5 KiB |   #                                                   | 884.0 KiB/s
423.5 KiB |    #                                                  | 877.1 KiB/s
520.6 KiB |     #                                                 | 877.1 KiB/s


	Downloading patient HN1501, experiment HN1501_20190403_CT, scan 1_3_6_1_4_1_40744_29_294587997382261494217023015100501023442.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  2.6 MiB | #                                                     |  25.4 MiB/s
  6.1 MiB |  #                                                    |  30.1 MiB/s
  8.3 MiB |   #                                                   |  27.3 MiB/s
 10.9 MiB |    #                                                  |  26.8 MiB/s
 14.2 MiB |     #                                                 |  28.1 MiB/s
 17.8 MiB |      #                                                |  29.3 MiB/s
 20.3 MiB |      #                                                |  29.3 MiB/s


Working on subject 7/137
	Downloading patient HN1259, experiment HN1259_20190402_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
 94.5 KiB |#                                                      |   2.1 MiB/s


	Downloading patient HN1259, experiment HN1259_20190402_CT, scan 1_3_6_1_4_1_40744_29_163029519172728433236936734659203480802.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  2.2 MiB | #                                                     |  21.2 MiB/s
  5.5 MiB |  #                                                    |  26.8 MiB/s
  8.1 MiB |   #                                                   |  26.3 MiB/s
 10.3 MiB |    #                                                  |  25.2 MiB/s
 12.5 MiB |     #                                                 |  24.6 MiB/s
 15.4 MiB |      #                                                |  25.3 MiB/s
 17.8 MiB |       #                                               |  24.8 MiB/s
 17.8 MiB |       #                                               |  24.8 MiB/s


Working on subject 8/137
	Downloading patient HN1372, experiment HN1372_20190402_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
 35.5 KiB | #                                                     | 229.2 KiB/s
103.5 KiB |   #                                                   | 339.9 KiB/s
179.5 KiB |    #                                                  | 414.3 KiB/s
275.5 KiB |     #                                                 | 494.8 KiB/s
403.5 KiB |      #                                                | 593.3 KiB/s
488.1 KiB |       #                                               | 593.3 KiB/s


	Downloading patient HN1372, experiment HN1372_20190402_CT, scan 1_3_6_1_4_1_40744_29_144464477074766979879254219958116907520.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  1.9 MiB | #                                                     |  19.0 MiB/s
  4.1 MiB |  #                                                    |  19.9 MiB/s
  6.7 MiB |   #                                                   |  21.8 MiB/s
  8.7 MiB |    #                                                  |  21.1 MiB/s
 11.1 MiB |     #                                                 |  21.7 MiB/s
 13.3 MiB |      #                                                |  21.4 MiB/s
 16.0 MiB |       #                                               |  22.3 MiB/s
 19.2 MiB |        #                                              |  23.4 MiB/s
 22.4 MiB |         #                                             |  24.2 MiB/s
 23.6 MiB |         #                                             |  24.2 MiB/s


Working on subject 9/137
	Downloading patient HN1560, experiment HN1560_20190402_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
 71.5 KiB | #                                                     | 687.6 KiB/s
134.9 KiB | #                                                     | 687.6 KiB/s


	Downloading patient HN1560, experiment HN1560_20190402_CT, scan 1_3_6_1_4_1_40744_29_131924080354178326530168765579293823568.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  2.5 MiB | #                                                     |  24.8 MiB/s
  4.9 MiB |  #                                                    |  24.6 MiB/s
  7.2 MiB |   #                                                   |  24.0 MiB/s
 10.0 MiB |    #                                                  |  24.9 MiB/s
 12.2 MiB |     #                                                 |  24.3 MiB/s
 14.9 MiB |      #                                                |  24.7 MiB/s
 17.5 MiB |       #                                               |  24.7 MiB/s
 19.1 MiB |       #                                               |  24.7 MiB/s


Working on subject 10/137
	Downloading patient HN1748, experiment HN1748_20190403_CT, scan 1.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
 71.5 KiB |  #                                                    | 300.3 KiB/s
103.5 KiB |   #                                                   | 288.6 KiB/s
167.5 KiB |     #                                                 | 287.8 KiB/s
199.5 KiB |      #                                                | 287.9 KiB/s
231.5 KiB |       #                                               | 290.1 KiB/s
263.5 KiB |         #                                             | 282.9 KiB/s
359.5 KiB |          #                                            | 327.3 KiB/s
391.5 KiB |            #                                          | 325.7 KiB/s
474.7 KiB |            #                                          | 325.7 KiB/s


	Downloading patient HN1748, experiment HN1748_20190403_CT, scan 1_3_6_1_4_1_40744_29_33682544755686290289659265674027017323.
resource is NIFTI


  0.0 B |#                                                          |   0.0 s/B
  2.7 MiB | #                                                     |  27.3 MiB/s
  5.6 MiB |  #                                                    |  27.8 MiB/s
  7.5 MiB |   #                                                   |  25.0 MiB/s
 10.1 MiB |    #                                                  |  25.1 MiB/s
 12.9 MiB |     #                                                 |  25.6 MiB/s
 15.7 MiB |      #                                                |  26.0 MiB/s
 17.6 MiB |       #                                               |  25.0 MiB/s
 18.9 MiB |        #                                              |  23.4 MiB/s
 20.5 MiB |         #                                             |  22.5 MiB/s
 21.6 MiB |         #                                             |  22.5 MiB/s


Done downloading!


Define the inputs of our network

In [7]:
# Identify our data structure: change the fields below accordingly
# if you use your own data.
imagedatadir = os.path.join(data_path, 'stwstrategyhn1')
image_file_name = 'image.nii.gz'
segmentation_file_name = 'mask.nii.gz'

# File in which the labels (i.e. outcome you want to predict) is stated
# Again, change this accordingly if you use your own data.
label_file = os.path.join(data_path, 'Examplefiles', 'pinfo_HN.csv')

# Name of the label you want to predict
if modus == 'binary_classification':
    # Classification: predict a binary (0 or 1) label
    label_name = ['imaginary_label_1']

elif modus == 'regression':
    # Regression: predict a continuous label
    label_name = ['Age']

elif modus == 'multiclass_classification':
    # Multiclass classification: predict several mutually exclusive binaru labels together
    label_name = ['imaginary_label_1', 'complement_label_1']

# Determine whether we want to do a coarse quick experiment, or a full lengthy
# one. Again, change this accordingly if you use your own data.
coarse = True

# Give your experiment a name
experiment_name = 'Example_STWStrategyHN'

# Instead of the default tempdir, let's but the temporary output in a subfolder
# in the same folder as this script
tmpdir = os.path.join(script_path, 'WORC_' + experiment_name)
print(f"Temporary folder: {tmpdir}.")

Temporary folder: /home/lkeb-mgo1/WORCTutorial/WORC_Example_STWStrategyHN.



---------------------------------------------------------------------------
The actual experiment
---------------------------------------------------------------------------

NOTE:  Precomputed features can be used instead of images and masks
by instead using ``I.features_from_this_directory()`` in a similar fashion to below. 

In [8]:
# Create a WORC object
experiment = SimpleWORC(experiment_name)

# Set the input data according to the variables we defined earlier
experiment.images_from_this_directory(imagedatadir,
                                      image_file_name=image_file_name,
                                      is_training=True)
experiment.segmentations_from_this_directory(imagedatadir,
                                             segmentation_file_name=segmentation_file_name,
                                             is_training=True)
experiment.labels_from_this_file(label_file)
experiment.predict_labels(label_name)

# Set the types of images WORC has to process. Used in fingerprinting
# Valid quantitative types are ['CT', 'PET', 'Thermography', 'ADC']
# Valid qualitative types are ['MRI', 'DWI', 'US']
experiment.set_image_types(['CT'])

# Use the standard workflow for your specific modus
if modus == 'binary_classification':
    experiment.binary_classification(coarse=coarse)
elif modus == 'regression':
    experiment.regression(coarse=coarse)
elif modus == 'multiclass_classification':
    experiment.multiclass_classification(coarse=coarse)

# Set the temporary directory
experiment.set_tmpdir(tmpdir)

Debug detected: False.
BigrCluster detected: False.
SnelliusCluster detected: False.
Debug detected: False.
BigrCluster detected: False.
SnelliusCluster detected: False.
Debug detected: False.
Debug detected: False.


In [9]:
# Run the experiment!
experiment.execute()

SimpleV validated: True.
MinSubjectsV validated: True.
Sample validated: True.
Label_type given is None, extracting all labels.
Label names to extract: Index(['imaginary_label_1', 'imaginary_label_2', 'Hospital', 'Age',
       'complement_label_1'],
      dtype='object')
InvalidLabelsV validated: True.
Debug detected: False.
Building training network...
 [INFO] basepluginmanager:0081 >> Could not load plugin file /home/lkeb-mgo1/anaconda3/envs/WORC/lib/python3.7/site-packages/fastr/resources/plugins/reportingplugins/elasticsearchreporter.py
FastrOptionalModuleNotAvailableError from /home/lkeb-mgo1/anaconda3/envs/WORC/lib/python3.7/site-packages/fastr/resources/plugins/reportingplugins/elasticsearchreporter.py line 46: Could not import the required elasticsearch for this plugin
 [INFO] basepluginmanager:0081 >> Could not load plugin file /home/lkeb-mgo1/anaconda3/envs/WORC/lib/python3.7/site-packages/fastr/resources/plugins/ioplugins/s3filesystem.py
FastrImportError from /home/lkeb-mgo1

**NOTE:**  Precomputed features can be used instead of images and masks by instead using ``experiment.features_from_this_directory(featuresdatadir)`` in a similar fashion.

---------------------------------------------------------------------------
Analysis of results
---------------------------------------------------------------------------

There are two main outputs: the features for each patient/object, and the overall
performance. These are stored as .hdf5 and .json files, respectively. By
default, they are saved in the so-called "fastr output mount", in a subfolder
named after your experiment name.

In [6]:
# Locate output folder
outputfolder = fastr.config.mounts['output']
experiment_folder = os.path.join(outputfolder, 'WORC_' + experiment_name)

print(f"Your output is stored in {experiment_folder}.")

# Read the features for the first patient
# NOTE: we use the glob package for scanning a folder to find specific files
feature_files = glob.glob(os.path.join(experiment_folder,
                                       'Features',
                                       'features_*.hdf5'))
if len(feature_files) == 0:
    raise ValueError('No feature files found: your network has failed.')

feature_files.sort()
featurefile_p1 = feature_files[0]
features_p1 = pd.read_hdf(featurefile_p1)

# Read the overall peformance
performance_file = os.path.join(experiment_folder, 'performance_all_0.json')
if not os.path.exists(performance_file):
    raise ValueError(f'No performance file {performance_file} found: your network has failed.')
    
with open(performance_file, 'r') as fp:
    performance = json.load(fp)

# Print the feature values and names
print("Feature values from first patient:")
for v, l in zip(features_p1.feature_values, features_p1.feature_labels):
    print(f"\t {l} : {v}.")

# Print the output performance
print("\n Performance:")
stats = performance['Statistics']
for k, v in stats.items():
    print(f"\t {k} {v}.")

Your output is stored in /home/lkeb-mgo1/WORC/output/WORC_Example_STWStrategyHN.


ValueError: No feature files found: your network has failed.

**NOTE:** the performance is probably horrible, which is expected as we ran
the experiment on coarse settings. These settings are recommended to only
use for testing: see also below.


---------------------------------------------------------------------------
Tips and Tricks
---------------------------------------------------------------------------

For tips and tricks on running a full experiment instead of this simple
example, adding more evaluation options, debuggin a crashed network etcetera,
please go to https://worc.readthedocs.io/en/latest/static/user_manual.html or
https://worc.readthedocs.io/en/latest/static/additionalfunctionality.html. If you
run into any issues, check the FAQ at https://worc.readthedocs.io/en/latest/static/faq.html,
make an issue on the WORC Github, or feel free to mail me.

We advice you to look at the docstrings of the SimpleWORC functions
introduced in this tutorial, and explore the other SimpleWORC functions,
as SimpleWORC offers much more functionality than presented here, see
the documentation: https://worc.readthedocs.io/en/latest/autogen/WORC.facade.html#WORC.facade.simpleworc.SimpleWORC


Some things we would advice to always do:
  - Run actual experiments on the full settings (coarse=False):
  
      ``coarse = False``
      
      ``experiment.binary_classification(coarse=coarse)``
      
  **Note**: this will result in more computation time. We therefore recommmend
  to run this script on either a cluster or high performance PC. If so,
  you may change the execution to use multiple cores to speed up computation
  just before before experiment.execute():
  
      ``experiment.set_multicore_execution()``

  - Add extensive evaluation: experiment.add_evaluation() before experiment.execute():
  
      ``experiment.add_evaluation()``
    
    See the documentation for more details on the evaluation outputs: https://worc.readthedocs.io/en/development/static/user_manual.html#outputs-and-evaluation-of-your-network.

Changing fields in the configuration (https://worc.readthedocs.io/en/latest/static/configuration.html)
can be done with the add_config_overrides function, see below. We recommend doing this after the modus part, as these also perform config_overrides. NOTE: all configuration fields have to be provided as strings.

In [None]:
overrides = {
            'Classification': {
                'classifiers': 'SVM',
                },
            }

experiment.add_config_overrides(overrides)
