# WORC Tutorial: Basic

Welcome to the tutorial of WORC: a Workflow for Optimal Radiomics
Classification! This tutorial interacts with WORC through BasicWORC,
which is based on SimpleWORC (SimpleWORC is the parent class of BasicWORC)
but provides additional functionality. For # first time use, we recommend
the WORCTutorialSimple using SimpleWORC, where we also
mention tips and tricks also valid for BasicWORC.

In [8]:
# import neccesary packages
from WORC import BasicWORC
import os

# These packages are only used in analysing the results
import pandas as pd
import json
import fastr
import glob

# If you don't want to use your own data, we use the following example set,
# see also the next code block in this example.
from WORC.exampledata.datadownloader import download_HeadAndNeck

# Define the folder this script is in, so we can easily find the example data
script_path = os.getcwd()

# NOTE: If on Google Colab, uncomment this line
# script_path = os.path.join(script_path, 'WORCTutorial')

# Determine whether you would like to use WORC for binary_classification,
# multiclass_classification or regression
modus = 'binary_classification'



---------------------------------------------------------------------------
Input
---------------------------------------------------------------------------

This part will first largely follow the same steps as the SimpleWORC tutorial.

In [10]:
# Download a subset of 20 patients in this folder. You can change these if you want.
nsubjects = 20  # use "all" if you want to download all patients.
data_path = os.path.join(script_path, 'Data')
#download_HeadAndNeck(datafolder=data_path, nsubjects=nsubjects)

Define the inputs of our network

In [3]:
# Identify our data structure: change the fields below accordingly
# if you use your own data.
imagedatadir = os.path.join(data_path, 'stwstrategyhn1')
image_file_name = 'image.nii.gz'
segmentation_file_name = 'mask.nii.gz'

# File in which the labels (i.e. outcome you want to predict) is stated
# Again, change this accordingly if you use your own data.
label_file = os.path.join(data_path, 'Examplefiles', 'pinfo_HN.csv')

# Name of the label you want to predict
if modus == 'binary_classification':
    # Classification: predict a binary (0 or 1) label
    label_name = ['imaginary_label_1']

elif modus == 'regression':
    # Regression: predict a continuous label
    label_name = ['Age']

elif modus == 'multiclass_classification':
    # Multiclass classification: predict several mutually exclusive binaru labels together
    label_name = ['imaginary_label_1', 'complement_label_1']

# Determine whether we want to do a coarse quick experiment, or a full lengthy
# one. Again, change this accordingly if you use your own data.
coarse = True

# Give your experiment a name
experiment_name = 'Example_STWStrategyHN_BasicWORC'

# Instead of the default tempdir, let's but the temporary output in a subfolder
# in the same folder as this script
tmpdir = os.path.join(script_path, 'WORC_' + experiment_name)
print(f"Temporary folder: {tmpdir}.")

Temporary folder: /home/lkeb-mgo1/WORCTutorial/WORC_Example_STWStrategyHN_BasicWORC.



---------------------------------------------------------------------------
The actual experiment
---------------------------------------------------------------------------

Here we will use BasicWORC. We could still use the ``..._from_this_directory`` SimpleWORC functions, but for
this tutorial we will instead directly provide the data to BasicWORC ourselves.
To this end, we need to create dictionaties, where the keys will be the sample
names (e.g. patient ID) and the values the filenames. The keys are used
to match segmentations to images, and match the files to the IDs provides in your
label file, so make sure everything corresponds.

In [5]:
# Create a WORC object
experiment = BasicWORC(experiment_name)

# Get the image files and convert to dictionary with patient names as keys
images = glob.glob(os.path.join(imagedatadir, "*", image_file_name))
images = {f"{os.path.basename(os.path.dirname(image))}_0": image for image in images}

# We now append this dictionary to the images_train object. The
# images_from_this_directory function from SimpleWORC also appends to this object.
experiment.images_train.append(images)

# We do the same with the segmentations
segmentations = glob.glob(os.path.join(imagedatadir, "*", segmentation_file_name))
segmentations = {f"{os.path.basename(os.path.dirname(segmentation))}_0": segmentation for segmentation in segmentations} 
experiment.segmentations_train.append(segmentations)
    
experiment.labels_from_this_file(label_file)
experiment.predict_labels(label_name)


# Set the types of images WORC has to process. Used in fingerprinting
# Valid quantitative types are ['CT', 'PET', 'Thermography', 'ADC']
# Valid qualitative types are ['MRI', 'DWI', 'US']
experiment.set_image_types(['CT'])

# Use the standard workflow for your specific modus
if modus == 'binary_classification':
    experiment.binary_classification(coarse=coarse)
elif modus == 'regression':
    experiment.regression(coarse=coarse)
elif modus == 'multiclass_classification':
    experiment.multiclass_classification(coarse=coarse)

# Set the temporary directory
experiment.set_tmpdir(tmpdir)

Debug detected: False.
BigrCluster detected: False.
SnelliusCluster detected: False.
Debug detected: False.
BigrCluster detected: False.
SnelliusCluster detected: False.
[]
Debug detected: False.
Debug detected: False.


There are various other objects you can interact with, see https://worc.readthedocs.io/en/latest/static/user_manual.html#attributes-sources
for an overview and the function of each attribute.
    
Note: You can keep appending dictionaries to these objects here if you want to
use multiple images per patient, e.g. a T1 MRI and a T2 MRI. You should
provide matching segmentations for each of the images, as WORC extracts the features
per image-segmentation set. Except when you want to
use special workflows, e.g. use image registration, see the WORC readthedocs.


In [6]:
# The rest remains the same as in SimpleWORC
experiment.labels_from_this_file(label_file)
experiment.predict_labels(label_name)

# Set the types of images WORC has to process. Used in fingerprinting
# Valid quantitative types are ['CT', 'PET', 'Thermography', 'ADC']
# Valid qualitative types are ['MRI', 'DWI', 'US']
experiment.set_image_types(['CT'])

# Use the standard workflow for your specific modus
if modus == 'binary_classification':
    experiment.binary_classification(coarse=coarse)
elif modus == 'regression':
    experiment.regression(coarse=coarse)
elif modus == 'multiclass_classification':
    experiment.multiclass_classification(coarse=coarse)

# Set the temporary directory
experiment.set_tmpdir(tmpdir)

Debug detected: False.
Debug detected: False.


In [7]:
# Run the experiment!
experiment.execute()

SimpleV validated: True.
MinSubjectsV validated: True.
Sample validated: True.
Label_type given is None, extracting all labels.
Label names to extract: Index(['imaginary_label_1', 'imaginary_label_2', 'Hospital', 'Age',
       'complement_label_1'],
      dtype='object')
InvalidLabelsV validated: True.
Debug detected: False.
Building training network...
 [INFO] basepluginmanager:0081 >> Could not load plugin file /home/lkeb-mgo1/anaconda3/envs/WORC/lib/python3.7/site-packages/fastr/resources/plugins/reportingplugins/elasticsearchreporter.py
FastrOptionalModuleNotAvailableError from /home/lkeb-mgo1/anaconda3/envs/WORC/lib/python3.7/site-packages/fastr/resources/plugins/reportingplugins/elasticsearchreporter.py line 46: Could not import the required elasticsearch for this plugin
 [INFO] basepluginmanager:0081 >> Could not load plugin file /home/lkeb-mgo1/anaconda3/envs/WORC/lib/python3.7/site-packages/fastr/resources/plugins/ioplugins/s3filesystem.py
FastrImportError from /home/lkeb-mgo1

---------------------------------------------------------------------------
Analysis of results
---------------------------------------------------------------------------

There are two main outputs: the features for each patient/object, and the overall
performance. These are stored as .hdf5 and .json files, respectively. By
default, they are saved in the so-called "fastr output mount", in a subfolder
named after your experiment name.

In [18]:
# Locate output folder
outputfolder = fastr.config.mounts['output']
experiment_folder = os.path.join(outputfolder, 'WORC_' + experiment_name)

print(f"Your output is stored in {experiment_folder}.")

# Read the features for the first patient
# NOTE: we use the glob package for scanning a folder to find specific files
feature_files = glob.glob(os.path.join(experiment_folder,
                                       'Features',
                                       'features_*.hdf5'))
if len(feature_files) == 0:
    raise ValueError('No feature files found: your network has failed.')

feature_files.sort()
featurefile_p1 = feature_files[0]
features_p1 = pd.read_hdf(featurefile_p1)

# Read the overall peformance
performance_file = os.path.join(experiment_folder, 'performance_all_0.json')
if not os.path.exists(performance_file):
    raise ValueError(f'No performance file {performance_file} found: your network has failed.')
    
with open(performance_file, 'r') as fp:
    performance = json.load(fp)

# Print the feature values and names
print("Feature values from first patient:")
for v, l in zip(features_p1.feature_values, features_p1.feature_labels):
    print(f"\t {l} : {v}.")

# Print the output performance
print("\n Performance:")
stats = performance['Statistics']
for k, v in stats.items():
    print(f"\t {k} {v}.")

Your output is stored in /home/lkeb-mgo1/WORC/output/WORC_Example_STWStrategyHN_BasicWORC.
Feature values from first patient:
	 PREDICT_original_sf_compactness_avg_2.5D : 0.7879222543158045.
	 PREDICT_original_sf_compactness_std_2.5D : 0.08060247901394918.
	 PREDICT_original_sf_rad_dist_avg_2.5D : 20.811173767482426.
	 PREDICT_original_sf_rad_dist_std_2.5D : 3.223201511471018.
	 PREDICT_original_sf_roughness_avg_2.5D : 6.481709885787098.
	 PREDICT_original_sf_roughness_std_2.5D : 3.226638248593759.
	 PREDICT_original_sf_convexity_avg_2.5D : 0.9555305485687677.
	 PREDICT_original_sf_convexity_std_2.5D : 0.03275492752585615.
	 PREDICT_original_sf_cvar_avg_2.5D : 0.024139225149618523.
	 PREDICT_original_sf_cvar_std_2.5D : 0.005668792723860412.
	 PREDICT_original_sf_prax_avg_2.5D : 0.5317359255750429.
	 PREDICT_original_sf_prax_std_2.5D : 0.13583080835623526.
	 PREDICT_original_sf_evar_avg_2.5D : 0.024785293957166093.
	 PREDICT_original_sf_evar_std_2.5D : 0.009940605490807794.
	 PREDICT_or

**NOTE:** the performance is probably horrible, which is expected as we ran
the experiment on coarse settings. These settings are recommended to only
use for testing: see also below.
