Bug: Re-Binarization of labels in pre-processing routine #18

jqmcginnis · 2022-12-30T13:46:12Z

Hi all 🙂,

I just discovered this small bug / missing post-processing step in the pre-processing routine.

Discovery / Problem

When running nn_UNnet nnUNet_plan_and_preprocess -t 501 --verify_dataset_integrity on the pre-processed dataset, we get the following error message:

Unexpected labels found in file /home/jmcginnis/data/nnunet/nnUNet_raw_data_base/nnUNet_raw_data/Task501_MSSpineLesionPreprocessedAxialOnly/labelsTr/MSSpineLesionPreprocessedAxialOnly_047.nii.gz. 
Found these unexpected values (they should not be there)
[7.925189e-17, 8.585621e-17, 9.0809456e-17, 1.0401811e-16, 
1.0897135e-16, 1.386908e-16, 1.5024837e-16, ...]

Similarly, @kiristern is dealing with low DICE values for the Modified U-Net baseline:

2022-12-20 17:24:12.210 | INFO     |
ivadomed.testing:test:88 - {'dice_score': 0.060521258921826665, 
'multi_class_dice_score': 0.060521258921826665, 
'precision_score': 0.06500272972600003, 
'recall_score': 0.0716620061524084, 
'specificity_score': 0.9973268868391573, 
'intersection_over_union': 0.03410365979066674, 
'accuracy_score': 0.996024251185431, 'hausdorff_score': 2.045652891236524}

Although I am not familiar with ivadomed, I suspect that multi_class_dice_score indicates that ivadomed faces similar problems with the non-binary labels and interprets it as a multi-class problem instead.

... but why?

When we resample the images to isotropic resolution, we introduce sampling artifacts as we blur the edges of the labels, leading to smoothed contours. Thus, we observe values other than {0,1} in the labels. Can be easily debugged by looking at one of the many examples of labels in the dataset.

Solution

We can mitigate this effect by adding the following post-processing step after this line:

bavaria-quebec/preprocessing/preprocess_data.sh

Line 137 in fc4f71d

    
           sct_resample -i ${file}_T2w_crop.nii.gz -mm 0.75x0.75x0.75 -o ${file}_T2w_crop_res.nii.gz

sct_maths -i ${file}_T2w_crop_res.nii.gz -bin 1e-12 -o ${file}_T2w_crop_res.nii.gz

I haven't looked into finding an optimal value for the threshold 1e-12, but I've chosen it to be extremely low so we have the whole label borders.

The text was updated successfully, but these errors were encountered:

jcohenadad · 2023-01-01T22:23:43Z

good catch! this is definitely a problem when the model expects a binary mask as input. However, when using softeg, the input mask should/can be non-binary (float ranging between 0 and 1).

Also, I think the problematic line referring to resampling the label (not the image) is this one:

bavaria-quebec/preprocessing/preprocess_data.sh

Line 157 in fc4f71d

    
           sct_resample -i ${file_gt}_crop.nii.gz -mm 0.75x0.75x0.75 -o ${file_gt}_crop_res.nii.gz

So, in order to address this issue, one possibility would be to output a binary and a non-binary masks, and your training config file would fetch the appropriate label.

naga-karthik · 2023-01-03T15:36:42Z

Hey @jqmcginnis, thanks for notifying about this! I have a few comments on the ivadomed aspect of your issue. First off, ivadomed does not differentiate between soft and binarized versions of the GT, and computes the soft Dice coefficient by default (see the function here). Secondly, the multi_class_dice_score is essentially the dice_score function above called in a loop for the number of classes in the input (see this line). Hence, I don't think that this has anything to do with non-binary labels as it is implemented to work with soft labels in the first place.

As for the low scores obtained by @kiristern, I would attribute them to sub-optimal hyperparamters and/or sizes of the input image patches (note, this was a 3D model with a relatively very large dimension in the S-I direction because of the stitching) rather than it being something to do with non-binarized labels.

Now, as for nnUNet expecting binarized labels, I had the same problem with Monai's default parameters so I had to use a preprocessing transform on Monai's side to convert the soft labels to the binarized labels (granted that this will already lose information, but some monai functions are not ready to accept soft labels). HENCE, I was wondering if nnUNet had something similar as well? That is, conversion to binary labels before processing?

OR, as @jcohenadad suggested, we could modify the pre-processing script to output both binary and soft labels instead and choose whatever is appropriate for the model we're training.

jcohenadad · 2023-01-03T20:33:11Z

note, this was a 3D model with a relatively very large dimension in the S-I direction because of the stitching

don't you want to go with a cropped version of the input for lesion segmentation task? (ie: cascaded approach)

jqmcginnis · 2023-01-04T08:23:04Z

HENCE, I was wondering if nnUNet had something similar as well? That is, conversion to binary labels before processing?

I've looked into the different functions, and I have not discovered any functions in the nn-unet repo to do this. I would see two options.

Calculate a binary mask in the nn-unet renaming routine or MONAI routines.
Output a second binary mask as @jcohenadad recommended in the pre-processing script.

Personally, I would recommend option 2) as this ensures each current model (and future models) will always use the same binarized version, e.g. perhaps different libraries use different threshold for the generation of a binary label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Re-Binarization of labels in pre-processing routine #18

Bug: Re-Binarization of labels in pre-processing routine #18

jqmcginnis commented Dec 30, 2022

jcohenadad commented Jan 1, 2023

naga-karthik commented Jan 3, 2023

jcohenadad commented Jan 3, 2023

jqmcginnis commented Jan 4, 2023

Bug: Re-Binarization of labels in pre-processing routine #18

Bug: Re-Binarization of labels in pre-processing routine #18

Comments

jqmcginnis commented Dec 30, 2022

Discovery / Problem

... but why?

Solution

jcohenadad commented Jan 1, 2023

naga-karthik commented Jan 3, 2023

jcohenadad commented Jan 3, 2023

jqmcginnis commented Jan 4, 2023