Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment SC from STIR/PSIR using contrast-agnostic model #46

Open
valosekj opened this issue Oct 8, 2023 · 24 comments
Open

Segment SC from STIR/PSIR using contrast-agnostic model #46

valosekj opened this issue Oct 8, 2023 · 24 comments

Comments

@valosekj
Copy link
Member

valosekj commented Oct 8, 2023

I ran the first run of the segment_sc_contrast-agnostic.sh script (PR #44) across canproco first session (ses-M0) PSIR and STIR images.
The script does the following:

  • segment the spinal cord using the contrast-agnostic MONAI model from PSIR/STIR images
  • binarize the prediction using sct_maths -bin 0.5 (to make the prediction compatible with sct_label_vertebrae)
  • perform vertebral labeling using sct_label_vertebrae

Initial observations

Spinal cord segmentation:

  • The contrast-agnostic MONAI model works well on STIR contrast (available for a single site (Calgary)); see the first part of this QC video. The model missed some SC parts, but they are located mainly in the outer slices where SC-CSF contrast is lower. Generally, the segmentations are very good.

  • The contrast-agnostic MONAI model performs significantly worse on PSIR contrast; see the second part of the QC video (from 00:30). For the PSIR contrast, the model either missed SC parts or segmented structures outside of the SC.

Note that I used the default 64x160x320 cropping. I will try running the prediction for the second time with R-L cropping.

Vertebral labeling:

  • sct_label_vertebrae fails during automatic C2-C3 disc detection in ~160 subjects --> those subjects will need manual labeling corrections. I am in favour to do labeling of all discs (instead of labeling only the init C2/C3 disc). This will be slightly more time-consuming, but we will be sure that the labeling will be okay.

Processed data and QC are saved in ~/duke/projects/canproco/canproco_contrast-agnostic_2023-10-07.

Tagging @sandrinebedard, @naga-karthik, @plbenveniste.

@naga-karthik
Copy link
Member

naga-karthik commented Oct 8, 2023

Hey @valosekj, thank you for running the predictions! The results on STIR contrast look good indeed!

The contrast-agnostic MONAI model performs significantly worse on PSIR contrast;

About this -- did you run the predictions on the raw PSIR images? Or, did you do any intensity rescaling? I think 1-2 weeks back @plbenveniste noticed that multiplying the PSIR images by -1 improved the segmentations (link to that the discussion sent to you on slack). Maybe you could do this and see if the results are better?

Note that I used the default 64x160x320 cropping. I will try running the prediction for the second time with R-L cropping.

You could try to 64 x 192 x -1. The -1 is important because if the spinal cord exceed 320 slices in S-I then it crops out top/bottom part of the cord. I am considering making -1 as the default for S-I cropping in my script

With these changes, I am confident that the results on PSIR will improve!

EDIT: I changed the default crop-size for the inference script in commit sct-pipeline/contrast-agnostic-softseg-spinalcord@c4cbd61

@valosekj
Copy link
Member Author

valosekj commented Oct 9, 2023

Thank you @naga-karthik! I am now using both suggestions:

  1. multiplication of the PSIR images by -1 to swap contrast from light cord and dark CSF to dark cord and light CSF (commit)
  2. 64 x 192 x -1 crop size (commit)

The contrast-agnostic MONAI model now works significantly better on the PSIR images; see QC video here!

Processed data and QC are saved in ~/duke/projects/canproco/canproco_contrast-agnostic_2023-10-09_PSIR_inv_fixed_patch_size.

@naga-karthik
Copy link
Member

Thanks @valosekj for these changes! Have to say that the predictions look much better now!

We were thinking of adding some intensity-based scaling during training or inference because of PSIR contrast (which was not used in training of contrast-agnostic model. Relevant issue: sct-pipeline/contrast-agnostic-softseg-spinalcord#69

@jcohenadad
Copy link
Member

tagging @plbenveniste

@valosekj
Copy link
Member Author

valosekj commented Oct 10, 2023

Next steps for SC seg:

Next steps for vertebral labeling:

Additional next steps:

@plbenveniste
Copy link
Collaborator

plbenveniste commented Oct 10, 2023

RL flipping experience

RL flipping has proved to have an impact on the quality of the segmentation.
The following GIF shows the difference on subject sub-cal105 (left: original, right: flipped-back)
ezgif com-gif-maker

The process to obtain the segmentations were:

  • one segmentation is obtained directly using the contrast-agnostic model
  • for the other: first, the image is flipped using sct_image -i file -o output -flip z then the contrast-agnostic model is used, and finally the segmentation is flipped back (using the same method)

To better understand, here is a GIF showing what flip z does to the image :

ezgif com-gif-maker (1)

Suggestion:

  • For the computation on the spinal cord, we could perform the inference on the natural image and the flipped image and then sum both segmentation (or take the average) (before binarization).
  • Going further with this: we could also flip on other directions (x and y) and get the total average of the 4 predictions.

@jcohenadad
Copy link
Member

Nice demonstration @plbenveniste. So there might be something fishy in the prediction code?

@plbenveniste
Copy link
Collaborator

plbenveniste commented Oct 10, 2023

I observed similar results for flipping on x (anterior-posterior) and y (inferior-superior). The predictions are different (either better or not).
→ Overall, by running the model several times on the same image with modifications, we can get complementary information. Our strategy is to run the model on the original image, the image flipped on x (anterior-posterior), the image flipped on y (inferior-superior) and the image flipped on z (left-right). The final mask is the sum of the 4 masks, which is then binarized (using sct_maths -i {mask_path} -o {mask_path} -bin 0.5 ).

⚠ This will require some post-processing as the spinal cord segmentation goes higher (too high ?) in the brain because of the flip on y (inferior-superior): this will be done thanks to the vertebral levels (which we will label manually or use Nathan's modle to do so)

-> Currently running this

@jcohenadad
Copy link
Member

Our strategy is to run the model on the original image, the image flipped on x (anterior-posterior), the image flipped on y (inferior-superior) and the image flipped on z (left-right). The final mask is the sum of the 4 masks, which is then binarized (using sct_maths -i {mask_path} -o {mask_path} -bin 0.5 ).

Won't this lead to an over-segmentation? I would do an average instead of a sum. But if we do an average, we might still "miss" the segmentations that only show up with, e.g., the R-L flip.

Also, what is the rationale for binarizing the output segmentation? In the past we (ie: Naga) noticed that training a softseg model with a mix between soft and hard input biases the model towards having less soft predictions (@naga-karthik can confirm)

@plbenveniste
Copy link
Collaborator

plbenveniste commented Oct 10, 2023

From our visual observation, we didn't see any over-segmentation. But yes, it is true that it can happen. Indeed, taking an average is not going to solve that problem. However, binarization can. For now, I am using a threshold of 0.5 (anything above is changed to 1). However, because we know have 4 segmentation: this means that the minimum is 0 and maximum is 4. We can then change the threshold to something higher like 0.7 or above. Therefore, binarization can help prevent over-segmentation. However, what we could do is modify binarization so that anything below 0.7 is 0 and anything above 1 is 1. (we would therefore still have a soft prediction ?) (not so sure about this idea though)
-> To be investigated as well

@jcohenadad
Copy link
Member

We can then change the threshold to something higher like 0.7 or above. Therefore, binarization can help prevent over-segmentation. However, what we could do is modify binarization so that anything below 0.7 is 0 and anything above 1 is 1.

The only issue I see with this is that the contrast agnostic model is designed to be calibrated across contrasts (ie: a value of 0.8 is supposed to represent 80% of partial volume). If we play around with the output regression values, it defeats the purpose of this calibration. Which is why I was suggesting averaging instead of summing, but if averaging does not solve the issue of 'missing' spinal cord, then that's a problem...

maybe the issue can be further investigated by digging a bit more in the inference pipeline?

@valosekj
Copy link
Member Author

Also, what is the rationale for binarizing the output segmentation?

We binarize the output segmentation to make it compatible with sct_label_vertebrae; see lines here. We will also use the output segmentation for the registration to the template.

@jcohenadad
Copy link
Member

We binarize the output segmentation to make it compatible with sct_label_vertebrae; see lines here. We will also use the output segmentation for the registration to the template.

Right, but I would still keep the soft segmentation because we need it for training. And the soft segmentation is the one that needs to be manually corrected (followed by binarization). With your current pipeline, you will end up manually correcting the binary segmentation, so we will end with twice as much manual correction needed.

so to sum up, we need: pred_soft -> pred_soft_manual -> pred_soft_manual_bin

@naga-karthik
Copy link
Member

Sorry for the delay in response, had been following the updates in-person.

RL flipping has proved to have an impact on the quality of the segmentation

This is a great idea actually. Glad that we're looking into this. This is essentially test-time-augmentation done in nnUNet (it is called mirroring there)

I would do an average instead of a sum

This is what we should be doing I believe. Even nnUnet takes the mean of the predictions (see this function)

But if we do an average, we might still "miss" the segmentations that only show up with, e.g., the R-L flip.

This is also true. If there is no prediction on either of the axes we might average it and make it "more soft" than needed.

And the soft segmentation is the one that needs to be manually corrected

@jcohenadad I don't understand how this can be done. With fsleyes all manual corrected labels have a label value of 1, right? how can we have 0.2 or 0.3 at the SC boundaries for example?

@jcohenadad
Copy link
Member

@jcohenadad I don't understand how this can be done. With fsleyes all manual corrected labels have a label value of 1, right? how can we have 0.2 or 0.3 at the SC boundaries for example?

Ah! I've been waiting for this question 😊 The only humanly reasonable way I see this, is by altering the soft mask with binary values. Eg: if the rater notices an undersegmentation, they would add 'ones' where the cord is supposed to be. So, 99.99% of the segmentation would still be soft, except for the part that was manually corrected. I believe this is still better than having 0% of the segmentation that is soft.

@naga-karthik
Copy link
Member

naga-karthik commented Oct 11, 2023

Ah okay, but in the case of undersegmentation won't the soft values be "sandwiched" between 1s of the actual prediction (say the center of the SC) and the 1s of the manual correction (at the SC boundary)?

Say the green arrow in this case is undersegmentation, then we would have to erase the orange-yellow parts just to the right of the arrow and add our manual corrections

BUT, now that I see it the corrected label will still be soft as you mentioned earlier (and better than only 0s and 1s). So, in the end, it could be done then!

Screen Shot 2023-10-11 at 7 33 36 PM

EDIT: added figure

@jcohenadad
Copy link
Member

Ah okay, but in the case of undersegmentation won't the soft values be "sandwiched" between 1s of the actual prediction (say the center of the SC) and the 1s of the manual correction (at the SC boundary)? Say the green arrow in this case is undersegmentation, then we would have to erase the orange-yellow parts just to the right of the arrow and add our manual corrections

Indeed, for consistency we would need to do that (ie: make sure there is no value below 1 inside the spinal cord if 1s are added at the border.

BUT, now that I see it the corrected label will still be soft as you mentioned earlier (and better than only 0s and 1s). So, in the end, it could be done then!

Indeed, I would expect that the prediction would be "good enough" so that we wouldn't have to deal with too many of these cases. If there is a systematic undersegmentation, then I think we should refrain from manually correcting the scans and instead enrich/improve the generalizability of the model first, and then re-running the prediction on those scans.

@plbenveniste
Copy link
Collaborator

(moving conversation from issue 49 to this issue to centralize everything)

New QC results
QC of sc segmentation done using the .sh script which includes:

  • x, y, and z flipping
  • keep the largest function from ivadomed : it only keeps the biggest continuous segmented object
  • the sum of the mask
  • cropping parameters done with 64 x 192 x -1 (default)
  • binarization threshold equal to 0.5

We must also say that the script, which performs inference, considers that every voxel below 0.5 is the background. Therefore, taking a threshold of 0.5 is considering that if a voxel is labeled at least once we take it into account. The new range is 0.5 to 4 now.

Problematic images:

Problems with the following files
  • sub-edm029_ses-M0_PSIR : missing bottom
  • sub-edm075_ses-M0_PSIR : missing bottom
  • sub-edm156_ses-M0_PSIR : missing bottom (small chunk)
  • sub-mon014_ses-M0_PSIR : missing bottom
  • sub-mon104_ses-M0_PSIR : missing bottom
  • sub-mon113_ses-M0_PSIR : missing bottom + data quality issue (changing from black to white at back of neck
  • sub-mon138_ses-M0_PSIR : missing bottom + data quality issue (blury image)
  • sub-mon152_ses-M0_PSIR : data quality (changing from white to black at back of neck)
  • sub-tor092_ses-M0_PSIR : missing bottom + motion
  • sub-van116_ses-M0_PSIR : missing bottom
  • sub-van159_ses-M0_PSIR : missing bottom + motion
  • sub-van189_ses-M0_PSIR : missing bottom

The problem with the missing bottom of the segmentation is caused by the keep_largest function which only keeps the longest continuous chunk. Therefore, while previously only a small chunk in the middle of the sc was missing, now the pipeline only keeps the bigger part of the sc (often being the top part).

Example with sub-van189_ses-M0_PSIR:

Screen Shot 2023-10-12 at 11 09 03 AM

without keep_largest function

Screen Shot 2023-10-12 at 11 09 16 AM

with keep_largest function

@jcohenadad
Copy link
Member

jcohenadad commented Oct 12, 2023

The problem with the missing bottom of the segmentation is caused by the keep_largest function which only keeps the longest continuous chunk. Therefore, while previously only a small chunk in the middle of the sc was missing, now the pipeline only keeps the bigger part of the sc (often being the top part).

Aoutch! that's a problem indeed. There was also another function, something like "remove small objects" (link here). Maybe that would be more appropriate?

@plbenveniste
Copy link
Collaborator

plbenveniste commented Oct 17, 2023

The function "remove small objects" successfully removes a "small" block which is segmented in the eye.
To reproduce the error of segmentation in the eye, I used:

  • run_single_inference (without "remove_small_objects" function)
  • image multiplied by -1
  • cropping: 100x320x320

Then to get the image without the small error in segmentation, I just added the "remove_small_objects" function in the script with min_size = 500 voxels.
The result is the following for subject sub-mon137:

ezgif com-gif-maker

The min_size can be discussed. In this case, the size segmentation of the spinal cord without the error in the eye is 6812 voxels and the segmentation in the eye is 120 voxels
Should the min_size be set at 500 voxels ? Should it be set at a certain percentage of the total volume (10%) ? Should it be in mm3 ?

@naga-karthik
Copy link
Member

In this case, the size segmentation of the spinal cord without the error in the eye is 6812 voxels

Is it in the similar range for a few more subjects? We can decide on a certain percentage of the total volume by getting an average estimate from a few more subjects. What do you think?

Also, having a percentage (instead of a raw number) of voxels is better imo

@plbenveniste
Copy link
Collaborator

plbenveniste commented Oct 18, 2023

QC of the final experiment

Script: segment_sc_contrast-agnostic.sh

Steps:

  • multiplied by -1 for PSIR images (to invert contrast)
  • run_single_inference (with the remove_small_objects function, 0.5 threshold, and cropping: 64 x 192 x -1 (default))
  • 4 inferences per image (1 on original image and one per flip (x,y,z))
  • Sum all 4 predictions
  • adding the lesion mask to the SC mask (because the lesions need to be in the SC for a region-based nnunet)

QC can be found at : ~/duke/projects/canproco/canproco_contrast-agnostic_2023-10-17_with_vert_labeling

Minor problem (10 files):
  • sub-cal095/ses-M0
  • sub-cal078/ses-M0
  • sub-cal115/ses-M0
  • sub-cal160/ses-M0
  • sub-cal198/ses-M0
  • sub-edm029/ses-M0
  • sub-mon041/ses-M0
  • sub-mon036/ses-M0
  • sub-van116/ses-M0
  • sub-van154/ses-M0
Major problem (10 files):
  • sub-edm019/ses-M0 (seg outside of sc)
  • sub-edm038/ses-M0 (seg outside of sc)
  • sub-edm075/ses-M0 (missing chunk)
  • sub-mon014/ses-M0 (missing chunk)
  • sub-mon104/ses-M0 (missing chunk)
  • sub-mon152/ses-M0 (almost no seg)
  • sub-mon180/ses-M0 (missing chunk)
  • sub-van159/ses-M0 (missing chunk)
  • sub-van189/ses-M0 (missing chunk)
  • sub-tor092/ses-M0 (missing chunk)

manual correction process

YML files for PSIR and STIR:

FILES_SEG:
- sub-edm029_ses-M0_PSIR.nii.gz
- sub-mon041_ses-M0_PSIR.nii.gz
- sub-mon036_ses-M0_PSIR.nii.gz
- sub-van116_ses-M0_PSIR.nii.gz
- sub-van154_ses-M0_PSIR.nii.gz
- sub-edm019_ses-M0_PSIR.nii.gz
- sub-edm038_ses-M0_PSIR.nii.gz
- sub-edm075_ses-M0_PSIR.nii.gz
- sub-mon014_ses-M0_PSIR.nii.gz
- sub-mon104_ses-M0_PSIR.nii.gz
- sub-mon152_ses-M0_PSIR.nii.gz
- sub-mon180_ses-M0_PSIR.nii.gz
- sub-van159_ses-M0_PSIR.nii.gz
- sub-van189_ses-M0_PSIR.nii.gz
- sub-tor092_ses-M0_PSIR.nii.gz
FILES_SEG:
- sub-cal095_ses-M0_STIR.nii.gz
- sub-cal078_ses-M0_STIR.nii.gz
- sub-cal115_ses-M0_STIR.nii.gz
- sub-cal160_ses-M0_STIR.nii.gz
- sub-cal198_ses-M0_STIR.nii.gz

manual correction commands (note that -path-img and -path-label is the same because both images and SC segs are located under the same folders. Also note that we have to run the manual_correction.py twice because PSIR and STIR have different -suffix-files-seg):

# PSIR
python manual_correction.py -config data_processed/seg_to_correct_PSIR.yml -path-img data_processed/data_to_correct_PSIR -path-label data_processed/data_to_correct_PSIR -suffix-files-seg _mul_pred_sum_bin_with_lesion_bin

# STIR
python manual_correction.py -config data_processed/seg_to_correct_STIR.yml -path-img data_processed/data_to_correct_STIR -path-label data_processed/data_to_correct_STIR -suffix-files-seg _pred_sum_bin_with_lesion_bin

Next steps :

  • Manual correction of major issues
  • Manual correction of minor issues
  • Remove SC seg of sub-mon152
  • Conversion to BIDS format into canproco dataset (creation of .json files)
  • Verification of dataset integrity
  • Push to git-annex

@valosekj Can you update the above TODO list, if I have missed something ?

@plbenveniste
Copy link
Collaborator

In the above list, sub-mon152 was not manually corrected as it was part of exclude.yml but was forgotten in the config file.
Should sc seg be deleted for this subject ?

@valosekj
Copy link
Member Author

In the above list, sub-mon152 was not manually corrected as it was part of exclude.yml but was forgotten in the config file.
Should sc seg be deleted for this subject ?

If the SC seg (and the image quality) is bad, then yes, we will not include the SC seg for this subject to git-annex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants