-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segment SC from STIR/PSIR using contrast-agnostic model #46
Comments
Hey @valosekj, thank you for running the predictions! The results on STIR contrast look good indeed!
About this -- did you run the predictions on the raw PSIR images? Or, did you do any intensity rescaling? I think 1-2 weeks back @plbenveniste noticed that multiplying the PSIR images by
You could try to With these changes, I am confident that the results on PSIR will improve! EDIT: I changed the default crop-size for the inference script in commit sct-pipeline/contrast-agnostic-softseg-spinalcord@c4cbd61 |
Thank you @naga-karthik! I am now using both suggestions:
The contrast-agnostic MONAI model now works significantly better on the PSIR images; see QC video here! Processed data and QC are saved in |
Thanks @valosekj for these changes! Have to say that the predictions look much better now! We were thinking of adding some intensity-based scaling during training or inference because of PSIR contrast (which was not used in training of contrast-agnostic model. Relevant issue: sct-pipeline/contrast-agnostic-softseg-spinalcord#69 |
tagging @plbenveniste |
Next steps for SC seg:
Next steps for vertebral labeling:
Additional next steps:
|
Nice demonstration @plbenveniste. So there might be something fishy in the prediction code? |
I observed similar results for flipping on x (anterior-posterior) and y (inferior-superior). The predictions are different (either better or not). ⚠ This will require some post-processing as the spinal cord segmentation goes higher (too high ?) in the brain because of the flip on y (inferior-superior): this will be done thanks to the vertebral levels (which we will label manually or use Nathan's modle to do so) -> Currently running this |
Won't this lead to an over-segmentation? I would do an average instead of a sum. But if we do an average, we might still "miss" the segmentations that only show up with, e.g., the R-L flip. Also, what is the rationale for binarizing the output segmentation? In the past we (ie: Naga) noticed that training a softseg model with a mix between soft and hard input biases the model towards having less soft predictions (@naga-karthik can confirm) |
From our visual observation, we didn't see any over-segmentation. But yes, it is true that it can happen. Indeed, taking an average is not going to solve that problem. However, binarization can. For now, I am using a threshold of 0.5 (anything above is changed to 1). However, because we know have 4 segmentation: this means that the minimum is 0 and maximum is 4. We can then change the threshold to something higher like 0.7 or above. Therefore, binarization can help prevent over-segmentation. However, what we could do is modify binarization so that anything below 0.7 is 0 and anything above 1 is 1. (we would therefore still have a soft prediction ?) (not so sure about this idea though) |
The only issue I see with this is that the contrast agnostic model is designed to be calibrated across contrasts (ie: a value of 0.8 is supposed to represent 80% of partial volume). If we play around with the output regression values, it defeats the purpose of this calibration. Which is why I was suggesting averaging instead of summing, but if averaging does not solve the issue of 'missing' spinal cord, then that's a problem... maybe the issue can be further investigated by digging a bit more in the inference pipeline? |
We binarize the output segmentation to make it compatible with |
Right, but I would still keep the soft segmentation because we need it for training. And the soft segmentation is the one that needs to be manually corrected (followed by binarization). With your current pipeline, you will end up manually correcting the binary segmentation, so we will end with twice as much manual correction needed. so to sum up, we need: pred_soft -> pred_soft_manual -> pred_soft_manual_bin |
Sorry for the delay in response, had been following the updates in-person.
This is a great idea actually. Glad that we're looking into this. This is essentially test-time-augmentation done in nnUNet (it is called
This is what we should be doing I believe. Even nnUnet takes the mean of the predictions (see this function)
This is also true. If there is no prediction on either of the axes we might average it and make it "more soft" than needed.
@jcohenadad I don't understand how this can be done. With fsleyes all manual corrected labels have a label value of 1, right? how can we have 0.2 or 0.3 at the SC boundaries for example? |
Ah! I've been waiting for this question 😊 The only humanly reasonable way I see this, is by altering the soft mask with binary values. Eg: if the rater notices an undersegmentation, they would add 'ones' where the cord is supposed to be. So, 99.99% of the segmentation would still be soft, except for the part that was manually corrected. I believe this is still better than having 0% of the segmentation that is soft. |
Indeed, for consistency we would need to do that (ie: make sure there is no value below 1 inside the spinal cord if 1s are added at the border.
Indeed, I would expect that the prediction would be "good enough" so that we wouldn't have to deal with too many of these cases. If there is a systematic undersegmentation, then I think we should refrain from manually correcting the scans and instead enrich/improve the generalizability of the model first, and then re-running the prediction on those scans. |
(moving conversation from issue 49 to this issue to centralize everything) New QC results
We must also say that the script, which performs inference, considers that every voxel below 0.5 is the background. Therefore, taking a threshold of 0.5 is considering that if a voxel is labeled at least once we take it into account. The new range is 0.5 to 4 now. Problematic images: Problems with the following files
The problem with the missing bottom of the segmentation is caused by the keep_largest function which only keeps the longest continuous chunk. Therefore, while previously only a small chunk in the middle of the sc was missing, now the pipeline only keeps the bigger part of the sc (often being the top part). Example with sub-van189_ses-M0_PSIR: without keep_largest function with keep_largest function |
Aoutch! that's a problem indeed. There was also another function, something like "remove small objects" (link here). Maybe that would be more appropriate? |
The function "remove small objects" successfully removes a "small" block which is segmented in the eye.
Then to get the image without the small error in segmentation, I just added the "remove_small_objects" function in the script with min_size = 500 voxels. The min_size can be discussed. In this case, the size segmentation of the spinal cord without the error in the eye is 6812 voxels and the segmentation in the eye is 120 voxels |
Is it in the similar range for a few more subjects? We can decide on a certain percentage of the total volume by getting an average estimate from a few more subjects. What do you think? Also, having a percentage (instead of a raw number) of voxels is better imo |
QC of the final experimentScript: segment_sc_contrast-agnostic.sh Steps:
QC can be found at : Minor problem (10 files):
Major problem (10 files):
manual correction processYML files for PSIR and STIR: FILES_SEG:
- sub-edm029_ses-M0_PSIR.nii.gz
- sub-mon041_ses-M0_PSIR.nii.gz
- sub-mon036_ses-M0_PSIR.nii.gz
- sub-van116_ses-M0_PSIR.nii.gz
- sub-van154_ses-M0_PSIR.nii.gz
- sub-edm019_ses-M0_PSIR.nii.gz
- sub-edm038_ses-M0_PSIR.nii.gz
- sub-edm075_ses-M0_PSIR.nii.gz
- sub-mon014_ses-M0_PSIR.nii.gz
- sub-mon104_ses-M0_PSIR.nii.gz
- sub-mon152_ses-M0_PSIR.nii.gz
- sub-mon180_ses-M0_PSIR.nii.gz
- sub-van159_ses-M0_PSIR.nii.gz
- sub-van189_ses-M0_PSIR.nii.gz
- sub-tor092_ses-M0_PSIR.nii.gz FILES_SEG:
- sub-cal095_ses-M0_STIR.nii.gz
- sub-cal078_ses-M0_STIR.nii.gz
- sub-cal115_ses-M0_STIR.nii.gz
- sub-cal160_ses-M0_STIR.nii.gz
- sub-cal198_ses-M0_STIR.nii.gz manual correction commands (note that
Next steps :
@valosekj Can you update the above TODO list, if I have missed something ? |
In the above list, |
If the SC seg (and the image quality) is bad, then yes, we will not include the SC seg for this subject to git-annex. |
I ran the first run of the
segment_sc_contrast-agnostic.sh
script (PR #44) across canproco first session (ses-M0
) PSIR and STIR images.The script does the following:
sct_maths -bin 0.5
(to make the prediction compatible withsct_label_vertebrae
)sct_label_vertebrae
Initial observations
Spinal cord segmentation:
The contrast-agnostic MONAI model works well on STIR contrast (available for a single site (Calgary)); see the first part of this QC video. The model missed some SC parts, but they are located mainly in the outer slices where SC-CSF contrast is lower. Generally, the segmentations are very good.
The contrast-agnostic MONAI model performs significantly worse on PSIR contrast; see the second part of the QC video (from 00:30). For the PSIR contrast, the model either missed SC parts or segmented structures outside of the SC.
Note that I used the default 64x160x320 cropping. I will try running the prediction for the second time with R-L cropping.
Vertebral labeling:
sct_label_vertebrae
fails during automatic C2-C3 disc detection in ~160 subjects --> those subjects will need manual labeling corrections. I am in favour to do labeling of all discs (instead of labeling only the init C2/C3 disc). This will be slightly more time-consuming, but we will be sure that the labeling will be okay.Processed data and QC are saved in
~/duke/projects/canproco/canproco_contrast-agnostic_2023-10-07
.Tagging @sandrinebedard, @naga-karthik, @plbenveniste.
The text was updated successfully, but these errors were encountered: