Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] make identifying region names more robust #4289

Merged
merged 27 commits into from
Mar 20, 2024

Conversation

Remi-Gau
Copy link
Collaborator

@Remi-Gau Remi-Gau commented Feb 27, 2024

Changes proposed in this pull request:

  • add tests to check that:
    • deterministic atlases with floats or non continuous values as region_ids are handled properly
    • atlases can be handled properly whether they contain the label for "background" (independent of case) or not
  • adapt code to make sure all those tests pass
  • sanitizing labels passed to the NiftiLabelsMasker to cast them to a list of strings, throw warnings it labels is not a list of strings.
  • throw a warning if number of labels does not match the number of regions at instance construction or during fitting.

Copy link
Contributor

👋 @Remi-Gau Thanks for creating a PR!

Until this PR is ready for review, you can include the [WIP] tag in its title, or leave it as a github draft.

Please make sure it is compliant with our contributing guidelines. In particular, be sure it checks the boxes listed below.

  • PR has an interpretable title.
  • PR links to Github issue with mention Closes #XXXX (see our documentation on PR structure)
  • Code is PEP8-compliant (see our documentation on coding style)
  • Changelog or what's new entry in doc/changes/latest.rst (see our documentation on PR structure)

For new features:

  • There is at least one unit test per new function / class (see our documentation on testing)
  • The new feature is demoed in at least one relevant example.

For bug fixes:

  • There is at least one test that would fail under the original bug conditions.

We will review it as quick as possible, feel free to ping us with questions if needed.

@@ -58,10 +59,12 @@ class NiftiLabelsMasker(BaseMasker, _utils.CacheMixin):
Region definitions, as one image of labels.

labels : :obj:`list` of :obj:`str`, optional
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the input type of labels is never checked

this was not a big issue for version < 0.10.3 as those were only used when generating masker reports, but this can now cause trouble

we should probably update the doc string to say that we can accept sequence of strings

though this may not be enough as some nilearn atlases have their labels in nump arrays

tmp.py Outdated
Comment on lines 48 to 62
if "labels" in atlas:
labels = atlas.labels
elif "rsn_indices" in atlas:
labels = atlas.rsn_indices
elif f == fetch_atlas_basc_multiscale_2015:
labels = range(64)
elif f == fetch_atlas_yeo_2011:
labels = range(17)

if f == fetch_atlas_schaefer_2018:
labels = np.insert(labels, 0, "Background")

labels_img = (
atlas["thick_17"] if f == fetch_atlas_yeo_2011 else atlas.maps
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a separate PR, we should probably further standardize what our atlases return to avoid forcing users to have to wrangle atlas outputs this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this has been a longstanding goal ; e.g., #2037 !

f"{len(labels_after_resampling)} labels "
"(including background)."
)
self._resample_labels(imgs_)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this refactoring is unrelated but this transform_single_imgs was getting way too long...

Comment on lines 601 to 616
@pytest.mark.parametrize(
"with_background",
[True, False], # In case the list of labels includes one for background
)
@pytest.mark.parametrize(
"dtype", ["int32", "float32"] # In case regions are labelled with floats
)
@pytest.mark.parametrize(
"affine_data",
[
None,
np.diag(
(4, 4, 4, 4)
), # region_names_ matches signals after resampling drops labels
],
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test parametrization should cover all the different use cases

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least all of those that we have in our atlases and that fail

tmp.py Outdated
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file does not need to be kept in the end but was useful to make sure all our atlases can be used

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file must be removed before merging, though it is potentially useful to keep around to make sure our atlases can be used with our maskers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bthirion @man-shu

After discussing with @mtorabi59 we figured it may be good to keep this script "around" and run it at regular interval (but not on every PR).

Currently our tests:

  • test the label maskers on dummy atlases
  • test the fetchers with mocks

But we don't check (I think) that all Nilearn atlases work with our maskers, so this script basically, run the label maskers on the atlases provided by Nilearn.

The logic of the script also shows the kind of branchic logic hoops users have to go through because of the inconsistent structure of our atlases.

Why not run it on every PR? Because the fetchers may fail when the download request fail (the reason why we mock them during testing)

What do you think of the idea? If we think this is valuable to keep around, where should this go? nilearn/maint_tools? in the sandbox repo? somewhere else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nilearn/maint_tools is probably the best thing to do. I'm just afraid of seeing this kind of thing growing.
What should actually be done with it should be documented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with me if we put it in maint_tools: this would be done in a sepatate PR to make it easier to review and see how to best document it.

@mtorabi59
Copy link
Contributor

@Remi-Gau you can ignore my reviewing request :)

nilearn/maskers/nifti_labels_masker.py Outdated Show resolved Hide resolved
tmp.py Outdated Show resolved Hide resolved
@bthirion
Copy link
Member

bthirion commented Mar 5, 2024

Sorry, what is the status of this PR ?

@Remi-Gau
Copy link
Collaborator Author

Remi-Gau commented Mar 5, 2024

Sorry, what is the status of this PR ?

  • Had a first pass.
  • Met with @mtorabi59 so he could cross check my reasoning of what the problem so he could make comments on it.
  • Will now try to implement these
  • We should meet in person with @mtorabi59 to make sure all the Ts are crossed and bring this one to a happy resolution

Comment on lines 711 to 724
self.region_names_ = None
if self.labels is not None:
lower_case_labels = {x.lower() for x in self.labels}
knwon_backgrounds = {"background"}
background_in_labels = any(
knwon_backgrounds.intersection(lower_case_labels)
)
offset = 1 if background_in_labels else 0
self.region_names_ = {
key: self.labels[region_id]
key: self.labels[key + offset]
for key, region_id in region_ids.items()
if region_id != self.background_label
}
else:
self.region_names_ = None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtorabi59
this is refactored version of your suggestion

if self.labels is not None:
lower_case_labels = {x.lower() for x in self.labels}
knwon_backgrounds = {"background"}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be expanded if we encounter atlases that use another "keyword" for the background (bckgrd, bg...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed with @mtorabi59:

  • to make our life easier we should probably require that list of labels passed to the constructors MUST include background or Background and that this should be the first item in the list. This cannot be done in this PR and should be part of some refactoring to standardize our atlases with deprecation cycle: won't break the API but will change some of the objects returned by the fetchers.

Copy link

codecov bot commented Mar 6, 2024

Codecov Report

Attention: Patch coverage is 92.15686% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 92.15%. Comparing base (abb80ff) to head (9e73330).
Report is 33 commits behind head on main.

Files Patch % Lines
nilearn/maskers/nifti_labels_masker.py 93.87% 1 Missing and 2 partials ⚠️
nilearn/maskers/nifti_masker.py 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4289      +/-   ##
==========================================
+ Coverage   91.85%   92.15%   +0.29%     
==========================================
  Files         144      143       -1     
  Lines       16419    16496      +77     
  Branches     3434     3463      +29     
==========================================
+ Hits        15082    15202     +120     
+ Misses        792      749      -43     
  Partials      545      545              
Flag Coverage Δ
macos-latest_3.11_test_plotting 91.94% <92.15%> (+0.09%) ⬆️
macos-latest_3.12_test_plotting 91.94% <92.15%> (?)
macos-latest_3.8_test_plotting 91.91% <92.15%> (?)
macos-latest_3.9_test_plotting 91.91% <92.15%> (?)
ubuntu-latest_3.10_test_plotting 91.94% <92.15%> (+0.09%) ⬆️
ubuntu-latest_3.11_test_plotting 91.94% <92.15%> (?)
ubuntu-latest_3.12_test_plotting 91.94% <92.15%> (?)
ubuntu-latest_3.8_test_min 68.83% <76.47%> (?)
ubuntu-latest_3.8_test_plot_min 91.61% <92.15%> (?)
ubuntu-latest_3.8_test_plotting 91.91% <92.15%> (?)
ubuntu-latest_3.9_test_plotting 91.91% <92.15%> (?)
windows-latest_3.10_test_plotting 91.92% <92.15%> (?)
windows-latest_3.11_test_plotting 91.92% <92.15%> (?)
windows-latest_3.12_test_plotting 91.92% <92.15%> (?)
windows-latest_3.8_test_plotting 91.88% <92.15%> (?)
windows-latest_3.9_test_plotting 91.89% <92.15%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Remi-Gau Remi-Gau requested a review from mtorabi59 March 6, 2024 22:08
@Remi-Gau
Copy link
Collaborator Author

After the resampling of the label image, some regions are dropped (values 42 and 117 are dropped: see masker.labels_ below). However when checking the masker.region_names_ you see that both regions are reported to be part of the region signals that were extracted.

However 3 regions that should be in masker.region_names_ are not:

* (148, 'R S_temporal_inf')

* (149, 'R S_temporal_sup')

* (150, 'R S_temporal_transverse')

OK I don't think this specific issue with that atlas (or atlases with similar problems) will get resolved until we clean up what our atlases fetchers return so this should be tackled in another issue / PR.

Comment on lines -297 to -311
# Number of regions excluding the background
number_of_regions = np.sum(
np.unique(labels_image_data) != self.background_label
)
# Basic safety check to ensure we have as many labels as we
# have regions (plus background).
if (
self.labels is not None
and len(self.labels) != number_of_regions + 1
):
raise ValueError(
"Mismatch between the number of provided labels "
f"({len(self.labels)}) and the number of regions in "
f"provided label image ({number_of_regions + 1})."
)
Copy link
Collaborator Author

@Remi-Gau Remi-Gau Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extracted this in separate methods so they can be used for checking the labels

  • the constructor
  • and also when checking self.labels_ in the transform_single_imgs method

@Remi-Gau Remi-Gau requested a review from bthirion March 19, 2024 21:45
@Remi-Gau
Copy link
Collaborator Author

@bthirion
OK I will stop tinkering with this one because the bug is fixed (no more crashes) even if some of the region names are still wrong for some of our atlases.
But to properly fix this, it will be easier to clean our atlases first.

So I would suggest:

  • merge this PR
  • open an issue to report the problem that for some atlases the region names in the reports and the content of region_names_ are not correct
  • "standardize our atlas"
  • fix the remaining issue

self.background_label = background_label
self._original_region_ids = self._get_labels_values(self.labels_img)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introducing a new private attribute to keep track of the labels of the image:

  • this allows to check that the labels and number of regions match at instantiation
  • should later allow to know which regions were dropped during resampling

@Remi-Gau
Copy link
Collaborator Author

failure of the pre-release workflow is unrelated

@bthirion
Copy link
Member

Can you clarify what you mean with " some of the region names are still wrong for some of our atlases" ? How do you diagnose that ?

Copy link
Member

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM

@Remi-Gau
Copy link
Collaborator Author

Can you clarify what you mean with " some of the region names are still wrong for some of our atlases" ? How do you diagnose that ?

Sorry that was a reference to a message above. Copying the important bit below.

For the destrieux atlas, some regions are dropped (values 42 and 117 are dropped: see in masker.labels_). However when checking the masker.region_names_ you see that both regions are reported to be part of the region signals that were extracted.

However 3 regions that should be in masker.region_names_ are not:

  • (148, 'R S_temporal_inf')
  • (149, 'R S_temporal_sup')
  • (150, 'R S_temporal_transverse')

@Remi-Gau
Copy link
Collaborator Author

Actually was also checking what the reports would look with these "misnamed" regions and it turns out that we get failures to generate reports with several atlases.

  • fetch_atlas_destrieux_2009
  • fetch_atlas_aal
  • fetch_atlas_basc_multiscale_2015
  • fetch_atlas_yeo_2011
  File "/home/remi/github/nilearn/nilearn/../tmp.py", line 80, in main
    report = masker.generate_report()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/remi/github/nilearn/nilearn/nilearn/maskers/nifti_labels_masker.py", line 350, in generate_report
    return generate_report(self)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/remi/github/nilearn/nilearn/nilearn/reporting/html_report.py", line 236, in generate_report
    return _create_report(estimator, data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/remi/github/nilearn/nilearn/nilearn/reporting/html_report.py", line 241, in _create_report
    overlay, image = _define_overlay(estimator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/remi/github/nilearn/nilearn/nilearn/reporting/html_report.py", line 164, in _define_overlay
    displays = estimator._reporting()
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/remi/github/nilearn/nilearn/nilearn/maskers/nifti_labels_masker.py", line 387, in _reporting
    self._check_mismatch_labels_regions(label_values, tolerant=False)
  File "/home/remi/github/nilearn/nilearn/nilearn/maskers/nifti_labels_masker.py", line 333, in _check_mismatch_labels_regions
    raise ValueError(msg)
ValueError: Mismatch between the number of provided labels (151) and the number of regions in provided label image (149).

@Remi-Gau
Copy link
Collaborator Author

OK just checked and the report generation problem is not due to the latest release but could reproduce with older versions (at least 0.10.1, have not checked further).

Will open an issue for this one as well, even if it is related to problem mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] NiftiLabelsMasker transform() error with Schaefer atlas
4 participants