New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] integrate load_confounds into first_level_from_bids #4103
Conversation
👋 @Remi-Gau Thanks for creating a PR! Until this PR is ready for review, you can include the [WIP] tag in its title, or leave it as a github draft. Please make sure it is compliant with our contributing guidelines. In particular, be sure it checks the boxes listed below.
For new features:
For bug fixes:
We will review it as quick as possible, feel free to ping us with questions if needed. |
nilearn/_utils/bids.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to move the create_bids_filename in a different module to help avoid circular imports
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of these changes in here are black related
may conflict with #3285
models, m_imgs, m_events, m_confounds = first_level_from_bids( | ||
dataset_path=bids_path, | ||
task_label="main", | ||
space_label="MNI", | ||
img_filters=[("desc", "preproc")], | ||
slice_time_ref=None, | ||
confounds_strategy=("motion", "wm_csf", "scrub"), | ||
confounds_motion="full", | ||
confounds_wm_csf="basic", | ||
confounds_scrub=1, | ||
confounds_fd_threshold=0.2, | ||
confounds_std_dvars_threshold=3, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this what the "API" could look like to select some confounds: does it look sensible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this could be made lighter if we expect to have predefined configurations for confounds, but I don't think that we're at that point.
At least I find the current API quite explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we might get rid of confounds_strategy
, because the next three arguments are redundant with the provided list ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this test just show a subset of the possible arguments that can be passed to load_confounds
I guess this could be made lighter if we expect to have predefined configurations for confounds, but I don't think that we're at that point.
Actually, we might get rid of
confounds_strategy
, because the next three arguments are redundant with the provided list ?
Actually I am just reusing the API from load_confounds, so it can almost be passed as is and we can let load_confounds do the argument validation
https://nilearn.github.io/dev/modules/generated/nilearn.interfaces.fmriprep.load_confounds.html
In short strategy
defines what type of confounds to include and all the other parameters give more details on how to include them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update the doc string to try to explain this so we can see if it makes sense
one thing we may want to check and send warning about the compatibility between some confound strategies and first level arguments: especially regarding high pass filters |
confounds, metadata = get_legal_confound() | ||
confounds.to_csv( | ||
confounds_path, sep="\t", index=None, encoding="utf-8" | ||
) | ||
with open(confounds_path.with_suffix(".json"), "w") as f: | ||
json.dump(metadata, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue with this approach is that the "legal_confounds" have a set number of time points so when creating a fake bids dataset, we end up with images that have a number of time points that does not match the number of time points in the confounds.
This does not affect any tests AFIACT but this may lead to confusing errors when testing down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, because of scrubbing ? Sorry if I miss something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope
let me try to rephrase
the way to generate "fake confounds" for the fake fmriprep datasets we use for testing would only create 6 confounds for the realignment parameters filled with random data and for a specified number of time points
to allow testing the load_confounds we need more realistic confounds with more columns with names that match what's in an actual fmriprep dataset
to do this I reuse the strategy used to test the load_confounds functions: use an actual confound file from an fmriprep dataset and copy its content every time it is needed in the fake fmriprep dataset
but this "template" confound file has only a limited number of time points
so we end up with fake fmriprep datasets that have nifti images with 100 volumes but with confounds with only 30 time points
possible solutions:
- easy: set the number of volumes to match the number of time points in the confounds
- hard(er): adapt the content of the confounds to the number of time points
hope this is clearer
for now I will go for the easy solution but we may have to implement the harder option in the future if we want to test more "exotic" stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go for the easy one. The number of volumes should be a parameters of the data simulation function anyhow ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now it is not: it is hard coded. I would keep it that way for now until we need more flexibility during testing.
but I will change the place where it is hard coded so it is easier to adapt in the future. will also add a comment to explain why this value was chosen.
with open(confounds_path.with_suffix(".json"), "w") as f: | ||
json.dump(metadata, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor change: using legal_confounds allows to add metada files for the confounds in the fake bids derivatives. some tests had to be changed to account for this.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #4103 +/- ##
==========================================
+ Coverage 91.85% 91.86% +0.01%
==========================================
Files 145 146 +1
Lines 16360 16384 +24
Branches 3424 3432 +8
==========================================
+ Hits 15027 15051 +24
+ Misses 792 788 -4
- Partials 541 545 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thought: should this functionality be demoed in the examples? |
kwargs: :obj:`dict` | ||
|
||
.. added:: 0.11.0 | ||
|
||
Keyword arguments to be passed to functions called within this function. | ||
|
||
Kwargs prefixed with ``confound_`` | ||
will be passed to :func:`~nilearn.interfaces.fmriprep.load_confounds`. | ||
This allows to ``first_level_from_bids`` to return | ||
a specific set of confounds by relying confound loading strategies | ||
defined in :func:`~nilearn.interfaces.fmriprep.load_confounds`. | ||
If no kwargs are passed, ``first_level_from_bids`` will return | ||
all the confounds available in the confounds TSV files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bthirion let me know if this helps clarify how to use this.
I prefer to add examples here and refer users to the doc of load_confounds for the details to avoid doc duplications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed !
Is this ready for review ? |
I would say yes, though I need to better check how to handle the second TODO mentioned in the top post of the PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall !
|
||
.. code-block:: python | ||
|
||
models, m_imgs, m_events, m_confounds = first_level_from_bids( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
models, m_imgs, m_events, m_confounds = first_level_from_bids( | |
models, imgs, events, confounds = first_level_from_bids( | |
``` (not sure what the `m_` means) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a copy-pasta from the code we have in the tests: probably could change it there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI those were shorter form of the name of the return argument:
- models_run_imgs
- models_events
- models_confounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe even if this makes the doc more verbose, I should use their full name to be internally consistent in the doc string ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer models, imgs, events, confounds = first_level_from_bids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bthirion I renamed those variable in the doc strings.
they also appear like this in a few tests, I could do a bit of renaming there too just for internal consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
I'm not against demoing in an example. I would always just consider the amount of additional build time, whether or not it is already well documented (which the added docstring does very well), and if parts of an example can be replaced or minimally tweaked to essentially improve them using this new functionality. |
I will do a draft in a separate PR but I was considering a minimal tweak to an already existing example. |
Co-authored-by: Yasmin <63292494+ymzayek@users.noreply.github.com>
"Conflict" to resolve: load_confounds
and when the compcor strategy is requested "high_pass" must be as well
GLM first leveldrift_model can be cosine or polynomial or none
There are some combinations of the above that could lead to "strange" design matrices:
To keep things simple I would say that we let the GLM machinery handle the high pass filtering setting and we ignore in the load_confounds anything that has to do with high pass filtering. This means that we should also ignore (for now) the compcor strategy. |
@bthirion before I start on this, can you tell me if the brief description above makes roughly sense as to what the problem is? |
You mean, the problem of duplication of high-pass filtering with the use of compcorr ? My view on this is that this is the user responsibility. You can't prevent people from doing wrong things. The point is to help them diagnose it easily using visualization of the design matrix they created. Does that answer your concern ? |
You can't prevent people from doing wrong things but you can add more friction to make it harder for them to do so. 😋 I think my views may be a bit more "paternalistic" (more tainted by automated pipelines and avoiding options that would considered wrong in most cases) but you have way more experience than me in the "nilearn philosophy" so I will gladly follow your lead on this. I will at least add some explicit warnings to tell them they have chosen options that may be redundant or clash with each other. |
Agreed. My impression is that when we impose constraints, we may not foresee all practical consequences. |
Currently we do safegard the compcor and high pass in |
models, m_imgs, m_events, m_confounds = first_level_from_bids( | ||
models, models, imgs, events, confounds = first_level_from_bids( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed the variables as mentioned in the PR discussion
if drift_model is not None and kwargs_load_confounds is not None: | ||
if "high_pass" in kwargs_load_confounds.get("strategy"): | ||
if drift_model == "cosine": | ||
verb = "duplicate" | ||
if drift_model == "polynomial": | ||
verb = "conflict with" | ||
|
||
warn( | ||
f"""Confounds will contain a high pass filter, | ||
that may {verb} the {drift_model} one used in the model. | ||
Remember to visualize your design matrix before fitting your model | ||
to check that your model is not overspecified.""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the phrasing of the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me toot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thx.
if drift_model is not None and kwargs_load_confounds is not None: | ||
if "high_pass" in kwargs_load_confounds.get("strategy"): | ||
if drift_model == "cosine": | ||
verb = "duplicate" | ||
if drift_model == "polynomial": | ||
verb = "conflict with" | ||
|
||
warn( | ||
f"""Confounds will contain a high pass filter, | ||
that may {verb} the {drift_model} one used in the model. | ||
Remember to visualize your design matrix before fitting your model | ||
to check that your model is not overspecified.""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me toot.
If I don't hear back from anyone I will merge this on Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for implementing this!
load_confounds
intofirst_level_from_bids
#4090Changes proposed in this pull request:
confounds_*
arguments are passedTODO