Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add function to enable smart path search #1098 #1103

Merged
merged 22 commits into from Nov 22, 2022

Conversation

moritz-gerster
Copy link
Contributor

@moritz-gerster moritz-gerster commented Nov 21, 2022

PR Description

Fixes #1098. The added function find_matching_paths returns all found bids_paths similar to BIDSPath().match(). However, in this function, it is possible to filter for several entities which can be passed as lists. For example

bids_paths = find_matching_paths(root, subjects=["01", "02", "03"], sessions="MedOn", tasks=["Rest", "Move"])

Since this function is very similar to BIDSPath().match() I added the private functions _fnames_to_bidspaths and _return_root_paths and refactored the match() method to avoid code repetition. Also note, that the private function _filter_fnames was extended to enable passing lists of strings as input arguments.

Merge checklist

Maintainer, please confirm the following before merging.
If applicable:

  • All comments are resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • New contributors have been added to CITATION.cff
  • PR description includes phrase "closes <#issue-number>"

@welcome
Copy link

welcome bot commented Nov 21, 2022

Hello! 👋 Thanks for opening your first pull request here! ❤️ We will try to get back to you soon. 🚴🏽‍♂️


regexp = (
leading_path_str +
sub_str + ses_str + task_str + acq_str + run_str + proc_str +
rec_str + space_str + split_str + desc_str + suffix_str + ext_str
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I fixed a small bug. "space" must be searched before "recording", according to their order in the basename.

@hoechenberger
Copy link
Member

Thanks for the contribution! I merged main, let's see how CI like this 🤓

@codecov
Copy link

codecov bot commented Nov 21, 2022

Codecov Report

Merging #1103 (2e000ba) into main (b97ce0b) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1103      +/-   ##
==========================================
+ Coverage   95.24%   95.28%   +0.03%     
==========================================
  Files          24       24              
  Lines        3873     3900      +27     
==========================================
+ Hits         3689     3716      +27     
  Misses        184      184              
Impacted Files Coverage Δ
mne_bids/path.py 97.70% <100.00%> (+0.08%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@hoechenberger
Copy link
Member

All green! I don't have time to review, but maybe @sappelhoff, @agramfort, or @adam2392 can find a couple minutes :)

@hoechenberger
Copy link
Member

@moritz-gerster We'll definitely need a changelog entry though :) in whats_new.rst

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add the new function to the API docs, and I'd be very happy to see it in action in any one of our existing examples.

Thanks for your contribution, Moritz!

@@ -44,6 +44,7 @@ Detailed list of changes
- Add :meth:`mne_bids.BIDSPath.find_matching_sidecar` to find the sidecar file associated with a given file path by `Eric Larson`_ (:gh:`1093`)
- When writing data via :func:`~mne_bids.write_raw_bids`, it is now possible to specify a custom mapping of :class:`mne.Annotations` descriptions to event codes via the ``event_id`` parameter. Previously, passing this parameter would always require to also pass ``events``, and using a custom event code mapping for annotations was impossible, by `Richard Höchenberger`_ (:gh:`1084`)
- Improve error message when :obj:`~mne_bids.BIDSPath.fpath` cannot be uniquely resolved by `Eric Larson`_ (:gh:`1097`)
- Add :func:`mne_bids.find_matching_paths` to get all bids paths that match the requirements by :newcontrib:`Moritz Gerster` (:gh:`1103`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct @hoechenberger? In the mne_bids contribution guide it is not described how to add lines so I followed the mne contributing guide.

Here I didn't add my name because I guess this works automatically?

The following authors contributed for the first time. Thank you so much! 🤩

* ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this works automatically

unfortunately not :-)

I followed the mne contributing guide.

it's a bit different here. You just add your name to the list you quoted above, and we don't have the sphinx role ":new_contrib:", so you can just follow the style of previous changelog entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I checked the old versions of whats_new.rst and understood that you don't have :newcontrib: as in mne.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want the function to be available as mne_bids.find_matching_paths, for example, you'd need to make additions here:

from mne_bids import commands
from mne_bids.report import make_report
from mne_bids.path import (BIDSPath, get_datatypes, get_entity_vals,
print_dir_tree, get_entities_from_fname,
search_folder_for_text, get_bids_path_from_fname)
from mne_bids.read import get_head_mri_trans, read_raw_bids
from mne_bids.utils import get_anonymization_daysback
from mne_bids.write import (make_dataset_description, write_anat,
write_raw_bids, mark_channels,
write_meg_calibration, write_meg_crosstalk,
get_anat_landmarks, anonymize_dataset)
from mne_bids.sidecar_updates import update_sidecar_json, update_anat_landmarks
from mne_bids.inspect import inspect_dataset
from mne_bids.dig import (template_to_head, convert_montage_to_ras,
convert_montage_to_mri)

for example look at BIDSPath in that file --> if it weren't where it is, you would always need to do mne_bids.path.BIDSPath.

And for the API docs, you need to add it here (provided you followed the above):

mne-bids/doc/api.rst

Lines 19 to 22 in b97ce0b

.. autosummary::
:toctree: generated/
write_raw_bids

Thanks @sappelhoff! Added with 1bc097e.

@moritz-gerster
Copy link
Contributor Author

We also need to add the new function to the API docs, and I'd be very happy to see it in action in any one of our existing examples.

@sappelhoff, how do I add the function to the API docs? I have never done that before. I thought this happens automatically via the docstring.

@sappelhoff
Copy link
Member

if you want the function to be available as mne_bids.find_matching_paths, for example, you'd need to make additions here:

from mne_bids import commands
from mne_bids.report import make_report
from mne_bids.path import (BIDSPath, get_datatypes, get_entity_vals,
print_dir_tree, get_entities_from_fname,
search_folder_for_text, get_bids_path_from_fname)
from mne_bids.read import get_head_mri_trans, read_raw_bids
from mne_bids.utils import get_anonymization_daysback
from mne_bids.write import (make_dataset_description, write_anat,
write_raw_bids, mark_channels,
write_meg_calibration, write_meg_crosstalk,
get_anat_landmarks, anonymize_dataset)
from mne_bids.sidecar_updates import update_sidecar_json, update_anat_landmarks
from mne_bids.inspect import inspect_dataset
from mne_bids.dig import (template_to_head, convert_montage_to_ras,
convert_montage_to_mri)

for example look at BIDSPath in that file --> if it weren't where it is, you would always need to do mne_bids.path.BIDSPath.

And for the API docs, you need to add it here (provided you followed the above):

mne-bids/doc/api.rst

Lines 19 to 22 in b97ce0b

.. autosummary::
:toctree: generated/
write_raw_bids

doc/whats_new.rst Outdated Show resolved Hide resolved
@moritz-gerster
Copy link
Contributor Author

moritz-gerster commented Nov 22, 2022

I have a question. What do you think about extending this proposed function to enable ignoring certain entity values?

Suggestion:

def find_matching_paths(root, subjects=None, sessions=None, tasks=None,
                        acquisitions=None, runs=None, processings=None,
                        recordings=None, spaces=None, splits=None,
                        descriptions=None, suffixes=None, extensions=None,
                        datatypes=None, ignore_subjects=None,
                        ignore_sessions=None, ignore_tasks=None,
                        ignore_runs=None, ignore_processings=None,
                        ignore_spaces=None, ignore_acquisitions=None,
                        ignore_splits=None, ignore_descriptions=None,  # ignore_modalities=None, # <- what's this?
                        ignore_datatypes=None, check=True):
    """Get list of all matching paths for all matching entity values.

...

    Returns
    -------
    bids_paths : list of mne_bids.BIDSPath
        The matching paths.

    """
    if datatypes is None and ignore_datatypes is not None:
        datatypes = get_entity_vals(root, 'datatype',
                                    ignore_datatypes=ignore_datatypes)
    fpaths = _return_root_paths(root, datatype=datatypes, ignore_json=False)

    if subjects is None and ignore_subjects is not None:
        subjects = get_entity_vals(root, 'subject',
                                   ignore_subjects=ignore_subjects)
    if sessions is None and ignore_sessions is not None:
        sessions = get_entity_vals(root, 'session',
                                   ignore_sessions=ignore_sessions)
    if tasks is None and ignore_tasks is not None:
        tasks = get_entity_vals(root, 'task', ignore_tasks=ignore_tasks)
    if runs is None and ignore_runs is not None:
        runs = get_entity_vals(root, 'run', ignore_runs=ignore_runs)
    if processings is None and ignore_processings is not None:
        processings = get_entity_vals(root, 'processing',
                                      ignore_processings=ignore_processings)
    if spaces is None and ignore_spaces is not None:
        spaces = get_entity_vals(root, 'space', ignore_spaces=ignore_spaces)
    if acquisitions is None and ignore_acquisitions is not None:
        acquisitions = get_entity_vals(root, 'acquisition',
                                       ignore_acquisitions=ignore_acquisitions)
    if splits is None and ignore_splits is not None:
        splits = get_entity_vals(root, 'split', ignore_splits=ignore_splits)
    if descriptions is None and ignore_descriptions is not None:
        descriptions = get_entity_vals(root, 'description',
                                       ignore_descriptions=ignore_descriptions)

    fpaths_filtered = _filter_fnames(fpaths,
                                     subject=subjects,
                                     session=sessions,
                                     task=tasks,
                                     acquisition=acquisitions,
                                     run=runs,
                                     processing=processings,
                                     recording=recordings,
                                     space=spaces,
                                     split=splits,
                                     description=descriptions,
                                     suffix=suffixes,
                                     extension=extensions)

Reason:

In the current implementation it is somewhat tedious to avoid certain entities. For example, if I want to avoid the subjects "emptyroom" and "03" and I want all tasks except for "Rest" and I don't need the ".tsv" files, I have to write

subjects = get_entity_vals(root, "subject", ignore_subjects=['03', 'emptyroom'])
tasks = get_entity_vals(root, "task", ignore_tasks='Rest')
bids_paths = find_matching_paths(root, subjects=subjects, tasks=tasks)

This becomes really tedious for users who don't know this function (I don't think everyone does), they might have to write

bids_paths = find_matching_paths(root, subjects=["01", "02", "04", "05", "06", "07, "08", "09", "10", "11", "emptyroom"],
                                                      tasks=["Hold", "Move", "Dance", "Read"])

If the study is ongoing and they collect more subjects they would have to manually edit "subjects" many times.

With the new function this would become

bids_paths = find_matching_paths(root, ignore_subjects=['03', 'emptyroom'], ignore_tasks='Rest')

What do you think? I haven't tested the code but I think it would work. If both "subjects" and "ignore_subjects" is provided, "ignore_subjects" would be ignored.

mne_bids/path.py Outdated
@@ -1951,8 +1951,8 @@ def _filter_fnames(fnames, *, subject=None, session=None, task=None,
if split else r'(|_split-([^_]+))')
desc_str = (r'_desc-(' + '|'.join(description) + ')'
if description else r'(|_desc-([^_]+))')
suffix_str = (r'_(' + '|'.join(suffixe) + ')' if suffix
else (r'_(' + '|'.join(ALLOWED_FILENAME_SUFFIX) + ')'))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the function _filter_fnames() only returns paths with valid BIDS suffixes even if the two mother functions find_matching_paths() and BIDSPath.match() have the keyword argument check=False.

In theory, one could add the argument check=False to _filter_fnames(check=False) and then code

    if suffix:
        suffix_str = r'_(' + '|'.join(suffix) + ')'
    elif suffix is None and check:
        # only return files with suffixes that are valid in BIDS
        suffix_str = (r'_(' + '|'.join(ALLOWED_FILENAME_SUFFIX) + ')'))
    elif suffix is None and not check:
        suffix_str = r'_([^_]+)'

However, this seems overkill to me since the BIDS conformity is checked afterwards using _fnames_to_bidspaths(check=check).

@sappelhoff
Copy link
Member

This becomes really tedious for users who don't know this function (I don't think everyone does), they might have to write

A good reason to advertise that function in an example with the new find_matching_paths 😉 (are you going to expand one of the existing examples to somehow use your new function? --> this would also create a link from the API docs to that example, which is very useful for people to see your feature in action)

Personally, I find your example

subjects = get_entity_vals(root, "subject", ignore_subjects=['03', 'emptyroom'])
tasks = get_entity_vals(root, "task", ignore_tasks='Rest')
bids_paths = find_matching_paths(root, subjects=subjects, tasks=tasks)

straight forward. On the other hand having "ignore" keywords in the function signature makes it veeeery long and messy 🤔

@moritz-gerster
Copy link
Contributor Author

Personally, I find your example

subjects = get_entity_vals(root, "subject", ignore_subjects=['03', 'emptyroom'])
tasks = get_entity_vals(root, "task", ignore_tasks='Rest')
bids_paths = find_matching_paths(root, subjects=subjects, tasks=tasks)

straight forward. On the other hand having "ignore" keywords in the function signature makes it veeeery long and messy 🤔

Ok, I agree :) Let's keep it as it is

@moritz-gerster
Copy link
Contributor Author

This becomes really tedious for users who don't know this function (I don't think everyone does), they might have to write

A good reason to advertise that function in an example with the new find_matching_paths 😉 (are you going to expand one of the existing examples to somehow use your new function? --> this would also create a link from the API docs to that example, which is very useful for people to see your feature in action)

I have looked through the examples and I didn't find a good place to insert it. Because if I exchange a BIDSPath().match() example with this function, then .match() is no longer documented. However, if I add the function and show that it yields the same, the doc becomes longer and contains more than just the most important code snippets. But I guess this would be ok?

I'll have a look again. Let me know if you find a good place to use it.

@moritz-gerster
Copy link
Contributor Author

Ok, I think I found a good example. I'll push it soon.

# Note that this is the same as running:
session = 'off'
bids_path = BIDSPath(root=bids_root, session=session, datatype=datatype)
print(bids_path.match(ignore_json=True))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we print the same list twice to show that the result is the same or should we comment out

# print(bids_path.match(ignore_json=True))

and just say that this would yield the same?

I went for printing it twice to make it as obvious as possible.

@moritz-gerster
Copy link
Contributor Author

moritz-gerster commented Nov 22, 2022

I'm done working on it @sappelhoff :)

I added two examples.

Co-authored-by: Richard Höchenberger <richard.hoechenberger@gmail.com>
Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add yourself to this file, please? Just below Denis

- given-names: Denis

thanks!

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with this, great work @moritz-gerster.

+1 to merge once CIs are green

@hoechenberger hoechenberger merged commit 5a3e3b1 into mne-tools:main Nov 22, 2022
@welcome
Copy link

welcome bot commented Nov 22, 2022

🎉 Congrats on merging your first pull request! 🥳 Looking forward to seeing more from you in the future! 💪

@hoechenberger
Copy link
Member

Thanks @moritz-gerster!

@moritz-gerster
Copy link
Contributor Author

Thank you two for all your effort!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend method BIDSPath().match() to enable smart path search
3 participants