Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend method BIDSPath().match() to enable smart path search #1098

Closed
moritz-gerster opened this issue Nov 14, 2022 · 10 comments · Fixed by #1103
Closed

Extend method BIDSPath().match() to enable smart path search #1098

moritz-gerster opened this issue Nov 14, 2022 · 10 comments · Fixed by #1103

Comments

@moritz-gerster
Copy link
Contributor

moritz-gerster commented Nov 14, 2022

Describe the problem

I love the BIDSPath().match() method for returnigng all bids paths in my root directory as a list. I can then loop over this list to, for example, preprocess all my files.

However, often I would like to filter that list and only loop over some subjects or some tasks. Unfortunately, this is not possible at the moment.

Describe your solution

One could add all the keyword arguments from the BIDSPath class to the match() method. The allowed kwarg types should be both strings (as for BIDSPath) but also list of strings.

If I have subjects=["sub-01", "sub-02", "sub-03"] and I want to loop over the first two, I could get

paths = BIDSPath().match(subject=["sub-01", "sub-02"])

or

paths = BIDSPath().match(task="Rest")

or

paths = BIDSPath().match(task=["Rest", "Move"])

Describe possible alternatives

One could also consider to use the ignore kwargs from mne_bids.get_entity_vals()

paths = BIDSPath().match(ignore_subjects=["sub-03"])

Additional context

No response

@moritz-gerster
Copy link
Contributor Author

`BIDSPath.entities` object. Ignores ``.json`` files.

I also have a use case for not ignoring json files. I would like to save my FOOOF fit data as json files and get the paths using the match method. However, I understand if this will not be supported as BIDS is not defined for FOOOF derivatives.

@agramfort
Copy link
Member

agramfort commented Nov 19, 2022 via email

@moritz-gerster
Copy link
Contributor Author

Yes exactly @agramfort.

If I want to analyze certain files

subjects = ["01", "02", "03"]
tasks = ["Rest", "Move"]
sessions = ["MedOn", "MedOff"]

it would be nice if

paths = []
bids_path = BIDSPath(root=root)
for subject in subjects:
    bids_path.update(subject=subject)
    for session in sessions:
        bids_path.update(session=session)
        for task in tasks:
            paths.extend(bids_path.update(task=task).match())

would reduce to

paths = BIDSPath(root=root).match(task=tasks, subject=subjects, session=sessions)

@agramfort
Copy link
Member

agramfort commented Nov 19, 2022 via email

@moritz-gerster
Copy link
Contributor Author

That would solve the issue 🙂

@agramfort
Copy link
Member

agramfort commented Nov 19, 2022 via email

@moritz-gerster
Copy link
Contributor Author

mne-bids/mne_bids/path.py

Lines 817 to 830 in 1b04da8

# allow searching by datatype
# all other entities are filtered below
if self.datatype is not None:
search_str = f'*/{self.datatype}/*'
else:
search_str = '*.*'
paths = self.root.rglob(search_str)
# Only keep files (not directories), and omit the JSON sidecars.
paths = [p for p in paths
if p.is_file() and p.suffix != '.json']
fnames = _filter_fnames(paths, suffix=self.suffix,
extension=self.extension,
**self.entities)

I don't understand lines 817-822. Why do we search for datatype separately and not with all other self.entities in _filter_fanmes?

@agramfort
Copy link
Member

agramfort commented Nov 20, 2022 via email

@moritz-gerster
Copy link
Contributor Author

I now understand this.

_filter_fnames only considers the basenames, not the full path (hence "_filter_fnames" and not "_filter_bidspaths"). datatype is not present in the basename, only in the path, therefore, datatype must be filtered earlier.

bids_path.fpath = 'rawdata/sub-EL002/ses-EcogLfpMedOff02/ieeg/sub-EL002_ses-EcogLfpMedOff02_task-Rest_acq-StimOff_run-1_proc-cleaned_rec-TMSi_channels.tsv'

-> f'*/{self.datatype}/*'

fname = "sub-EL002_ses-EcogLfpMedOff02_task-Rest_acq-StimOff_run-1_proc-cleaned_rec-TMSi_channels.tsv'
-> does not contain datatype=ieeg

@moritz-gerster
Copy link
Contributor Author

great ! could open a PR to do this? thx

Yes, I'm working on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants