Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] allow read_raw_bids to take additional input parameters #244

Closed
wants to merge 12 commits into from

Conversation

sappelhoff
Copy link
Member

closes #203

Merge checklist

Maintainer, please confirm the following before merging:

  • All comments resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • PR description includes phrase "closes <#issue-number>"
  • Commit history does not contain any merge commits

@sappelhoff sappelhoff marked this pull request as ready for review July 22, 2019 08:58
@sappelhoff sappelhoff changed the title allow read_raw_bids to take additional input parameters [MRG] allow read_raw_bids to take additional input parameters Jul 22, 2019
@codecov-io
Copy link

codecov-io commented Jul 22, 2019

Codecov Report

Merging #244 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #244      +/-   ##
==========================================
+ Coverage   98.41%   98.43%   +0.02%     
==========================================
  Files          15       15              
  Lines        1573     1597      +24     
==========================================
+ Hits         1548     1572      +24     
  Misses         25       25
Impacted Files Coverage Δ
mne_bids/tests/test_read.py 100% <100%> (ø) ⬆️
mne_bids/read.py 99.26% <100%> (+0.09%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2547049...d94e7d3. Read the comment docs.

@jasmainak
Copy link
Member

I don't quite understand this PR. Shouldn't BIDS structure tell you where these additional files are? You shouldn't have to expose additional arguments for this.

if ext in ['.con', '.sqd']:
read_params = dict(elp=opt.electrode, hsp=opt.hsp, hpi=opt.hpi)
elif ext == '.pdf':
read_params = dict(config_fname=opt.config, head_shape_fname=opt.hsp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't have to specify them. It should come from BIDS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets passed to _read_raw so it's not yet in BIDS AFAIK.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the write utility of the CLI ... so the data is not yet in BIDS and has to be supplied "manually"

parameters of MNE-Python's read_raw_* function for the data format you
intend to read. For example, when reading fif data you can specify
`read_params` as {allow_maxshield=False}. Non-fitting parameters will
be ignored. Not all parameters are currently supported. If None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dangerous route!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100. Don't silently ignore things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am raising Userwarning in verbose mode ... should I

  1. always raise a warning?
  2. always throw an error?
  3. ... do something else? what?

The keys of the dictionary must correspond with potential input
parameters of MNE-Python's read_raw_* function for the data format you
intend to read. For example, when reading fif data you can specify
`read_params` as {allow_maxshield=False}. Non-fitting parameters will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't bother adding a new argument just for this. Let's do use-case driven software engineering. Ask people around you to try mne-bids and raise issues. It will give you a sense of gratification but also give a sense of priority.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fully agreed. Also besides allow_maxshield all the other params contain info that should be present in the BIDS sidecar files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my response: #244 (comment)

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I would have relied on something like

read_raw_foo(fname, ..., **read_params)

if you have a parameter that is not valid you get a clear Python exception

if ext in ['.con', '.sqd']:
read_params = dict(elp=opt.electrode, hsp=opt.hsp, hpi=opt.hpi)
elif ext == '.pdf':
read_params = dict(config_fname=opt.config, head_shape_fname=opt.hsp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this gets passed to _read_raw so it's not yet in BIDS AFAIK.

The keys of the dictionary must correspond with potential input
parameters of MNE-Python's read_raw_* function for the data format you
intend to read. For example, when reading fif data you can specify
`read_params` as {allow_maxshield=False}. Non-fitting parameters will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fully agreed. Also besides allow_maxshield all the other params contain info that should be present in the BIDS sidecar files.

parameters of MNE-Python's read_raw_* function for the data format you
intend to read. For example, when reading fif data you can specify
`read_params` as {allow_maxshield=False}. Non-fitting parameters will
be ignored. Not all parameters are currently supported. If None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+100. Don't silently ignore things.

@sappelhoff
Copy link
Member Author

I don't quite understand this PR

I prepared this PR to target #203

In #203 I argued that our private function _read_raw supports digitization files as inputs ... but these are not exposed to its "parent read_raw_bids.

That is what I wanted to do with this PR (no more, no less).

@sappelhoff
Copy link
Member Author

Shouldn't BIDS structure tell you where these additional files are? You shouldn't have to expose additional arguments for this.

I agree, we should support that in the future. Let me talk only about MEG first:

It will be a major effort: We have to rely on these BIDS files:

and from these files, we have to extract the data that we need for a given data format:

  • fif --> only allow_maxshield should be a problem (rest contained in fif file)
  • bti --> needs headshape_fname
  • ctf --> should be no problem (all contained in .ds directory)
  • kit --> needs mrk, elp, hsp

(what about itab and kriss by the way? It seems like we are not supporting them at the moment? see BIDS MEG formats)

so we need to write reading functions for the BIDS files ... and then code to ...

  1. get the relevant data from the BIDS files for a given data format
  2. enrich the raw data of that format with the additional data (headshape, etc.)

I would not take this on currently, but rather wait for a feature request or user issue.

This present PR however provides an intermediate solution, until the "real deal" is implemented.

@sappelhoff
Copy link
Member Author

One last comment: I would separate this discussion between MEG and EEG, because the situation seems much simpler in EEG.

in EEG, we have electrodes.tsv with x,y,z coordinates and coordsystem.json with info on how to interpret the electrodes.tsv

The coordsystem.json also contains coordinates for LPA, RPA, and Nasion.

Reading these would be quite simple, and if we have LPA, RPA, Nasion, and the electrode positions with respect to these three landmarks, we can easily convert this into a Neuromag coordsystem as used in FIF, and we'll get rid of all problems :-)

@agramfort
Copy link
Member

FYI there is current a rewamping of the digitization / montage mechanism by @massich

the hsp, elp etc... should be handled outside of the read raw function. Vision is to be able to read them outside and then call a set_montage method on the raw.

@sappelhoff
Copy link
Member Author

FYI there is current a rewamping of the digitization / montage mechanism by @massich

yes, I really like that - do we know something about the timescale when this will be finished?

@agramfort
Copy link
Member

agramfort commented Jul 24, 2019 via email

@agramfort
Copy link
Member

@sappelhoff I would simplify this code. See something like this:

read_params = dict(elp=..., hsp=..., )
read_raw_bla(fname, **read_params)

by doing this no need to check for ignored params. It's easy to document as you need to point to mne-python documentation etc.

@jasmainak
Copy link
Member

It will be a major effort: We have to rely on these BIDS files:
*_coordsystem.json
*_headshape.pos
note: this is also in the coordsystem section
note 2: the headshape business is a bit unclear: there is an ongoing discussion about it, which has been open for a long time; community interest seems low

yes, but if not a GSoC, when will we undertake such a major effort? :) Then again, I think you should focus your efforts on the study template so you are the user and you figure out what are the issues. We have so far not hit this issue in the study template, or have we?

and from these files, we have to extract the data that we need for a given data format:
fif --> only allow_maxshield should be a problem (rest contained in fif file)

agreed.

bti --> needs headshape_fname

No, your headshape.pos should do the job

ctf --> should be no problem (all contained in .ds directory)
kit --> needs mrk, elp, hsp

These follow the BIDS naming specifications, e.g., sub-control01_ses-001_task-rest_run-01_markers.<mrk,sqd> as in the specification example.

so we need to write reading functions for the BIDS files

you'd simply read in a text file, so that should not be too difficult? Based on the link you posted, it seems to me that you just need to write a reader for CTF. We can ignore KRISS for now and for the rest you already have the digitizer info in the files themselves.

This present PR however provides an intermediate solution, until the "real deal" is implemented.

hmm ... maybe, but I'm not too enthusiastic about this solution.

@sappelhoff
Copy link
Member Author

yes, but if not a GSoC, when will we undertake such a major effort? :)

fair enough :-)

hen again, I think you should focus your efforts on the study template so you are the user and you figure out what are the issues.

ok

We have so far not hit this issue in the study template, or have we?

we hit it once, when we wanted to plot sensors, but Alex then fixed it by changing an argument and it's fine now.

This present PR however provides an intermediate solution, until the "real deal" is implemented.

hmm ... maybe, but I'm not too enthusiastic about this solution.

okay, let's be pragmatic: Should I invest some more work and incorporate alex's feedback and then we merge, or should we close this?

@teonbrooks
Copy link
Member

drive by comment:

  • for read bids folders in, it should rely on the basic reader and the sidecar files.
  • for writing, hopefully the new digization refactoring will make it clean and straightforward for non-FIF system to load their dig points and a simple function and be added to the raw object to ultimately be written out to the sidecars.
  • I think the cosmetics things are fine to merge
  • I think the additional read_params param should be avoided (should be handled in other like above).

is there a way to set allow_maxshield with FIF objects without adding a param?

@sappelhoff
Copy link
Member Author

Thanks for weighing in @teonbrooks :-)

I think the cosmetics things are fine to merge

are there any cosmetic things though? The main refactoring of this PR is the introduction of the _get_read_params function, which is then used for the read_params param.

However, if we are not exposing read_params, I don't think we need the _get_read_params function.

Please advise:

@agramfort
Copy link
Member

agramfort commented Jul 29, 2019 via email

@massich
Copy link

massich commented Jul 29, 2019

yes, I really like that - do we know something about the timescale when this will be finished?

@sappelhoff this is my top priority right now. And the sooner I get done with it the better. Ping me anytime.

@teonbrooks
Copy link
Member

Again I would just add a read_params dict() param to read_raw_bids saying that these are kwargs that can be passed to mne read functions cf mne documentation

that makes sense. this is me playing the 4 year long game of mne-tools/mne-python#1930 😄

@jasmainak
Copy link
Member

Let me take over this PR for a moment, so that @sappelhoff you can concentrate on the study-template :) You can review this when you are bored.

mrk=hpi, preload=False)
mrk = _find_matching_sidecar(bids_fname, bids_root,
'_markers.???')
raw = io.read_raw_kit(raw_fpath, elp=None, hsp=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mne-bids only copies the mrk file but mne-python expects also elp and hsp to be provided. This tells me that the writing of this file is already broken. cc @sappelhoff

@jasmainak jasmainak changed the title [MRG] allow read_raw_bids to take additional input parameters [WIP] allow read_raw_bids to take additional input parameters Jul 30, 2019
@jasmainak
Copy link
Member

@agramfort I'm fine with adding **kwargs if you insist ... so @teonbrooks finally has what he wished for 4 years ago ;-)

But, how do you propose to deal with conflicting kwargs? E.g.: read_raw_edf has an argument called eog and then we also read the channel types from the BIDS sidecar files. If you don't expose this parameter, it's clear that the BIDS sidecar should be used to set the EOG channel. However, now there is a conflict.

Also, I feel doing this will hide some of the issues that you have in BIDS and mne-bids. For example, when I wrote the tests for reading the hsp files in the KIT reader, it's when I realized that all the files necessary are not written by mne-bids.

@sappelhoff I left some todos for you. I think the PR is back to you as I won't have more time :-) Sorry I made a bit of mess while I was playing around, feel free to add issues and revert my commits if that's more efficient.

@agramfort
Copy link
Member

agramfort commented Jul 30, 2019 via email

('y', y),
('z', z)])
_write_tsv(fname, data, overwrite=overwrite, verbose=verbose)
return fname
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is slightly problematic, because like this, we will write electrodes.tsv data that might originate from an idealized Montage as opposed to a measured DigMontage 🤔

for BIDS data, we only want to write electrode positions (x,y,z) if these have been measured/digitized ... not some template or idealized positions

cc @jasmainak

I'll have to figure out if it's possible in MNE to distinguish between a DigMontage and a Montage ... currently it seems like once they are set, they are treated equally.

Ideally, I would vote for only writing electrodes.tsv files if we know that

  • the electrode positions are MEASURED
  • we have NAS, PRA, LPA to scale the positions to FIFF space

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that? Does the BIDS specification make this mandate somewhere? I couldn't find it with a quick search ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think much consideration has been given toward template data in the digitization in the bids discussion. i think it would be nice to have this information but if the analysis ultimately rest on the template, the digpoint data should exist somewhere (in the electrode file or in the attached template). it would be incumbent on the researcher to describe it in their methods somehow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me either you have it described in your electrode file, if you have a digitization or in the template if you are using a template. (and the template structure should be a replica of electrode file).

Copy link
Member Author

@sappelhoff sappelhoff Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like researchers / data-providers should only provide measured data (=digitized, true sensor positions) ... template positions etc. should be provided by the analysis software.

Sharing template positions without an indicator that the positions in question are in fact just some "standard", "idealized" positions can easily lead to misunderstandings, where somebody thinks that these are true positions (...and this would lead to errors in source localization)

With this reasoning, a researcher would write an electrodes.tsv and accompanying coordsystem.json only if they actually have true positions ... else, they just omit this info and the analysis software can provide a template based on the data format.

Think about it this way: If researchers / data-providers also shared an electrodes.tsv for template positions for each dataset and datafile, we'd have a lot of duplication and clutter ... because the template positions are always the same, depending on the sensor name and manufacturer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is true for all kinds of sensors (templates can almost always be unambiguously guessed)

This is not true because if this were true, you would not need raw.set_montage method or montage argument in the raw readers. As we are practical people, I invite you to try writing code to unambiguously guess it ;-) If it cannot be guessed in 5 tries, then the burden of sharing the electrode locations should lie on the researcher sharing the data, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it cannot be guessed in 5 tries, then the burden of sharing the electrode locations should lie on the researcher sharing the data, no?

okay, fair enough --> I already agreed that this could/should be fixed in BIDS (by adding a key-vaul pair whether the electrode locations are template positions or digitized positions)

but not everybody is using BIDS, so we will also have to deal with this in MNE-Python. The suggestion by @larsoner seems good to me at a first glance (using BOTH his points)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fyi, this is what Robert Oostenveld thinks about this: bids-standard/pyedf#7 (comment)

... the thread might be an interesting read for everybody working on digitization in MNE-Python

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me BIDS (MEG, EEG and iEEG) so far have always been on raw (i.e. measured and unprocessed) data, not on processed (template) or model data (like electrodes on a sphere). That might have had to be stressed more.

I would argue that this should be fixed in the BIDS specification not in MNE. If you share template positions, there should be clear indicator in the specification that it's a template. Ultimately, many researchers won't care as long as they can run their pipeline.

Agreed, I propose that the BIDS spec for raw data is updated to clarify that people should include measured data, not templates. Should that also be extended to template anatomical MRIs? I.e. would people right now also be including for example colin27 as anat/sub-01_T1w.nii, in case they don't have an anatomical MRI?

If you share the electrode locations only in some BIDS datasets, how is the researcher supposed to guess which template it belongs to?

That has been given explicit thought and is part of the specification: you SHOULD specify CapManufacturersModelName and EEGPlacementScheme. These are more precise than merely relying on electrode names, since cap designs vary between manufacturers and sometimes between Caucasian/Asian fit. If you (as experimenter) tell me what cap you used - I (as software developer) can provide you with the template - which I can also update and improve over the years. E.g. I might start with a spherical 10-20 template, then in a later version of the software update that for a more accurate one. Ideally EEG cap manufacturers would get themselves a digitizer, scan all of their cap designs and share those (as we are arguing for here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue for this on BIDS-Specification: bids-standard/bids-specification#318

for MNE-BIDS, we "solved" this in our last call, I summarize the solution here: #264

@sappelhoff
Copy link
Member Author

closing, because part of the work will be done in #263

and the rest of this PR can be trashed because it's not (yet) reliant on BIDS.

Feel free to re-open if somebody plans to work on this in the near future - but I won't have the time for this particular thing

@sappelhoff sappelhoff closed this Aug 23, 2019
@sappelhoff sappelhoff deleted the enh_reader branch March 22, 2020 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_raw_bids should read digitization files
8 participants