Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG, API] Start of automatic methods section with create_methods_paragraph #457

Merged
merged 30 commits into from Jul 30, 2020

Conversation

adam2392
Copy link
Member

@adam2392 adam2392 commented Jun 22, 2020

PR Description

Addresses preliminarily: #347

A summary of MEEG summary (MEG, EEG, iEEG):
I don't want this to be too much headache to start with, so figured that the easiest most robust summary we can provide is modularized as such:

  1. dataset description: subject, session, kinds, and dataset_description.json file
  2. participants.tsv file summary per subject: age, sex, hand that is supported in mne-bids. Note this file is only RECOMMENDED. I went through a lot of effort to get this to work without making the report look crazy ugly for now because this I figured is one of the most crucial summaries every study should have, but since it's structure is not very imposed by BIDS, then it's hard to summarize consistently.
  3. modality-agnostic-summary: per session scans, and their length, sfreq, channel counts, etc.
  4. modality-specific-summary: adding iEEG channel counts (e.g. SEEG, ECoG, etc.). Similarly, I suppose if someone wants to add MEG/EEG, it can be a relatively short summary here. (tabled to future)

TODO:

  • Update convert_group_studies to use create_methods_paragraph.
  • Have maintainers first review the groundwork to make sure this is in the right direction.
  • Add documentation to the docstrings
  • Add a summary function for channels.tsv and sidecar.json files kind=ieeg data (idk how to add for 'eeg', or 'meg', so would prefer someone else add that functionality in)
  • Add REQUIRED elements from dataset_description.json and the reference/DOI for mne-bids.
  • Add example outputs from OpenNeuro datasets (i.e. iEEG)
  • How to deal w/ emptyroom subjects? I don't work w/ these, so skipping them for now. Assuming this is okay, I added a XXX to the inline comments for someone else to fix.
  • How to add MEG/iEEG/EEG specific data, such as "Gradiometers" and "Magnometers" for MEG data?
  • Adding scan-level summary even without the *_scans.tsv files, which is considered "RECOMMENDED" and not "REQUIRED".

Example Output from Local/OpenNeuro datasets
These are the datasets I ran the method generation w/:

  1. local_dataset I have, that will get put onto openneuro.
  2. ds001779,
  3. ds002778
  4. ds002904
  5. ds000246': 'https://github.com/OpenNeuroDatasets/ds000246',
  6. ds000248': 'https://github.com/OpenNeuroDatasets/ds000248',
  7. ds000117': 'https://github.com/OpenNeuroDatasets/ds000117',
  8. ds001810': 'https://github.com/OpenNeuroDatasets/ds001810',
  9. ds001971': 'https://github.com/OpenNeuroDatasets/ds001971',
  10. somato

Ran datasets w/ the following code:

methods_paragraph = create_methods_paragraph(bids_root)
print(methods_paragraph)

See output on PR here: ./report.txt.

Merge checklist

Maintainer, please confirm the following before merging:

  • All comments resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • PR description includes phrase "closes <#issue-number>"
  • Commit history does not contain any merge commits

@adam2392 adam2392 changed the title Automethods [WIP, API] Start of automatic methods section with create_methods_paragraph Jun 22, 2020
@jasmainak
Copy link
Member

nice, can you share an example paragraph generated by this report?

@jasmainak
Copy link
Member

maybe update the ds000117 example?

@adam2392
Copy link
Member Author

maybe update the ds000117 example?

Done! lmk what you think.

nice, can you share an example paragraph generated by this report?

It is copied into the PR description.

@codecov-commenter
Copy link

codecov-commenter commented Jun 23, 2020

Codecov Report

Merging #457 into master will decrease coverage by 1.03%.
The diff coverage is 83.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #457      +/-   ##
==========================================
- Coverage   93.70%   92.66%   -1.04%     
==========================================
  Files          11       13       +2     
  Lines        1762     1950     +188     
==========================================
+ Hits         1651     1807     +156     
- Misses        111      143      +32     
Impacted Files Coverage Δ
mne_bids/commands/mne_bids_report.py 0.00% <0.00%> (ø)
mne_bids/write.py 96.73% <ø> (ø)
mne_bids/report.py 91.17% <91.17%> (ø)
mne_bids/config.py 95.55% <100.00%> (+0.10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2798b3...d357e9d. Read the comment docs.

@jasmainak
Copy link
Member

jasmainak commented Jun 23, 2020

There are 2 datasets (364.69 +/- 116.31 seconds) with sampling rates 1000.0 (n=1), 999.0 (n=1).

I don't get this part.

Can you make the text almost like a copy-paste for publication? I would also add information about:

  1. Manufacturer of device (e.g., Vectorview)
  2. Number of channels
  3. Filtering information
  4. Sampling frequency

what's the output of pybids for the same dataset? hopefully not the same?

@jasmainak
Copy link
Member

This is the output for ds000117:

The dataset consists of 1 patients with 1 sessions (01) consisting of 1 kinds of data (meg). The dataset consists of 1 subjects (10.0 +/- 0.0; 1 right ; 1 male ). There are 6 datasets (447.33 +/- 106.20 seconds) with sampling rates 1100.0 (n=6)

This seems off. There aren't "6 datasets". We should test on a few datasets to ensure it gives something reasonable. I would pick 10 ephys datasets at random from openneuro and try on them.

@adam2392
Copy link
Member Author

There are 2 datasets (364.69 +/- 116.31 seconds) with sampling rates 1000.0 (n=1), 999.0 (n=1).
Can you make the text almost like a copy-paste for publication?

What would be the copy-paste version of this part? Do you have a specific structure in mind?

  1. Manufacturer of device (e.g., Vectorview)
  2. Number of channels
  3. Filtering information
  4. Sampling frequency

By filtering info, I suppose you mean the SoftwareFilters in sidecar.json? 1/2/4 can be added.

what's the output of pybids for the same dataset? hopefully not the same?

pybids doesn't give any output at all. See the output pasted in the corresponding issue of this PR. I think it's due to the fact that the pybids-report still only supports nifti files.

@adam2392
Copy link
Member Author

This is the output for ds000117:
This seems off. There aren't "6 datasets". We should test on a few datasets to ensure it gives something reasonable. I would pick 10 ephys datasets at random from openneuro and try on them.

https://mne.tools/mne-bids/stable/auto_examples/convert_group_studies.html#sphx-glr-auto-examples-convert-group-studies-py

There are 6 .fif files tho, so aren't there 6 datasets?

@agramfort
Copy link
Member

@adam2392 please paste here what you obtain on various datasets to get a feeling of how it reads. Thx

@jasmainak
Copy link
Member

There are 6 .fif files tho, so aren't there 6 datasets?

I think they are 6 runs not dataset. To me, a dataset is everything inside bids_root

@adam2392
Copy link
Member Author

A few thoughts:

  1. participants.tsv summary is usually desired (from all pubs I've seen and written), but it also is difficult to achieve consistency since it is a RECOMMENDED file + format, so optionally, we should be able to summarize report w/o it.
  2. the BIDS spec version is currently 1.4.0, so that can be updated in the code
  3. MEG_TEMPLATE and EEG_TEMPLATE can be pretty easily added at the end of create_methods_paragraph(). If someone wants to add that in, the naive thing to do would be just grab different channels from the sidecar.json file I suppose?

@jasmainak
Copy link
Member

@adam2392 can you put this comment in the PR description?

@adam2392
Copy link
Member Author

@adam2392 can you put this comment in the PR description?

@jasmainak Besides the current breaking of the bids-validator, I was wondering if you could lmk how the direction currently feels.

Some notes:

  • I can fix rounding very easily to two decimal places.
  • We could add a dynamic check on participants.tsv file since participants is always a desired summary. But currently, we have no way of determining if the file complies with how we assume the formatting to be (e.g. M vs male vs man etc.). Idk how desirable this is, versus just "switching off the summary of participants because we lack control".
  • I added the total # of scans (e.g. total # runs summed across entire bids_root), but didn't update the summaries yet.

@jasmainak
Copy link
Member

"switching off the summary of participants because we lack control".

not sure I understand this point. But I would say, don't add too much code complexity and branching. It will make life harder for future developers.

can you add a couple of more examples in the description? At least 6 or 7 in total? Just to get a sense of how the methods paragraph might be useful/handy for researchers. I'll ask a couple of my colleagues to provide feedback what else might be useful to include.

mne_bids/report.py Outdated Show resolved Hide resolved
@jasmainak
Copy link
Member

Dataset was created with BIDS version 1.2.2
using MNE-BIDS

We don't really know if it was MNE-BIDS? If so, we should leave it out

@adam2392
Copy link
Member Author

Dataset was created with BIDS version 1.2.2
using MNE-BIDS

We don't really know if it was MNE-BIDS? If so, we should leave it out

Is this still true in the context of #460 ?

mne_bids/config.py Outdated Show resolved Hide resolved
mne_bids/report.py Outdated Show resolved Hide resolved
mne_bids/report.py Outdated Show resolved Hide resolved
@jasmainak
Copy link
Member

@adam2392 take a look at the datasets tested in MNE-study-template. Would you mind posting the description for these datasets as well? If we have a substantial number (around 10), we can start to see if this looks good. And then in the study template, you could add a line to add the generated paragraph to the MNE report using add_htmls_to_section so there is an additional layer of testing for MNE-BIDS.

@adam2392
Copy link
Member Author

adam2392 commented Jun 27, 2020

@adam2392 take a look at the datasets tested in MNE-study-template. Would you mind posting the description for these datasets as well? If we have a substantial number (around 10), we can start to see if this looks good. And then in the study template, you could add a line to add the generated paragraph to the MNE report using add_htmls_to_section so there is an additional layer of testing for MNE-BIDS.

Okay, so I took those datasets and running those locally and pasting into the PR description. Also, FYI some of the descriptions aren't up to date w/ some of the minor changes, but I don't want to re-download the datasets locally and redo (They're mainly spelling or grammar issues, or logic on the template string itself, which have been updated). If this is absolutely needed, I can re-download, but it takes a bit of time on my currently home internet :p.

I don't now what you mean by adding it into the study template? Do you mean in the circleci of mne-study-template? If so, which file? Logistically, do I just simply add mne-bids to the install, generate the report using this PR (once it's merged), and then add the add_htmls_to_section function passing in the mne-bids generated report?

A problem I had w/ a large dataset
Due to _summarize_sidecars and _summarize_channels, this makes the summary generation for larger datasets (e.g. I have 102 patients), very slow. What's the opinion on parallelization?

@agramfort
Copy link
Member

@adam2392 please paste here examples of results so we don't all need to test ourself to give you more feedback. Any dataset is fine but the more the better. we just need to read some examples.

@adam2392
Copy link
Member Author

almost good to go from my end. Can you make the PR title to MRG when all green? thx @adam2392

Not sure why the docs are failing... Do you know if it's something I can fix?

mne_bids/report.py Outdated Show resolved Hide resolved
Co-authored-by: Stefan Appelhoff <stefan.appelhoff@mailbox.org>
@agramfort
Copy link
Member

agramfort commented Jul 21, 2020 via email

@adam2392
Copy link
Member Author

One issue I'm having is... is there a way one can "add" Template objects together? It would make handling participants string easier.

@jasmainak
Copy link
Member

please go ahead and incorporate @hoechenberger and @sappelhoff 's suggestions :-) I am lagging behind ...

@adam2392
Copy link
Member Author

Okay I added in the feedback and I think things are all good to go.

@agramfort
Copy link
Member

here is what I get

(base) alex@:mne-bids(automethods)$ mne_bids report --bids_root ~/mne_data/ds000117/
Summarizing participants.tsv /Users/alex/mne_data/ds000117/participants.tsv...
Summarizing scans.tsv files [PosixPath('/Users/alex/mne_data/ds000117/sub-13/ses-meg/sub-13_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-14/ses-meg/sub-14_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-emptyroom/ses-20090506/sub-emptyroom_ses-20090506_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-emptyroom/ses-20090601/sub-emptyroom_ses-20090601_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-emptyroom/ses-20090515/sub-emptyroom_ses-20090515_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-emptyroom/ses-20091208/sub-emptyroom_ses-20091208_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-emptyroom/ses-20090409/sub-emptyroom_ses-20090409_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-emptyroom/ses-20091126/sub-emptyroom_ses-20091126_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-emptyroom/ses-20090518/sub-emptyroom_ses-20090518_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-emptyroom/ses-20090511/sub-emptyroom_ses-20090511_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-15/ses-meg/sub-15_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-12/ses-meg/sub-12_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-08/ses-meg/sub-08_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-01/ses-meg/sub-01_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-06/ses-meg/sub-06_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-07/ses-meg/sub-07_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-09/ses-meg/sub-09_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-10/ses-meg/sub-10_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-11/ses-meg/sub-11_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-16/ses-meg/sub-16_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-05/ses-meg/sub-05_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-02/ses-meg/sub-02_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-03/ses-meg/sub-03_ses-meg_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000117/sub-04/ses-meg/sub-04_ses-meg_scans.tsv')]...
The participant template found: comprised of 9 men and 7 women;
handedness were all unknown; ages ranged from 23.0 to 31.0 (mean = 26.38, std = 2.76; 1 with unknown age)
------------------------------------ REPORT ------------------------------------
The Multisubject, multimodal face processing dataset was created with BIDS
version 1.0.2 by Wakeman, DG, and Henson, RN. This report was generated with
MNE-BIDS (https://doi.org/10.21105/joss.01896). The dataset consists of 16
participants (comprised of 9 men and 7 women; handedness were all unknown; ages
ranged from 23.0 to 31.0 (mean = 26.38, std = 2.76; 1 with unknown age))and 2
recording sessions: meg, and mri. Data was recorded using a MEG system
(Elekta/Neuromag manufacturer) sampled at 1100 Hz with line noise at 50 Hz using
SpatialCompensation. There were 104 scans in total. For each dataset, there were
on average 404.0 (std = 0.0) recording channels per scan, out of which 404.0
(std = 0.0) were used in analysis (0.0 +/- 0.0 were removed from analysis).
(base) alex@:mne-bids(automethods)$ mne_bids report --bids_root ~/mne_data/ds000248
Summarizing scans.tsv files [PosixPath('/Users/alex/mne_data/ds000248/sub-01/sub-01_scans.tsv')]...
------------------------------------ REPORT ------------------------------------
The ds000248 dataset was created with BIDS version 1.2 by Alexandre Gramfort,
and Matti S Hämäläinen. This report was generated with MNE-BIDS
(https://doi.org/10.21105/joss.01896). The dataset consists of 1 participants
(). Data was recorded using a MEG system (Elekta manufacturer) sampled at 600.61
Hz with line noise at 60 Hz. There was 1 scan in total. For each dataset, there
were on average 376.0 (std = 0.0) recording channels per scan, out of which
374.0 (std = 0.0) were used in analysis (2.0 +/- 0.0 were removed from
analysis).
(base) alex@:mne-bids(automethods)$ mne_bids report --bids_root ~/mne_data/ds000246
Summarizing participants.tsv /Users/alex/mne_data/ds000246/participants.tsv...
Summarizing scans.tsv files [PosixPath('/Users/alex/mne_data/ds000246/sub-emptyroom/sub-emptyroom_scans.tsv'), PosixPath('/Users/alex/mne_data/ds000246/sub-0001/sub-0001_scans.tsv')]...
The participant template found: sex were all unknown;
handedness were all unknown; ages ranged from 25.0 to 25.0 (mean = 25.0, std = 0.0; 1 with unknown age)
------------------------------------ REPORT ------------------------------------
The MEG-BIDS Brainstorm data sample dataset was created with BIDS version 1.0.2
by Elizabeth Bock, Peter Donhauser, Francois Tadel, Guiomar Niso, and Sylvain
Baillet. This report was generated with MNE-BIDS
(https://doi.org/10.21105/joss.01896). The dataset consists of 1 participants
(sex were all unknown; handedness were all unknown; ages ranged from 25.0 to
25.0 (mean = 25.0, std = 0.0; 1 with unknown age)). Data was recorded using a
MEG system (CTF manufacturer) sampled at 2400 Hz with line noise at 60 Hz using
SpatialCompensation with parameters 3rd GradientOrder. There were 3 scans in
total. Recording durations ranged from 360 to 360 seconds (mean = 360.0, std =
0.0), for a total of 720 seconds of data recorded over all scans. For each
dataset, there were on average 340.0 (std = 0.0) recording channels per scan,
out of which 340.0 (std = 0.0) were used in analysis (0.0 +/- 0.0 were removed
from analysis).

@jasmainak
Copy link
Member

Looks pretty fair!

Copy link
Member

@jasmainak jasmainak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on merging this!

0.5 automation moved this from In progress to Reviewer approved Jul 22, 2020
@jasmainak
Copy link
Member

We can keep tweaking more in future PRs but this is good for a start.

@agramfort
Copy link
Member

ok for you @hoechenberger ?

@adam2392
Copy link
Member Author

Any remaining changes needed here? Don't want this to go stale and we forget what happened :p.

@sappelhoff
Copy link
Member

I can't make enough time to review this right now, sorry :|

@agramfort
Copy link
Member

agramfort commented Jul 30, 2020 via email

@sappelhoff
Copy link
Member

maybe we merge this and improve later?

+1, I hate stale PRs and love iterations :-)

@jasmainak jasmainak merged commit 5c7ed7e into mne-tools:master Jul 30, 2020
0.5 automation moved this from Reviewer approved to Done Jul 30, 2020
@jasmainak
Copy link
Member

Merged, thanks a ton @adam2392 !!! This is great :-)

@hoechenberger
Copy link
Member

Argh I was just working on a review… but yet let's iterate, then! :)

@jasmainak
Copy link
Member

jasmainak commented Jul 30, 2020 via email

@adam2392 adam2392 deleted the automethods branch August 27, 2020 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.5
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants