Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Sleep tutorial #5718

Merged
merged 83 commits into from Jan 14, 2019
Merged

[MRG] Sleep tutorial #5718

merged 83 commits into from Jan 14, 2019

Conversation

Slasnista
Copy link
Contributor

@Slasnista Slasnista commented Nov 16, 2018

This PR adds a tutorial to illustrate how to analyze sleep data. (see #5684)

The purpose is to show how one can:

  1. download the data
  2. extract the raw signals and the annotations
  3. extract some features from the raw signals
  4. perform sleep stage classification with a random forest
  5. quantify the performances of the classifier

Todo

  • Make the data available
    • add fetcher
  • tutorial

Edited by @massich

@codecov
Copy link

codecov bot commented Nov 16, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@e0df416). Click here to learn what that means.
The diff coverage is 97.51%.

@@            Coverage Diff            @@
##             master    #5718   +/-   ##
=========================================
  Coverage          ?   88.63%           
=========================================
  Files             ?      373           
  Lines             ?    69321           
  Branches          ?    11665           
=========================================
  Hits              ?    61440           
  Misses            ?     5030           
  Partials          ?     2851

@larsoner
Copy link
Member

  1. Choose dataset name. sleep is a good choice here.
  2. Create a folder following our naming conventions. Since these are data we are providing, we can call it MNE-sleep-data.
  3. The structure is generally MNE-sleep-data/subect-name-or-id/raw_data.fmt so maybe MNE-sleep-data/anonymous/whatever.edf
  4. Add a version text file MNE-sleep-data/version.txt with 0.1 though we don't really use it yet
  5. tar czfv MNE-sleep-data.tar.gz MNE-sleep-data
  6. Upload to OSF
  7. Look at the "revisions" for the file, it will list the MD5sum
  8. Copy the datasets/sample directory over to datasets/sleep
  9. For other changes, take a look here https://github.com/mne-tools/mne-python/pull/5525/files

@agramfort
Copy link
Member

here is what the example returns:

figure_6

saturday hacking session...

mne/utils.py Outdated Show resolved Hide resolved
tutorials/plot_sleep.py Outdated Show resolved Hide resolved
@agramfort
Copy link
Member

@Slasnista @massich I just spent 1 hour to write a fetcher and cleanup the example.

I think I did what needs a deep understanding of mne internals. You should be able to finish this PR without me.

thanks for your efforts

@Slasnista
Copy link
Contributor Author

Hi,

I have just added a few lines of code to pre-process the annotations. This way:

  1. the annotations are given on 30s samples of signals which corresponds to the traditional way of annotating sleep stages
  2. "Sleep stage 3" and "Sleep stage 4" are merged into a single sleep stage to have annotations closer to the AASM rules currently used.

Should such steps be implemented in the read_annotation function or should we let the user choose to perform them ?

annotations = mne.read_annotations(hyp_fname)

##############################################################################
# preprocessing annotations
Copy link
Contributor

@massich massich Dec 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand all this pre-processing. What is its purpose?

if we only want to merge Sleep stage 3 and Sleep stage 4 we should be able to do so with a function.

And I don't understand the resampling problem. Is it only a problem of this example or an implementation error that we need to test and fix in mne?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem comes from the format of annotations. Sleep stages are traditionally annotated over 30s of PSG signal (by both experts and algorithms). For this reason, it could be useful to output annotations already resampled and associated to 30s of signals.

Regarding the merging of sleep stage 3 and 4, nowadays people generally work with 5 sleep stages instead of 6 and merge potential samples of label 'sleep stage 4' with samples of label 'sleep stage 3'.

A way to overcome the resampling problem might be to have a resampling parameter inside the read_annotations function that allows the user to get already resampled annotations. What do you think ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like it. read_annotations should only read. If this is a type of analysis is always needed then we should do some preprocessing or something.

This example also makes me question if we want to always read the annotations of a file. Maybe this is better:

psg_annotations = read_annotations(edf_file).to_psg()
raw = read_raw_edf(edf_file, annotations=psg_annotations)

or

psg_annotations = read_annotations(edf_file).to_psg()
raw = read_raw_edf(edf_file)
raw.set_annotations(psg_annotations)

instead of

raw = read_raw_edf(edf_file)
raw.annotations.to_psg(inplace=True)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like it. read_annotations should only read.

Agreed

This example also makes me question if we want to always read the annotations of a file.

I think we should. We should avoid adding options / kwargs to read_raw_* to modify how things are read / represented. The job of the read_raw_* functions should ideally be to transparently read all data from disk in a given file, in as close to the storage format as possible. Then we can have other functions to modify Raw instances as necessary for use cases. (For example, the montage argument of read_raw_* should really not be there, either -- we should do raw.set_montage(...) instead.)

Of your three code snippets, this one looks cleanest to me:

raw = read_raw_edf(edf_file)
raw.annotations.to_psg(inplace=True)

But I think the following is better, since it's more explicit and avoids an otherwise redundant inplace kwarg:

raw = read_raw_edf(edf_file)
raw.set_annotations(raw.annotations.to_psg())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am -1 on the to_psg method. This adds some Sleep specific code to an Annotations object that should be agnostic.

I'd rather prefer a

df = annotations.to_dataframe()

and then you use df.resample or some other pandas code to do what you want.
Then you can recreate the annotations.
If it's a mess many some simple numpy code can do the job?

annot = annot.resample('30s').ffill()
annot.reset_index(inplace=True)
annot.onset = annot.onset.dt.total_seconds()
annot["duration"] = 30.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it now. That's a bit long but it's exactly what I had in mind. Maybe just make it a function so we know where
pandas kungfu ends.

@massich
Copy link
Contributor

massich commented Dec 12, 2018 via email

@agramfort
Copy link
Member

agramfort commented Dec 13, 2018 via email

@massich
Copy link
Contributor

massich commented Dec 13, 2018

After giving it some thought. I don't think it's a good idea to modify the data representation
(aka, the raw.annotations) so that a process/consumer (aka, events_from_annotations) can produce
the expected data (aka, event onsets). A natural solution is to crate another process/consumer
that transforms the data, and prepend it to the original process (pre-process the data).
The problem here is that this pre-process does not change the data into another form nor does
a clean up. This pre-process is a hack to not touch the process/consumer.

Therefore the events_from_annotations should do this task. Then the question becomes whats
the difference between the new/existing behavior, and how we trigger one or the other?
The current behavior uses only return the onsets of the annotations. So we could add a parameter
that returns as many onsets as fit (wit a separation time) during the annotation duration.

@agramfort
Copy link
Member

yes agreed. If you add a chunk_duration param to events_from_annotations then you can pass the annotations untouched and get the valid results. It also avoids the pandas kung fu.

we can do this in this PR or in the next PR ie merge the pandas kung fu for now as the main objective here is to reach a basic sleep scoring task using scikit-learn.

@mmagnuski
Copy link
Member

merge the pandas kung fu for now

you probably meant this:

@agramfort
Copy link
Member

@Slasnista @massich I heavily simplified the code thanks to the chunk_duration parameter from events_from_annotation. No more pandas kung fu

tutorials/plot_sleep.py Outdated Show resolved Hide resolved
kinds = ['%s (%s)' % (kind, sum(d.lower().startswith(kind)
for d in self.description))
for kind in kinds]
counter = collections.Counter(self.description)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1000

@mmagnuski
Copy link
Member

No more pandas kung fu

@Slasnista
Copy link
Contributor Author

yes I have already done it.

Shall I extract the features from another record to evaluate the model properly ?

@agramfort
Copy link
Member

agramfort commented Dec 21, 2018 via email

raw_train, events_train, event_id_train,
tmin=0., tmax=tmax, baseline=None)

tmax = 30. - 1. / raw_test.info['sfreq'] # tmax in included
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to duplicate this

prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
[line.set_color(color) for line, color in zip(ax.get_lines(), colors)]
plt.legend(list(epochs_train.event_id.keys()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start by showing PSDs as it explains why power features can be good predictors. Then you can to the code for classification. I would actually load a second subject here and not before so the beginning is easier to follow.

@agramfort
Copy link
Member

FunctionTransformer was a great idea.

I would still need to do power ratios and maybe adjust welch parameters.

@Slasnista what do you think?

@massich
Copy link
Contributor

massich commented Jan 14, 2019

I changed to use pytest tmpdir to remove couple errors + slightly better setup time

~/code/mne-python remotes/slasnista/sleep_tutorial*
(mne) ❯ pytest mne/datasets/sleep_physionet/tests -vv
Test session starts (platform: linux, Python 3.6.6, pytest 4.0.0, pytest-sugar 0.9.2)
cachedir: .pytest_cache
rootdir: /home/sik/code/mne-python, inifile: setup.cfg
plugins: sugar-0.9.2, pudb-0.7.0, faulthandler-1.5.0, cov-2.6.0
collecting ... 
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records ✓                                          25% ██▌       
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age ✓                                             50% █████     
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records ✓                                    75% ███████▌  
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam ✓                                      100% ██████████
======================================================== slowest 20 test durations =========================================================
11.80s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age
3.18s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam
1.90s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records
1.89s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records
1.85s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records
0.00s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam
0.00s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records
0.00s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records

Results (21.41s):
       4 passed
Exception ignored in: <bound method _TempDir.__del__ of '/tmp/tmp_mne_tempdir_190hcdox'>
Traceback (most recent call last):
  File "/home/sik/code/mne-python/mne/utils.py", line 548, in __del__
  File "/home/sik/miniconda3/envs/mne/lib/python3.6/shutil.py", line 494, in rmtree
TypeError: 'NoneType' object is not callable


~/code/mne-python sleep_tutorial* ⇡ 23s
(mne) ❯ pytest mne/datasets/sleep_physionet/tests -vv
Test session starts (platform: linux, Python 3.6.6, pytest 4.0.0, pytest-sugar 0.9.2)
cachedir: .pytest_cache
rootdir: /home/sik/code/mne-python, inifile: setup.cfg
plugins: sugar-0.9.2, pudb-0.7.0, faulthandler-1.5.0, cov-2.6.0
collecting ... 
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records ✓                                          25% ██▌       
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age ✓                                             50% █████     
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records ✓                                    75% ███████▌  
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam ✓                                      100% ██████████
======================================================== slowest 20 test durations =========================================================
13.42s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age
3.28s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam
1.92s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records
1.91s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records
1.39s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam
0.00s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records
0.00s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records
0.00s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records

Results (22.48s):
       4 passed

@massich
Copy link
Contributor

massich commented Jan 14, 2019

~/code/mne-python sleep_tutorial* 6s
(mne) ❯ pytest mne/datasets/sleep_physionet/tests -vv                        
Test session starts (platform: linux, Python 3.6.6, pytest 4.0.0, pytest-sugar 0.9.2)
cachedir: .pytest_cache
rootdir: /home/sik/code/mne-python, inifile: setup.cfg
plugins: sugar-0.9.2, pudb-0.7.0, mock-1.10.0, faulthandler-1.5.0, cov-2.6.0
collecting ... 
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records ✓                                          25% ██▌       
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age ✓                                             50% █████     
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records ✓                                    75% ███████▌  
 mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam ✓                                      100% ██████████
======================================================== slowest 20 test durations =========================================================
1.92s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records
1.90s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records
1.35s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records
0.01s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age
0.00s call     mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam
0.00s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age
0.00s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records
0.00s setup    mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_temazepam
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_age_records
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_sleep_physionet_age
0.00s teardown mne/datasets/sleep_physionet/tests/test_physionet.py::test_run_update_temazepam_records

Results (5.70s):
       4 passed

~/code/mne-python sleep_tutorial* 7s

"""Test Sleep Physionet URL handling."""
mm = mocker.patch('mne.datasets.sleep_physionet._utils._fetch_file',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get how this can work. mocker is not imported. Your mocker function does not write any file to disk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean that the mocker is not imported? pytest-mock exposes this fixture called mocker which takes care of setting it up and tearing down.

And I'm aware that I'm writing nothing. But _fetch_one delegates all the writing to the original _ferch_file and I'm bypassing this so _fetch_one works just as expected despite _fake_fetch_fle does not write a thing.

from ._utils import _fetch_one, _data_path, BASE_URL, TEMAZEPAM_SLEEP_RECORDS
from ._utils import _check_subjects

SLEEP_RECORDS = 'physionet_sleep_records.npy'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should disappear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 i thought i had remove them all. my bad.

The subjects to use. Can be in the range of 0-21 (inclusive).
drug : bool
If True it's the data with the Temazepam and if False it's
the placebo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over from old code of mine

@massich
Copy link
Contributor

massich commented Jan 14, 2019

If everything is ok, everything should be green except 3.7, that would be green in #5834

@larsoner
Copy link
Member

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At last benchmark one was over 10 sec, thus the decorator. But maybe they are faster now

@massich
Copy link
Contributor

massich commented Jan 14, 2019

At last benchmark one was over 10 sec, thus the decorator. But maybe they are faster now

Not the run_update_xx these were fast already. Since they only download small .xls .csv with records of the subjects and hashes of the recordings.

@agramfort
Copy link
Member

agramfort commented Jan 14, 2019 via email

@massich
Copy link
Contributor

massich commented Jan 14, 2019

This is weird in the second env in travis there was a test that failed using _fetch_file. I restarted it. I guess it was just network issue. 'cos pytest-mock should clean up automatically and should be no side effects.

@massich
Copy link
Contributor

massich commented Jan 14, 2019

green !!

@massich
Copy link
Contributor

massich commented Jan 14, 2019

Thx a lot to everyone !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants