New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Sleep tutorial #5718
[MRG] Sleep tutorial #5718
Conversation
9925949
to
b9abf7c
Compare
Codecov Report
@@ Coverage Diff @@
## master #5718 +/- ##
=========================================
Coverage ? 88.63%
=========================================
Files ? 373
Lines ? 69321
Branches ? 11665
=========================================
Hits ? 61440
Misses ? 5030
Partials ? 2851 |
45e9276
to
e5247af
Compare
|
9143615
to
c17d095
Compare
@Slasnista @massich I just spent 1 hour to write a fetcher and cleanup the example. I think I did what needs a deep understanding of mne internals. You should be able to finish this PR without me. thanks for your efforts |
Hi, I have just added a few lines of code to pre-process the annotations. This way:
Should such steps be implemented in the read_annotation function or should we let the user choose to perform them ? |
tutorials/plot_sleep.py
Outdated
annotations = mne.read_annotations(hyp_fname) | ||
|
||
############################################################################## | ||
# preprocessing annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand all this pre-processing. What is its purpose?
if we only want to merge Sleep stage 3
and Sleep stage 4
we should be able to do so with a function.
And I don't understand the resampling problem. Is it only a problem of this example or an implementation error that we need to test and fix in mne?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem comes from the format of annotations. Sleep stages are traditionally annotated over 30s of PSG signal (by both experts and algorithms). For this reason, it could be useful to output annotations already resampled and associated to 30s of signals.
Regarding the merging of sleep stage 3 and 4, nowadays people generally work with 5 sleep stages instead of 6 and merge potential samples of label 'sleep stage 4' with samples of label 'sleep stage 3'.
A way to overcome the resampling problem might be to have a resampling parameter inside the read_annotations
function that allows the user to get already resampled annotations. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like it. read_annotations
should only read. If this is a type of analysis is always needed then we should do some preprocessing or something.
This example also makes me question if we want to always read the annotations of a file. Maybe this is better:
psg_annotations = read_annotations(edf_file).to_psg()
raw = read_raw_edf(edf_file, annotations=psg_annotations)
or
psg_annotations = read_annotations(edf_file).to_psg()
raw = read_raw_edf(edf_file)
raw.set_annotations(psg_annotations)
instead of
raw = read_raw_edf(edf_file)
raw.annotations.to_psg(inplace=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like it. read_annotations should only read.
Agreed
This example also makes me question if we want to always read the annotations of a file.
I think we should. We should avoid adding options / kwargs to read_raw_*
to modify how things are read / represented. The job of the read_raw_*
functions should ideally be to transparently read all data from disk in a given file, in as close to the storage format as possible. Then we can have other functions to modify Raw
instances as necessary for use cases. (For example, the montage
argument of read_raw_*
should really not be there, either -- we should do raw.set_montage(...)
instead.)
Of your three code snippets, this one looks cleanest to me:
raw = read_raw_edf(edf_file)
raw.annotations.to_psg(inplace=True)
But I think the following is better, since it's more explicit and avoids an otherwise redundant inplace
kwarg:
raw = read_raw_edf(edf_file)
raw.set_annotations(raw.annotations.to_psg())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am -1 on the to_psg method. This adds some Sleep specific code to an Annotations object that should be agnostic.
I'd rather prefer a
df = annotations.to_dataframe()
and then you use df.resample or some other pandas code to do what you want.
Then you can recreate the annotations.
If it's a mess many some simple numpy code can do the job?
tutorials/plot_sleep.py
Outdated
annot = annot.resample('30s').ffill() | ||
annot.reset_index(inplace=True) | ||
annot.onset = annot.onset.dt.total_seconds() | ||
annot["duration"] = 30. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it now. That's a bit long but it's exactly what I had in mind. Maybe just make it a function so we know where
pandas kungfu ends.
I would go with something like
```
from mne.annotations import enforce_psg_annotation_protocol
raw = read_raw_edf(fname)
raw.set_annotations(enforce_psg_annotation_protocol(raw.annotations))
```
But I do like
```
enforce_psg_annotation_protocol(raw.annotations, inplace=True)
```
…On Wed, Dec 12, 2018, 22:09 Alexandre Gramfort ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tutorials/plot_sleep.py
<#5718 (comment)>:
> +annot = pd.DataFrame()
+annot["onset"] = annotations.onset
+annot["description"] = annotations.description
+annot["duration"] = annotations.duration
+
+# add temporarily a last event to have a correct resampling
+last_onset = annot.onset.values[-1]
+last_duration = annot.duration.values[-1]
+annot.loc[annot.shape[0]] = [last_onset + last_duration, "end", 0]
+
+annot = annot.set_index('onset')
+annot.index = pd.to_timedelta(annot.index, unit='s')
+annot = annot.resample('30s').ffill()
+annot.reset_index(inplace=True)
+annot.onset = annot.onset.dt.total_seconds()
+annot["duration"] = 30.
I see it now. That's a bit long but it's exactly what I had in mind. Maybe
just make it a function so we know where
pandas kungfu ends.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5718 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGt-48JhDCN06jjZUl369W0ArKRhbqk-ks5u4XCLgaJpZM4YmL5m>
.
|
this would work for me.
not sure about the word "enforce" as we tend to use make_ or setup_ but
it's a nitpick
|
After giving it some thought. I don't think it's a good idea to modify the data representation Therefore the |
yes agreed. If you add a we can do this in this PR or in the next PR ie merge the pandas kung fu for now as the main objective here is to reach a basic sleep scoring task using scikit-learn. |
b2b5e8e
to
0cb9a87
Compare
@Slasnista @massich I heavily simplified the code thanks to the chunk_duration parameter from events_from_annotation. No more pandas kung fu |
kinds = ['%s (%s)' % (kind, sum(d.lower().startswith(kind) | ||
for d in self.description)) | ||
for kind in kinds] | ||
counter = collections.Counter(self.description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1000
07e4a4c
to
1f7e771
Compare
yes I have already done it. Shall I extract the features from another record to evaluate the model properly ? |
yes
… |
tutorials/plot_sleep.py
Outdated
raw_train, events_train, event_id_train, | ||
tmin=0., tmax=tmax, baseline=None) | ||
|
||
tmax = 30. - 1. / raw_test.info['sfreq'] # tmax in included |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to duplicate this
tutorials/plot_sleep.py
Outdated
prop_cycle = plt.rcParams['axes.prop_cycle'] | ||
colors = prop_cycle.by_key()['color'] | ||
[line.set_color(color) for line, color in zip(ax.get_lines(), colors)] | ||
plt.legend(list(epochs_train.event_id.keys())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start by showing PSDs as it explains why power features can be good predictors. Then you can to the code for classification. I would actually load a second subject here and not before so the beginning is easier to follow.
FunctionTransformer was a great idea. I would still need to do power ratios and maybe adjust welch parameters. @Slasnista what do you think? |
I changed to use pytest tmpdir to remove couple errors + slightly better setup time
|
|
"""Test Sleep Physionet URL handling.""" | ||
mm = mocker.patch('mne.datasets.sleep_physionet._utils._fetch_file', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get how this can work. mocker is not imported. Your mocker function does not write any file to disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean that the mocker is not imported? pytest-mock
exposes this fixture called mocker
which takes care of setting it up and tearing down.
And I'm aware that I'm writing nothing. But _fetch_one
delegates all the writing to the original _ferch_file
and I'm bypassing this so _fetch_one
works just as expected despite _fake_fetch_fle
does not write a thing.
from ._utils import _fetch_one, _data_path, BASE_URL, TEMAZEPAM_SLEEP_RECORDS | ||
from ._utils import _check_subjects | ||
|
||
SLEEP_RECORDS = 'physionet_sleep_records.npy' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should disappear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 i thought i had remove them all. my bad.
The subjects to use. Can be in the range of 0-21 (inclusive). | ||
drug : bool | ||
If True it's the data with the Temazepam and if False it's | ||
the placebo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over from old code of mine
If everything is ok, everything should be green except 3.7, that would be green in #5834 |
Real flake error https://travis-ci.org/mne-tools/mne-python/jobs/479456801#L3071 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At last benchmark one was over 10 sec, thus the decorator. But maybe they are faster now
Not the |
Yes all tests are really fast now
|
This is weird in the second env in travis there was a test that failed using |
green !! |
Thx a lot to everyone !!! |
This PR adds a tutorial to illustrate how to analyze sleep data. (see #5684)
The purpose is to show how one can:
Todo
Edited by @massich