New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: refactor PSD functions #2710
MRG: refactor PSD functions #2710
Conversation
I can't seem to see your commit. Did you update the |
scikit-learn pipelines. It relies heavily on the multitaper_psd | ||
function found in the time_frequency module. Running the transform | ||
method will create attributes for the estimated psd as well as the | ||
frequencies used, on top of return these values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update See Also
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "on top of return these values." mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means that freqs
and data
will be created as attributes when you call transform
, on top of being returned by transform
. (though it's only data that is returned, not freqs...it's a poorly written phrase).
I added a |
|
||
See Also | ||
-------- | ||
multitaper_psd, compute_epochs_psd, compute_raw_psd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you try building the doc? does it hyperlink properly? You can do:
$ cd doc/
$ make dev-html_noplot
I think the convention for attributes is that you add it to the |
Ah I see what you were referring to. regarding underscores and such that totally makes sense. For putting it in |
For now, I'd say ... let's not add the attributes here. This calls for an update to the time-frequency module as we discussed here: #2290 |
FYI you have a flake error |
I just spent 5 minutes looking for an anti-cornflakes meme on the internet, no luck. Instead I will focus on doing actual work... It sounds like this PR will change a bit either way. It sounds like the best thing is to add epochs functionality to the So I guess the question is does this warrant its own function (e.g., |
I vote for a new argument called |
Also, please remember to update a couple of tests :) |
naturally :) On Wed, Dec 16, 2015 at 11:36 AM, Mainak Jas notifications@github.com
|
let us know when you did the necessary changes. |
will do - trying to finish a PR in h5io right now but I will try to get to On Fri, Dec 18, 2015 at 2:11 PM, Alexandre Gramfort <
|
OK there's a first step. I made 2 main changes:
|
More updates - the PSDEstimator should now work using compute_epochs_psd so it's a pretty simple class at this point. Also fixed up the attribute creation etc. If these look reasonable then I'll make some tests as well... |
@@ -143,6 +147,8 @@ def compute_epochs_psd(epochs, picks=None, fmin=0, fmax=np.inf, tmin=None, | |||
to be <= n_fft. | |||
proj : bool | |||
Apply SSP projection vectors. | |||
method : 'welch' | 'multitaper' | |||
The method to use in calculating the PSD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain in one line what each method does? For details, we should add references in the manual
I noticed that you added a new argument |
if picks is not None: | ||
data = np.dot(proj[picks][:, picks], data) | ||
else: | ||
data = np.dot(proj, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do:
if picks is not None:
proj = proj[picks][:, picks]
data = np.dot(proj, data)
Saves you one line ;)
@choldgraf thanks for making this contribution. Please go ahead and add the tests :) |
@@ -248,6 +248,12 @@ def inverse_transform(self, X, y=None): | |||
class PSDEstimator(TransformerMixin): | |||
"""Compute power spectrum density (PSD) using a multi-taper method | |||
|
|||
This structures data so that it can be easily incorporated into | |||
scikit-learn pipelines. It relies heavily on the multitaper_psd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence will go away once we add the method
argument to the init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - I wasn't sure if that's something people would want but can do that for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm ... actually, thinking about it some more -- the problem with a method
argument is that you will have to pass the different options for multitaper
/ welch
too. Maybe, we can leave it for now ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's also why I didn't put it in there in the first place.
Functions like spectral_connectivity (and the newly-modified
compute_epochs_psd) handle it by having a bunch of kwargs for the
respective functions (e.g., mt_adaptive, mt_normalize). I decided to pass
on that this time around though it shouldn't be a big deal to extend the
function to make it work with a method parameter.
On Sun, Dec 20, 2015 at 3:55 AM, Mainak Jas notifications@github.com
wrote:
In mne/decoding/transformer.py
#2710 (comment):@@ -248,6 +248,12 @@ def inverse_transform(self, X, y=None):
class PSDEstimator(TransformerMixin):
"""Compute power spectrum density (PSD) using a multi-taper method
- This structures data so that it can be easily incorporated into
- scikit-learn pipelines. It relies heavily on the multitaper_psd
hmm ... actually, thinking about it some more -- the problem with a method
argument is that you will have to pass the different options for
multitaper / welch too. Maybe, we can leave it for now ...—
Reply to this email directly or view it on GitHub
https://github.com/mne-tools/mne-python/pull/2710/files#r48102657.
@@ -33,6 +38,9 @@ | |||
epochs = mne.Epochs(raw, events, event_id, tmin, tmax, | |||
proj=True, baseline=(None, 0), preload=True, | |||
reject=dict(grad=4000e-13, eog=150e-6)) | |||
# Pull 2**n points to speed up computation | |||
epochs = EpochsArray(epochs.get_data()[..., :1024], epochs.info, | |||
epochs.events, tmin, epochs.event_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
tmax = tmin + 1023. / raw.info['sfreq']
and skip this line altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed to me when I tried this before that there were some inconsistencies with the indexing using times (e.g., it would be off by 1 index in one direction or the other), so if I ever want my signal to be length 2**N I just index the data directly. I can use tmin/tmax if you think that's better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long ago did you have that problem? We changed to using something that is a little bit more lenient with the floating point arithmetic that goes on under the hood about a year ago IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a while ago - ok I'll give it a shot and see if it works.
On Thu, Jan 14, 2016 at 9:40 AM, Eric Larson notifications@github.com
wrote:
In examples/time_frequency/plot_epochs_spectra.py
#2710 (comment):@@ -33,6 +38,9 @@
epochs = mne.Epochs(raw, events, event_id, tmin, tmax,
proj=True, baseline=(None, 0), preload=True,
reject=dict(grad=4000e-13, eog=150e-6))
+# Pull 2**n points to speed up computation
+epochs = EpochsArray(epochs.get_data()[..., :1024], epochs.info,
epochs.events, tmin, epochs.event_id)
How long ago did you have that problem? We changed to using something that
is a little bit more lenient with the floating point arithmetic that goes
on under the hood about a year ago IIRC.—
Reply to this email directly or view it on GitHub
https://github.com/mne-tools/mne-python/pull/2710/files#r49758067.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so it sortof works but there's still weird stuff that happens. For example:
tmin = -1.
tmax = tmin + 1023. / raw.info['sfreq']
epochs = mne.Epochs(raw, events, event_id, tmin, tmax)
print(epochs.get_data().shape)
will print shape (..., 1024)
but if I do
tmin, tmax = -1., 1.
raw.info['bads'] += ['MEG 2443'] # bads
epochs = mne.Epochs(raw, events, event_id, tmin, tmax,
proj=True, baseline=(None, 0), preload=True,
reject=dict(grad=4000e-13, eog=150e-6))
tmax = tmin + 1023. / raw.info['sfreq']
epochs.crop(tmin, tmax)
print(epochs.get_data().shape)
Now it prints shape (..., 1023)
So depending on whether you define the tmin/tmax before turning into epochs, vs cropping the epochs after the fact, then you are off by 1. So I think I ran into this inconsistency and just said I would index manually because I wanted to be sure that it was correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you open an issue with a small code snippet using RawArray or the sample dataset to reproduce? We should fix that if possible. For now I'm fine leaving this example to manually set the bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the relevant lines are in epochs.py: in the init at 290, get_data at 1154 and the cropping function 1435
The issue is the first index. E.g.:
tmin = -1.
tmax = tmin + 1023. / raw.info['sfreq']
epochs = mne.Epochs(raw, events, event_id, tmin, tmax)
print(epochs.times)
epochs.crop(tmin, tmax)
print(epochs.times)
gives:
[-1.00064103 -0.99897607 -0.99731111 ..., 0.69928325 0.70094821
0.70261317]
[-0.99897607 -0.99731111 -0.99564615 ..., 0.69928325 0.70094821
0.70261317]
I feel like the outputs should be the same, but in the second case the first index gets lopped off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is worth making consistent between the two, my intuition is that we should use the behavior of the crop
function. If a user wants to have a length 1024 array, to me it makes the most sense to do tmin + 1024. / sfreq
, not do tmin + 1023. / sfreq
and have to remember that it's inclusive on tmin so it keeps an extra index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah just saw your comment, I'll open an issue and use the crop function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we'll have to discuss further in the issue. We have to be careful about changes like this because they could break people's code.
other than these minor comments, LGTM. I am offline tomorrow but if you fix these, I'll merge the day after. |
OK I think I've addressed all the comments there - see my changes to the docstrings/example/whatsnew and LMKWYT. |
8037702
to
fedd0f8
Compare
great! ... but coveralls is complaining. @Eric89GXL is this something to be concerned about or is coveralls broken? |
# Combining/reshaping to original data shape | ||
psd = psd.reshape(np.hstack([dshape, -1])) | ||
if ndim_in == 1: | ||
psd = psd.squeeze() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
squeeze is evil. It squeezes all the dimensions equal to 1. I've been bitten by this in the past. If you know it's the first dim that is one just do as it was before psd = psd[0, :]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good point, will revert it
done with my review. Looks really nice. thanks @choldgraf let me know when you addressed my comments |
I suspect coveralls is broken because it was reporting bad coverage for my
PR too
|
@choldgraf don't give up. You are almost there :) |
Ummm, I think @agramfort's suggestions may have uncovered another bug. I tried creating sinusoidal data by basing RawArray off of Raw, and ran into some strange behavior when creating Epochs. Basically, if I build an Epochs object off of RawArray, then no Epochs get pulled. I was able to recreate it with this code: # These are paths to the test data
raw_fname = '/Users/choldgraf/src/python/mne-python/mne/io/tests/data/test_raw.fif'
ev_fname = '/Users/choldgraf/src/python/mne-python/mne/io/tests/data/test-eve.fif'
# Load data
raw = mne.io.Raw(raw_fname)
raw_arr = mne.io.RawArray(raw[:, :][0], raw.info)
ev = mne.read_events(ev_fname)
# Create epochs objects and print shape
ep = mne.Epochs(raw, ev, 1, -.5, 1, preload=True)
ep_arr = mne.Epochs(raw_arr, ev, 1, -.5, 1, preload=True)
for e in [ep, ep_arr]:
print(e._data.shape) This outputs:
So when Epochs is created using a RawArray as input, none of the events are being pulled... I should have known that this PR would get more complicated ;) |
The first_samp is probably different, and that greys factored into the
events. Not really a bug but could be better documented. You can confirm by
looking at the drop_log entries
|
Yeah - the differences between the drop logs are that all the non-ignored On Sat, Jan 16, 2016 at 12:31 PM, Eric Larson notifications@github.com
|
Correct, it's not in info. It's a neuromag thing having to do with when raw
data recording actually started. Subtract the first samp from the event
times and it should work for the raw array
|
OK - in the latest commit it now does the same stuff but on a signal w/ On Sat, Jan 16, 2016 at 2:04 PM, Eric Larson notifications@github.com
|
I think you'd also need to copy _last_samps and/or _raw_lengths to be
complete. But subtracting from the events seems a bit cleaner to me because
it avoids private attributes entirely.
|
ac7cd1e
to
74f92d1
Compare
ok cool - see latest push. that look good? I'm just doing ev[:, 0] -= On Sat, Jan 16, 2016 at 3:08 PM, Eric Larson notifications@github.com
|
Yeah that should be fine
|
I am happy ! +1 for merge thx heaps @choldgraf @Eric89GXL or @jasmainak I'll let you merge if you're happy |
Bravo @choldgraf ! congrats on the merge 🍻 |
🍻 ! |
thanks for all the help @jasmainak @Eric89GXL @agramfort |
now you need to find another occupation for today sorry :)
|
This is a quick update that adds information to docstring so people know where to look for the under-the-hood PSD estimation. It also adds code to keep the estimated psd/frequencies with the object after the transform method is called.