Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: refactor PSD functions #2710

Merged
merged 21 commits into from Jan 17, 2016

Conversation

choldgraf
Copy link
Contributor

This is a quick update that adds information to docstring so people know where to look for the under-the-hood PSD estimation. It also adds code to keep the estimated psd/frequencies with the object after the transform method is called.

@choldgraf choldgraf mentioned this pull request Dec 15, 2015
@jasmainak
Copy link
Member

I can't seem to see your commit. Did you update the .rst file only or did you add the output htmls to the commit as well?

scikit-learn pipelines. It relies heavily on the multitaper_psd
function found in the time_frequency module. Running the transform
method will create attributes for the estimated psd as well as the
frequencies used, on top of return these values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update See Also?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "on top of return these values." mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that freqs and data will be created as attributes when you call transform, on top of being returned by transform. (though it's only data that is returned, not freqs...it's a poorly written phrase).

@choldgraf
Copy link
Contributor Author

I added a see also section - let me know if that's what you means. Regarding creating the attributes, I'm not sure how to do it differently (unless I shouldn't do it at all)


See Also
--------
multitaper_psd, compute_epochs_psd, compute_raw_psd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you try building the doc? does it hyperlink properly? You can do:

$ cd doc/
$ make dev-html_noplot

@jasmainak
Copy link
Member

I think the convention for attributes is that you add it to the fit method, not to the transform method. And they end with an underscore. See here: http://scikit-learn.org/stable/tutorial/statistical_inference/settings.html#estimators-objects. I think you can't add an attribute to a transform method.

@choldgraf
Copy link
Contributor Author

Ah I see what you were referring to. regarding underscores and such that totally makes sense. For putting it in fit vs transform, this object already handles that strangely already. The fit method currently does nothing, it only returns self...so I wasn't sure what to do there...

@jasmainak
Copy link
Member

For now, I'd say ... let's not add the attributes here. This calls for an update to the time-frequency module as we discussed here: #2290

@larsoner
Copy link
Member

FYI you have a flake error

@choldgraf
Copy link
Contributor Author

I just spent 5 minutes looking for an anti-cornflakes meme on the internet, no luck. Instead I will focus on doing actual work...

It sounds like this PR will change a bit either way. It sounds like the best thing is to add epochs functionality to the time_frequency module, and then have this sklearn structure call that function instead.

So I guess the question is does this warrant its own function (e.g., multitaper_psd_epochs, or should it be added as a flag to compute_epochs_psd, e.g. compute_epochs_psd(...kind='mt'...)?

@jasmainak
Copy link
Member

I vote for a new argument called method which is string type. So, no new function.

@jasmainak
Copy link
Member

Also, please remember to update a couple of tests :)

@choldgraf
Copy link
Contributor Author

naturally :)

On Wed, Dec 16, 2015 at 11:36 AM, Mainak Jas notifications@github.com
wrote:

Also, please remember to update a couple of tests :)


Reply to this email directly or view it on GitHub
#2710 (comment)
.

@agramfort
Copy link
Member

let us know when you did the necessary changes.

@choldgraf
Copy link
Contributor Author

will do - trying to finish a PR in h5io right now but I will try to get to
this soon thereafter

On Fri, Dec 18, 2015 at 2:11 PM, Alexandre Gramfort <
notifications@github.com> wrote:

let us know when you did the necessary changes.


Reply to this email directly or view it on GitHub
#2710 (comment)
.

@choldgraf
Copy link
Contributor Author

OK there's a first step. I made 2 main changes:

  1. Added a method kw to the compute_epoch_psd function, this lets you do either welch or multitaper psd estimation.
  2. Added support for arrays instead of only Epochs objects. Since the PSDEstimator expects an array this would allow it to call this function instead of doing its own multitaper estimation. Let me know if that's over-reaching and we shouldn't add the code to support arrays...it just wasn't much extra effort...

@choldgraf
Copy link
Contributor Author

More updates - the PSDEstimator should now work using compute_epochs_psd so it's a pretty simple class at this point. Also fixed up the attribute creation etc. If these look reasonable then I'll make some tests as well...

@@ -143,6 +147,8 @@ def compute_epochs_psd(epochs, picks=None, fmin=0, fmax=np.inf, tmin=None,
to be <= n_fft.
proj : bool
Apply SSP projection vectors.
method : 'welch' | 'multitaper'
The method to use in calculating the PSD.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain in one line what each method does? For details, we should add references in the manual

@jasmainak
Copy link
Member

Added support for arrays instead of only Epochs objects. Since the PSDEstimator expects an array this would allow it to call this function instead of doing its own multitaper estimation. Let me know if that's over-reaching and we shouldn't add the code to support arrays...it just wasn't much extra effort...

I noticed that you added a new argument Fs to support this. I would instead just support epochs object, but have a private function support the Fs + array + epochs. When you call it from compute_psd_epochs, it would call this private function with epochs and when you call it from the PSDEstimator, you would call this private function with Fs + array. I can't think of a good reason why you would want to support compute_psd_epochs for arrays in the public API. Even the name of the function has epochs in it :)

if picks is not None:
data = np.dot(proj[picks][:, picks], data)
else:
data = np.dot(proj, data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do:

if picks is not None:
   proj = proj[picks][:, picks]
data = np.dot(proj, data)

Saves you one line ;)

@jasmainak
Copy link
Member

@choldgraf thanks for making this contribution. Please go ahead and add the tests :)

@@ -248,6 +248,12 @@ def inverse_transform(self, X, y=None):
class PSDEstimator(TransformerMixin):
"""Compute power spectrum density (PSD) using a multi-taper method

This structures data so that it can be easily incorporated into
scikit-learn pipelines. It relies heavily on the multitaper_psd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence will go away once we add the method argument to the init

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - I wasn't sure if that's something people would want but can do that for sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm ... actually, thinking about it some more -- the problem with a method argument is that you will have to pass the different options for multitaper / welch too. Maybe, we can leave it for now ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's also why I didn't put it in there in the first place.
Functions like spectral_connectivity (and the newly-modified
compute_epochs_psd) handle it by having a bunch of kwargs for the
respective functions (e.g., mt_adaptive, mt_normalize). I decided to pass
on that this time around though it shouldn't be a big deal to extend the
function to make it work with a method parameter.

On Sun, Dec 20, 2015 at 3:55 AM, Mainak Jas notifications@github.com
wrote:

In mne/decoding/transformer.py
#2710 (comment):

@@ -248,6 +248,12 @@ def inverse_transform(self, X, y=None):
class PSDEstimator(TransformerMixin):
"""Compute power spectrum density (PSD) using a multi-taper method

  • This structures data so that it can be easily incorporated into
  • scikit-learn pipelines. It relies heavily on the multitaper_psd

hmm ... actually, thinking about it some more -- the problem with a method
argument is that you will have to pass the different options for
multitaper / welch too. Maybe, we can leave it for now ...


Reply to this email directly or view it on GitHub
https://github.com/mne-tools/mne-python/pull/2710/files#r48102657.

@@ -33,6 +38,9 @@
epochs = mne.Epochs(raw, events, event_id, tmin, tmax,
proj=True, baseline=(None, 0), preload=True,
reject=dict(grad=4000e-13, eog=150e-6))
# Pull 2**n points to speed up computation
epochs = EpochsArray(epochs.get_data()[..., :1024], epochs.info,
epochs.events, tmin, epochs.event_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not?

tmax = tmin + 1023. / raw.info['sfreq']

and skip this line altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed to me when I tried this before that there were some inconsistencies with the indexing using times (e.g., it would be off by 1 index in one direction or the other), so if I ever want my signal to be length 2**N I just index the data directly. I can use tmin/tmax if you think that's better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long ago did you have that problem? We changed to using something that is a little bit more lenient with the floating point arithmetic that goes on under the hood about a year ago IIRC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a while ago - ok I'll give it a shot and see if it works.

On Thu, Jan 14, 2016 at 9:40 AM, Eric Larson notifications@github.com
wrote:

In examples/time_frequency/plot_epochs_spectra.py
#2710 (comment):

@@ -33,6 +38,9 @@
epochs = mne.Epochs(raw, events, event_id, tmin, tmax,
proj=True, baseline=(None, 0), preload=True,
reject=dict(grad=4000e-13, eog=150e-6))
+# Pull 2**n points to speed up computation
+epochs = EpochsArray(epochs.get_data()[..., :1024], epochs.info,

  •                 epochs.events, tmin, epochs.event_id)
    

How long ago did you have that problem? We changed to using something that
is a little bit more lenient with the floating point arithmetic that goes
on under the hood about a year ago IIRC.


Reply to this email directly or view it on GitHub
https://github.com/mne-tools/mne-python/pull/2710/files#r49758067.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so it sortof works but there's still weird stuff that happens. For example:

tmin = -1.
tmax = tmin + 1023. / raw.info['sfreq']
epochs = mne.Epochs(raw, events, event_id, tmin, tmax)
print(epochs.get_data().shape)

will print shape (..., 1024)

but if I do

tmin, tmax = -1., 1.
raw.info['bads'] += ['MEG 2443']  # bads
epochs = mne.Epochs(raw, events, event_id, tmin, tmax,
                    proj=True, baseline=(None, 0), preload=True,
                    reject=dict(grad=4000e-13, eog=150e-6))
tmax = tmin + 1023. / raw.info['sfreq']
epochs.crop(tmin, tmax)
print(epochs.get_data().shape)

Now it prints shape (..., 1023)

So depending on whether you define the tmin/tmax before turning into epochs, vs cropping the epochs after the fact, then you are off by 1. So I think I ran into this inconsistency and just said I would index manually because I wanted to be sure that it was correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you open an issue with a small code snippet using RawArray or the sample dataset to reproduce? We should fix that if possible. For now I'm fine leaving this example to manually set the bounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the relevant lines are in epochs.py: in the init at 290, get_data at 1154 and the cropping function 1435

The issue is the first index. E.g.:

tmin = -1.
tmax = tmin + 1023. / raw.info['sfreq']
epochs = mne.Epochs(raw, events, event_id, tmin, tmax)
print(epochs.times)
epochs.crop(tmin, tmax)
print(epochs.times)

gives:
[-1.00064103 -0.99897607 -0.99731111 ..., 0.69928325 0.70094821
0.70261317]
[-0.99897607 -0.99731111 -0.99564615 ..., 0.69928325 0.70094821
0.70261317]

I feel like the outputs should be the same, but in the second case the first index gets lopped off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is worth making consistent between the two, my intuition is that we should use the behavior of the crop function. If a user wants to have a length 1024 array, to me it makes the most sense to do tmin + 1024. / sfreq, not do tmin + 1023. / sfreq and have to remember that it's inclusive on tmin so it keeps an extra index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah just saw your comment, I'll open an issue and use the crop function here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we'll have to discuss further in the issue. We have to be careful about changes like this because they could break people's code.

@jasmainak
Copy link
Member

other than these minor comments, LGTM. I am offline tomorrow but if you fix these, I'll merge the day after.

@choldgraf
Copy link
Contributor Author

OK I think I've addressed all the comments there - see my changes to the docstrings/example/whatsnew and LMKWYT.

@jasmainak
Copy link
Member

great! ... but coveralls is complaining. @Eric89GXL is this something to be concerned about or is coveralls broken?

# Combining/reshaping to original data shape
psd = psd.reshape(np.hstack([dshape, -1]))
if ndim_in == 1:
psd = psd.squeeze()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

squeeze is evil. It squeezes all the dimensions equal to 1. I've been bitten by this in the past. If you know it's the first dim that is one just do as it was before psd = psd[0, :]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good point, will revert it

@agramfort
Copy link
Member

done with my review. Looks really nice. thanks @choldgraf

let me know when you addressed my comments

@larsoner
Copy link
Member

larsoner commented Jan 16, 2016 via email

@jasmainak
Copy link
Member

@choldgraf don't give up. You are almost there :)

@choldgraf
Copy link
Contributor Author

Ummm, I think @agramfort's suggestions may have uncovered another bug. I tried creating sinusoidal data by basing RawArray off of Raw, and ran into some strange behavior when creating Epochs. Basically, if I build an Epochs object off of RawArray, then no Epochs get pulled. I was able to recreate it with this code:

# These are paths to the test data
raw_fname = '/Users/choldgraf/src/python/mne-python/mne/io/tests/data/test_raw.fif'
ev_fname = '/Users/choldgraf/src/python/mne-python/mne/io/tests/data/test-eve.fif'

# Load data
raw = mne.io.Raw(raw_fname)
raw_arr = mne.io.RawArray(raw[:, :][0], raw.info)
ev = mne.read_events(ev_fname)

# Create epochs objects and print shape
ep = mne.Epochs(raw, ev, 1, -.5, 1, preload=True)
ep_arr = mne.Epochs(raw_arr, ev, 1, -.5, 1, preload=True)
for e in [ep, ep_arr]:
    print(e._data.shape)

This outputs:

(7, 376, 902)
(0, 376, 902)

So when Epochs is created using a RawArray as input, none of the events are being pulled...

I should have known that this PR would get more complicated ;)

@larsoner
Copy link
Member

larsoner commented Jan 16, 2016 via email

@choldgraf
Copy link
Contributor Author

Yeah - the differences between the drop logs are that all the non-ignored
epochs are marked as "TOO_SHORT". And yep, the first_samp attributes are
different between the two. I'm not really sure what the first_samp is
for...does it mark where the file starts reading in the data or something?
And I guess that's not embedded in the 'info' file.

On Sat, Jan 16, 2016 at 12:31 PM, Eric Larson notifications@github.com
wrote:

The first_samp is probably different, and that greys factored into the
events. Not really a bug but could be better documented. You can confirm by
looking at the drop_log entries


Reply to this email directly or view it on GitHub
#2710 (comment)
.

@larsoner
Copy link
Member

larsoner commented Jan 16, 2016 via email

@choldgraf
Copy link
Contributor Author

OK - in the latest commit it now does the same stuff but on a signal w/
sinusoidal waves, rather than subtract from the events file I just copied
over _first_samps, if that's ok. WDYT?

On Sat, Jan 16, 2016 at 2:04 PM, Eric Larson notifications@github.com
wrote:

Correct, it's not in info. It's a neuromag thing having to do with when raw
data recording actually started. Subtract the first samp from the event
times and it should work for the raw array


Reply to this email directly or view it on GitHub
#2710 (comment)
.

@larsoner
Copy link
Member

larsoner commented Jan 16, 2016 via email

@choldgraf
Copy link
Contributor Author

ok cool - see latest push. that look good? I'm just doing ev[:, 0] -=
first_samp

On Sat, Jan 16, 2016 at 3:08 PM, Eric Larson notifications@github.com
wrote:

I think you'd also need to copy _last_samps and/or _raw_lengths to be
complete. But subtracting from the events seems a bit cleaner to me because
it avoids private attributes entirely.


Reply to this email directly or view it on GitHub
#2710 (comment)
.

@larsoner
Copy link
Member

larsoner commented Jan 16, 2016 via email

@agramfort
Copy link
Member

I am happy ! +1 for merge

thx heaps @choldgraf

@Eric89GXL or @jasmainak I'll let you merge if you're happy

jasmainak added a commit that referenced this pull request Jan 17, 2016
@jasmainak jasmainak merged commit 8e7010f into mne-tools:master Jan 17, 2016
@jasmainak
Copy link
Member

Bravo @choldgraf ! congrats on the merge 🍻

@agramfort
Copy link
Member

🍻 !

@choldgraf
Copy link
Contributor Author

@choldgraf
Copy link
Contributor Author

thanks for all the help @jasmainak @Eric89GXL @agramfort

@agramfort
Copy link
Member

agramfort commented Jan 17, 2016 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants