-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG+4: Epochs metadata #4414
MRG+4: Epochs metadata #4414
Conversation
Aaand I broke the epochs tests. OK so I think this probably a problem w/ the logic that happens when you call How should we handle this? Essentially the question is "how does epochs know when a string input corresponds to a pandas query, vs. when it corresponds to a 'current behavior' field"? |
You could try the string behavior, if it fails, fall back to Pandas, if it fails, throw error. Tell people not to have their Pandas entries and |
So right now I'm doing the opposite I think :-) basically:
Though I agree I think it should be the other way around...lemme try that |
mne/epochs.py
Outdated
"""Compute average of epochs. | ||
|
||
Parameters | ||
---------- | ||
picks : array-like of int | None | ||
If None only MEG, EEG, SEEG, ECoG, and fNIRS channels are kept | ||
otherwise the channels indices in picks are kept. | ||
by : string | list of strings | None | ||
If ``self.metadata`` is a DataFrame, return averages grouped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"grouped by"? Do you get a dict, lists, ..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call - need to add that in the docstring. it'll be a dictionary
mne/epochs.py
Outdated
'order to use `by`') | ||
metadata = self.metadata.reset_index() | ||
groups = {} | ||
for name, ixs in metadata.groupby(by=by): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When did Alex allow a soft dependency on Pandas :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it took a lot of sweet-talking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and a couple of 🍻 :)
tutorials/plot_metadata_epochs.py
Outdated
|
||
# Load the data from the interwebz (XXX need to fix this) | ||
varname = 'https://www.dropbox.com/s/5y2rv7vlgilh52y/KWORD_VARIABLES_DGMH2015.txt?dl=1' | ||
dataname = 'https://www.dropbox.com/s/6mpunoswlxaa9bi/KWORD_ERP_LEXICAL_DECISION_DGMH2015.txt?dl=1' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not heard back yet from Grainger et al if we can repost this ...
tutorials/plot_metadata_epochs.py
Outdated
Loading the data | ||
================ | ||
First we'll load the data...this is unnecessarily complex right now because | ||
the data is stored as a dataframe...we should fix this :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, ff we get the OK to rehost, we'll just use an epochs object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can always use another dataset too - we just need something that's got complex trial structure. I can probably make my data available for this but it is ECoG and we probably want something that's MEG/EEG for a tutorial like this (at least until ECoG is more of a first-class citizen in MNE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to make ECoG a first-class citizen is by using it more in examples. I think we intend for it to be one, so +1 for using that dataset if it shows the functionality better (or at least as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough - I guess in my mind it's not really a first-class citizen until the same or similar behavior can be expected for EEG/MEG/ECoG, and right now there is no visualization support at all for ecog, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now there is no visualization support at all for ecog, no?
There is for law/epochs/evoked (right), but no source-level stuff AFAIK. Can you open an issue (or comment on an existing one) about what to do? I wonder if we can do something cool with PySurfer for integrating data. We can brainstorm on Gitter, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@choldgraf can we use your ECoG data to make one single clean tutorial and get rid of examples/preprocessing/plot_metadata_query.py
/ unify it here?
I haven't thought about it in too much detail. Just one suggestion: would there be a way to more directly integrate it with "/"-matching? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm -1 on adding a regress
method
mne/epochs.py
Outdated
@@ -818,6 +869,59 @@ def _compute_mean_or_stderr(self, picks, mode='ave'): | |||
return self._evoked_from_epoch_data(data, self.info, picks, n_events, | |||
kind, self._name) | |||
|
|||
def regress(self, on, fit_intercept=True, by=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? Why putting this in the object??
I don't think that's the right way to go. We should get users to understand how to implement whatever analysis they need using standard packages, not creating wrappers in all objects.
There already exists multiple ways to do linear regression in mne, e.g. mne.stats.linear_regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pinging @agramfort on this one...he wrote it into the prototype that we put together and nobody else complained about it when we showed folks on the gitter so I wrote it into here as well. I do see @kingjr's point...WDYT alex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for leaving it out for now. I agree with @kingjr that it's better to keep it separate if possible. We can always add it later if we need to, let's proceed in multiple PRs to simplify API considerations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the linear_regression code is a bit arcane. I don't mind seeing an extra method. Although I don't like "on".
How close can we get to sklearn style? How about fit_transform(X)?
mne/epochs.py
Outdated
# Try metadata matching | ||
idx = np.where(self.metadata.eval(keys[0]).values)[0] | ||
return idx | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify exception
@@ -0,0 +1,118 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many lines are too long in this tutorial no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this tutorial is in very rough form - it'll probably change a lot but I wanted it up here to give an idea for the API etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add these to documentation.rst
so we can see where they will live
I have the feeling that this will require some community discussion :) shall we keep this for the next sprint early 2018? this PR is already a big step forward. |
It seems like if we remove the |
ok then :) let's remove regress and find a good dataset we can host to demo this ! |
+1 I'll just change the module name to NeuroPandas and we can merge. |
+np.finfo['float128'].max :)
|
haha - I will try to get to another iteration on this over the weekend or next week...in the meantime I had to deal with a last-second berkeley bureaucracy graduation crisis :-) |
Yes let's indeed discuss face to face. Also -1 on regress method now. If
you want to experiment make an example that exposes the functionality in
the sense of a demo / pre-API idea.
…On Sat, 29 Jul 2017 at 02:54, Chris Holdgraf ***@***.***> wrote:
haha - I will try to get to another iteration on this over the weekend or
next week...in the meantime I had to deal with a last-second berkeley
bureaucracy graduation crisis :-)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4414 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB0fikRt8mswEPgC4FqHa1u1gyBj-SE5ks5sSoLVgaJpZM4Ofx-h>
.
|
so @Eric89GXL , what's the way to handle I/O here? I don't have a ton of experience with the elektra binary files... |
From @agramfort on Gitter:
So we need to add a new constant to One thing I'm not sure about is how we're going to store the column header names. Currently in MNE we turn a list of strings into a colon-separated single string for writing ( |
yes we use : so far. It's maybe not super robust. I take suggestions
|
would it break things if we used JSON?
|
It should work, I think we used it for serialization in ICA -> fif. Worth a
try. I remember there were some annoying corner cases though. Not
everything could be nicely serialized.
…On Mon, Jul 31, 2017 at 6:01 PM Chris Holdgraf ***@***.***> wrote:
would it break things if we used JSON?
In [16]: data.to_json()
Out[16]: '{"a":{"0":1,"1":3},"b":{"0":2,"1":4}}'
In [17]: pd.read_json(data.to_json())
Out[17]:
a b
0 1 2
1 3 4
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4414 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB0fijIssuyTlzBfx6GllD45iwAKeJhVks5sTfppgaJpZM4Ofx-h>
.
|
This will probably become inefficient for binary data (e.g., float), not
sure if it would matter.
If we do this string dump we can just use FIFF_DESCRIPTION string field,
and thus avoid a new constant
|
so it seems like the event IDs are stored like this:
does that mean that I could just store the metadata by doing
? |
I can imagine scenarios where this will get inefficient. What about going
column wise and depending on dtype use fiff functions to write float/int
matrix, if string then serialize.
…On Mon, 31 Jul 2017 at 19:26, Chris Holdgraf ***@***.***> wrote:
so it seems like the event IDs are stored like this:
mapping_ = ';'.join([k + ':' + str(v) for k, v in
epochs.event_id.items()])
does that mean that I could just store the metadata by doing mapping_ +=
';EVENT_METADATA: %s' % self.metadata.to_json()?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4414 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB0fitUHNcDPBc_nD3hgnQFLlhbJ_eSGks5sTg5DgaJpZM4Ofx-h>
.
|
Another option would be to see if columns can be grouped by type. Then you
could store blocks of same dtype and save an index / column names
separately for reading.
On Mon, 31 Jul 2017 at 19:36, Denis-Alexander Engemann <
denis.engemann@gmail.com> wrote:
… I can imagine scenarios where this will get inefficient. What about going
column wise and depending on dtype use fiff functions to write float/int
matrix, if string then serialize.
On Mon, 31 Jul 2017 at 19:26, Chris Holdgraf ***@***.***>
wrote:
> so it seems like the event IDs are stored like this:
>
> mapping_ = ';'.join([k + ':' + str(v) for k, v in
> epochs.event_id.items()])
>
> does that mean that I could just store the metadata by doing mapping_ +=
> ';EVENT_METADATA: %s' % self.metadata.to_json()?
>
> —
> You are receiving this because you commented.
>
>
> Reply to this email directly, view it on GitHub
> <#4414 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AB0fitUHNcDPBc_nD3hgnQFLlhbJ_eSGks5sTg5DgaJpZM4Ofx-h>
> .
>
|
we could do that...though it sounds significantly more complex. I'm not sure whether performance is something that won't be really noticed until people start doing stuff with really big dataframes. I just tried out the following:
so that's not too bad, no? also way more metadata than one would usually use... |
I would also be in favor of column wise and depending on dtype use fiff
functions to write float/int
matrix, if string then serialize. Also it means that it will be trivial to
load our files in other languages like matlab. It will also be a way to
check that dataframe dtypes are controled (float, int or string)
|
hmmm...ok, that'll add a fair amount of extra complexity to this PR then. Can you give me some pseudocode of how you imagine this working? |
it means that it will be trivial to load our files in other languages
like matlab.
To/from json is standard enough I suspect it will actually be easier than
recreating the necessary conditionals for writing. If MATLAB has some
seamless json support would you be convinced?
It will also be a way to check that dataframe dtypes are controled
(float, int or string)
IIUC json already only supports some reasonable set of types like this.
IIRC, when I've tried to write json with even np.float64 scalars instead of
Python float I've gotten errors.
|
here it is my friend:
https://gist.github.com/agramfort/21d4f43caefef40efaf93b6d2ba2a90a
I needed a hacking moment ...
|
doc/whats_new.rst
Outdated
@@ -19,7 +19,7 @@ Current | |||
Changelog | |||
~~~~~~~~~ | |||
|
|||
- Nothing yet | |||
- Add support for metadata in :class:`mne.Epochs` by `Chris Holdgraf`_, `Jona Sassenhagen`_, and `Eric Larson`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add @agramfort in there as well...he and I mapped out the original prototypes for this at SciPy 2017!
my ego is flattered :)
|
Go neuropandas
|
I think this is what we expect it to look like: @jona-sassenhagen you have some work to do on the |
(hopefully #4526 can follow closely behind this one) |
I'll be on it as soon as this is merged! |
🍻
|
This is a rough draft of the metadata attribute for the
Epochs
object. The code is mostly there, and I put together a tutorial + some tests that still need tweaking. Let's see what the rendered circle output looks like and then decide whether we like it or not :-)The main thing this does is:
metadata
attribute toEpochs
objects. This is a dataframe that can be stored w/ the object.__getattr__
method in Epochs. (see example)Todo
ping @agramfort @Eric89GXL @jona-sassenhagen @kingjr
Follow-up PRs
groupby
functionality for makingEvoked
(orEpochs
?) instances