MRG+4: Epochs metadata #4414

choldgraf · 2017-07-21T19:07:03Z

This is a rough draft of the metadata attribute for the Epochs object. The code is mostly there, and I put together a tutorial + some tests that still need tweaking. Let's see what the rendered circle output looks like and then decide whether we like it or not :-)

The main thing this does is:

Lets you add a metadata attribute to Epochs objects. This is a dataframe that can be stored w/ the object.
Lets you do pandas query-style things with the __getattr__ method in Epochs. (see example)

Todo

Add I/O stuff
tutorial

ping @agramfort @Eric89GXL @jona-sassenhagen @kingjr

Follow-up PRs

Deal with groupby functionality for making Evoked (or Epochs?) instances

choldgraf · 2017-07-21T19:48:49Z

Aaand I broke the epochs tests. OK so I think this probably a problem w/ the logic that happens when you call __getitem__ with epochs.

How should we handle this? Essentially the question is "how does epochs know when a string input corresponds to a pandas query, vs. when it corresponds to a 'current behavior' field"?

larsoner · 2017-07-21T20:02:20Z

How should we handle this? Essentially the question is "how does epochs know when a string input corresponds to a pandas query, vs. when it corresponds to a 'current behavior' field"?

You could try the string behavior, if it fails, fall back to Pandas, if it fails, throw error. Tell people not to have their Pandas entries and event_ids overlap.

choldgraf · 2017-07-21T20:11:25Z

So right now I'm doing the opposite I think :-)

basically:

see if pandas is installed, if so:
Try running the string as a query
If that succeeds then proceed, if that fails then try:
Running the string as the current string behavior
If that fails, then error

Though I agree I think it should be the other way around...lemme try that

jona-sassenhagen · 2017-07-25T10:46:47Z

mne/epochs.py

        """Compute average of epochs.

        Parameters
        ----------
        picks : array-like of int | None
            If None only MEG, EEG, SEEG, ECoG, and fNIRS channels are kept
            otherwise the channels indices in picks are kept.
+        by : string | list of strings | None
+            If ``self.metadata`` is a DataFrame, return averages grouped


"grouped by"? Do you get a dict, lists, ..?

good call - need to add that in the docstring. it'll be a dictionary

jona-sassenhagen · 2017-07-25T10:47:19Z

mne/epochs.py

+                             'order to use `by`')
+        metadata = self.metadata.reset_index()
+        groups = {}
+        for name, ixs in metadata.groupby(by=by):


When did Alex allow a soft dependency on Pandas :D

it took a lot of sweet-talking

and a couple of 🍻 :)

jona-sassenhagen · 2017-07-25T10:49:56Z

tutorials/plot_metadata_epochs.py

+
+# Load the data from the interwebz (XXX need to fix this)
+varname = 'https://www.dropbox.com/s/5y2rv7vlgilh52y/KWORD_VARIABLES_DGMH2015.txt?dl=1'
+dataname = 'https://www.dropbox.com/s/6mpunoswlxaa9bi/KWORD_ERP_LEXICAL_DECISION_DGMH2015.txt?dl=1'


I've not heard back yet from Grainger et al if we can repost this ...

jona-sassenhagen · 2017-07-25T10:50:31Z

tutorials/plot_metadata_epochs.py

+Loading the data
+================
+First we'll load the data...this is unnecessarily complex right now because
+the data is stored as a dataframe...we should fix this :-)


Yeah, ff we get the OK to rehost, we'll just use an epochs object.

we can always use another dataset too - we just need something that's got complex trial structure. I can probably make my data available for this but it is ECoG and we probably want something that's MEG/EEG for a tutorial like this (at least until ECoG is more of a first-class citizen in MNE)

One way to make ECoG a first-class citizen is by using it more in examples. I think we intend for it to be one, so +1 for using that dataset if it shows the functionality better (or at least as well).

fair enough - I guess in my mind it's not really a first-class citizen until the same or similar behavior can be expected for EEG/MEG/ECoG, and right now there is no visualization support at all for ecog, no?

right now there is no visualization support at all for ecog, no?

There is for law/epochs/evoked (right), but no source-level stuff AFAIK. Can you open an issue (or comment on an existing one) about what to do? I wonder if we can do something cool with PySurfer for integrating data. We can brainstorm on Gitter, too.

@choldgraf can we use your ECoG data to make one single clean tutorial and get rid of examples/preprocessing/plot_metadata_query.py / unify it here?

jona-sassenhagen · 2017-07-25T10:51:42Z

I haven't thought about it in too much detail. Just one suggestion: would there be a way to more directly integrate it with "/"-matching?

kingjr

I'm -1 on adding a regress method

kingjr · 2017-07-25T13:06:45Z

mne/epochs.py

@@ -818,6 +869,59 @@ def _compute_mean_or_stderr(self, picks, mode='ave'):
        return self._evoked_from_epoch_data(data, self.info, picks, n_events,
                                            kind, self._name)

+    def regress(self, on, fit_intercept=True, by=None):


? Why putting this in the object??

I don't think that's the right way to go. We should get users to understand how to implement whatever analysis they need using standard packages, not creating wrappers in all objects.

There already exists multiple ways to do linear regression in mne, e.g. mne.stats.linear_regression

pinging @agramfort on this one...he wrote it into the prototype that we put together and nobody else complained about it when we showed folks on the gitter so I wrote it into here as well. I do see @kingjr's point...WDYT alex?

+1 for leaving it out for now. I agree with @kingjr that it's better to keep it separate if possible. We can always add it later if we need to, let's proceed in multiple PRs to simplify API considerations.

cool, will do

IMO the linear_regression code is a bit arcane. I don't mind seeing an extra method. Although I don't like "on".
How close can we get to sklearn style? How about fit_transform(X)?

kingjr · 2017-07-25T13:07:17Z

mne/epochs.py

+                # Try metadata matching
+                idx = np.where(self.metadata.eval(keys[0]).values)[0]
+                return idx
+            except:


specify exception

kingjr · 2017-07-25T13:10:45Z

tutorials/plot_metadata_epochs.py

@@ -0,0 +1,118 @@
+"""


many lines are too long in this tutorial no?

yeah this tutorial is in very rough form - it'll probably change a lot but I wanted it up here to give an idea for the API etc

Please add these to documentation.rst so we can see where they will live

agramfort · 2017-07-28T20:25:55Z

I have the feeling that this will require some community discussion :)

shall we keep this for the next sprint early 2018? this PR is already a big step forward.

larsoner · 2017-07-28T20:27:29Z

I have the feeling that this will require some community discussion :)

It seems like if we remove the regress function, then we have already converged, no?

agramfort · 2017-07-28T20:29:37Z

ok then :) let's remove regress and find a good dataset we can host to demo this !

choldgraf · 2017-07-28T20:30:42Z

+1

I'll just change the module name to NeuroPandas and we can merge.

agramfort · 2017-07-28T20:32:09Z

+np.finfo['float128'].max :)

choldgraf · 2017-07-29T00:54:45Z

haha - I will try to get to another iteration on this over the weekend or next week...in the meantime I had to deal with a last-second berkeley bureaucracy graduation crisis :-)

dengemann · 2017-07-29T08:54:01Z

Yes let's indeed discuss face to face. Also -1 on regress method now. If you want to experiment make an example that exposes the functionality in the sense of a demo / pre-API idea.

…

On Sat, 29 Jul 2017 at 02:54, Chris Holdgraf ***@***.***> wrote: haha - I will try to get to another iteration on this over the weekend or next week...in the meantime I had to deal with a last-second berkeley bureaucracy graduation crisis :-) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4414 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB0fikRt8mswEPgC4FqHa1u1gyBj-SE5ks5sSoLVgaJpZM4Ofx-h> .

choldgraf · 2017-07-31T15:12:10Z

so @Eric89GXL , what's the way to handle I/O here? I don't have a ton of experience with the elektra binary files...

larsoner · 2017-07-31T15:36:51Z

From @agramfort on Gitter:

I would create a new FIFFB_EPOCHS_METADATA block. We need to store the columns as list of strings and then arrays on int, float or str

So we need to add a new constant to mne.io.constants.FIFF for this new block. Then based on what @agramfort wrote, if you look at the existing Epochs I/O I think it will already make sense to you. There are pretty simple functions to open/close the new block, and write/read chunks / tags of data as necessary.

One thing I'm not sure about is how we're going to store the column header names. Currently in MNE we turn a list of strings into a colon-separated single string for writing (*_name_list). @agramfort do we require no : in the header titles, or do some sanitizing during I/O? (@choldgraf you can proceed with trying to implement the solution before this question is answered.)

agramfort · 2017-07-31T15:42:14Z

yes we use : so far. It's maybe not super robust. I take suggestions

choldgraf · 2017-07-31T16:01:45Z

would it break things if we used JSON?

In [16]: data.to_json()
Out[16]: '{"a":{"0":1,"1":3},"b":{"0":2,"1":4}}'

In [17]: pd.read_json(data.to_json())
Out[17]:
   a  b
0  1  2
1  3  4

dengemann · 2017-07-31T16:04:11Z

It should work, I think we used it for serialization in ICA -> fif. Worth a try. I remember there were some annoying corner cases though. Not everything could be nicely serialized.

…

On Mon, Jul 31, 2017 at 6:01 PM Chris Holdgraf ***@***.***> wrote: would it break things if we used JSON? In [16]: data.to_json() Out[16]: '{"a":{"0":1,"1":3},"b":{"0":2,"1":4}}' In [17]: pd.read_json(data.to_json()) Out[17]: a b 0 1 2 1 3 4 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4414 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB0fijIssuyTlzBfx6GllD45iwAKeJhVks5sTfppgaJpZM4Ofx-h> .

larsoner · 2017-07-31T16:08:09Z

This will probably become inefficient for binary data (e.g., float), not sure if it would matter. If we do this string dump we can just use FIFF_DESCRIPTION string field, and thus avoid a new constant

choldgraf · 2017-07-31T17:26:27Z

so it seems like the event IDs are stored like this:

mapping_ = ';'.join([k + ':' + str(v) for k, v in
                         epochs.event_id.items()])

does that mean that I could just store the metadata by doing

mapping_ += ';EVENT_METADATA: %s' % self.metadata.to_json()

?

dengemann · 2017-07-31T17:36:17Z

I can imagine scenarios where this will get inefficient. What about going column wise and depending on dtype use fiff functions to write float/int matrix, if string then serialize.

…

On Mon, 31 Jul 2017 at 19:26, Chris Holdgraf ***@***.***> wrote: so it seems like the event IDs are stored like this: mapping_ = ';'.join([k + ':' + str(v) for k, v in epochs.event_id.items()]) does that mean that I could just store the metadata by doing mapping_ += ';EVENT_METADATA: %s' % self.metadata.to_json()? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4414 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB0fitUHNcDPBc_nD3hgnQFLlhbJ_eSGks5sTg5DgaJpZM4Ofx-h> .

dengemann · 2017-07-31T17:37:51Z

Another option would be to see if columns can be grouped by type. Then you could store blocks of same dtype and save an index / column names separately for reading. On Mon, 31 Jul 2017 at 19:36, Denis-Alexander Engemann < denis.engemann@gmail.com> wrote:

…

I can imagine scenarios where this will get inefficient. What about going column wise and depending on dtype use fiff functions to write float/int matrix, if string then serialize. On Mon, 31 Jul 2017 at 19:26, Chris Holdgraf ***@***.***> wrote: > so it seems like the event IDs are stored like this: > > mapping_ = ';'.join([k + ':' + str(v) for k, v in > epochs.event_id.items()]) > > does that mean that I could just store the metadata by doing mapping_ += > ';EVENT_METADATA: %s' % self.metadata.to_json()? > > — > You are receiving this because you commented. > > > Reply to this email directly, view it on GitHub > <#4414 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AB0fitUHNcDPBc_nD3hgnQFLlhbJ_eSGks5sTg5DgaJpZM4Ofx-h> > . >

choldgraf · 2017-07-31T17:46:34Z

we could do that...though it sounds significantly more complex. I'm not sure whether performance is something that won't be really noticed until people start doing stuff with really big dataframes. I just tried out the following:

# 1000 trials, 100 values, all floats
data = np.random.randn(1000, 100)
data = pd.DataFrame(data, columns=['a%s' % ii for ii in range(data.shape[-1])])
%timeit json = data.to_json(); pd.read_json(json)

10 loops, best of 3: 81.4 ms per loop

so that's not too bad, no? also way more metadata than one would usually use...

agramfort · 2017-07-31T18:46:04Z

I would also be in favor of column wise and depending on dtype use fiff functions to write float/int matrix, if string then serialize. Also it means that it will be trivial to load our files in other languages like matlab. It will also be a way to check that dataframe dtypes are controled (float, int or string)

choldgraf · 2017-07-31T18:54:38Z

hmmm...ok, that'll add a fair amount of extra complexity to this PR then. Can you give me some pseudocode of how you imagine this working?

larsoner · 2017-07-31T19:14:53Z

it means that it will be trivial to load our files in other languages

like matlab. To/from json is standard enough I suspect it will actually be easier than recreating the necessary conditionals for writing. If MATLAB has some seamless json support would you be convinced?

It will also be a way to check that dataframe dtypes are controled

(float, int or string) IIUC json already only supports some reasonable set of types like this. IIRC, when I've tried to write json with even np.float64 scalars instead of Python float I've gotten errors.

agramfort · 2017-07-31T19:46:44Z

here it is my friend: https://gist.github.com/agramfort/21d4f43caefef40efaf93b6d2ba2a90a I needed a hacking moment ...

choldgraf · 2017-10-19T15:05:27Z

doc/whats_new.rst

@@ -19,7 +19,7 @@ Current
 Changelog
 ~~~~~~~~~

- Nothing yet
+- Add support for metadata in :class:`mne.Epochs` by `Chris Holdgraf`_, `Jona Sassenhagen`_, and `Eric Larson`_


we should add @agramfort in there as well...he and I mapped out the original prototypes for this at SciPy 2017!

agramfort · 2017-10-19T15:16:11Z

my ego is flattered :)

jona-sassenhagen · 2017-10-19T16:23:31Z

Go neuropandas

larsoner · 2017-10-19T16:30:58Z

I think this is what we expect it to look like:

https://6216-1301584-gh.circle-artifacts.com/0/tmp/circle-artifacts.Ykx8l1p/html/auto_tutorials/plot_metadata_epochs.html

@jona-sassenhagen you have some work to do on the plot_compare_evoked :)

larsoner · 2017-10-19T16:31:15Z

(hopefully #4526 can follow closely behind this one)

jona-sassenhagen · 2017-10-19T16:43:12Z

I'll be on it as soon as this is merged!

choldgraf · 2017-10-19T18:44:05Z

agramfort · 2017-10-19T19:11:01Z

🍻

jona-sassenhagen reviewed Jul 25, 2017

View reviewed changes

kingjr previously requested changes Jul 25, 2017

View reviewed changes

larsoner and others added 10 commits October 19, 2017 10:19

FIX: No numexpr

6c1c9f7

tutorial

ea3ce62

FIX: Many fixes

ead914c

add ref info

c3be7cd

add ref info

afdca11

FIX: Refs

2686c02

jona comments

e066cfd

FIX: Minor tweaks

a0a86a8

docs

4d401d0

FIX: Fix docstring

3f70019

larsoner force-pushed the epochs_meta branch from d0c5acf to 1666e7e Compare October 19, 2017 14:22

FIX: Fix flake

afc1b99

larsoner force-pushed the epochs_meta branch from 1666e7e to afc1b99 Compare October 19, 2017 14:24

FIX: whats_new [ci skip]

60a4097

choldgraf commented Oct 19, 2017

View reviewed changes

larsoner added 2 commits October 19, 2017 11:55

FIX: Fix cmap for CircleCI

afcbbff

FIX: whats_new [ci skip]

d35bee1

larsoner merged commit 7ad946f into mne-tools:master Oct 19, 2017

larsoner deleted the epochs_meta branch October 19, 2017 18:30

larsoner restored the epochs_meta branch October 19, 2017 18:30

larsoner mentioned this pull request Dec 8, 2017

FIX: metadata index behavior does not comply with pandas #4823

Closed

larsoner mentioned this pull request Aug 11, 2018

Allow @ queries for metadata #5400

Closed

choldgraf mentioned this pull request Apr 9, 2020

serializing nested ntbk metadata items executablebooks/MyST-NB#147

Merged

MRG+4: Epochs metadata #4414

MRG+4: Epochs metadata #4414

Conversation

choldgraf commented Jul 21, 2017 • edited Loading

Todo

Follow-up PRs

choldgraf commented Jul 21, 2017

larsoner commented Jul 21, 2017

choldgraf commented Jul 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jona-sassenhagen commented Jul 25, 2017

kingjr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agramfort commented Jul 28, 2017

larsoner commented Jul 28, 2017

agramfort commented Jul 28, 2017

choldgraf commented Jul 28, 2017

agramfort commented Jul 28, 2017 via email

choldgraf commented Jul 29, 2017

dengemann commented Jul 29, 2017 via email

choldgraf commented Jul 31, 2017

larsoner commented Jul 31, 2017

agramfort commented Jul 31, 2017 via email

choldgraf commented Jul 31, 2017

dengemann commented Jul 31, 2017 via email

larsoner commented Jul 31, 2017 via email

choldgraf commented Jul 31, 2017 • edited Loading

dengemann commented Jul 31, 2017 via email

dengemann commented Jul 31, 2017 via email

choldgraf commented Jul 31, 2017

agramfort commented Jul 31, 2017 via email

choldgraf commented Jul 31, 2017

larsoner commented Jul 31, 2017 via email

agramfort commented Jul 31, 2017 via email

Choose a reason for hiding this comment

agramfort commented Oct 19, 2017 via email

jona-sassenhagen commented Oct 19, 2017 via email

larsoner commented Oct 19, 2017

larsoner commented Oct 19, 2017

jona-sassenhagen commented Oct 19, 2017

choldgraf commented Oct 19, 2017

agramfort commented Oct 19, 2017 via email

choldgraf commented Jul 21, 2017 •

edited

Loading

choldgraf commented Jul 31, 2017 •

edited

Loading