Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG+4: Epochs metadata #4414

Merged
merged 26 commits into from
Oct 19, 2017
Merged

MRG+4: Epochs metadata #4414

merged 26 commits into from
Oct 19, 2017

Conversation

choldgraf
Copy link
Contributor

@choldgraf choldgraf commented Jul 21, 2017

This is a rough draft of the metadata attribute for the Epochs object. The code is mostly there, and I put together a tutorial + some tests that still need tweaking. Let's see what the rendered circle output looks like and then decide whether we like it or not :-)

The main thing this does is:

  • Lets you add a metadata attribute to Epochs objects. This is a dataframe that can be stored w/ the object.
  • Lets you do pandas query-style things with the __getattr__ method in Epochs. (see example)

Todo

  • Add I/O stuff
  • tutorial

ping @agramfort @Eric89GXL @jona-sassenhagen @kingjr

Follow-up PRs

  • Deal with groupby functionality for making Evoked (or Epochs?) instances

@choldgraf
Copy link
Contributor Author

Aaand I broke the epochs tests. OK so I think this probably a problem w/ the logic that happens when you call __getitem__ with epochs.

How should we handle this? Essentially the question is "how does epochs know when a string input corresponds to a pandas query, vs. when it corresponds to a 'current behavior' field"?

@larsoner
Copy link
Member

How should we handle this? Essentially the question is "how does epochs know when a string input corresponds to a pandas query, vs. when it corresponds to a 'current behavior' field"?

You could try the string behavior, if it fails, fall back to Pandas, if it fails, throw error. Tell people not to have their Pandas entries and event_ids overlap.

@choldgraf
Copy link
Contributor Author

So right now I'm doing the opposite I think :-)

basically:

  1. see if pandas is installed, if so:
  2. Try running the string as a query
  3. If that succeeds then proceed, if that fails then try:
  4. Running the string as the current string behavior
  5. If that fails, then error

Though I agree I think it should be the other way around...lemme try that

mne/epochs.py Outdated
"""Compute average of epochs.

Parameters
----------
picks : array-like of int | None
If None only MEG, EEG, SEEG, ECoG, and fNIRS channels are kept
otherwise the channels indices in picks are kept.
by : string | list of strings | None
If ``self.metadata`` is a DataFrame, return averages grouped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"grouped by"? Do you get a dict, lists, ..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call - need to add that in the docstring. it'll be a dictionary

mne/epochs.py Outdated
'order to use `by`')
metadata = self.metadata.reset_index()
groups = {}
for name, ixs in metadata.groupby(by=by):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When did Alex allow a soft dependency on Pandas :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took a lot of sweet-talking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a couple of 🍻 :)


# Load the data from the interwebz (XXX need to fix this)
varname = 'https://www.dropbox.com/s/5y2rv7vlgilh52y/KWORD_VARIABLES_DGMH2015.txt?dl=1'
dataname = 'https://www.dropbox.com/s/6mpunoswlxaa9bi/KWORD_ERP_LEXICAL_DECISION_DGMH2015.txt?dl=1'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not heard back yet from Grainger et al if we can repost this ...

Loading the data
================
First we'll load the data...this is unnecessarily complex right now because
the data is stored as a dataframe...we should fix this :-)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ff we get the OK to rehost, we'll just use an epochs object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always use another dataset too - we just need something that's got complex trial structure. I can probably make my data available for this but it is ECoG and we probably want something that's MEG/EEG for a tutorial like this (at least until ECoG is more of a first-class citizen in MNE)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to make ECoG a first-class citizen is by using it more in examples. I think we intend for it to be one, so +1 for using that dataset if it shows the functionality better (or at least as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough - I guess in my mind it's not really a first-class citizen until the same or similar behavior can be expected for EEG/MEG/ECoG, and right now there is no visualization support at all for ecog, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now there is no visualization support at all for ecog, no?

There is for law/epochs/evoked (right), but no source-level stuff AFAIK. Can you open an issue (or comment on an existing one) about what to do? I wonder if we can do something cool with PySurfer for integrating data. We can brainstorm on Gitter, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@choldgraf can we use your ECoG data to make one single clean tutorial and get rid of examples/preprocessing/plot_metadata_query.py / unify it here?

@jona-sassenhagen
Copy link
Contributor

I haven't thought about it in too much detail. Just one suggestion: would there be a way to more directly integrate it with "/"-matching?

kingjr
kingjr previously requested changes Jul 25, 2017
Copy link
Member

@kingjr kingjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -1 on adding a regress method

mne/epochs.py Outdated
@@ -818,6 +869,59 @@ def _compute_mean_or_stderr(self, picks, mode='ave'):
return self._evoked_from_epoch_data(data, self.info, picks, n_events,
kind, self._name)

def regress(self, on, fit_intercept=True, by=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? Why putting this in the object??

I don't think that's the right way to go. We should get users to understand how to implement whatever analysis they need using standard packages, not creating wrappers in all objects.

There already exists multiple ways to do linear regression in mne, e.g. mne.stats.linear_regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinging @agramfort on this one...he wrote it into the prototype that we put together and nobody else complained about it when we showed folks on the gitter so I wrote it into here as well. I do see @kingjr's point...WDYT alex?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for leaving it out for now. I agree with @kingjr that it's better to keep it separate if possible. We can always add it later if we need to, let's proceed in multiple PRs to simplify API considerations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, will do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the linear_regression code is a bit arcane. I don't mind seeing an extra method. Although I don't like "on".
How close can we get to sklearn style? How about fit_transform(X)?

mne/epochs.py Outdated
# Try metadata matching
idx = np.where(self.metadata.eval(keys[0]).values)[0]
return idx
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify exception

@@ -0,0 +1,118 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many lines are too long in this tutorial no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this tutorial is in very rough form - it'll probably change a lot but I wanted it up here to give an idea for the API etc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add these to documentation.rst so we can see where they will live

@agramfort
Copy link
Member

I have the feeling that this will require some community discussion :)

shall we keep this for the next sprint early 2018? this PR is already a big step forward.

@larsoner
Copy link
Member

I have the feeling that this will require some community discussion :)

It seems like if we remove the regress function, then we have already converged, no?

@agramfort
Copy link
Member

ok then :) let's remove regress and find a good dataset we can host to demo this !

@choldgraf
Copy link
Contributor Author

+1

I'll just change the module name to NeuroPandas and we can merge.

@agramfort
Copy link
Member

agramfort commented Jul 28, 2017 via email

@choldgraf
Copy link
Contributor Author

haha - I will try to get to another iteration on this over the weekend or next week...in the meantime I had to deal with a last-second berkeley bureaucracy graduation crisis :-)

@dengemann
Copy link
Member

dengemann commented Jul 29, 2017 via email

@choldgraf
Copy link
Contributor Author

so @Eric89GXL , what's the way to handle I/O here? I don't have a ton of experience with the elektra binary files...

@larsoner
Copy link
Member

From @agramfort on Gitter:

I would create a new FIFFB_EPOCHS_METADATA block. We need to store the columns as list of strings and then arrays on int, float or str

So we need to add a new constant to mne.io.constants.FIFF for this new block. Then based on what @agramfort wrote, if you look at the existing Epochs I/O I think it will already make sense to you. There are pretty simple functions to open/close the new block, and write/read chunks / tags of data as necessary.

One thing I'm not sure about is how we're going to store the column header names. Currently in MNE we turn a list of strings into a colon-separated single string for writing (*_name_list). @agramfort do we require no : in the header titles, or do some sanitizing during I/O? (@choldgraf you can proceed with trying to implement the solution before this question is answered.)

@agramfort
Copy link
Member

agramfort commented Jul 31, 2017 via email

@choldgraf
Copy link
Contributor Author

would it break things if we used JSON?

In [16]: data.to_json()
Out[16]: '{"a":{"0":1,"1":3},"b":{"0":2,"1":4}}'

In [17]: pd.read_json(data.to_json())
Out[17]:
   a  b
0  1  2
1  3  4

@dengemann
Copy link
Member

dengemann commented Jul 31, 2017 via email

@larsoner
Copy link
Member

larsoner commented Jul 31, 2017 via email

@choldgraf
Copy link
Contributor Author

choldgraf commented Jul 31, 2017

so it seems like the event IDs are stored like this:

mapping_ = ';'.join([k + ':' + str(v) for k, v in
                         epochs.event_id.items()])

does that mean that I could just store the metadata by doing

mapping_ += ';EVENT_METADATA: %s' % self.metadata.to_json()

?

@dengemann
Copy link
Member

dengemann commented Jul 31, 2017 via email

@dengemann
Copy link
Member

dengemann commented Jul 31, 2017 via email

@choldgraf
Copy link
Contributor Author

we could do that...though it sounds significantly more complex. I'm not sure whether performance is something that won't be really noticed until people start doing stuff with really big dataframes. I just tried out the following:

# 1000 trials, 100 values, all floats
data = np.random.randn(1000, 100)
data = pd.DataFrame(data, columns=['a%s' % ii for ii in range(data.shape[-1])])
%timeit json = data.to_json(); pd.read_json(json)

10 loops, best of 3: 81.4 ms per loop

so that's not too bad, no? also way more metadata than one would usually use...

@agramfort
Copy link
Member

agramfort commented Jul 31, 2017 via email

@choldgraf
Copy link
Contributor Author

hmmm...ok, that'll add a fair amount of extra complexity to this PR then. Can you give me some pseudocode of how you imagine this working?

@larsoner
Copy link
Member

larsoner commented Jul 31, 2017 via email

@agramfort
Copy link
Member

agramfort commented Jul 31, 2017 via email

@@ -19,7 +19,7 @@ Current
Changelog
~~~~~~~~~

- Nothing yet
- Add support for metadata in :class:`mne.Epochs` by `Chris Holdgraf`_, `Jona Sassenhagen`_, and `Eric Larson`_
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add @agramfort in there as well...he and I mapped out the original prototypes for this at SciPy 2017!

@agramfort
Copy link
Member

agramfort commented Oct 19, 2017 via email

@jona-sassenhagen
Copy link
Contributor

jona-sassenhagen commented Oct 19, 2017 via email

@larsoner
Copy link
Member

I think this is what we expect it to look like:

https://6216-1301584-gh.circle-artifacts.com/0/tmp/circle-artifacts.Ykx8l1p/html/auto_tutorials/plot_metadata_epochs.html

@jona-sassenhagen you have some work to do on the plot_compare_evoked :)

@larsoner
Copy link
Member

(hopefully #4526 can follow closely behind this one)

@jona-sassenhagen
Copy link
Contributor

I'll be on it as soon as this is merged!

@larsoner larsoner merged commit 7ad946f into mne-tools:master Oct 19, 2017
@larsoner larsoner deleted the epochs_meta branch October 19, 2017 18:30
@larsoner larsoner restored the epochs_meta branch October 19, 2017 18:30
@choldgraf
Copy link
Contributor Author

@agramfort
Copy link
Member

agramfort commented Oct 19, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants