-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
epochs event_id as dict and support for multiple marker numbers #133
Comments
Sounds easy enough. I can do it while I'm digging around in there anyway. Speaking of which, I'd like to add support for saving the standard errors, too, if it's not already in there. Did I just miss it somewhere? |
something like epochs.average(comment='auditory', stderr=True/False) would be nice |
very nice idea +1 for that On 05.10.2012, at 19:08, Eric89GXL notifications@github.com wrote:
|
glad you both like this idea. Regarding capturing the standard error what you want is a logger see for inspiration (or not if too complex) this PR on scikit-learn : scikit-learn/scikit-learn#1171 but we could do something simpler... |
I'm not sure what you mean by standard error---does it have something to do with the subject making errors? I just meant the standard deviation across epochs divided by the sqrt of the number of epochs for each point in time (to complement the average at each point in time for plotting purposes, generally)... |
i.e., the data that gets put in the evoked files by putting "stderr" in the .ave files in the C code. As you can see I'm a little hung up on the C code functionality :) |
#LOL by the way you probably know it's possible to compute the std in one pass: |
I did not know that. I wonder which one numpy.std uses. |
you can read the code : https://github.com/numpy/numpy :) |
lol too :-) no it was just related to a discussion alex and i had about smartening up the epochs objects. btw. -- somewhat related -- i'm about to finalize my ica WIP-PR soon. I was thinking about allowing to include the mixing / unmixing matrixes in the raw object similar like projs, so that you could toggle the raw data between ica and channel space. does it make sense to you? On 05.10.2012, at 19:22, Alexandre Gramfort notifications@github.com wrote:
|
I would suggest enhancing epochs with an ultimate goal of also capturing single trial properties. I have been working on some classes to represent events, which would provide a large range of functionalities. If people like this idea these classes could be used in one way or another to manage events. For the classes, look at an example or the documentation I have made so far. I have made a branch where I used these objects to implement basically the functionality you are asking for (there is an example script but you would have to have my package installed). The branch adds a I think such an event representation would be even more useful for loading events in the first place. This would make it easier to assign more than one label to each event, and specify e.g. multifactorial designs. Each experimental paradigm then would only need a single label_events function that would add labels based on IDs (example). The whole selection of events could then be made using labels rather than IDs. E.g.:: >>> import loader
>>> ds = loader.load_evts('xxxxx/MEG/sample/sample_audvis_raw.fif')
Read a total of 3 projection items:
PCA-v1 (1 x 102) idle
PCA-v2 (1 x 102) idle
PCA-v3 (1 x 102) idle
Adding average EEG reference projection.
Created an SSP operator (subspace dimension = 4)
4 projection items activated
>>> print ds[:10]
eventID i_start condition side modality
-----------------------------------------------
2 27977 RA R A
3 28345 LV L V
1 28771 LA L A
4 29219 RV R V
2 29652 RA R A
3 30025 LV L V
1 30450 LA L A
4 30839 RV R V
2 31240 RA R A
3 31665 LV L V
>>> ds = ds.subset(ds['modality'] == 'A')
>>> print ds[:10]
eventID i_start condition side modality
-----------------------------------------------
2 27977 RA R A
1 28771 LA L A
2 29652 RA R A
1 30450 LA L A
2 31240 RA R A
1 32101 LA L A
2 32935 RA R A
1 33712 LA L A
2 34532 RA R A
1 35428 LA L A
>>> And then epochs could be loaded just for the selected events. Minor comment: the |
While we're on the topic of re-coding events, it came to my attention a bit ago that having different numbers of trials in two conditions A and B will cause a bias when you estimate the difference in the magnitude of the activities in the two conditions (e.g., when using the magnitude of the dSPM or current values in the two conditions). For an intuitive sense, of this consider that your noise will go down as sqrt(N) with N trials, so if there are 4x the number of trials in condition A versus condition B, then you'll have half the noise amplitude in condition A---once you take the magnitude, if noise used to be zero-mean Gaussian it now has a folded normal distribution with a non-zero mean that biases condition B more than condition A. In any case, if we can build in a way to do trial-count equalization at some level to this code, it would also be great. Right now we do it all offline with list file I/O... |
wow, that sounds really interesting. http://patsy.readthedocs.org/en/latest/index.html maybe one could create some kind of interface to the stats-world via your approach / patsy / pandas. D On 05.10.2012, at 23:13, Christian Brodbeck notifications@github.com wrote:
|
good that you name it. On 05.10.2012, at 23:34, Eric89GXL notifications@github.com wrote:
|
I'm working on adding functionality to save stderr as well as means in evoked (including from epochs.average()). Looks like the best way to do it is to modify the guts of the Evoked object to support multiple data sets, i.e. make evoked.data a list. This most closely mirrors how the files are stored. While we could do multiple evoked objects per condition (or mean / stderr) you wanted to have, that makes for a bunch of unnecessary copies of channel data, etc. that we'd have to add conditions for, so I lean against that solution. From the bit of work I've done, it looks like we'd have to make the following things lists: evoked.data Other than that, we should be able to leave the structures as-is. Unfortunately this would break backward compatibility with people reading these fields directly, but it seems like the cleanest from a coding standpoint. What do people think? If backward compatibility is critical, I can instead work on instead improving read_evoked and write_evoked. Instead of just calling the Evoked() and evoked.save() functions, respectively, I can try to get them to do something intelligent to combine (or split) these calls when reading or writing the evoked files. That would get us closer to what the C code did, storing every item in one place, at least. |
In any case, it might make sense for now just to extend the functionality of read/write_evoked to allow reading and writing multiple event codes to FIF files in order to maintain backward compatibility. That's what I've implemented in PR 135, let me know what you think. |
+1 for that and +1 for denis' suggestion to find a way to expose epochs as objects that df = epochs.as_dataframe() ? |
and +1 for adding a method summary to Epochs to print things like eventID i_start condition2 27977 RA I am really open to suggestions. and print also in seconds rather that index numbers |
since a trigger is always one integer it would be better to do: event_id = {1:['left', 'auditory'], 2:['right', 'auditory'], 3:['left', 'visual'], 4:['right', 'visual']} @christianmbrodbeck would that suite your needs better? |
@ Data frames Actually i did something like that, it would looks like that (inside Epochs): def to_data_frame(self, frame=True):
"""Get the epochs as Pandas panel of data frames
Parameters
----------
frame : boolean
If frame, data frame will be returned with a hierarchical
epochs * time-slices index, else a panel object of
channels * time-slices data frames for each epoch.
Returns
-------
out : depending on arguments
data frame object or panel object
"""
import pandas as pa
data = self.get_data()
epoch_ids = ["Epoch %i" % (i + 1) for i in np.arange(data.shape[0])]
ret = pa.Panel(data=data, items=epoch_ids, major_axis=self.ch_names)
if frame:
ret = ret.swapaxes(0, 1).to_frame()
ret.index.names = ["epochs", "tsl"]
return ret
else:
ret.swapaxes(1, 2)
And here a minimum usage example: import mne
import numpy as np
from mne.fiff import Raw
from mne.datasets import sample
from pandas.stats.api import rolling_mean
from mne.datasets import sample
data_path = sample.data_path('examples/')
raw_fname = data_path + '/MEG/sample/sample_audvis_filt-0-40_raw.fif'
raw = Raw(raw_fname)
events = mne.find_events(raw, stim_channel='STI 014')
exclude = raw.info['bads'] + ['MEG 2443', 'EEG 053']
picks = mne.fiff.pick_types(raw.info, meg=True, eeg=True, eog=True, stim=False, exclude=exclude)
event_id = 1
tmin = -0.2
tmax = 0.5
baseline = (None, 0)
reject = dict(grad=4000e-13, mag=4e-12, eog=150e-6)
epochs = mne.Epochs(raw, events, event_id, tmin, tmax, proj=True, picks=picks,
baseline=baseline, preload=False, reject=reject)
epochs_df = epochs.to_data_frame()
meg_chs = [c for c in epochs.ch_names if c.startswith("MEG")]
#display some channels
epochs_df.ix[:, :10].head(20)
# split timeslices
grouped_tsl = epochs_df[meg_chs].groupby(level='tsl')
# then create a quick average plot
grouped_tsl.mean().plot(legend=0)
# or a trellis plot on a few channels
grouped_tsl.mean()[meg_chs[:10]].plot(subplots=1)
# use median instead
grouped_tsl.median().plot(legend=0)
# use custom numpy function
grouped_tsl.agg(np.std).plot(legend=0)
# average and then smooth using a rolling mean and finally plot in one sinfle line!
grouped_tsl.apply(lambda x: rolling_mean(x.mean(), 10)).plot(legend=0)
# apply different functio for channels
grouped_tsl.agg({"MEG 0113": np.mean, "MEG 0213": np.median})
# investigate epochs and create string table for dumping into file
grouped_epochs = epochs_df[meg_chs].groupby(level='epochs')
result_table = (grouped_epochs.max().ix[:, 1:3] * 1e15).to_string()
# investigate a specific channel's std across epochs
grouped_epochs.std()["MEG 0113"].plot()
grouped_tsl.agg(np.std).plot(legend=0) What do you think on that, worth a PR? |
@dengemann no I haven't come across patsy, thanks for pointing it out! I will have to have a closer look at it. I have come across Pandas, but I understood it's not made for higher dimensional data. For simplifying sensor analysis I've been using an additional "ndvar" class which I have not documented externally yet; The main idea is to have a numpy array that "knows" about its dimensions, which can be used in indexing and plotting.
Here are some examples with random data, and here is an example with the mne sample data. If that's of any use it would be great to integrate it with other packages. A limitation is representing data of from different types (e.g. gradiometer and magnetometer) together. However, I think source estimates could well be represented with ndvars, the relevant dimension being a source space. |
That would be somewhat more flexible but might get somewhat complicated with multiple factors. Also, what I mean by trial information is something like e.g. lexical frequency for words, which often cannot be encoded in the triggers because there are not enough values. Rather we'd have an external list with those values and would have to use that information together with the data. Having a proper data model for the epochs would provide a natural way to interact with that information. However, especially because of the .fif file format limitations (?), maybe it would make more sense to store the data model externally? The epochs object could restrict itself to representing the case index in some way (so that after rejecting epochs the corresponding to the model can be reconstructed) but epochs could of course also represent simple label codes for situations where a more complex label is unnecessary? I.e. someone with a simple model would not have to worry about the model representation? |
@christianmbrodbeck Hi chris, that looks very appealing, nice! the pandas folks are working on nd-dataframes. however, in fact the data frame already supports higher dimensional structures via hierarchical or multi index, that is a tuple based index. http://pandas.pydata.org/pandas-docs/stable/indexing.html also see the recent WIP #137 you can easily achieve what we are looking with pandas, e.g. subjects x conditions x trials x timeslices indices. so as for now pandas might be a good choice for doing analysis-related restructuring. we then could iself-pacedly catch up with the api step by step and see where to set the internal / foreign fuctionality border. does that make sense? D On 07.10.2012, at 17:55, Christian Brodbeck notifications@github.com wrote:
|
... Hi @agramfort @Eric89GXL @mluessi @christianmbrodbeck --- what did actually happen to this, thinking about future directions I 'sexing' / 'smartening' up the Epochs still is a nice one. Where are we at with this? |
not far... I guess we need a dedicated contributor to embrace the project ;) |
You damn pointer ;-) On Wed, Dec 5, 2012 at 2:46 PM, Alexandre Gramfort <notifications@github.com
|
i can only second that, each sentence. On 07.12.2012, at 23:49, Eric89GXL notifications@github.com wrote:
|
Sure. @agramfort, @christianmbrodbeck, @mluessi, speak now or forever hold your peace... |
Just to add a few thoughts, or just one, basically. Before the 0.5 release i am planning to refactor the pandas export. I think the dict features will make it straight forward to export proper design-tables to pandas / R -- so you could easily feedback stats to an mne session at a critical point, the epochs processing. the equalisation features would be nice to obtain equal cells expected by most procedures. On 07.12.2012, at 23:49, Eric89GXL notifications@github.com wrote:
|
I like the idea.. but I think we should try not to include a bunch new features right before the 0.5 release. Given that the idea is to have a release in less than 2 weeks, features that we include now will only receive minimal testing and we may end up having a release that is buggy. |
that's true too. so let's first make sure the new exciting dict feature does what we want and also there are still other thıngs pending / waiting... On 08.12.2012, at 00:11, Martin Luessi notifications@github.com wrote:
|
Makes sense. Are we going to split development between stable 0.5 (patching bugs and the like) and development (0.6dev) versions? That would prevent delays on 0.6 feature development. But it is possible just hold off on merging changes if that's easier to manage. |
@Eric89GXL Sounds useful, and makes me think of my earlier proposal to integrate my factors to manage events: with those factors you could just use an interaction (I used the
internally you could then use those factors:
|
@christianmbrodbeck that would be cool, too. I think supporting both modes would be convienent, like a "simple" versus "advanced" mode. Sorry for being dense, but what is the ('None', 'None') in that example...? |
i think that would be brilliant. On 08.12.2012, at 00:18, Christian Brodbeck notifications@github.com wrote:
|
@mluessi I realized it might be as simple as sending my pull request for this design into a 0.6dev fork (which would also get the commits from 0.5 as they roll in). Is that right? |
@Eric89GXL
On 'smiley' and 'button' trials, modality and side are = 'None' :), so
|
@Eric89GXL I am answering to your long text above +1 for 1/ for 2/ this is somehow related with the mne.merge_events functions. I am wondering if this logic should be done in Epochs or a priori working with the events array. just a remark: we should be careful not to have too clever objects. Think about how you would explain it to a new user. |
On 08.12.2012, at 10:34, Alexandre Gramfort notifications@github.com wrote:
right, forgot about merge events. so as it is now you could at any time merge events before you create a new epoch.
|
Maybe it's 5 lines of code so yes but it gets messy lets avoid too much complexity. |
@agramfort I do expect it only to be about 5 lines of code. I'll draft a PR (for 0.6) and you can see what you think. The advantage to having the object be smarter is that, the way I've talked about, you can load all the data exactly once (and save the epochs FIF), and then take a subset of them as required by the analysis. It should be substantially faster than having to re-load from raw every time. |
as i proposed, if the reluctance is about the fear to pollute our Epochs object we could write function that lives in mne just like merge_events On 08.12.2012, at 17:25, Eric89GXL notifications@github.com wrote:
|
this function then could do the remapping in a flexible fashion to also support factors, patsy and the other smart things from #133 so the object may remain a bit less smart and we can eat and have our cakes... On 08.12.2012, at 17:25, Eric89GXL notifications@github.com wrote:
|
Ahh, I see what you mean. I'll do that! |
Great! Let's think about how we do it in a way that is extendible, either |
…tuff to what's new. @chrstianmbrodbeck you also made contributions to 0.5, you should mention these. Added you to the dict as the rest of the #133 crew (hope no one is missing).
@dengemann, @agramfort, @christianmbrodbeck now that we've made some progress on this, my vote is to close this issue, and start a new, more clearly titled issue when someone formulates the next good potential extension (for 0.6, presumably). |
Good bye #133 and thanks for the fruitful as well as nice discussion we had. Let's continue this in subsequent issues and PRs. |
I'd like to be able to write
and evoked_auditory.comment would then contain "Evoked auditory" and not unknown as it is now.
cc/ @Eric89GXL @mluessi @christianmbrodbeck @dengemann if you want to give it a try...
The text was updated successfully, but these errors were encountered: