Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: single class to represent multidimensional data #352

Closed
wants to merge 12 commits into from

Conversation

christianbrodbeck
Copy link
Member

To take up discussion from #309.

base on tmin, tstep, nsteps
- __repr__, indexing: only use name if not None
- subdata(): index dim with index (not arg)
- RM "properties" kwarg
use dimensons' dimrepr
Allow accessig single dimension data with a string
@christianbrodbeck
Copy link
Member Author

@agramfort

which is the origin of my worries about implementing a complex object
with named axis.

What exactly were your worries? I tried to keep NdVar as un-complex as possible (e.g., the numpy array is always directly accessible at the .x attribute, and accessed form there when NdVar operations are performed). Limiting ourselves to a few dimension types that are relevant for mne should also keep it simpler.

@mluessi
Copy link
Contributor

mluessi commented Jan 11, 2013

I appreciate the input but I have to agree with @agramfort; I feel that adding this adds a lot of complexity which is something we should avoid.

For M/EEG there are not that many different cases, which IMHO would justify the use of an NdVar object with named dimensions. Usually, the first dimension is space (sensor or source) and the second time. For time-freq we have a third dimension, which is frequency. Right now, it is very clear how the underlying data is organized, so a user could just do stc.data[10, :] to get a time course, whereas with the NdVar one would have to do something like much more complicated (I'm not sure how you would actually do that..).

In general, I think it is better to use e.g. a dict to hold stc's for different conditions, then you can e.g. use stc['left'], stc['right'] etc. and we can keep SourceEstimate simple.

@christianbrodbeck
Copy link
Member Author

For M/EEG there are not that many different cases, which IMHO would justify the use of an NdVar object with named dimensions.

There is sensor and source space data with time, frequency and trial/case, that already makes a large number of possible combinations. Something like the NdVar would make handling of these dimensions consistent. Currently for example when you do single trial stcs you just get a list of stcs, which I think is not very useful. IMO there should be a consistent way of handling those axes, even if not through an NdVar.

Right now, it is very clear how the underlying data is organized, so a user could just do stc.data[10, :] to get a time course, whereas with the NdVar one would have to do something like much more complicated (I'm not sure how you would actually do that..).

The user could in fact still use stc.x[10, :] to get an array if it is known that this object has a sensor by time arrangement. However, the user could also do y = stc.subdata(sensor='MEG 10') which would be more explicit, and return an object that still contains a time axis, i.e., a plotting function taking this into account could be as simple as plot_uts(y) and would have a formatted time axis.

In general, I think it is better to use e.g. a dict to hold stc's for different conditions, then you can e.g. use stc['left'], stc['right'] etc. and we can keep SourceEstimate simple.

With single trial stcs this is not a good option. Also, as I illustrated with the resample function, the NdVar like object would allow this function to be used on any kind of source estimates, including single trial stcs. I'm also using an uncorrected t-test between two conditions, for example; with the list/dict representation that would be rather complicated and involve rewriting for each data type. With ndvars, I can use a single function for objects with different dimensions.

Again, I think it would be good to have a more consistent way of handling axes, even if not through an NdVar objects. Do people agree on this?

@dengemann
Copy link
Member

Hi @christianmbrodbeck , thanks for preparing this. As impressed as I am by your NdVar object I'm somewhat reluctant supporting it for mne for similar reasons as brought forward by @agramfort and @mluessi. My main point is that it seems easier to me to convey to newcomers what we are doing here if we keep it more flat. Even if it might be less elegant than what you practice, I think its still easier to deal with. Who already has seen a nunmpy array will easily find his way though our data structures which are already enough to learn, I guess. Also it implies less object magic (@agramfort, wow, I'm getting old....). I agree that we should more explore the use of dicts to support labeled access and stuff. Where exactly do you see the biggest inconsistencies with reagrd to indexing and accessing data you would like to improve. How can we get there?

@christianbrodbeck
Copy link
Member Author

Where exactly do you see the biggest inconsistencies with reagrd to indexing and accessing data you would like to improve. How can we get there?

Here are some points I've been thinking about:

  • data objects should have a consistent indexing interface (currently, epochs support numpy indexing such that epochs[[1,5,6]] returns the corresponding trials, but after transforming epochs to single trial source estimates the stcs are contained in a list so that stcs[[1,5,6]] does not work anymore)
  • indexing should return objects that retain their data attributes and can be plotted without supplying additional arguments that the object could provide (e.g. plotting a topomap of a specific time point in an Evoked should be something like plot_topomap(Evoked[time]))
  • The option for meaningful slicing could make things more explicit (time=x instead of [:,:,x]) and simple (for example directly using a time value in seconds rather than first looking up the index of the desired time value in x.times ...)
  • It should be possible to plot objects with the same dimensions using the same functions: for example, a single topomap plotting function should plot a time slice through an evoked, a time slice through a single epoch, an average across a time window in an evoked, a PCA projection, ....
  • It would also be nice to do math operations with objects (e.g., do baseline correction by subtracting a topography object with several trials from an epochs object with the same number of trials)
  • Certain operations (e.g., filtering, element-wise t-test for cluster tests) could be defined so that they work for all objects with the required dimensional structure rather than defining them separately for different object types

@dengemann
Copy link
Member

data objects should have a consistent indexing interface (currently, epochs support numpy indexing such that epochs[[1,5,6]] returns the corresponding trials, but after transforming epochs to single trial source estimates the stcs are contained in a list so that stcs[[1,5,6]] does not work anymore)

I've been thinking a few times about an stc object for epochs in source space ... valid point.

indexing should return objects that retain their data attributes and can be plotted without supplying additional arguments that the object could provide (e.g. plotting a topomap of a specific time point in an Evoked should be something like plot_topomap(Evoked[time]))

Yes. This is presumably minor and very doable, and certainly a good idea.

The option for meaningful slicing could make things more explicit (time=x instead of [:,:,x]) and simple (for example directly using a time value in seconds rather than first looking up the index of the desired time value in x.times ...)

Mhmm. This does not sound like something that would look very pythonic using square brackets. Maybe getters could do the job.

It should be possible to plot objects with the same dimensions using the same functions: for example, a single topomap plotting function should plot a time slice through an evoked, a time slice through a single epoch, an average across a time window in an evoked, a PCA projection, ....

This does not sound like something we could get done by 0.6 wdyt? Too many internals to consider ...

It would also be nice to do math operations with objects (e.g., do baseline correction by subtracting a topography object with several trials from an epochs object with the same number of trials)

We do have support for math in epochs, evokeds and stcs ... But it seems that's not enough?

Certain operations (e.g., filtering, element-wise t-test for cluster tests) could be defined so that they work for all objects with the required dimensional structure rather than defining them separately for different object types

I think the clustering is not such a valid example here, it's quite abstract I would say. But I agree that the requirements could be made more explicit. But see #590 and the discussion on #580.
I think in many cases we already addressed this by adding the same functionality once with more context as a method and then as a function. Maybe things like this would be easier with RawBase et al. And sometimes it may be of advantage not to have a method on / work with each object ... E.g. fitting our ICA on an evoked objects might not be a good idea without certain pre-requirements met, or filtering evokeds, etc.

@christianbrodbeck
Copy link
Member Author

I've been thinking a few times about an stc object for epochs in source space ... valid point.

I wonder if it would be feasible to merge Epochs and Evoked objects into a single class that represents sensor space data (and allows multiple averages in an Evoked), and then make an analogous class for source estimates?

indexing should return objects that retain their data attributes and can be plotted without supplying additional arguments that the object could provide (e.g. plotting a topomap of a specific time point in an Evoked should be something like plot_topomap(Evoked[time]))

Yes. This is presumably minor and very doable, and certainly a good idea.

How would you approach that? My solution was that the objects keep information about dimensions -- i.e. a topography is like an Epochs object without the time dimension.

The option for meaningful slicing could make things more explicit (time=x instead of [:,:,x]) and simple (for example directly using a time value in seconds rather than first looking up the index of the desired time value in x.times ...)

Mhmm. This does not sound like something that would look very pythonic using square brackets. Maybe getters could do the job.

Agree, methods seem to be more natural for this than direct indexing

It would also be nice to do math operations with objects (e.g., do baseline correction by subtracting a topography object with several trials from an epochs object with the same number of trials)

We do have support for math in epochs, evokeds and stcs ... But it seems that's not enough?

What I had in mind was extending that math to objects with different dimensions, in the way that baseline correction is subtracting a topography that has no time dimension from an Epochs object. It's not a necessity, but I think it would be nice to be able to do something like evoked_blc = evoked - evoked.average(time=(None, 0)).

Certain operations (e.g., filtering, element-wise t-test for cluster tests) could be defined so that they work for all objects with the required dimensional structure rather than defining them separately for different object types

I think in many cases we already addressed this by adding the same functionality once with more context as a method and then as a function. ...

Yes that works too, just seems to require somewhat more maintenance especially if now new subclasses arise for which some methods might break...

@dengemann
Copy link
Member

I wonder if it would be feasible to merge Epochs and Evoked objects into a single class that represents sensor space data (and allows multiple averages in an Evoked), and then make an analogous class for source estimates?

sounds reasonable to me. But that would probably break APIs + user code ... @agramfort will love it.
For the stcs the case is maybe a bit different, here we don't have a proper object for representing epochs.
Mhm, we could create an analogy between reading epochs from disk and projecting epochs into source space, to allow for any lazy on-demand processing of source space single trials...
To me this sounds like we first would like to have an Eposch subclass for stcs.

maths, polymorphism, etc.

It sounds like this like we first get the evoked topo plot PR merged to be able to more concretely discuss these other points mentioned. The topo case might be a good model candidate to expose these points without scaring object-phobic people ;-)

@christianbrodbeck
Copy link
Member Author

sounds reasonable to me. But that would probably break APIs + user code ... @agramfort will love it.

haha yeah, but might be minimal if the new object's behavior depends on whether it has a "trial" dimension or not

lazy on-demand processing of source space single trials

That would be great, since source space data can be huge. What do you have in mind? An stc object that references Epochs in the same way that Epochs can reference a Raw?

maths, polymorphism, etc.

It sounds like this like we first get the evoked topo plot PR merged to be able to more concretely discuss these other points mentioned. The topo case might be a good model candidate to expose these points without scaring object-phobic people ;-)

Actually, working on the topomap PR was part of the reason I am coming back to this now ... with the ndvars I had a sensors dimension and could plot any time slice with a one liner, whereas for the PR I had to handle all those arguments again...

@dengemann
Copy link
Member

haha yeah, but might be minimal if the new object's behavior depends on whether it has a "trial" dimension or not

You mean just add a trial dimension to evoked? Mhm, but still the epochs api is a bit different from the evoked. I guess nothing prevents you from dummy-initializing an Epochs object and filling it wich 'trials' of evoked responses ...

An stc object that references Epochs in the same way that Epochs can reference a Raw?

Yes, exactly. A subclass of Epochs that takes an inverse operater on init and then computes the source trials on demand + a 'preload' that also allows us to compute the single-trial stcs in one stel and keep them in memory.

stc_epochs = SourceEstimatesEpochs(epochs=epochs, inverse_func=lcmv_epochs, preload=False)

for stc in stc_epochs:
    stc.data.max()

Mhm. but this is something we can already do without such a constructor. I think this would only be of additional value if we could clarify any Epochs levels methods that operate on the single trial stcs. Does that makes any sense? At least stc_epochs.average() would be one, e.g. in the TFR context. Second getitem would be cool so we can handle multiple conditions in source space. Wdyt? More pros to add? Any cons? | cc @mluessi @Eric89GXL @agramfort

Actually, working on the topomap PR was part of the reason I am coming back to this now ...

Than this seems like being your PR ;-)

@agramfort
Copy link
Member

sorry for getting back to this from the start

Here are some points I've been thinking about:

data objects should have a consistent indexing interface (currently, epochs support numpy indexing such that epochs[[1,5,6]] returns the corresponding trials, but after transforming epochs to single trial source estimates the stcs are contained in a list so that stcs[[1,5,6]] does not work anymore)

true and to make this work we need an object that is a list of stc. I would do:

apply_inverse_epochs(epochs[[1,5,6]], …)

or

[stcs[k] for k in [1, 5, 6]]

it's easy, requires no code and everybody can write and read it.

indexing should return objects that retain their data attributes and can be plotted without supplying additional arguments that the object could provide (e.g. plotting a topomap of a specific time point in an Evoked should be something like plot_topomap(Evoked[time]))

evoked.plot_topomap(time)

is what I have in mind. It's true that we could have plot_topomap that
works with some objects as soon as they respect an interface such as
having an info attribute, a data or get_data() and then slice the last
dim which is always time by convention so data[…, time_idx] should do
the job for 2d and 3d data.

The option for meaningful slicing could make things more explicit (time=x instead of [:,:,x]) and simple (for example directly using a time value in seconds rather than first looking up the index of the desired time value in x.times ...)

that's a debate if we want to use pythonic getitem or a custom method.
I was thinking that if we pass float we interpret it as time in
seconds or ints as time samples but maybe it's dangerous for users...

It should be possible to plot objects with the same dimensions using the same functions: for example, a single topomap plotting function should plot a time slice through an evoked, a time slice through a single epoch, an average across a time window in an evoked, a PCA projection, ....

let's agree on an interface and use duck typing.

It would also be nice to do math operations with objects (e.g., do baseline correction by subtracting a topography object with several trials from an epochs object with the same number of trials)

the difficulty with this is that you need to make sure that the same
epochs where dropped in both Epochs instances.

Certain operations (e.g., filtering, element-wise t-test for cluster tests) could be defined so that they work for all objects with the required dimensional structure rather than defining them separately for different object types

can you tell where you think we have bad code duplication in the stats
? most methods take ndarrays and the first dimension is always the
number of observations. But maybe I want to keep things too simple

@agramfort agramfort closed this Sep 25, 2013
@christianbrodbeck christianbrodbeck deleted the ndvar branch July 15, 2016 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants