Q: correct way to do permutation clustering for multiple label time courses #1176

dengemann · 2014-03-10T10:29:39Z

I'm wondering what would be the appropriate way to do clustering permutation stats with a bunch of time courses extracted from multiple labels. Currently I'm using 'permutation_cluster_test' with an f-test. However this can lead to pseudo-spatial clusters just because one label is put next to another label in the data ndarray. My hunch would be to pass a connectivity matrix that only connects temporal features but not the spatial ones. Any thoughts on that @christianmbrodbeck @agramfort

mluessi · 2014-03-10T13:57:09Z

AFAIK you would have to construct a custom connectivity matrix that connects the columns but not rows and then pass it using the connectivity kwarg. Basically, if your data is size N x T, you would make a (N * T) x (N * T) sparse matrix and set the elements to one to define your neighbors. I need to think a minute for the precise structure.. it will be mostly band-diagonal.

dengemann · 2014-03-10T14:02:23Z

@mluessi yes, that's where I'm stuck --- setting up the correct structure ....

mluessi · 2014-03-10T14:02:32Z

Using a custom matrix will be slow (because in it can't use the optimized cluster finding code). I think in your case you could pass the data as a 1D signal, i.e., concatenate all rows together and add one-element zero values between the rows (to make sure there cant be clusters that span more than one row).

dengemann · 2014-03-10T14:06:45Z

yeah, I though of that as well, the zero padding idea was missing though :-)

dgwakeman · 2014-03-10T14:08:10Z

Might this be beneficial on the mne_analysis list, where others could follow?

dengemann · 2014-03-10T14:08:35Z

Anyways, with 10 time courses performance should not be an issue. Would actually be nice to expose some custom connectivity matrix example in the -- to-be-started -- cookbook or so

mluessi · 2014-03-10T14:18:31Z

Yes, I agree, that would be nice. I also agree with @dgwakeman, I think it is good to send general analysis related questions to the mailing list. This question is a good example of a question about an advanced feature available only in mne-python, so it would be a good idea to expose more users to it. At the same time, it is not a good idea to spam the mailing list ;).

larsoner · 2014-03-10T14:22:46Z

Maybe a good compromise would be to post the original question to the list,
saying that you've opened an issue for further discussion with a link to
it. That way people only get one email, but they're still made aware of it.

dengemann · 2014-03-11T14:26:32Z

Hi folks, another obvious solution to handling this to put empty time courses between the label time courses. An array of size (16, 10, 500) would then be (16, 20, 500). This should be equivalent of @mluessi proposal but has as advantage that the returned cluster indices are bool ndarray instead of slices. Subsequent visualization seems more straighforward to me this way, at least given my spatio-temporal viz code. I think it would be nice to properly support this use case, maybe by providing a temporal-only connectivity matrix.

agramfort · 2014-03-11T14:34:37Z

the use case is also relevant for multi sensors without the spatial
connectivity between sensors.

see this very old example:

http://martinos.org/mne/dev/auto_examples/stats/plot_sensor_permutation_test.html#example-stats-plot-sensor-permutation-test-py

one could avoid the mean over the time axis

dengemann · 2014-03-11T14:35:22Z

one could avoid the mean over the time axis

yes +1

dengemann · 2014-03-11T16:09:24Z

So we would need to setup a conn matrix which comprises different blocks of features. The temporal features would be connected via the diagonal and diagonal + 1 (for the upper triangle). The spatial feature block would only have zeros, just as the spatio-temporal cross-section.
One then only needs to know which columns correspond to time points and which ones correspond to locations.

Does that make sense? @Eric89GXL @mluessi @agramfort

dengemann · 2014-03-11T16:11:35Z

That being said, it would of course be nice to have a top-level function that allows you to set e.g. one type of connectivity {regular lattice, ...} selectively for one block of features, e.g., {frequency only, time only, space only}.

dengemann · 2014-03-11T16:15:18Z

Probably it would also be nice to set the degree (1, 2, 3, neighbours ...)

dengemann · 2014-03-11T16:21:12Z

So how would people think about a function that covers the creation of the most relevant connectivity matrices? It could be used internally.

dengemann · 2014-03-11T16:24:19Z

Alternatively: if good sklearn / scipy utils should exist, it would be worth adding an example that covers this usecase: custom connectivity + multiple label time series.

larsoner · 2014-03-11T16:36:55Z

I doubt those packages provide use cases. I'm +1 for adding relevant connectivity matrices like this.

dengemann · 2014-03-11T16:40:59Z

So we would create a matrix of zeros with n_columns = n_features, e.g. n_times * n_locations.
The dimension that shall have a simple neighbor connectivity would then be filled with something like np.eye(n_sub_features, k=1). If all features shall have such a connectivity we could construct it like that: np.eye(n_features, k=1). Does that make sense?

agramfort · 2014-03-11T17:23:12Z

we should start a list of todos for the sprint in june...

dengemann · 2014-03-12T10:49:41Z

Ok folks, here's a first draft on a temporal only connectivity function:

from scipy import sparse

def get_temporal_connectivity(n_locations, n_times):
    """Create temporal connectivity matrix

    Useful for analyses of non neighouring labels.
    Note. It is assumed that time is the last dimension

    Parameters
    ----------
    n_locations : int
        the number of locations.
    n_times : int
        the number of time points.
    """
    n_features = n_locations + n_times
    c1 = sparse.eye(n_features, k=1)
    c1.data[:, :n_locations + 1] = 0
    c2 = sparse.eye(n_features, k=-1)
    c2.data[:, :n_locations] = 0
    return c1 + c2

dengemann · 2014-03-12T10:50:22Z

Produces a matrix like this when called with n_locations=10 and n_times = 100

dengemann · 2014-03-12T12:20:51Z

Ok I think I now see what's wrong here (the dimensions...) I'm currently testing and will let you know once everything works with my use case.

dengemann · 2014-03-12T12:40:15Z

Ok, here we go:

def get_temporal_connectivity(n_locations, n_times):
    """Create temporal connectivity matrix

    Useful for analyses of non neighouring labels.
    Note. It is assumed that time is the last dimension

    Parameters
    ----------
    n_locations : int
        the number of locations.
    n_times : int
        the number of time points.
    """
    n_features = n_locations * n_times
    connectivity = sparse.eye(n_features, k=1)
    connectivity.data[:, n_times - 1::n_times] = 0

    return connectivity

works on my machine ;-) any thoughts? shall we add this as a function or make an example?

@agramfort @mluessi @Eric89GXL

dengemann · 2014-03-12T12:41:51Z

For generalization: a higher order use case would then be to disconnect labels while connecting times and frequencies for multiple TFR analyses. And so on.

mluessi · 2014-03-12T12:46:12Z

Shouldn't it be n_features = n_locations * n_times? Also, I think you would have to insert zeros after every n_times elements, i.e,

n_features = n_locations * n_times
c1 = sparse.eye(n_features, k=1)
c1.data[:, n_times - 1::n_times] = 0
c2 = sparse.eye(n_features, k=-1)
c2.data[:, n_times - 1::n_times] = 0

This assumes that your data is originally a n_locations x n_times matrix and you flatten it by concatenating the rows. I may be wrong.. I only had a insufficiently strong coffee this morning ;).

dengemann · 2014-03-12T12:48:31Z

Shouldn't it be n_features = n_locations * n_times?
Also, I think you would have to insert zeros after every n_times elements, i.e,

see updated code ...

mluessi · 2014-03-12T12:50:22Z

lol.. I see you figured it out while I was writing my comment. The solution is always the same :), but you also need the -1 diagonal.

dengemann · 2014-03-12T12:52:03Z

The solution is always the same :)

parallel discoveries ;-)

, but you also need the -1 diagonal.

... why actually? The documentation says only the upper triangle is uesed.

mluessi · 2014-03-12T13:06:04Z

Oh.. I guess that makes sense :). You should be all set then. I wonder how using this custom matrix compares in terms of computation time to using the flattening trick we discussed earlier.. I assume that using the matrix will be much slower..

dengemann · 2014-03-12T13:08:46Z

@mluessi for that use case it does not matter at all: 10 time courses ;-)

dengemann added the QUESTION label Mar 10, 2014

mluessi closed this as completed Mar 11, 2014

mluessi reopened this Mar 11, 2014

dengemann closed this as completed Nov 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q: correct way to do permutation clustering for multiple label time courses #1176

Q: correct way to do permutation clustering for multiple label time courses #1176

dengemann commented Mar 10, 2014

mluessi commented Mar 10, 2014

dengemann commented Mar 10, 2014

mluessi commented Mar 10, 2014

dengemann commented Mar 10, 2014

dgwakeman commented Mar 10, 2014

dengemann commented Mar 10, 2014

mluessi commented Mar 10, 2014

larsoner commented Mar 10, 2014

dengemann commented Mar 11, 2014

agramfort commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

larsoner commented Mar 11, 2014

dengemann commented Mar 11, 2014

agramfort commented Mar 11, 2014

dengemann commented Mar 12, 2014

dengemann commented Mar 12, 2014

dengemann commented Mar 12, 2014

dengemann commented Mar 12, 2014

dengemann commented Mar 12, 2014

mluessi commented Mar 12, 2014

dengemann commented Mar 12, 2014

mluessi commented Mar 12, 2014

dengemann commented Mar 12, 2014

mluessi commented Mar 12, 2014

dengemann commented Mar 12, 2014

Q: correct way to do permutation clustering for multiple label time courses #1176

Q: correct way to do permutation clustering for multiple label time courses #1176

Comments

dengemann commented Mar 10, 2014

mluessi commented Mar 10, 2014

dengemann commented Mar 10, 2014

mluessi commented Mar 10, 2014

dengemann commented Mar 10, 2014

dgwakeman commented Mar 10, 2014

dengemann commented Mar 10, 2014

mluessi commented Mar 10, 2014

larsoner commented Mar 10, 2014

dengemann commented Mar 11, 2014

agramfort commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

dengemann commented Mar 11, 2014

larsoner commented Mar 11, 2014

dengemann commented Mar 11, 2014

agramfort commented Mar 11, 2014

dengemann commented Mar 12, 2014

dengemann commented Mar 12, 2014

dengemann commented Mar 12, 2014

dengemann commented Mar 12, 2014

dengemann commented Mar 12, 2014

mluessi commented Mar 12, 2014

dengemann commented Mar 12, 2014

mluessi commented Mar 12, 2014

dengemann commented Mar 12, 2014

mluessi commented Mar 12, 2014

dengemann commented Mar 12, 2014