Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: correct way to do permutation clustering for multiple label time courses #1176

Closed
dengemann opened this issue Mar 10, 2014 · 30 comments
Closed

Comments

@dengemann
Copy link
Member

I'm wondering what would be the appropriate way to do clustering permutation stats with a bunch of time courses extracted from multiple labels. Currently I'm using 'permutation_cluster_test' with an f-test. However this can lead to pseudo-spatial clusters just because one label is put next to another label in the data ndarray. My hunch would be to pass a connectivity matrix that only connects temporal features but not the spatial ones. Any thoughts on that @christianmbrodbeck @agramfort

@mluessi
Copy link
Contributor

mluessi commented Mar 10, 2014

AFAIK you would have to construct a custom connectivity matrix that connects the columns but not rows and then pass it using the connectivity kwarg. Basically, if your data is size N x T, you would make a (N * T) x (N * T) sparse matrix and set the elements to one to define your neighbors. I need to think a minute for the precise structure.. it will be mostly band-diagonal.

@dengemann
Copy link
Member Author

@mluessi yes, that's where I'm stuck --- setting up the correct structure ....

@mluessi
Copy link
Contributor

mluessi commented Mar 10, 2014

Using a custom matrix will be slow (because in it can't use the optimized cluster finding code). I think in your case you could pass the data as a 1D signal, i.e., concatenate all rows together and add one-element zero values between the rows (to make sure there cant be clusters that span more than one row).

@dengemann
Copy link
Member Author

yeah, I though of that as well, the zero padding idea was missing though :-)

@dgwakeman
Copy link
Contributor

Might this be beneficial on the mne_analysis list, where others could follow?

@dengemann
Copy link
Member Author

Anyways, with 10 time courses performance should not be an issue. Would actually be nice to expose some custom connectivity matrix example in the -- to-be-started -- cookbook or so

@mluessi
Copy link
Contributor

mluessi commented Mar 10, 2014

Yes, I agree, that would be nice. I also agree with @dgwakeman, I think it is good to send general analysis related questions to the mailing list. This question is a good example of a question about an advanced feature available only in mne-python, so it would be a good idea to expose more users to it. At the same time, it is not a good idea to spam the mailing list ;).

@larsoner
Copy link
Member

Maybe a good compromise would be to post the original question to the list,
saying that you've opened an issue for further discussion with a link to
it. That way people only get one email, but they're still made aware of it.

@dengemann
Copy link
Member Author

Hi folks, another obvious solution to handling this to put empty time courses between the label time courses. An array of size (16, 10, 500) would then be (16, 20, 500). This should be equivalent of @mluessi proposal but has as advantage that the returned cluster indices are bool ndarray instead of slices. Subsequent visualization seems more straighforward to me this way, at least given my spatio-temporal viz code. I think it would be nice to properly support this use case, maybe by providing a temporal-only connectivity matrix.

@agramfort
Copy link
Member

the use case is also relevant for multi sensors without the spatial
connectivity between sensors.

see this very old example:

http://martinos.org/mne/dev/auto_examples/stats/plot_sensor_permutation_test.html#example-stats-plot-sensor-permutation-test-py

one could avoid the mean over the time axis

@dengemann
Copy link
Member Author

one could avoid the mean over the time axis

yes +1

@dengemann
Copy link
Member Author

So we would need to setup a conn matrix which comprises different blocks of features. The temporal features would be connected via the diagonal and diagonal + 1 (for the upper triangle). The spatial feature block would only have zeros, just as the spatio-temporal cross-section.
One then only needs to know which columns correspond to time points and which ones correspond to locations.

Does that make sense? @Eric89GXL @mluessi @agramfort

@dengemann
Copy link
Member Author

That being said, it would of course be nice to have a top-level function that allows you to set e.g. one type of connectivity {regular lattice, ...} selectively for one block of features, e.g., {frequency only, time only, space only}.

@dengemann
Copy link
Member Author

Probably it would also be nice to set the degree (1, 2, 3, neighbours ...)

@mluessi mluessi closed this as completed Mar 11, 2014
@mluessi mluessi reopened this Mar 11, 2014
@dengemann
Copy link
Member Author

So how would people think about a function that covers the creation of the most relevant connectivity matrices? It could be used internally.

@dengemann
Copy link
Member Author

Alternatively: if good sklearn / scipy utils should exist, it would be worth adding an example that covers this usecase: custom connectivity + multiple label time series.

@larsoner
Copy link
Member

I doubt those packages provide use cases. I'm +1 for adding relevant connectivity matrices like this.

@dengemann
Copy link
Member Author

So we would create a matrix of zeros with n_columns = n_features, e.g. n_times * n_locations.
The dimension that shall have a simple neighbor connectivity would then be filled with something like np.eye(n_sub_features, k=1). If all features shall have such a connectivity we could construct it like that: np.eye(n_features, k=1). Does that make sense?

@agramfort
Copy link
Member

we should start a list of todos for the sprint in june...

@dengemann
Copy link
Member Author

Ok folks, here's a first draft on a temporal only connectivity function:

from scipy import sparse

def get_temporal_connectivity(n_locations, n_times):
    """Create temporal connectivity matrix

    Useful for analyses of non neighouring labels.
    Note. It is assumed that time is the last dimension

    Parameters
    ----------
    n_locations : int
        the number of locations.
    n_times : int
        the number of time points.
    """
    n_features = n_locations + n_times
    c1 = sparse.eye(n_features, k=1)
    c1.data[:, :n_locations + 1] = 0
    c2 = sparse.eye(n_features, k=-1)
    c2.data[:, :n_locations] = 0
    return c1 + c2

@dengemann
Copy link
Member Author

Produces a matrix like this when called with n_locations=10 and n_times = 100

connectivity

@dengemann
Copy link
Member Author

Ok I think I now see what's wrong here (the dimensions...) I'm currently testing and will let you know once everything works with my use case.

@dengemann
Copy link
Member Author

Ok, here we go:

def get_temporal_connectivity(n_locations, n_times):
    """Create temporal connectivity matrix

    Useful for analyses of non neighouring labels.
    Note. It is assumed that time is the last dimension

    Parameters
    ----------
    n_locations : int
        the number of locations.
    n_times : int
        the number of time points.
    """
    n_features = n_locations * n_times
    connectivity = sparse.eye(n_features, k=1)
    connectivity.data[:, n_times - 1::n_times] = 0

    return connectivity

works on my machine ;-) any thoughts? shall we add this as a function or make an example?

@agramfort @mluessi @Eric89GXL

@dengemann
Copy link
Member Author

For generalization: a higher order use case would then be to disconnect labels while connecting times and frequencies for multiple TFR analyses. And so on.

@mluessi
Copy link
Contributor

mluessi commented Mar 12, 2014

Shouldn't it be n_features = n_locations * n_times? Also, I think you would have to insert zeros after every n_times elements, i.e,

n_features = n_locations * n_times
c1 = sparse.eye(n_features, k=1)
c1.data[:, n_times - 1::n_times] = 0
c2 = sparse.eye(n_features, k=-1)
c2.data[:, n_times - 1::n_times] = 0

This assumes that your data is originally a n_locations x n_times matrix and you flatten it by concatenating the rows. I may be wrong.. I only had a insufficiently strong coffee this morning ;).

@dengemann
Copy link
Member Author

Shouldn't it be n_features = n_locations * n_times?
Also, I think you would have to insert zeros after every n_times elements, i.e,

see updated code ...

@mluessi
Copy link
Contributor

mluessi commented Mar 12, 2014

lol.. I see you figured it out while I was writing my comment. The solution is always the same :), but you also need the -1 diagonal.

@dengemann
Copy link
Member Author

The solution is always the same :)

parallel discoveries ;-)

, but you also need the -1 diagonal.

... why actually? The documentation says only the upper triangle is uesed.

@mluessi
Copy link
Contributor

mluessi commented Mar 12, 2014

Oh.. I guess that makes sense :). You should be all set then. I wonder how using this custom matrix compares in terms of computation time to using the flattening trick we discussed earlier.. I assume that using the matrix will be much slower..

@dengemann
Copy link
Member Author

@mluessi for that use case it does not matter at all: 10 time courses ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants