WIP/ENH: add repeated measures twoway anova function #580

dengemann · 2013-04-26T21:19:52Z

This addresses #226 and some aspects of #535

Adapted from my gist:
https://gist.github.com/dengemann/5427106

which is a translation of MATLAB code by Rik Henson:
http://www.mrc-cbu.cam.ac.uk/people/rik.henson/personal/repanova.m

and to some lesser extend by Python code from pvttbl by Roger Lew.
http://code.google.com/p/pyvttbl/

While there is a new WIP PR in statsmodels related to this (statsmodels/statsmodels#786), this minimal version aims at supporting our (mass-univariate) use case.

Some features:

supports joblib (on my 2011 macbook air it took me 7 minutes to
compute 1.000.000 repeated measures anovas for 18 subjects and
all three effects from a 2 x 2 design using 2 jobs). I might find a more efficient way to do this, but I think this is definitely start.
supports sphericity correction for factor levels > 2

Current limitations are:

to keep things simple, I constrained this function to only estimate models with 2 factors. This should make sense in an MEG context.
both factors are expected to be repeated, so no 'between-subject' effects.

Tests and examples are coming, so please wait with strict reviewing.
For the API, however I could already need some feedback. I was wondering whether to add some iterator support for retrieving data matrices for the effects desired sequentially (assuming you don't want all three 20484 * e.g. 150 matrices at once when applied in a source space analysis). Another option would be to add some mini formula language allowing to request one effect e.g. using some R conventions: 'A' or 'B' for the main effects, 'A:B' for just the interaction or 'A*B' for the full model. I think both iteration and formula will be too much.

larsoner · 2013-04-26T21:23:25Z

mne/stats/parametric.py

+        The significance threshold.
+    correction : bool.
+        The correction method to be employed. If True, sphericity
+        correction will using the Greenhouse-Gyser method will be applied


... and 'will' ;-)

I meant to say "Geisser", rather. Sheesh, need to correct my own corrections...

larsoner · 2013-04-26T21:29:52Z

I have been using some code of a collaborator that uses a two-way ANOVA with time as one of the "ways" to gain statistical power... I should be able to compare these two bits of code at some point. I'm happy to send the snippet to you if you'd like to :)

dengemann · 2013-04-26T21:31:36Z

I'm happy to send the snippet to you if you'd like to :)

Sure! Times should also work this this. The second factor may have multiple levels.

larsoner · 2013-04-26T21:37:36Z

See if this makes sense to you:

https://gist.github.com/Eric89GXL/1d59fc1085c0a4798318

dengemann · 2013-04-26T21:41:45Z

Looks cool, but also looks like a slightly different approach / use case. I'd be happy to add it though. Maybe don't mix it with this PR? Unless you immediately see how to get all this done in one function ;-)

larsoner · 2013-04-26T21:44:14Z

I'm wondering if the approaches can be made equivalent somewhere... maybe I'll try using your code to see if I can get similar results.

dengemann · 2013-04-26T21:48:59Z

Her some minimum example:

import numpy as np
from mne.stats.parametric import r_anova_twoway

data = np.random.random([18, 4, 100])
print r_anova_twoway(data, [2, 2], n_jobs=2)

I think the idea of this one is slightly different. This one is a brute force mass univariate function that just goes through which ever chain of nsubj X condition matrices to compute interactions. But maybe we can combine it somehow, even if its just about the computation.
If you're happy to wait I'll soon add a TFR example for a 2x2 interaction image.

larsoner · 2013-04-26T21:51:25Z

The example should be helpful. So long as the function returns F values for each condition / interaction, we should be able to make them equal. I'll see what I can get.

dengemann · 2013-04-26T21:55:16Z

The example should be helpful.

Yes, it's coming.

So long as the function returns F values for each condition / interaction, we should be able to make them equal.

Yes, as it is in the moment it returns 3 F values for A, B, A:B for each of the observations (last dimension)

larsoner · 2013-04-26T22:01:33Z

@dengemann if I use your example but change data = np.random.random([18, 6, 100]) and r_anova_twoway(data, [2, 3]) I get an error about matrices not being aligned. Am I being stupid with the input arguments? There should be a test for making sure the user doesn't do anything dumb with the condition counts.

dengemann · 2013-04-26T22:10:20Z

@dengemann if I use your example but change data = np.random.random([18, 6, 100]) and r_anova_twoway(data, [2, > 3]) I get an error about matrices not being aligned. Am I being stupid with the input arguments?

certainly not.

There should be a test for making sure the user doesn't do anything dumb with the condition counts.

yes, there's supposed to be a bug, the code from my gist is quite freshly refactored. Time for writing tests :-)

dengemann · 2013-04-26T22:15:09Z

... the count certainly will not support anything above 2 x 2. I'll have to revisit that more carefully. This was the result of trying to get rid of a weird binary counter function together with the iterator was producing the correct indices in the original code.

larsoner · 2013-04-26T22:19:44Z

Alright, let me know when the second factor (or first factor) supports multiple levels beyond 2, and I'll see how the two methods compare.

dengemann · 2013-04-26T23:37:57Z

@Eric89GXL this should now be safe enough to play with. Test added (passes) and coding bug fixed.

larsoner · 2013-04-26T23:41:04Z

Okay, I'll take a look and also update my gist...

dengemann · 2013-04-26T23:43:03Z

Damn it, wrong tab ... deleted my post.
But here we go: Cool. But beware of sphericity correction. That part is still buggy.

dengemann · 2013-04-26T23:49:20Z

@Eric89GXL at least formally GG works. No tests at value levels so far.

larsoner · 2013-04-27T00:16:25Z

Comparison Gist updated, gotta run now...

dengemann · 2013-04-27T00:32:07Z

Cool, looks like the next thing to checkout. Something seems wrong with the meshgrid call ...

dengemann · 2013-04-27T00:37:27Z

... ok, I'm using an older numpy version ...

dengemann · 2013-04-27T11:41:30Z

So, here we go with a new repeated measures example that detects a potential interaction between stimulation modality and location in the beta band.

dengemann · 2013-04-28T20:23:34Z

Besides the open question of how to possibly unify eric's gist with this propsal, I think this is ready for a first review. If you feel inclined please pull and take a look at the example ;-)

dengemann · 2013-04-29T16:56:20Z

@Eric89GXL so this is how the functions compare, actually t-values VS f-values.

larsoner · 2013-04-29T17:09:15Z

mne/stats/parametric.py

+        The number of levels per factor.
+    alpha : float
+        The significance threshold.
+    correction : bool.


larsoner · 2013-04-29T17:11:36Z

@dengemann how consistent is the layout and use of this function with the f_oneway function we already have? Would it be possible to add a 2-way within-subjects clustering example? The sample dataset has Aud/Vis and L/R, so those could be your two ways. It might be nice to see how this method could be used alongside clustering to look for interactions using a non-parametric statistical test. It would be really cool to have an example comparing the FDR-corrected parametric method (using this function) to the FWER-corrected non-parametric permutation test that uses this function for initial thresholding.

dengemann · 2013-04-29T17:15:11Z

The sample dataset has Aud/Vis and L/R, so those could be your two ways. It might be nice to see how this method could be used alongside clustering to look for interactions using a non-parametric statistical test. It would be really cool to have an example comparing the FDR-corrected parametric method (using this function) to the FWER-corrected non-parametric permutation test that uses this function for initial thresholding.

Did you have glance at the TFR example posted, which uses Aud/Vis and L/R as the two ways? I think it would be straight forward to extend this to do exactly what you propose (actually I had exactly that mind as the next thing to add).

larsoner · 2013-04-29T17:17:10Z

mne/stats/parametric.py

+    for obs in data if data.ndim == 3 else data[None:, ]:
+        this_fvals, this_epsilon = [], []
+        for (c1, c2, c3) in iter_contrasts:
+            c_ = np.kron(sc[0][c1], sc[c3][c2])  # compute design matrix


If you switch the order of these loops, then can compute c_ far fewer times. Also, once that switch is done, it might be possible to vectorize the code, which would speed it up and (on systems with MKL numpy/scipy) make the use of python-level paralellization unnecessary. Have you thought about this? Such optimizations become relevant once non-parametric cluster methods are considered.

Yes, I think this is still incompletely refactored. I was on my way to see how far I can vectorize it but then payed more attention to get it actually working. E.g. you can see that the sy / cy are already computed outside this loop. So there should be no reason to keep this here as the result will always be the same. Let's do this and see what's the next opportunity for improving efficiency.

larsoner · 2013-04-29T17:19:13Z

Using the TFR example would work, too.

dengemann · 2013-04-29T17:28:20Z

Using the TFR example would work, too.

Why not have both ;-) I think both are worth it. The example you mentioned is actually the one that motivated this PR, so lat's add that too.

agramfort · 2013-05-13T20:31:11Z

examples/stats/plot_cluster_stats_time_frequency_repeated_measures_anova.py

+tail = 1  # f-test, so tail > 0
+n_permutations = 256  # Save some time (the test won't be too sensitive ...)
+T_obs, clusters, cluster_p_values, h0 = mne.stats.permutation_cluster_test(
+    epochs_power, stat_fun=stat_fun, threshold=f_thresh, tail=tail, n_jobs=2,


same here. n_jobs=1

agramfort · 2013-05-13T20:55:31Z

for plot_cluster_stats_time_frequency_repeated_measures_anova you should play with the n_cycles especially increasing it with high frequencies. besides that's a great PR

I resisted asking to make tests+examples faster though :)

dengemann · 2013-05-13T21:06:33Z

On 13.05.2013, at 22:55, Alexandre Gramfort notifications@github.com wrote:

for plot_cluster_stats_time_frequency_repeated_measures_anova you should play with the n_cycles especially increasing it with high frequencies. besides that's a great PR

good point, and thanks!

I resisted asking to make tests+examples faster though :)

the tests should be quite fast. We could shrink the number of 'iter-cases' tested, if you think that it helps.
The examples and the implementation are already optimized .... as seen correctly. The good news is that for real-world scenarios this is quite fast. on my working computer for 18 subjects, 30 time bins and the full source space the clustering passes within 20 minutes. If you see something in the examples we can improve to make it faster (without over-simplifying things) don't hesitate to tell me.

—
Reply to this email directly or view it on GitHub.

larsoner · 2013-05-13T21:09:08Z

@dengemann once you've had a chance to make those changes and look at the cycles issue, let me know and I'll merge.

dengemann · 2013-05-13T21:10:46Z

I'll dispatch a message to seattle-merge-service before I go to sleep ;-)

On 13.05.2013, at 23:09, Eric89GXL notifications@github.com wrote:

@dengemann once you've had a chance to make those changes and look at the cycles issue, let me know and I'll merge.

—
Reply to this email directly or view it on GitHub.

larsoner · 2013-05-13T21:13:15Z

Yes, the lesser-known SMS acronym translation.

dengemann · 2013-05-13T21:18:02Z

Exactly! ^^

On 13.05.2013, at 23:13, Eric89GXL notifications@github.com wrote:

Yes, the lesser-known SMS acronym translation.

—
Reply to this email directly or view it on GitHub.

dengemann · 2013-05-13T21:54:33Z

@Eric89GXL last chance: confine the analysis in the spatio-temporal example to the left hemisphere to save some time?

larsoner · 2013-05-13T21:59:59Z

Sounds reasonable to me.

dengemann · 2013-05-13T22:19:38Z

@Eric89GXL is there a quick way to comute the connectivity for only one hemisphere? It seems spatial_tris_connectivity expects the full source space ....

larsoner · 2013-05-13T22:24:19Z

Just take the one computed for the whole brain, and take the upper left part of the matrix.

larsoner · 2013-05-13T22:24:52Z

For the left<->left hemi connectivity, that's what it would be. Lower right would be right<->right.

dengemann · 2013-05-13T22:26:39Z

Yeah, was a tired / late indexing expression issue:
connectivity = spatial_tris_connectivity(source_space[source_space[:, 0] < 10242]) does the job

- addresses discussion - lobotomizes right hemisphere in spaito-temporal example and optimzises parameters, we're now at 5 minutes on my air. - fixes plotting bug.

dengemann · 2013-05-13T23:11:11Z

@Eric89GXL here's the SMS dispatch ;-)

agramfort · 2013-05-13T23:20:32Z

Don't bother

Le 14 mai 2013 à 00:19, "Denis A. Engemann" notifications@github.com a écrit :

@Eric89GXL is there a quick way to comute the connectivity for only one hemisphere? It seems spatial_tris_connectivity expects the full source space ....

—
Reply to this email directly or view it on GitHub.

agramfort · 2013-05-13T23:20:59Z

Great

Le 14 mai 2013 à 01:11, "Denis A. Engemann" notifications@github.com a écrit :

@Eric89GXL here's the SMS dispatch ;-)

—
Reply to this email directly or view it on GitHub.

dengemann · 2013-05-13T23:31:02Z

... @agramfort / @Eric89GXL if we can merge this without rebasing (as the green button indicates) this would come in handy. just tried it and it's gonna be a hell due to renamed / deleted files, etc.

Cosmits.

larsoner · 2013-05-13T23:39:11Z

What did you try to do that was difficult? Rebasing?

dengemann · 2013-05-13T23:41:40Z

Just an innocent git pull --rebase upstream master ... inducing 30 merge conflicts due to the many changes on this branch. That was too much for git to re apply them once more.

larsoner · 2013-05-14T00:11:06Z

I think your method of rebasing isn't what I usually use. I did:

git checkout master
git fetch upstream
git pull upstream master
git fetch dengemann
git checkout -b rm_anova dengemann/rm_anova
git rebase master

And it worked just fine. I thought this was the correct way, but maybe yours is...?

larsoner · 2013-05-14T00:12:13Z

I /think/ my way tries to replay your commits on top of the current master, whereas yours tries to play master's commits on top of your work (creating problems). In any case, rebase was fine, so I'm merging now.

ENH: Add repeated measures twoway anova function

larsoner reviewed Apr 26, 2013
View reviewed changes

larsoner reviewed Apr 29, 2013
View reviewed changes

mne/stats/parametric.py

The number of levels per factor.

alpha : float

The significance threshold.

correction : bool.

Copy link

Member

larsoner Apr 29, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No period.

larsoner reviewed Apr 29, 2013
View reviewed changes

agramfort reviewed May 13, 2013
View reviewed changes

This was referenced May 13, 2013

2-Way Repeated Measures ANOVA + contrasts #226

Closed

ENH: get rid of unnecessary *args signatures in _permutation_cluster_test #589

Closed

ENH: add 3-way repeated measures anova function + other related options #590

Closed

ENH: speed up spatio-temporal example + misc

4021079

- addresses discussion - lobotomizes right hemisphere in spaito-temporal example and optimzises parameters, we're now at 5 minutes on my air. - fixes plotting bug.

Update plot_cluster_stats_spatio_temporal_repeated_measures_anova.py

620733a

Cosmits.

larsoner added a commit that referenced this pull request May 14, 2013

Merge pull request #580 from dengemann/rm_anova

3fe86a3

ENH: Add repeated measures twoway anova function

larsoner merged commit 3fe86a3 into mne-tools:master May 14, 2013

dengemann mentioned this pull request May 17, 2013

WIP: single class to represent multidimensional data #352

Closed

WIP/ENH: add repeated measures twoway anova function #580

WIP/ENH: add repeated measures twoway anova function #580

Conversation

dengemann commented Apr 26, 2013

larsoner Apr 26, 2013

Choose a reason for hiding this comment

dengemann Apr 26, 2013

Choose a reason for hiding this comment

larsoner Apr 26, 2013

Choose a reason for hiding this comment

larsoner commented Apr 26, 2013

dengemann commented Apr 26, 2013

larsoner commented Apr 26, 2013

dengemann commented Apr 26, 2013

larsoner commented Apr 26, 2013

dengemann commented Apr 26, 2013

larsoner commented Apr 26, 2013

dengemann commented Apr 26, 2013

larsoner commented Apr 26, 2013

dengemann commented Apr 26, 2013

dengemann commented Apr 26, 2013

larsoner commented Apr 26, 2013

dengemann commented Apr 26, 2013

larsoner commented Apr 26, 2013

dengemann commented Apr 26, 2013

dengemann commented Apr 26, 2013

larsoner commented Apr 27, 2013

dengemann commented Apr 27, 2013

dengemann commented Apr 27, 2013

dengemann commented Apr 27, 2013

dengemann commented Apr 28, 2013

dengemann commented Apr 29, 2013

larsoner Apr 29, 2013

Choose a reason for hiding this comment

larsoner commented Apr 29, 2013

dengemann commented Apr 29, 2013

larsoner Apr 29, 2013

Choose a reason for hiding this comment

dengemann Apr 29, 2013

Choose a reason for hiding this comment

larsoner commented Apr 29, 2013

dengemann commented Apr 29, 2013

agramfort May 13, 2013

Choose a reason for hiding this comment

agramfort commented May 13, 2013

dengemann commented May 13, 2013

larsoner commented May 13, 2013

dengemann commented May 13, 2013

larsoner commented May 13, 2013

dengemann commented May 13, 2013

dengemann commented May 13, 2013

larsoner commented May 13, 2013

dengemann commented May 13, 2013

larsoner commented May 13, 2013

larsoner commented May 13, 2013

dengemann commented May 13, 2013

dengemann commented May 13, 2013

agramfort commented May 13, 2013

agramfort commented May 13, 2013

dengemann commented May 13, 2013

larsoner commented May 13, 2013

dengemann commented May 13, 2013

larsoner commented May 14, 2013

larsoner commented May 14, 2013