Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/ENH: add repeated measures twoway anova function #580

Merged
merged 24 commits into from
May 14, 2013

Conversation

dengemann
Copy link
Member

This addresses #226 and some aspects of #535

Adapted from my gist:
https://gist.github.com/dengemann/5427106

which is a translation of MATLAB code by Rik Henson:
http://www.mrc-cbu.cam.ac.uk/people/rik.henson/personal/repanova.m

and to some lesser extend by Python code from pvttbl by Roger Lew.
http://code.google.com/p/pyvttbl/

While there is a new WIP PR in statsmodels related to this (statsmodels/statsmodels#786), this minimal version aims at supporting our (mass-univariate) use case.

Some features:

  • supports joblib (on my 2011 macbook air it took me 7 minutes to
    compute 1.000.000 repeated measures anovas for 18 subjects and
    all three effects from a 2 x 2 design using 2 jobs). I might find a more efficient way to do this, but I think this is definitely start.
  • supports sphericity correction for factor levels > 2

Current limitations are:

  • to keep things simple, I constrained this function to only estimate models with 2 factors. This should make sense in an MEG context.
  • both factors are expected to be repeated, so no 'between-subject' effects.

Tests and examples are coming, so please wait with strict reviewing.
For the API, however I could already need some feedback. I was wondering whether to add some iterator support for retrieving data matrices for the effects desired sequentially (assuming you don't want all three 20484 * e.g. 150 matrices at once when applied in a source space analysis). Another option would be to add some mini formula language allowing to request one effect e.g. using some R conventions: 'A' or 'B' for the main effects, 'A:B' for just the interaction or 'A*B' for the full model. I think both iteration and formula will be too much.

The significance threshold.
correction : bool.
The correction method to be employed. If True, sphericity
correction will using the Greenhouse-Gyser method will be applied
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geyser

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and 'will' ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say "Geisser", rather. Sheesh, need to correct my own corrections...

@larsoner
Copy link
Member

I have been using some code of a collaborator that uses a two-way ANOVA with time as one of the "ways" to gain statistical power... I should be able to compare these two bits of code at some point. I'm happy to send the snippet to you if you'd like to :)

@dengemann
Copy link
Member Author

I'm happy to send the snippet to you if you'd like to :)

Sure! Times should also work this this. The second factor may have multiple levels.

@larsoner
Copy link
Member

See if this makes sense to you:

https://gist.github.com/Eric89GXL/1d59fc1085c0a4798318

@dengemann
Copy link
Member Author

Looks cool, but also looks like a slightly different approach / use case. I'd be happy to add it though. Maybe don't mix it with this PR? Unless you immediately see how to get all this done in one function ;-)

@larsoner
Copy link
Member

I'm wondering if the approaches can be made equivalent somewhere... maybe I'll try using your code to see if I can get similar results.

@dengemann
Copy link
Member Author

Her some minimum example:

import numpy as np
from mne.stats.parametric import r_anova_twoway

data = np.random.random([18, 4, 100])
print r_anova_twoway(data, [2, 2], n_jobs=2)

I think the idea of this one is slightly different. This one is a brute force mass univariate function that just goes through which ever chain of nsubj X condition matrices to compute interactions. But maybe we can combine it somehow, even if its just about the computation.
If you're happy to wait I'll soon add a TFR example for a 2x2 interaction image.

@larsoner
Copy link
Member

The example should be helpful. So long as the function returns F values for each condition / interaction, we should be able to make them equal. I'll see what I can get.

@dengemann
Copy link
Member Author

The example should be helpful.

Yes, it's coming.

So long as the function returns F values for each condition / interaction, we should be able to make them equal.

Yes, as it is in the moment it returns 3 F values for A, B, A:B for each of the observations (last dimension)

@larsoner
Copy link
Member

@dengemann if I use your example but change data = np.random.random([18, 6, 100]) and r_anova_twoway(data, [2, 3]) I get an error about matrices not being aligned. Am I being stupid with the input arguments? There should be a test for making sure the user doesn't do anything dumb with the condition counts.

@dengemann
Copy link
Member Author

@dengemann if I use your example but change data = np.random.random([18, 6, 100]) and r_anova_twoway(data, [2, > 3]) I get an error about matrices not being aligned. Am I being stupid with the input arguments?

certainly not.

There should be a test for making sure the user doesn't do anything dumb with the condition counts.

yes, there's supposed to be a bug, the code from my gist is quite freshly refactored. Time for writing tests :-)

@dengemann
Copy link
Member Author

... the count certainly will not support anything above 2 x 2. I'll have to revisit that more carefully. This was the result of trying to get rid of a weird binary counter function together with the iterator was producing the correct indices in the original code.

@larsoner
Copy link
Member

Alright, let me know when the second factor (or first factor) supports multiple levels beyond 2, and I'll see how the two methods compare.

@dengemann
Copy link
Member Author

@Eric89GXL this should now be safe enough to play with. Test added (passes) and coding bug fixed.

@larsoner
Copy link
Member

Okay, I'll take a look and also update my gist...

@dengemann
Copy link
Member Author

Damn it, wrong tab ... deleted my post.
But here we go: Cool. But beware of sphericity correction. That part is still buggy.

@dengemann
Copy link
Member Author

@Eric89GXL at least formally GG works. No tests at value levels so far.

@larsoner
Copy link
Member

Comparison Gist updated, gotta run now...

@dengemann
Copy link
Member Author

Cool, looks like the next thing to checkout. Something seems wrong with the meshgrid call ...

@dengemann
Copy link
Member Author

... ok, I'm using an older numpy version ...

@dengemann
Copy link
Member Author

So, here we go with a new repeated measures example that detects a potential interaction between stimulation modality and location in the beta band.

@dengemann
Copy link
Member Author

Besides the open question of how to possibly unify eric's gist with this propsal, I think this is ready for a first review. If you feel inclined please pull and take a look at the example ;-)

@dengemann
Copy link
Member Author

@Eric89GXL so this is how the functions compare, actually t-values VS f-values.

compare_anova

The number of levels per factor.
alpha : float
The significance threshold.
correction : bool.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No period.

@larsoner
Copy link
Member

@dengemann how consistent is the layout and use of this function with the f_oneway function we already have? Would it be possible to add a 2-way within-subjects clustering example? The sample dataset has Aud/Vis and L/R, so those could be your two ways. It might be nice to see how this method could be used alongside clustering to look for interactions using a non-parametric statistical test. It would be really cool to have an example comparing the FDR-corrected parametric method (using this function) to the FWER-corrected non-parametric permutation test that uses this function for initial thresholding.

@dengemann
Copy link
Member Author

The sample dataset has Aud/Vis and L/R, so those could be your two ways. It might be nice to see how this method could be used alongside clustering to look for interactions using a non-parametric statistical test. It would be really cool to have an example comparing the FDR-corrected parametric method (using this function) to the FWER-corrected non-parametric permutation test that uses this function for initial thresholding.

Did you have glance at the TFR example posted, which uses Aud/Vis and L/R as the two ways? I think it would be straight forward to extend this to do exactly what you propose (actually I had exactly that mind as the next thing to add).

for obs in data if data.ndim == 3 else data[None:, ]:
this_fvals, this_epsilon = [], []
for (c1, c2, c3) in iter_contrasts:
c_ = np.kron(sc[0][c1], sc[c3][c2]) # compute design matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you switch the order of these loops, then can compute c_ far fewer times. Also, once that switch is done, it might be possible to vectorize the code, which would speed it up and (on systems with MKL numpy/scipy) make the use of python-level paralellization unnecessary. Have you thought about this? Such optimizations become relevant once non-parametric cluster methods are considered.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is still incompletely refactored. I was on my way to see how far I can vectorize it but then payed more attention to get it actually working. E.g. you can see that the sy / cy are already computed outside this loop. So there should be no reason to keep this here as the result will always be the same. Let's do this and see what's the next opportunity for improving efficiency.

@larsoner
Copy link
Member

Using the TFR example would work, too.

@dengemann
Copy link
Member Author

Using the TFR example would work, too.

Why not have both ;-) I think both are worth it. The example you mentioned is actually the one that motivated this PR, so lat's add that too.

tail = 1 # f-test, so tail > 0
n_permutations = 256 # Save some time (the test won't be too sensitive ...)
T_obs, clusters, cluster_p_values, h0 = mne.stats.permutation_cluster_test(
epochs_power, stat_fun=stat_fun, threshold=f_thresh, tail=tail, n_jobs=2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. n_jobs=1

@agramfort
Copy link
Member

for plot_cluster_stats_time_frequency_repeated_measures_anova you should play with the n_cycles especially increasing it with high frequencies. besides that's a great PR

I resisted asking to make tests+examples faster though :)

@dengemann
Copy link
Member Author

On 13.05.2013, at 22:55, Alexandre Gramfort notifications@github.com wrote:

for plot_cluster_stats_time_frequency_repeated_measures_anova you should play with the n_cycles especially increasing it with high frequencies. besides that's a great PR

good point, and thanks!

I resisted asking to make tests+examples faster though :)

the tests should be quite fast. We could shrink the number of 'iter-cases' tested, if you think that it helps.
The examples and the implementation are already optimized .... as seen correctly. The good news is that for real-world scenarios this is quite fast. on my working computer for 18 subjects, 30 time bins and the full source space the clustering passes within 20 minutes. If you see something in the examples we can improve to make it faster (without over-simplifying things) don't hesitate to tell me.


Reply to this email directly or view it on GitHub.

@larsoner
Copy link
Member

@dengemann once you've had a chance to make those changes and look at the cycles issue, let me know and I'll merge.

@dengemann
Copy link
Member Author

I'll dispatch a message to seattle-merge-service before I go to sleep ;-)

On 13.05.2013, at 23:09, Eric89GXL notifications@github.com wrote:

@dengemann once you've had a chance to make those changes and look at the cycles issue, let me know and I'll merge.


Reply to this email directly or view it on GitHub.

@larsoner
Copy link
Member

Yes, the lesser-known SMS acronym translation.

@dengemann
Copy link
Member Author

Exactly! ^^

On 13.05.2013, at 23:13, Eric89GXL notifications@github.com wrote:

Yes, the lesser-known SMS acronym translation.


Reply to this email directly or view it on GitHub.

@dengemann
Copy link
Member Author

@Eric89GXL last chance: confine the analysis in the spatio-temporal example to the left hemisphere to save some time?

@larsoner
Copy link
Member

Sounds reasonable to me.

@dengemann
Copy link
Member Author

@Eric89GXL is there a quick way to comute the connectivity for only one hemisphere? It seems spatial_tris_connectivity expects the full source space ....

@larsoner
Copy link
Member

Just take the one computed for the whole brain, and take the upper left part of the matrix.

@larsoner
Copy link
Member

For the left<->left hemi connectivity, that's what it would be. Lower right would be right<->right.

@dengemann
Copy link
Member Author

Yeah, was a tired / late indexing expression issue:
connectivity = spatial_tris_connectivity(source_space[source_space[:, 0] < 10242]) does the job

- addresses discussion
- lobotomizes right hemisphere in spaito-temporal example and
optimzises parameters, we're now at 5 minutes on my air.
- fixes plotting bug.
@dengemann
Copy link
Member Author

@Eric89GXL here's the SMS dispatch ;-)

@agramfort
Copy link
Member

Don't bother

Le 14 mai 2013 à 00:19, "Denis A. Engemann" notifications@github.com a écrit :

@Eric89GXL is there a quick way to comute the connectivity for only one hemisphere? It seems spatial_tris_connectivity expects the full source space ....


Reply to this email directly or view it on GitHub.

@agramfort
Copy link
Member

Great

Le 14 mai 2013 à 01:11, "Denis A. Engemann" notifications@github.com a écrit :

@Eric89GXL here's the SMS dispatch ;-)


Reply to this email directly or view it on GitHub.

@dengemann
Copy link
Member Author

... @agramfort / @Eric89GXL if we can merge this without rebasing (as the green button indicates) this would come in handy. just tried it and it's gonna be a hell due to renamed / deleted files, etc.

@larsoner
Copy link
Member

What did you try to do that was difficult? Rebasing?

@dengemann
Copy link
Member Author

Just an innocent git pull --rebase upstream master ... inducing 30 merge conflicts due to the many changes on this branch. That was too much for git to re apply them once more.

@larsoner
Copy link
Member

I think your method of rebasing isn't what I usually use. I did:

git checkout master
git fetch upstream
git pull upstream master
git fetch dengemann
git checkout -b rm_anova dengemann/rm_anova
git rebase master

And it worked just fine. I thought this was the correct way, but maybe yours is...?

@larsoner
Copy link
Member

I /think/ my way tries to replay your commits on top of the current master, whereas yours tries to play master's commits on top of your work (creating problems). In any case, rebase was fine, so I'm merging now.

larsoner added a commit that referenced this pull request May 14, 2013
ENH: Add repeated measures twoway anova function
@larsoner larsoner merged commit 3fe86a3 into mne-tools:master May 14, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants