-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP/ENH: add repeated measures twoway anova function #580
Conversation
The significance threshold. | ||
correction : bool. | ||
The correction method to be employed. If True, sphericity | ||
correction will using the Greenhouse-Gyser method will be applied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Geyser
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and 'will' ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say "Geisser", rather. Sheesh, need to correct my own corrections...
I have been using some code of a collaborator that uses a two-way ANOVA with time as one of the "ways" to gain statistical power... I should be able to compare these two bits of code at some point. I'm happy to send the snippet to you if you'd like to :) |
Sure! Times should also work this this. The second factor may have multiple levels. |
See if this makes sense to you: |
Looks cool, but also looks like a slightly different approach / use case. I'd be happy to add it though. Maybe don't mix it with this PR? Unless you immediately see how to get all this done in one function ;-) |
I'm wondering if the approaches can be made equivalent somewhere... maybe I'll try using your code to see if I can get similar results. |
Her some minimum example: import numpy as np
from mne.stats.parametric import r_anova_twoway
data = np.random.random([18, 4, 100])
print r_anova_twoway(data, [2, 2], n_jobs=2) I think the idea of this one is slightly different. This one is a brute force mass univariate function that just goes through which ever chain of nsubj X condition matrices to compute interactions. But maybe we can combine it somehow, even if its just about the computation. |
The example should be helpful. So long as the function returns F values for each condition / interaction, we should be able to make them equal. I'll see what I can get. |
Yes, it's coming.
Yes, as it is in the moment it returns 3 F values for A, B, A:B for each of the observations (last dimension) |
@dengemann if I use your example but change |
certainly not.
yes, there's supposed to be a bug, the code from my gist is quite freshly refactored. Time for writing tests :-) |
... the count certainly will not support anything above 2 x 2. I'll have to revisit that more carefully. This was the result of trying to get rid of a weird binary counter function together with the iterator was producing the correct indices in the original code. |
Alright, let me know when the second factor (or first factor) supports multiple levels beyond 2, and I'll see how the two methods compare. |
@Eric89GXL this should now be safe enough to play with. Test added (passes) and coding bug fixed. |
Okay, I'll take a look and also update my gist... |
Damn it, wrong tab ... deleted my post. |
@Eric89GXL at least formally GG works. No tests at value levels so far. |
Comparison Gist updated, gotta run now... |
Cool, looks like the next thing to checkout. Something seems wrong with the meshgrid call ... |
... ok, I'm using an older numpy version ... |
So, here we go with a new repeated measures example that detects a potential interaction between stimulation modality and location in the beta band. |
Besides the open question of how to possibly unify eric's gist with this propsal, I think this is ready for a first review. If you feel inclined please pull and take a look at the example ;-) |
The number of levels per factor. | ||
alpha : float | ||
The significance threshold. | ||
correction : bool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No period.
@dengemann how consistent is the layout and use of this function with the |
Did you have glance at the TFR example posted, which uses Aud/Vis and L/R as the two ways? I think it would be straight forward to extend this to do exactly what you propose (actually I had exactly that mind as the next thing to add). |
for obs in data if data.ndim == 3 else data[None:, ]: | ||
this_fvals, this_epsilon = [], [] | ||
for (c1, c2, c3) in iter_contrasts: | ||
c_ = np.kron(sc[0][c1], sc[c3][c2]) # compute design matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you switch the order of these loops, then can compute c_
far fewer times. Also, once that switch is done, it might be possible to vectorize the code, which would speed it up and (on systems with MKL numpy/scipy) make the use of python-level paralellization unnecessary. Have you thought about this? Such optimizations become relevant once non-parametric cluster methods are considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is still incompletely refactored. I was on my way to see how far I can vectorize it but then payed more attention to get it actually working. E.g. you can see that the sy / cy are already computed outside this loop. So there should be no reason to keep this here as the result will always be the same. Let's do this and see what's the next opportunity for improving efficiency.
Using the TFR example would work, too. |
Why not have both ;-) I think both are worth it. The example you mentioned is actually the one that motivated this PR, so lat's add that too. |
tail = 1 # f-test, so tail > 0 | ||
n_permutations = 256 # Save some time (the test won't be too sensitive ...) | ||
T_obs, clusters, cluster_p_values, h0 = mne.stats.permutation_cluster_test( | ||
epochs_power, stat_fun=stat_fun, threshold=f_thresh, tail=tail, n_jobs=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here. n_jobs=1
for plot_cluster_stats_time_frequency_repeated_measures_anova you should play with the n_cycles especially increasing it with high frequencies. besides that's a great PR I resisted asking to make tests+examples faster though :) |
On 13.05.2013, at 22:55, Alexandre Gramfort notifications@github.com wrote:
good point, and thanks!
|
@dengemann once you've had a chance to make those changes and look at the cycles issue, let me know and I'll merge. |
I'll dispatch a message to seattle-merge-service before I go to sleep ;-) On 13.05.2013, at 23:09, Eric89GXL notifications@github.com wrote:
|
Yes, the lesser-known SMS acronym translation. |
Exactly! ^^ On 13.05.2013, at 23:13, Eric89GXL notifications@github.com wrote:
|
@Eric89GXL last chance: confine the analysis in the spatio-temporal example to the left hemisphere to save some time? |
Sounds reasonable to me. |
@Eric89GXL is there a quick way to comute the connectivity for only one hemisphere? It seems spatial_tris_connectivity expects the full source space .... |
Just take the one computed for the whole brain, and take the upper left part of the matrix. |
For the left<->left hemi connectivity, that's what it would be. Lower right would be right<->right. |
Yeah, was a tired / late indexing expression issue: |
- addresses discussion - lobotomizes right hemisphere in spaito-temporal example and optimzises parameters, we're now at 5 minutes on my air. - fixes plotting bug.
@Eric89GXL here's the SMS dispatch ;-) |
Don't bother Le 14 mai 2013 à 00:19, "Denis A. Engemann" notifications@github.com a écrit :
|
Great Le 14 mai 2013 à 01:11, "Denis A. Engemann" notifications@github.com a écrit :
|
... @agramfort / @Eric89GXL if we can merge this without rebasing (as the green button indicates) this would come in handy. just tried it and it's gonna be a hell due to renamed / deleted files, etc. |
What did you try to do that was difficult? Rebasing? |
Just an innocent git pull --rebase upstream master ... inducing 30 merge conflicts due to the many changes on this branch. That was too much for git to re apply them once more. |
I think your method of rebasing isn't what I usually use. I did:
And it worked just fine. I thought this was the correct way, but maybe yours is...? |
I /think/ my way tries to replay your commits on top of the current master, whereas yours tries to play master's commits on top of your work (creating problems). In any case, rebase was fine, so I'm merging now. |
ENH: Add repeated measures twoway anova function
This addresses #226 and some aspects of #535
Adapted from my gist:
https://gist.github.com/dengemann/5427106
which is a translation of MATLAB code by Rik Henson:
http://www.mrc-cbu.cam.ac.uk/people/rik.henson/personal/repanova.m
and to some lesser extend by Python code from pvttbl by Roger Lew.
http://code.google.com/p/pyvttbl/
While there is a new WIP PR in statsmodels related to this (statsmodels/statsmodels#786), this minimal version aims at supporting our (mass-univariate) use case.
Some features:
compute 1.000.000 repeated measures anovas for 18 subjects and
all three effects from a 2 x 2 design using 2 jobs). I might find a more efficient way to do this, but I think this is definitely start.
Current limitations are:
Tests and examples are coming, so please wait with strict reviewing.
For the API, however I could already need some feedback. I was wondering whether to add some iterator support for retrieving data matrices for the effects desired sequentially (assuming you don't want all three 20484 * e.g. 150 matrices at once when applied in a source space analysis). Another option would be to add some mini formula language allowing to request one effect e.g. using some R conventions: 'A' or 'B' for the main effects, 'A:B' for just the interaction or 'A*B' for the full model. I think both iteration and formula will be too much.