Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed two-sample permutation testing (# 323) #347

Closed
wants to merge 3 commits into from

Conversation

bthirion
Copy link
Contributor

This is supposed to solve the issue #323.
As pointed out rightly, the code was not doing what it was supposed to do.

Note again that I don't to put any more efforts in it. It is very low quality anyhow.

@matthew-brett
Copy link
Member

Thanks - can you think of a test?

@jwirsich - can you somehow give us the results of your test against the PERM r-package so we can make sure the results continue to be correct?

@bthirion
Copy link
Contributor Author

The test is working (and travis passed: i think that there is a clash between different travis runs ?).

@jwirsich
Copy link

jwirsich commented Sep 2, 2015

I can give you here the R-code I used to test from a matlab variable containing 2 n x m arrays x and y, though I did not test for tmax correction.

library('perm')
library(R.matlab)  
data <- readMat(path)
x1 <- as.matrix(data$x)
y1 <- as.matrix(data$y)
for(i in 1:length(x1[1,])) {
    DV <- c(x1[,i], y1[,i])
    IV <- factor(rep(c("A", "B"), c(length(x1[,i]), length(y1[,i]))))
    valuer = permTS(DV~IV, alternative="greater", method="exact.mc", control=permControl(nmc=10^3-1))$estimate

    if (valuer < 0.05) {
      print(paste(i, ' ' , valuer, sep = ''))
      }
}

@matthew-brett
Copy link
Member

@jwirsich - thank you - that's very helpful. Sorry to punish your kindness like this, but would you consider sending the x, y vector you used, say saved as a matlab .mat file?

@matthew-brett
Copy link
Member

OK - sorry about this - I am afraid I don't understand what I am doing. But, here are my scripts:

# write_perm_data.py
import numpy as np
import scipy.io as sio

n = 1000
m = 30
rng = np.random.RandomState(42)
x = rng.normal(size=(n, m))
y = rng.normal(size=(n, m))
sio.savemat('xy.mat', dict(x=x, y=y))

and

# estimate_perm_data.R
library('perm')
library(R.matlab)

path = 'xy.mat'
data <- readMat(path)
x1 <- as.matrix(data$x)
y1 <- as.matrix(data$y)
estimates = rep(NA, length(x1[1,]))
for(i in 1:length(x1[1,])) {
    DV <- c(x1[,i], y1[,i])
    IV <- factor(rep(c("A", "B"), c(length(x1[,i]), length(y1[,i]))))
    valuer = permTS(DV~IV,
                    alternative="greater",
                    method="exact.mc",
                    control=permControl(nmc=10^3-1))$estimate
    estimates[i] = valuer
}
writeMat('xy_estimates.mat', estimates=estimates)

and

# check_perm_data.py
import numpy as np
import scipy.io as sio
import nipy.labs.group.permutation_test as pt

matfile = sio.loadmat('xy.mat')
x, y = matfile['x'], matfile['y']

P = pt.permutation_test_twosample(x, y, None, ndraws=100-1)

estimates = sio.loadmat('xy_estimates.mat')['estimates']

print(np.abs(estimates - P.Tvalues))

I run these in IPython:

In [1]: run write_perm_data.py
In [2]: !Rscript estimate_perm_data.R
R.matlab v3.2.0 (2015-02-24) successfully loaded. See ?R.matlab for help.
Attaching package: ‘R.matlab’

The following objects are masked from ‘package:base’:
    getOption, isOpen

In [3]: run check_perm_data.py
[  1.61345199e-04   3.66017869e-05   1.14590729e-05   2.62798356e-04
   9.42071930e-04   3.97339359e-05   3.43087771e-04   6.82364911e-05
   1.06188055e-03   5.40881411e-04   1.33743680e-03   8.58267197e-04
   7.61348354e-05   1.55652935e-04   6.67128049e-04   2.51306563e-04
   5.30961087e-05   1.03552428e-04   1.18789211e-04   5.39055155e-04
   4.89608611e-04   6.62367729e-05   6.13974412e-04   1.76433731e-04
   1.06311059e-05   3.57039125e-04   3.60571430e-05   3.22796150e-04
   1.40873196e-05   2.46184657e-04]

This is comparing the R estimate, with the two-sample test Tvalues. Is that the right comparison? Should these be more similar?

@jwirsich
Copy link

jwirsich commented Sep 9, 2015

Sorry for the late answer. The test-case looks fine but the r-code uses 1000 permutation the python code seems to use only 100 draws: you might want to change that. Also the r-code uses a right sided t-test (alternative="greater"). You can change the parameter to (alternative="two.sided") to get the twosided estimates.

@matthew-brett
Copy link
Member

Thanks for the feedback. Actually the number of draws doesn't change the values output permutation_test_twosample - anything from ndraws=1 to ndraws=10000 gives the same exact estimates.

The same seems to be true for the nmc value and the alternative value in the R script.

Bertrand - do you have any comment on whether this test is valid?

@bthirion
Copy link
Contributor Author

As far as I can see, this looks right. At least, the relative error is small.

@matthew-brett
Copy link
Member

Bertrand - is it possible these tests I've done aren't actually testing the permutation?

@bthirion
Copy link
Contributor Author

Indeed, you need to invoke the calibrate method in order to run the permutation test. The problem is that the complete module is really bad, so I don't think that this additional test would help a lot.

@matthew-brett
Copy link
Member

Bertrand - are you saying that you aren't confident that the results are correct in general?

@bthirion
Copy link
Contributor Author

Note exactly: I have added some tests in this PR, that confirm that, roughly speaking, the permutation p-values are correct. So, I don't see why they should be wrong.

I was simply saying that i) your test does not check for permutation p-values ii) the permutation_test code is really bad and would need a complete redesign, which I think nobody is going to do, since we have decided to stop the development of nipy.

In concrete terms, I don't think that the comparison of p-values with those obtained with R is necessary.

@matthew-brett
Copy link
Member

I completely agree that my current tests are not testing the right thing.

On the other hand, it would be a shame if we released this version of nipy with code that we thought might give the wrong answer. So it seems it would be worth checking that it is at least roughly right. Comparing against the R package seems like one way of doing that, and doesn't seem like it would be too much work, for someone who understood what the code was doing.

@matthew-brett
Copy link
Member

OK - @jwirsich - unless you'd like to add a test here (that would be great) I'll merge this soon.

@jwirsich
Copy link

I haven't checked the tmax correction. But I am not sure if it's worth putting more work into this so no problems to merge soon.

@matthew-brett
Copy link
Member

Thanks for the feedback - rebased and merged as 556ec73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants