Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Speed of mean.difference and test statistics in general #6

Open
jwbowers opened this Issue Jul 12, 2011 · 0 comments

Comments

Projects
None yet
1 participant
Collaborator

jwbowers commented Jul 12, 2011

So, I've discovered that mean.difference is much slower than mann.whitney.u:

> system.time(replicate(1000,mean.difference(R,Z,B)))
   user  system elapsed 
  2.909   0.013   2.921 
> system.time(replicate(1000,mann.whitney.u(R,Z,B)))
   user  system elapsed 
  0.073   0.001   0.074 

Part of the issue is the fact that there is some preprocessing of the data for blocks:

> system.time(replicate(1000,paired.sgnrank.sum(R,Z,B)))
   user  system elapsed 
  1.484   0.004   1.489 

But not all of the difference is there. Here are a couple of ideas:

mean.diff.lsfit<-function(ys,z,blocks){ ##Try using something that calls compiled code
  ##Gives same answer as mean.difference for balanced blocks and should be like harmonic.mean.difference for unbalanced blocks.
  lsfit(x=model.matrix(ys~z+blocks),y=ys,intercept=FALSE)[["coefficients"]][["z"]]
}

> system.time(replicate(1000,mean.diff.lsfit(R,Z,B)))
   user  system elapsed 
  1.793   0.004   1.797 

mean.diff.vect<-function(ys,z,blocks){
 X<-model.matrix(ys~z+blocks)
 solve(qr(X, LAPACK=TRUE), ys)[2] ## qr.coef(qr(X,LAPACK=TRUE),ys) ## to handle near singular X
}

> system.time(replicate(1000,mean.diff.vect(R,Z,B)))
   user  system elapsed 
  1.741   0.001   1.742

I suspect that as long as we allow blocks to be a factor and use model.matrix, we may not get much more speed. Any ideas welcome, of course, since this is the function that we are calling lots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment