Conversation
|
I got the following failure in a test: Error ('test-aldex-mem.R:37:3'): aldex mem correct naming
Error ('test-aldex-mem.R:57:3'): aldex mem correct mean estimate
[ FAIL 2 | WARN 0 | SKIP 0 | PASS 40 ] Also, will leave some comments on the code. |
R/aldex-mem.R
Outdated
| } | ||
|
|
||
| ## Process Cols (i.e., per-taxa) | ||
| results <- lapply_func(1:ncol(logWm), function(j) { |
There was a problem hiding this comment.
Is it just me or is this function actually copying the entire logWm object to each worker.. I suspect this is going to be slow and very memory inefficient. Rather than just sending each column to each worker. I think the later could be done by splitting logWm into a list with each element in the list being a column of the current array. This would drastically reduce memory and likely improve spead as that copy must be slow.
There was a problem hiding this comment.
Sure I'll do that
|
|
||
| ## Random Effects | ||
| re_df <- as.data.frame(lme4::VarCorr(fit)) | ||
| re_df$var1[is.na(re_df$var1)] <- "(Intercept)" |
There was a problem hiding this comment.
handling NAs in random effcts
R/aldex-mem.R
Outdated
| ## To array, get names, build return list | ||
| P <- nrow(results[[1]]$fixed_coefs) | ||
| Pr <- ncol(results[[1]]$random_coefs) | ||
| fixed_arr <- do.call(rbind, lapply(results, function(x) x$fixed_coefs)) |
There was a problem hiding this comment.
could you preallocate the arrays, likely far more efficient than repeated rbinds. rbind is super inefficient in R (when applied like this). Unless do.call is doing some magic behind the scenes.
There was a problem hiding this comment.
Yeah I think so
|
|
||
| ## Fixed Effects | ||
| coefs <- coef(summary(fit))[,1:3] | ||
| coefs <- cbind(coefs, pt(coefs[,1]/coefs[,2], coefs[,3], lower.tail=T)) |
There was a problem hiding this comment.
whats happening here with binding what looks like p-values in?
There was a problem hiding this comment.
Upper and lower p value calculation. default p-value returned is 2-sided which is not what we want
|
Huh the unittests work for me, even after pulling repo in new branch... I think I need to add clusterEvalQ, the parallel workers must not be getting the libraries passed into them automatically. |
|
Ok updated, try unit tests again now |
| results <- lapply_func(logWm_list, function(y) { | ||
| data_temp <- data | ||
| data_temp$y <- y | ||
| fit <- suppressMessages(lmerTest::lmer(update(formula, y~.), data=data_temp)) |
There was a problem hiding this comment.
Are you sure you are using lme4 and not the much slower lmer? I notice there are no calls to lme4 here... This seems to be the only place where any lmm is actually being fit. I also notice there is no interface to pass additional options (e.g., covaraince structure etc...). There are usually alot more options than just the formula right?
The code is rediculously slow and it all comes down to this one line.
There was a problem hiding this comment.
lmer is the main function for lmerTest. lmerTest is lme4 but with the Satterthwaite degrees of freedom calculation added. Also even in lme4 there is no lme4 function, lme4's main function is lmer.
The code is indeed slow. You are doing reml optimization potentially thousands of times. There may be some clever way to speed it up, but to be clear lmerTest, building on lme4, uses a bunch of C-optimization. It's just inherently slow to do REML thousands of times.
There was a problem hiding this comment.
In terms of additional options, I have not yet added those.
lmerTest/lme4 cannot handle correlation. I'll do a second PR to add nlme, which can handle correlations. I was planning on adding more functionality for specifying additional parameters in that PR. This pr just adds minimal working functionality.
There was a problem hiding this comment.
In a more distant future PR, there are probably Z'Z and X'X type matrix algebra in the optimization that could be de-duplicated for speed, but that would involve hand-rolling our own REML optimization.
|
Alright, will merge. Will need more work but for the moment this is probably ok. |
This PR adds lme4 modeling with parallelization built in to ALDEx3.