Skip to content

Add lme4 to ALDEx3#14

Merged
jsilve24 merged 4 commits intojsilve24:mainfrom
kyle-mcgovern:integrate_mem
Apr 21, 2025
Merged

Add lme4 to ALDEx3#14
jsilve24 merged 4 commits intojsilve24:mainfrom
kyle-mcgovern:integrate_mem

Conversation

@kyle-mcgovern
Copy link
Contributor

@kyle-mcgovern kyle-mcgovern commented Apr 17, 2025

This PR adds lme4 modeling with parallelization built in to ALDEx3.

  • Adds
    • aldex-mem.R: A new function that returns the same list structure as fflm, except with random effects added.
    • 8 unit tests for sanity checking, including checking mean estimation
    • method (default "lm"): specify regression method but can also pass "lme4"
    • n.cores (default detectCores()-1): argument, does not currently apply when method="lm" (i.e., using fflm), only when method="lme4"
    • clr simulation can now simulate random effects, used for testing
    • "random.effects" as option for return.pars
    • new dependencies: lme4, parallel, lmerTest (for proper p-value calculation)
    • version: 0.2.0 -> 0.3.0
  • Fixes:
    • test-aldex-summary: test was failing because formula was ~condition, should have been ~disease
  • Future Plan
    • add nlme

@jsilve24
Copy link
Owner

I got the following failure in a test:

Error ('test-aldex-mem.R:37:3'): aldex mem correct naming
Error in checkForRemoteErrors(val): 2 nodes produced errors; first error: could not find function "lmer"
Backtrace:

  1. └─ALDEx3::aldex(...) at test-aldex-mem.R:37:3
  2. └─ALDEx3:::aldex.mem(...) at ALDEx3/R/aldex.R:132:7
  3. └─ALDEx3 (local) lapply_func(...) at ALDEx3/R/aldex-mem.R:36:3
    
  4.   └─parallel::parLapply(cl, X, FUN) at ALDEx3/R/aldex-mem.R:30:5
    
  5.     ├─base::do.call(...)
    
  6.     └─parallel::clusterApply(...)
    
  7.       └─parallel:::staticClusterApply(cl, fun, length(x), argfun)
    
  8.         └─parallel:::checkForRemoteErrors(val)
    

Error ('test-aldex-mem.R:57:3'): aldex mem correct mean estimate
Error in checkForRemoteErrors(val): 13 nodes produced errors; first error: could not find function "lmer"
Backtrace:

  1. └─ALDEx3::aldex(...) at test-aldex-mem.R:57:3
  2. └─ALDEx3:::aldex.mem(...) at ALDEx3/R/aldex.R:132:7
  3. └─ALDEx3 (local) lapply_func(...) at ALDEx3/R/aldex-mem.R:36:3
    
  4.   └─parallel::parLapply(cl, X, FUN) at ALDEx3/R/aldex-mem.R:30:5
    
  5.     ├─base::do.call(...)
    
  6.     └─parallel::clusterApply(...)
    
  7.       └─parallel:::staticClusterApply(cl, fun, length(x), argfun)
    
  8.         └─parallel:::checkForRemoteErrors(val)
    

[ FAIL 2 | WARN 0 | SKIP 0 | PASS 40 ]

Also, will leave some comments on the code.

R/aldex-mem.R Outdated
}

## Process Cols (i.e., per-taxa)
results <- lapply_func(1:ncol(logWm), function(j) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just me or is this function actually copying the entire logWm object to each worker.. I suspect this is going to be slow and very memory inefficient. Rather than just sending each column to each worker. I think the later could be done by splitting logWm into a list with each element in the list being a column of the current array. This would drastically reduce memory and likely improve spead as that copy must be slow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll do that


## Random Effects
re_df <- as.data.frame(lme4::VarCorr(fit))
re_df$var1[is.na(re_df$var1)] <- "(Intercept)"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handling NAs in random effcts

R/aldex-mem.R Outdated
## To array, get names, build return list
P <- nrow(results[[1]]$fixed_coefs)
Pr <- ncol(results[[1]]$random_coefs)
fixed_arr <- do.call(rbind, lapply(results, function(x) x$fixed_coefs))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you preallocate the arrays, likely far more efficient than repeated rbinds. rbind is super inefficient in R (when applied like this). Unless do.call is doing some magic behind the scenes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think so


## Fixed Effects
coefs <- coef(summary(fit))[,1:3]
coefs <- cbind(coefs, pt(coefs[,1]/coefs[,2], coefs[,3], lower.tail=T))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats happening here with binding what looks like p-values in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upper and lower p value calculation. default p-value returned is 2-sided which is not what we want

@kyle-mcgovern
Copy link
Contributor Author

Huh the unittests work for me, even after pulling repo in new branch... I think I need to add clusterEvalQ, the parallel workers must not be getting the libraries passed into them automatically.

@kyle-mcgovern
Copy link
Contributor Author

Ok updated, try unit tests again now

results <- lapply_func(logWm_list, function(y) {
data_temp <- data
data_temp$y <- y
fit <- suppressMessages(lmerTest::lmer(update(formula, y~.), data=data_temp))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you are using lme4 and not the much slower lmer? I notice there are no calls to lme4 here... This seems to be the only place where any lmm is actually being fit. I also notice there is no interface to pass additional options (e.g., covaraince structure etc...). There are usually alot more options than just the formula right?

The code is rediculously slow and it all comes down to this one line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lmer is the main function for lmerTest. lmerTest is lme4 but with the Satterthwaite degrees of freedom calculation added. Also even in lme4 there is no lme4 function, lme4's main function is lmer.

The code is indeed slow. You are doing reml optimization potentially thousands of times. There may be some clever way to speed it up, but to be clear lmerTest, building on lme4, uses a bunch of C-optimization. It's just inherently slow to do REML thousands of times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of additional options, I have not yet added those.

lmerTest/lme4 cannot handle correlation. I'll do a second PR to add nlme, which can handle correlations. I was planning on adding more functionality for specifying additional parameters in that PR. This pr just adds minimal working functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a more distant future PR, there are probably Z'Z and X'X type matrix algebra in the optimization that could be de-duplicated for speed, but that would involve hand-rolling our own REML optimization.

@jsilve24
Copy link
Owner

Alright, will merge. Will need more work but for the moment this is probably ok.

@jsilve24 jsilve24 closed this Apr 21, 2025
@jsilve24 jsilve24 reopened this Apr 21, 2025
@jsilve24 jsilve24 merged commit 09ef39b into jsilve24:main Apr 21, 2025
@kyle-mcgovern kyle-mcgovern deleted the integrate_mem branch April 21, 2025 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants