Add lme4 to ALDEx3 by kyle-mcgovern · Pull Request #14 · jsilve24/ALDEx3

kyle-mcgovern · 2025-04-17T10:20:35Z

This PR adds lme4 modeling with parallelization built in to ALDEx3.

Adds
- aldex-mem.R: A new function that returns the same list structure as fflm, except with random effects added.
- 8 unit tests for sanity checking, including checking mean estimation
- method (default "lm"): specify regression method but can also pass "lme4"
- n.cores (default detectCores()-1): argument, does not currently apply when method="lm" (i.e., using fflm), only when method="lme4"
- clr simulation can now simulate random effects, used for testing
- "random.effects" as option for return.pars
- new dependencies: lme4, parallel, lmerTest (for proper p-value calculation)
- version: 0.2.0 -> 0.3.0
Fixes:
- test-aldex-summary: test was failing because formula was ~condition, should have been ~disease
Future Plan
- add nlme

jsilve24 · 2025-04-17T16:59:32Z

I got the following failure in a test:

Error ('test-aldex-mem.R:37:3'): aldex mem correct naming
Error in checkForRemoteErrors(val): 2 nodes produced errors; first error: could not find function "lmer"
Backtrace:
▆

└─ALDEx3::aldex(...) at test-aldex-mem.R:37:3
└─ALDEx3:::aldex.mem(...) at ALDEx3/R/aldex.R:132:7

└─ALDEx3 (local) lapply_func(...) at ALDEx3/R/aldex-mem.R:36:3

  └─parallel::parLapply(cl, X, FUN) at ALDEx3/R/aldex-mem.R:30:5

```
    ├─base::do.call(...)
```
```
    └─parallel::clusterApply(...)
```

      └─parallel:::staticClusterApply(cl, fun, length(x), argfun)

        └─parallel:::checkForRemoteErrors(val)

Error ('test-aldex-mem.R:57:3'): aldex mem correct mean estimate
Error in checkForRemoteErrors(val): 13 nodes produced errors; first error: could not find function "lmer"
Backtrace:
▆

└─ALDEx3::aldex(...) at test-aldex-mem.R:57:3
└─ALDEx3:::aldex.mem(...) at ALDEx3/R/aldex.R:132:7

└─ALDEx3 (local) lapply_func(...) at ALDEx3/R/aldex-mem.R:36:3

  └─parallel::parLapply(cl, X, FUN) at ALDEx3/R/aldex-mem.R:30:5

```
    ├─base::do.call(...)
```
```
    └─parallel::clusterApply(...)
```

      └─parallel:::staticClusterApply(cl, fun, length(x), argfun)

        └─parallel:::checkForRemoteErrors(val)

[ FAIL 2 | WARN 0 | SKIP 0 | PASS 40 ]

Also, will leave some comments on the code.

jsilve24 · 2025-04-17T17:01:47Z

R/aldex-mem.R

+  }
+
+  ## Process Cols (i.e., per-taxa)
+  results <- lapply_func(1:ncol(logWm), function(j) {


Is it just me or is this function actually copying the entire logWm object to each worker.. I suspect this is going to be slow and very memory inefficient. Rather than just sending each column to each worker. I think the later could be done by splitting logWm into a list with each element in the list being a column of the current array. This would drastically reduce memory and likely improve spead as that copy must be slow.

Sure I'll do that

jsilve24 · 2025-04-17T17:02:27Z

R/aldex-mem.R

+
+    ## Random Effects
+    re_df <- as.data.frame(lme4::VarCorr(fit))
+    re_df$var1[is.na(re_df$var1)] <- "(Intercept)"


what is going on here?

handling NAs in random effcts

jsilve24 · 2025-04-17T17:03:13Z

R/aldex-mem.R

+  ## To array, get names, build return list
+  P <- nrow(results[[1]]$fixed_coefs)
+  Pr <- ncol(results[[1]]$random_coefs)
+  fixed_arr <- do.call(rbind, lapply(results, function(x) x$fixed_coefs))


could you preallocate the arrays, likely far more efficient than repeated rbinds. rbind is super inefficient in R (when applied like this). Unless do.call is doing some magic behind the scenes.

Yeah I think so

jsilve24 · 2025-04-17T17:03:50Z

R/aldex-mem.R

+
+    ## Fixed Effects
+    coefs <- coef(summary(fit))[,1:3]
+    coefs <- cbind(coefs, pt(coefs[,1]/coefs[,2], coefs[,3], lower.tail=T))


whats happening here with binding what looks like p-values in?

Upper and lower p value calculation. default p-value returned is 2-sided which is not what we want

kyle-mcgovern · 2025-04-17T17:48:46Z

Huh the unittests work for me, even after pulling repo in new branch... I think I need to add clusterEvalQ, the parallel workers must not be getting the libraries passed into them automatically.

kyle-mcgovern · 2025-04-17T18:42:19Z

Ok updated, try unit tests again now

jsilve24 · 2025-04-21T14:30:07Z

R/sr-mem.R

+  results <- lapply_func(logWm_list, function(y) {
+    data_temp <- data
+    data_temp$y <- y
+    fit <- suppressMessages(lmerTest::lmer(update(formula, y~.), data=data_temp))


Are you sure you are using lme4 and not the much slower lmer? I notice there are no calls to lme4 here... This seems to be the only place where any lmm is actually being fit. I also notice there is no interface to pass additional options (e.g., covaraince structure etc...). There are usually alot more options than just the formula right?

The code is rediculously slow and it all comes down to this one line.

lmer is the main function for lmerTest. lmerTest is lme4 but with the Satterthwaite degrees of freedom calculation added. Also even in lme4 there is no lme4 function, lme4's main function is lmer.

The code is indeed slow. You are doing reml optimization potentially thousands of times. There may be some clever way to speed it up, but to be clear lmerTest, building on lme4, uses a bunch of C-optimization. It's just inherently slow to do REML thousands of times.

In terms of additional options, I have not yet added those.

lmerTest/lme4 cannot handle correlation. I'll do a second PR to add nlme, which can handle correlations. I was planning on adding more functionality for specifying additional parameters in that PR. This pr just adds minimal working functionality.

In a more distant future PR, there are probably Z'Z and X'X type matrix algebra in the optimization that could be de-duplicated for speed, but that would involve hand-rolling our own REML optimization.

jsilve24 · 2025-04-21T15:12:05Z

Alright, will merge. Will need more work but for the moment this is probably ok.

add lme4 functionality

5c8aecb

jsilve24 reviewed Apr 17, 2025

View reviewed changes

add lme4, memory efficiency updated

c509d0a

kyle-mcgovern added 2 commits April 18, 2025 17:38

rename aldex.mem to sr.mem

293618e

change name aldex-mem.R to sr-mem.R

7695564

jsilve24 reviewed Apr 21, 2025

View reviewed changes

jsilve24 closed this Apr 21, 2025

jsilve24 reopened this Apr 21, 2025

jsilve24 merged commit 09ef39b into jsilve24:main Apr 21, 2025

kyle-mcgovern deleted the integrate_mem branch April 21, 2025 15:13

jsilve24 mentioned this pull request Apr 21, 2025

Improvements for Mixed Effects Models #15

Open

3 tasks

Conversation

kyle-mcgovern commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jsilve24 commented Apr 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyle-mcgovern commented Apr 17, 2025

Uh oh!

kyle-mcgovern commented Apr 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsilve24 commented Apr 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kyle-mcgovern commented Apr 17, 2025 •

edited

Loading