-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How does the order of random effects in lme4 influence the estimation? #449
Comments
Definitely seems to be something funny going on here, as commented on StackOverflow we're almost certainly going to need a reproducible example in order to figure out what's going on. I do note that we get a singular fit in the first case and not in the second ... the fact that the variance terms end up in a different order probably has something to do with it (but also guessing that the fit is somewhat unstable? I guess that's exactly what's been demonstrated here ...) |
check update on SO to see if it makes sense: https://stats.stackexchange.com/questions/323957/how-does-the-order-of-random-effects-in-lme4-influence-the-estimation |
We now have a reproducible example: see https://github.com/lme4/lme4/blob/master/misc/issues/lme4_order.Rmd |
I'm also slightly baffled: In @dmbates "not-yet-book" and in the JSS paper, I thought we had explained that we (i.e., the algorithm, building up Z and consequently \Lambda_t ) do order the random terms such that number of levels are largest for the first term, and decreasing from there ... and the reason for that being that the relevant Cholesky factorization L L' = ... has least fillin, or needs the least permutation toward a cholesky with little fill-in. But as you @bbolker write, in the repr.ex. the order of the columns are different in the Z matrix and hence sparse cholesky will behave differently, and -- in the case of singularities, can behave even more differently: |
Another example, from https://stat.ethz.ch/pipermail/r-sig-mixed-models/2018q3/027178.html 👍
With the following order of variables in the random-effects structure,
but not with this order:
Why does the order of the random effects matter when PHASE is still |
I'm following up on a similar problem, that I already posted on R-mixed model email list. Bbolker told me to comment here. The csv data can be found at: https://figshare.com/s/04e6e4fb9b73c0e247bd Problem: Standard errors of fixed effects are drastically different when the order of variables in the model changes (hence ANOVA and post hoc testing are also affected) best_mod is the model with the lowest AIC returned by MuMIn::dredge (weird order with random effects in the middle of the fixed effects) best_mod_reorder is the same model but reorganised in a more classical fashion (i.e. fixed effects followed by random effects) Here is the code and the two models:
Could anyone please shine their light? Agronomically, the model makes sense. |
At present it looks like @GuillaumeA2's problem is fixed by centering the |
A followup question - does the order of random effects corresponds to the order in rePCA? When I change the order of random effects, numbers don't change in rePCA output. How should I know which random effect is unnecessary? |
@bbolker Hi all, I just found this sensitivity to ordering of input data rows in lme4, and now found this thread. (I was not aware of that before.) In my example case, it even leads to the fact that with one ordering, lmerTest::lmer() convergence is achieved, while with another ordering, no successful convergence is achieved (trying actually multiple optimization algorithms). What is your current status on this observation? I am thinking to just reorder the data set before I feed it to lme4 / lmerTest, but wonder if there is a better way... maybe even some way that leads to more converging models. Any thoughts?... |
It's important to distinguish between "got a convergence warning" and "the results actually differ in some way that is important to my conclusions". If you have already done everything you can to increase numerical stability (primarily, scaling and centering the predictor variables; possibly, discarding some complexity in the random effects if you're not interested in it; possibly, lower the fitting tolerance on the optimizers to make them work harder), the gold standard is to see whether different optimizers reach similar solutions. If they do, then it doesn't really matter much if they all report convergence warnings; our assumption is that a variety of optimizers using different algorithms will not (are unlikely to?) falsely converge to the same solution. |
@bbolker That is very interesting, thanks a lot! If I understand correctly, then the best practice (for a given covariance structure) is
Currently I have just done steps 2 and 3, but not 1,4,5. So I will try with adding those. I am still curious though about the reordering question. Is there an additional step between 1 and 2 above that could work on the reordering of subjects within the input data set? Since as we have seen, the results are not the same with different orderings. Would it be interesting if I supply you with an example data set that shows the reordering sensitivity? or not? |
I would be interested in example data. To be completely honest, I don't have a detailed understanding of why reordering of groups would make a difference: I just know that numerical computation can always be sensitive to ordering. For example, floating-point addition is not associative. |
Cool thanks Ben. I will work on getting a data set that shows this behavior and that I can share externally. It definitely makes sense that numerical computations can be order sensitive - I guess this is one of the first times that I see this having relevant impact though in practice. So it is kind of interesting. |
@bbolker Thanks Ben - OK so here comes a hopefully reproducible example. As far as I understand the conversation so far, such situations could partly be
Still I am curious if there is also an "optimal ordering" of this kind of data set, in terms of optimizing internal computations in lme4. random_data <- function(seed,
n_obs = 250,
n_ids = 70,
n_visits = 4) {
set.seed(seed, kind = "Mersenne-Twister")
id <- factor(sample(
rep(seq_len(n_ids), each = n_visits),
size = n_obs,
replace = FALSE
))
unique_ids <- sort(unique(id))
visit <- factor(stats::ave(
seq_len(n_obs),
id,
FUN = seq_along
))
y <- stats::rlnorm(
n = n_obs,
meanlog = 1,
sdlog = 0.6
)
x <- factor(sample(
0:1,
size = length(unique_ids),
replace = TRUE
))
x <- setNames(x, unique_ids)
arm <- factor(sample(
c("A", "B", "C"),
size = length(unique_ids),
replace = TRUE
))
arm <- setNames(arm, unique_ids)
data <- data.frame(
id = id,
visit = visit,
y = y,
x = x[id],
arm = arm[id]
)
return(data)
}
# Already for i = 2 there is a discrepancy in the lme4 fits between
# unordered and ordered data sets.
library(lme4)
library(dplyr)
dat <- random_data(2)
dat_ordered <- dat %>%
dplyr::arrange(id, visit)
# Define the model we would like to fit: "unstructured" covariance matrix.
form <- y ~ x + arm * visit + (0 + visit | id)
# First try with the unordered data set.
fit <- lme4::lmer(
formula = form,
data = dat,
control = lme4::lmerControl(
check.nobs.vs.nRE = "ignore"
)
)
fit_all <- lme4::allFit(fit)
summary(fit_all)$msgs
# Here all optimizers show warnings.
# Now try with the ordered data set.
fit_ordered <- lme4::lmer(
formula = form,
data = dat_ordered,
control = lme4::lmerControl(
check.nobs.vs.nRE = "ignore"
)
)
fit_ordered_all <- lme4::allFit(fit_ordered)
summary(fit_ordered_all)$msgs
# So here the "nlminbwrap" optimizer works without any warnings. I ran this in R 3.6.1 with latest versions of lme4 etc., see: > sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.0.1 lme4_1.1-23 Matrix_1.2-17
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 rstudioapi_0.11 magrittr_1.5 splines_3.6.1 MASS_7.3-51.4
[6] tidyselect_1.1.0 statmod_1.4.32 lattice_0.20-41 R6_2.4.0 rlang_0.4.7
[11] optimx_2020-4.2 minqa_1.2.4 tools_3.6.1 grid_3.6.1 dfoptim_2018.2-1
[16] nlme_3.1-141 ellipsis_0.3.0 Rserve_1.7-3.1 tibble_3.0.3 lifecycle_0.2.0
[21] numDeriv_2016.8-1.1 crayon_1.3.4 purrr_0.3.3 nloptr_1.2.1 vctrs_0.3.2
[26] glue_1.4.1 compiler_3.6.1 pillar_1.4.6 generics_0.0.2 boot_1.3-23
[31] pkgconfig_2.0.3 |
@bbolker @mmaechler @dmbates Not sure if you saw the above example - any ideas? I wonder if there is an "optimal ordering" that we could use in our functions as default. |
I think I looked at this briefly. The short answer is that I'd be pleasantly surprised if we could come up with a general rule. Proximal causes of dIfferences are presumably down to numeric differences in the fairly complex linear algebra we're doing, possibly interacting with the nonlinear solvers. I will take another look when I get a chance, but I wouldn't hold your breath ... |
When I ran these just now I got various warnings (not identical) for all optimizers except Will think about this some more.
|
Thanks a lot Ben! |
I've come across a similar (maybe isomorphic? not sure) problem: Changing the order of levels in the random-effects grouping variable can change the output of lmer. I found this because I was analyzing a repeated-measures human subjects experiment, and when I changed the name of my subjects (to anonymize the data for public posting), it changed the statistics slightly. I posted a minimal example here: https://stackoverflow.com/questions/66309877/changing-the-labels-of-your-random-effects-grouping-variable-changes-the-results Does this seem like the same issue as what's been discussed here? If not, I can open a new ticket. |
@adammmorris I think this is the same issue. |
@bbolker any thoughts? Do we maybe just need to live with the fact that the output is not exactly "deterministic" if we don't fix the row order? |
This may be a consequence of the fact that floating point addition is not transitive, because of round-off. The sum of a series of floating point numbers, and hence the dot product of floating point vectors, etc., can depend on the order in which you do the operations. |
I'm going to post the reprex from the StackOverflow post here: require(dplyr)
require(lme4)
require(digest)
df = faithful %>% mutate(subject = rep(as.character(1:8), each = 34),
subject2 = rep(as.character(9:16), each = 34))
summary(lmer(eruptions ~ waiting + (waiting | subject), data = df))$coefficients[2,1] # = 0.07564181
summary(lmer(eruptions ~ waiting + (waiting | subject2), data = df))$coefficients[2,1] # = 0.07567655 As Doug says, it's not surprising that doing the arithmetic in a different order might give slightly different answers, but it would be nice to be able to track the issue more precisely, to know if there might be a tweak that would make the answers identical in a wider range of outcomes. (A brute-force solution to this problem would be to internally Doug, just out of curiosity, what happens with |
The two models converge to two different optima, which is another effect of slight changes in the evaluation of the objective producing slightly different values. The log-likelihood is likely quite flat and minor changes in the objective at some point can take the optimizer onto a different path. The second optimum is slighly better than the first but both are on the boundary. julia> m1 = fit(MixedModel, @formula(eruptions ~ 1 + waiting + (1 + waiting|subject)), df)
Minimizing 108 Time: 0:00:00 ( 6.45 ms/it)
objective: 389.02461819568583
Linear mixed model fit by maximum likelihood
eruptions ~ 1 + waiting + (1 + waiting | subject)
logLik -2 logLik AIC AICc BIC
-194.5123 389.0246 401.0246 401.3416 422.6594
Variance components:
Column Variance Std.Dev. Corr.
subject (Intercept) 0.00187455 0.04329602
waiting 0.00000034 0.00058011 -1.00
Residual 0.24465390 0.49462501
Number of obs: 272; levels of grouping factors: 8
Fixed-effects parameters:
─────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
─────────────────────────────────────────────────────
(Intercept) -1.87442 0.160291 -11.69 <1e-30
waiting 0.0756335 0.00221991 34.07 <1e-99
─────────────────────────────────────────────────────
julia> show(m1.θ)
[0.08753301883756691, -0.0011728288060489592, 0.0]
julia> last(m1.β)
0.07563351392240815
julia> m2 = fit(MixedModel, @formula(eruptions ~ 1 + waiting + (1 + waiting|subject2)), df)
Linear mixed model fit by maximum likelihood
eruptions ~ 1 + waiting + (1 + waiting | subject2)
logLik -2 logLik AIC AICc BIC
-194.5094 389.0189 401.0189 401.3358 422.6537
Variance components:
Column Variance Std.Dev. Corr.
subject2 (Intercept) 0.000648156 0.025458908
waiting 0.000000116 0.000340941 -1.00
Residual 0.244692054 0.494663577
Number of obs: 272; levels of grouping factors: 8
Fixed-effects parameters:
─────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
─────────────────────────────────────────────────────
(Intercept) -1.87416 0.159809 -11.73 <1e-31
waiting 0.0756299 0.00221367 34.16 <1e-99
─────────────────────────────────────────────────────
julia> show(m2.θ)
[0.051467116521091434, -0.0006892388201978146, 0.0]
julia> last(m2.β)
0.07562988347804948 |
We observe the following model:
I was just a bit baffled when I found out that
lme4
makes a difference betweenmod, which expands to
(1 + X + Condition + X:Condition)
for random effects;and
mod1 <- X*Condition + (1 + Condition + X + X:Condition|subject)
.The difference is not great, but why does this happen, and is any of the two ways of specifying a model "more correct"?
anova() function calls are also different.
Both models were specified with
control = glmerControl(optimizer = "nloptwrap", calc.derivs = FALSE)
. The same thing happens withbobyqa
also.P.S. The question was also posted on SO here
The text was updated successfully, but these errors were encountered: