Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A closer look at dispformula=~0 - never needing the R-side in gaussian mixed models again? #653

Closed
SchmidtPaul opened this issue Dec 17, 2020 · 6 comments

Comments

@SchmidtPaul
Copy link

SchmidtPaul commented Dec 17, 2020

In the plant sciences and related fields we often deal with linear mixed models with non-iid variance structures:

  • sometimes for random effects (on the G-side)
  • but more often for the residual (R-side).

Therefore, in the past I was forced to use nlme with its weights = argument and basically hoped that my model would not get too complex. If it did get too complex, I switched to the asreml package or even SAS' PROC MIXED and its REPEATED argument. Both of them are certainly great, but not open source. I was happy to learn about glmmTMB's ability to deal with variance structures while giving me the advantages of lme4. I was really happy when I found this in the documentation:

In Gaussian mixed models, dispformula=~0 fixes the residual variance to be 0 (actually a small non-zero value: at present it is set to sqrt(.Machine$double.eps)), forcing variance into the random effects.

Fixing the residual variance to be 0 and then mimicing it with an additional random effect should theoretically lead to identical results. In addition, I may now use all the variance structures with this pseudo error term. So all the problems are solved! What is this issue about then? It's basically this:

How well does dispformula=~0 really do the trick? What are the caveats and is there more info/citeable proof somewhere of how well it works?

Here is an example of how the pseudo-error-approach can lead to obtaining essentially identical variance component estimates and AICc values.

library(agridat)
library(AICcmodavg)
library(glmmTMB)
library(mixedup)
library(tidyverse)

dat <- agridat::mcconway.turnip %>% 
  as_tibble() %>% 
  mutate(unit = 1:n()) %>% 
  mutate_at(vars(density, unit), as.factor)

m1 <- glmmTMB(yield ~ gen*date*density + 
                (1 | block), 
              dispformula = ~ 1, # default i.e. iid error variance
              REML = TRUE,
              data = dat)

m2 <- glmmTMB(yield ~ gen*date*density + 
                (1 | block) +
                (1 | unit),      # pseudo error term
              dispformula = ~ 0, # fix error variance to be 0
              REML = TRUE,
              data = dat)
#> Warning in fitTMB(TMBStruc): Model convergence problem; false convergence (8).
#> See vignette('troubleshooting')

mixedup::extract_vc(m1)
#>      group    effect variance    sd sd_2.5 sd_97.5 var_prop
#> 1    block Intercept    2.812 1.677  0.635   4.431    0.227
#> 2 Residual              9.591 3.097     NA      NA    0.773

mixedup::extract_vc(m2)
#>      group    effect variance    sd sd_2.5 sd_97.5 var_prop
#> 1    block Intercept    2.810 1.676  0.634   4.430    0.227
#> 2     unit Intercept    9.592 3.097  2.519   3.808    0.773
#> 3 Residual              0.000 0.000     NA      NA    0.000

AICcmodavg::aictab(list(m1, m2), c("m1", "m2"))
#> 
#> Model selection based on AICc:
#> 
#>     K   AICc Delta_AICc AICcWt Cum.Wt      LL
#> m1 18 323.34          0    0.5    0.5 -136.07
#> m2 18 323.34          0    0.5    1.0 -136.07

Created on 2020-12-17 by the reprex package (v0.3.0.9001)

Although this works to satisfaction, there are the first two caveats that become apparent:

  1. As is the case here, we sometimes get a false convergence warning for the pseudo-error-model but not for the standard model.
  2. The variance components will never be exactly identical, may it only be due to the fact that the error variance is not fixed to be 0, but to "a small non-zero value". You address this in a comment in the glmmTMB.R which reads "FIXME: Depending on the final estimates, we should somehow check that this fixed dispersion is small enough."

It can be argued that these are minor, acceptable issues and not so practically relevant. But this is basically how far my train of thought has come and I wanted to reach out to you as I am interested in your thoughts on this.

Thanks!

More stuff on this:
Shamelessly referring to my github page with more examples of how to different mixed models with variance structures translate between packages.
Also, this issue bbolker/broom.mixed#97 is related

@bbolker
Copy link
Contributor

bbolker commented Dec 22, 2020

One piece of low-hanging fruit would be to allow the small non-zero value (currently log(sqrt(.Machine$double.eps))) to be user-settable via glmmTMBControl ...

@SchmidtPaul
Copy link
Author

SchmidtPaul commented Dec 28, 2020

Do you think that experimenting with even smaller non-zero values would actually resolve issues in some cases? I assume you already chose sqrt(.Machine$double.eps) on purpose with just .Machine$double.eps leading to problems as it is too small (?). (For clarity: I know the code says it's the log() of that small number but only because later on the exp() is taken again, right?) Overall I have the feeling that although this may well be a low-hanging fruit, it is not necessary to put in the effort.

To me this issue was actually more to see if you had any objections or apprehensions to the approach described. It seems that you don't and if so I'm fine with closing this issue and in the future I can refer skeptic colleagues to it 😀

P.S. I wrote a mini documentation on this now

@bbolker
Copy link
Contributor

bbolker commented Jan 2, 2021

There is an example here (you'll need to remotes::install_github("glmmTMB/glmmTMB/glmmTMB@dispzero_flex") first ...)

I admit I don't understand the behaviour of this example: the random-effects variance decreases a lot when the fixed residual variance is decreased. I don't know whether this represents a bug (although the code to implement this feature is pretty simple, it's hard to see what could go wrong ...?) or an aspect of the stats I don't understand. It might be worth trying with a less mis-specified model (e.g. Reaction ~ Days + (Days|Subject)) and/or trying it in a model with an observation-level random effect, where var(obs_random) + fixed_sigma^2 should equal the estimated sigma^2 in the 'null' model (no obs-level random effect) ...

@bbolker
Copy link
Contributor

bbolker commented Jan 7, 2021

This is now in #659 (and there are some better tests).

@bbolker
Copy link
Contributor

bbolker commented Jan 9, 2021

Thoughts on this? Shall we go ahead and close it once #659 is accepted? I admit I don't have much to say about this, beyond the fact that adding this flexibility seems something that has advantages for people who want to use it and doesn't have a big downside (tiny increase in code complexity). @SchmidtPaul do you want to play around with it a bit?

As far as resolving errors go: in my experience the "false convergence" warning has always been fairly mysterious: from the documentation

the gradient ∇f(x) may be computed incorrectly, the other stopping tolerances may be too tight, or either f or ∇f may be discontinuous near the current iterate x

In this case it's hard to believe the gradient is computed incorrectly (since it's computed by AD) or that the objective function or gradient are discontinuous, which leaves "the other stopping tolerances may be too tight". I don't even know what that means. My guess would be that any improvement on the fit/change in warning behaviour would be haphazard.

@SchmidtPaul
Copy link
Author

Thank you for the effort and your comments on this! As stated above, it would be fine for me to close this issue now as I may or may not find time to play around with it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants