Fix any variance component to any value #833

SchmidtPaul · 2022-06-27T14:10:43Z

Is it correct that it is currently not possible to fix (some of the) variance components to values of my choice?

How SAS does it via `parms` and `NOPROFILE`/`NOITER`

source: SAS documentation

This linear mixed model has two random effects. Due to the additional i.i.d residual variance, three values need to be provided in the parms statement. As the documentation says, this model will not iterate at all, but stick to the provided values:

The NOPROFILE option requests PROC MIXED to refrain from profiling the residual variance parameter during its calculations, thereby enabling its value to be held at 6 as specified in the PARMS statement. The NOITER option prevents any Newton-Raphson iterations so that the subsequent results are based on the given variance components.

How SAS does it via `parms` and `HOLD =`

source: SAS documentation

An even more flexible option is to individually choose what subset of the variance components should be fixed to a given value during the model fitting.

specifies which parameter values PROC MIXED should hold to equal the specified values. For example, the following statement constrains the first and third covariance parameters to equal 5 and 2, respectively

What `glmmTMB()` can do - I think?

Correct me if I am wrong, but the only variance component in a linear (mixed) model fit via glmmTMB() that can be fixed to a value is that of the i.i.d. residual and it can only be fixed to (almost) zero via disformula ~ 0 (see e.g. #653).

Relevance

In my field, two-stage analyses are a relatively big deal. They are at least important enough that a major reason for not switching from SAS to R on more occasions is this very limitation I am addressing here. I actually wrote it out in more detail and with citable references here in #638. Note that this limitation is as far as I know not just true for {glmmTMB} but for any linear mixed model package in R - but I'd be glad to be corrected on this regard.

The text was updated successfully, but these errors were encountered:

bbolker · 2022-07-06T16:36:24Z

You can do a lot, maybe everything you want. Any parameter in the model can be held at a specified value by using the map argument. The analogue of parms is the start= argument (mapping holds the parameters fixed at their starting values, which are usually set to default values of 0 but can be specified), and hold can be specified via map: I'm pretty sure that

parms (5) (3) (2) (3) / hold=1,3;

would be equivalent to

start = list(theta = c(log(5),0,log(2),0), map = list(theta = factor(NA, 1, NA, 2))

(I used the default starting point, 0, for the non-fixed values, and used log(5) to highlight that glmmTMB parameterizes covariance matrices on the log-sd scale; I'm not sure what SAS's scale is, but obviously it matters! The model output should show whether you fixed the parameter to what you wanted, though.)

It gets tricky if you want to constrain correlation parameters in a model with a random-effects vector of length 3 or more: see the 'mappings' section of http://glmmtmb.github.io/glmmTMB/articles/covstruct.html and https://github.com/glmmTMB/glmmTMB/blob/master/misc/glmmTMB_corcalcs.ipynb

map: a list specifying which parameter values should be fixed to a
constant value rather than estimated. ‘map’ should be a named
list containing factors corresponding to a subset of the
internal parameter names (see ‘start’ parameter). Distinct
factor values are fitted as separate parameter values, ‘NA’
values are held fixed: e.g.,
‘map=list(beta=factor(c(1,2,3,NA)))’ would fit the first
three fixed-effect parameters of the conditional model and
fix the fourth parameter to its starting value. In general,
users will probably want to use ‘start’ to specify
non-default starting values for fixed parameters. See
‘MakeADFun’ for more details.

SchmidtPaul · 2022-07-11T13:43:29Z

Thanks for the detailed answer. This sounded great so I tried to apply it.

mod1

As a first check I fitted a model with a single basic random effect and fixed its variance and it worked just as intended:

library(agridat)
library(broom.mixed)
library(glmmTMB)
library(tidyverse)

dat <- as_tibble(agridat::john.alpha)

theta_start <- c(5, 3, 2, 3) %>% sqrt() %>% log()

# Fix variance of single random effect ------------------------------------
mod1 <- glmmTMB(
  yield ~ rep + (1 | gen),
  REML  = TRUE,
  start = list(theta = theta_start[1]),
  map   = list(theta = factor(c(NA))),
  data  = dat
)

tidy(mod1, effects = "ran_pars", scales = "vcov")
#> # A tibble: 2 x 5
#>   effect   component group    term             estimate
#>   <chr>    <chr>     <chr>    <chr>               <dbl>
#> 1 ran_pars cond      gen      var__(Intercept)    5    
#> 2 ran_pars cond      Residual var__Observation    0.134

mod2

And finally, as you can see, the parms in SAS also refer to the error variance or R-side of the model. Am I doing it wrong or is it not possible to do this in glmmTMB?

mod2 <- glmmTMB(
  yield ~ rep + (1 | gen),
  REML = TRUE,
  start = list(theta = theta_start[1:2]),
  map   = list(theta = factor(c(NA, NA))),
  data = dat
)
#> Error: parameter vector length mismatch: in ‘start’, length(theta)==2, should be 1

^{Created on 2022-07-11 by the reprex package (v2.0.1)}

bbolker · 2022-07-12T18:03:06Z

Two corrections: (1) the error variance model is parameterized by betad, not by additional terms in theta; (2) apparently (I had forgotten this) for the Gaussian model the parameters are on the log-variance scale, not the log-SD scale:

mod2 <- glmmTMB(
    yield ~ rep + (1 | gen),
    REML = TRUE,
    start = list(theta = theta_start[1],
                 betad = log(3)),
    map   = list(theta = factor(c(NA)),
                 betad = factor(c(NA))),
    data = dat
)
sigma(mod2)^2 ## 3                                                                      
VarCorr(mod2)$cond$gen[1,1]  ## 5

SchmidtPaul · 2022-07-13T06:59:49Z

Oh wow, this is fantastic. It might be quite the game changer for my field.
So I am assuming that the following two models are more or less equivalent. Or can you comment on what is different? I mean, at least their VarCorr outputs are not displayed identically.

library(agridat)
library(glmmTMB)

dat <- agridat::john.alpha

mod3 <- glmmTMB(yield ~ rep + (1 | gen),
                dispformula = ~ 0,
                REML = TRUE,
                data = dat)

mod4 <- glmmTMB(yield ~ rep + (1 | gen),
                start = list(betad = log(sqrt(.Machine$double.eps))),
                map   = list(betad = factor(c(NA))),
                REML = TRUE,
                data = dat)


sigma(mod3)^2
#> [1] 1.490116e-08
sigma(mod4)^2
#> [1] 1.490116e-08

VarCorr(mod3)
#> 
#> Conditional model:
#>  Groups Name        Std.Dev.
#>  gen    (Intercept) 0.4518
VarCorr(mod4)
#> 
#> Conditional model:
#>  Groups   Name        Std.Dev.  
#>  gen      (Intercept) 0.45179614
#>  Residual             0.00012207

^{Created on 2022-07-13 by the reprex package (v2.0.1)}

mebrooks · 2022-07-13T08:24:39Z

Yes, they are the same. I think a way to confirm that they are identical would be with the variance covariance matrix output from vcov.

> vcov(mod3)
Conditional model:
              (Intercept)         repR2         repR3
(Intercept)  8.504990e-03 -6.208817e-10 -6.208817e-10
repR2       -6.208817e-10  1.241763e-09  6.208817e-10
repR3       -6.208817e-10  6.208817e-10  1.241763e-09

> vcov(mod4)
Conditional model:
              (Intercept)         repR2         repR3
(Intercept)  8.504990e-03 -6.208817e-10 -6.208817e-10
repR2       -6.208817e-10  1.241763e-09  6.208817e-10
repR3       -6.208817e-10  6.208817e-10  1.241763e-09

> mod5 <- glmmTMB(yield ~ rep + (1 | gen),
+                 start = list(betad = log(sqrt(2*.Machine$double.eps))),
+                 map   = list(betad = factor(c(NA))),
+                 REML = TRUE,
+                 data = dat)
> 
> vcov(mod5)
Conditional model:
              (Intercept)         repR2         repR3
(Intercept)  8.504990e-03 -8.780593e-10 -8.780593e-10
repR2       -8.780593e-10  1.756119e-09  8.780593e-10
repR3       -8.780593e-10  8.780593e-10  1.756119e-09

SchmidtPaul · 2022-07-13T09:01:34Z

I apologize for having so many questions, but this is very fruitful. Thank you very much.

I noticed that when fixing all variances in a model, having REML = TRUE leads to problems. But I am now assuming that this should not concern me since no variances are being estimated so I really don't care if REML or ML are being used... since they are not being used.

Would you agree or am I missing other potential issues?

library(agridat)
library(broom.mixed)
library(glmmTMB)

dat <- agridat::john.alpha

mod6_REML <- glmmTMB(yield ~ rep + (1 | gen),
                     start = list(theta =   log(sqrt(3)), betad =       log(5)),
                     map   = list(theta =  factor(c(NA)), betad = factor(c(NA))),
                     REML = TRUE,
                     data = dat)
#> Warning in fitTMB(TMBStruc): Model convergence problem; extreme or very small
#> eigenvalues detected. See vignette('troubleshooting')

tidy(mod6_REML, effects = "ran_pars", scales = "vcov")
#> Error in solve.default(as.matrix(Qm)): 'a' ist 0-dimensional

mod6_ML <- glmmTMB(yield ~ rep + (1 | gen),
                   start = list(theta =   log(sqrt(3)), betad =       log(5)),
                   map   = list(theta =  factor(c(NA)), betad = factor(c(NA))),
                   REML = FALSE,
                   data = dat)

tidy(mod6_ML, effects = "ran_pars", scales = "vcov")
#> # A tibble: 2 x 5
#>   effect   component group    term             estimate
#>   <chr>    <chr>     <chr>    <chr>               <dbl>
#> 1 ran_pars cond      gen      var__(Intercept)        3
#> 2 ran_pars cond      Residual var__Observation        5

^{Created on 2022-07-13 by the reprex package (v2.0.1)}

bbolker · 2022-07-13T14:33:58Z

Yes, I would agree. These problems are both edge cases - we didn't foresee anyone fitting a REML model with all random effects parameters fixed (for some reason this also leads to a loss of dimension names in one place, which isn't hard to work around). The new REML_zero branch (probably to be integrated into master shortly) fixes these problems and gives us the same answers for both approaches.

SchmidtPaul · 2022-07-14T07:04:12Z

Ok, thanks again. I'd say it's fine if you closed the issue now.
Maybe as a last question, since it is only partly related: Is it still correct that for the error/residual variance I cannot fit any variance structure except the default iid? I mean you more or less confirmed this in #653 , but I just want to make sure I'm up-to-date and that the approach described in that issue is still the best workaround (see also #561 for example)

bbolker · 2022-07-15T00:25:20Z

Yes, glmmTMB does not have a separate structure for "R-side modeling". I think the description in #653 is still the best. I did a little bit of work on trying to set up "true" zero-residual models, but these require some fairly different internal structure as well as more thought than I've had a chance to devote to the problem ...

bbolker added a commit that referenced this issue Jul 13, 2022

allow for REML models with all ran pars mapped (GH #833)

3476f41

bbolker closed this as completed Jul 15, 2022

SchmidtPaul mentioned this issue Aug 23, 2022

glmmTMB models with REML = TRUE vincentarelbundock/marginaleffects#466

Closed

coraliewilliams mentioned this issue Sep 14, 2022

Weights in glmmTMB coraliewilliams/glmmTMB#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix any variance component to any value #833

Fix any variance component to any value #833

SchmidtPaul commented Jun 27, 2022

bbolker commented Jul 6, 2022

SchmidtPaul commented Jul 11, 2022 •

edited

Loading

bbolker commented Jul 12, 2022

SchmidtPaul commented Jul 13, 2022

mebrooks commented Jul 13, 2022

SchmidtPaul commented Jul 13, 2022

bbolker commented Jul 13, 2022

SchmidtPaul commented Jul 14, 2022 •

edited

Loading

bbolker commented Jul 15, 2022

Fix any variance component to any value #833

Fix any variance component to any value #833

Comments

SchmidtPaul commented Jun 27, 2022

How SAS does it via parms and NOPROFILE/NOITER

How SAS does it via parms and HOLD =

What glmmTMB() can do - I think?

Relevance

bbolker commented Jul 6, 2022

SchmidtPaul commented Jul 11, 2022 • edited Loading

mod1

mod2

bbolker commented Jul 12, 2022

SchmidtPaul commented Jul 13, 2022

mebrooks commented Jul 13, 2022

SchmidtPaul commented Jul 13, 2022

bbolker commented Jul 13, 2022

SchmidtPaul commented Jul 14, 2022 • edited Loading

bbolker commented Jul 15, 2022

How SAS does it via `parms` and `NOPROFILE`/`NOITER`

How SAS does it via `parms` and `HOLD =`

What `glmmTMB()` can do - I think?

SchmidtPaul commented Jul 11, 2022 •

edited

Loading

SchmidtPaul commented Jul 14, 2022 •

edited

Loading