Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix any variance component to any value #833

Closed
SchmidtPaul opened this issue Jun 27, 2022 · 9 comments
Closed

Fix any variance component to any value #833

SchmidtPaul opened this issue Jun 27, 2022 · 9 comments

Comments

@SchmidtPaul
Copy link

Is it correct that it is currently not possible to fix (some of the) variance components to values of my choice?

How SAS does it via parms and NOPROFILE/NOITER

source: SAS documentation
image
This linear mixed model has two random effects. Due to the additional i.i.d residual variance, three values need to be provided in the parms statement. As the documentation says, this model will not iterate at all, but stick to the provided values:

The NOPROFILE option requests PROC MIXED to refrain from profiling the residual variance parameter during its calculations, thereby enabling its value to be held at 6 as specified in the PARMS statement. The NOITER option prevents any Newton-Raphson iterations so that the subsequent results are based on the given variance components.

How SAS does it via parms and HOLD =

source: SAS documentation
image
An even more flexible option is to individually choose what subset of the variance components should be fixed to a given value during the model fitting.

specifies which parameter values PROC MIXED should hold to equal the specified values. For example, the following statement constrains the first and third covariance parameters to equal 5 and 2, respectively

What glmmTMB() can do - I think?

Correct me if I am wrong, but the only variance component in a linear (mixed) model fit via glmmTMB() that can be fixed to a value is that of the i.i.d. residual and it can only be fixed to (almost) zero via disformula ~ 0 (see e.g. #653).

Relevance

In my field, two-stage analyses are a relatively big deal. They are at least important enough that a major reason for not switching from SAS to R on more occasions is this very limitation I am addressing here. I actually wrote it out in more detail and with citable references here in #638. Note that this limitation is as far as I know not just true for {glmmTMB} but for any linear mixed model package in R - but I'd be glad to be corrected on this regard.

@bbolker
Copy link
Contributor

bbolker commented Jul 6, 2022

You can do a lot, maybe everything you want. Any parameter in the model can be held at a specified value by using the map argument. The analogue of parms is the start= argument (mapping holds the parameters fixed at their starting values, which are usually set to default values of 0 but can be specified), and hold can be specified via map: I'm pretty sure that

parms (5) (3) (2) (3) / hold=1,3;

would be equivalent to

start = list(theta = c(log(5),0,log(2),0), map = list(theta = factor(NA, 1, NA, 2))

(I used the default starting point, 0, for the non-fixed values, and used log(5) to highlight that glmmTMB parameterizes covariance matrices on the log-sd scale; I'm not sure what SAS's scale is, but obviously it matters! The model output should show whether you fixed the parameter to what you wanted, though.)

It gets tricky if you want to constrain correlation parameters in a model with a random-effects vector of length 3 or more: see the 'mappings' section of http://glmmtmb.github.io/glmmTMB/articles/covstruct.html and https://github.com/glmmTMB/glmmTMB/blob/master/misc/glmmTMB_corcalcs.ipynb

map: a list specifying which parameter values should be fixed to a
constant value rather than estimated. ‘map’ should be a named
list containing factors corresponding to a subset of the
internal parameter names (see ‘start’ parameter). Distinct
factor values are fitted as separate parameter values, ‘NA’
values are held fixed: e.g.,
‘map=list(beta=factor(c(1,2,3,NA)))’ would fit the first
three fixed-effect parameters of the conditional model and
fix the fourth parameter to its starting value. In general,
users will probably want to use ‘start’ to specify
non-default starting values for fixed parameters. See
‘MakeADFun’ for more details.

@SchmidtPaul
Copy link
Author

SchmidtPaul commented Jul 11, 2022

Thanks for the detailed answer. This sounded great so I tried to apply it.

mod1

As a first check I fitted a model with a single basic random effect and fixed its variance and it worked just as intended:

library(agridat)
library(broom.mixed)
library(glmmTMB)
library(tidyverse)

dat <- as_tibble(agridat::john.alpha)

theta_start <- c(5, 3, 2, 3) %>% sqrt() %>% log()

# Fix variance of single random effect ------------------------------------
mod1 <- glmmTMB(
  yield ~ rep + (1 | gen),
  REML  = TRUE,
  start = list(theta = theta_start[1]),
  map   = list(theta = factor(c(NA))),
  data  = dat
)

tidy(mod1, effects = "ran_pars", scales = "vcov")
#> # A tibble: 2 x 5
#>   effect   component group    term             estimate
#>   <chr>    <chr>     <chr>    <chr>               <dbl>
#> 1 ran_pars cond      gen      var__(Intercept)    5    
#> 2 ran_pars cond      Residual var__Observation    0.134

image
image

mod2

And finally, as you can see, the parms in SAS also refer to the error variance or R-side of the model. Am I doing it wrong or is it not possible to do this in glmmTMB?

mod2 <- glmmTMB(
  yield ~ rep + (1 | gen),
  REML = TRUE,
  start = list(theta = theta_start[1:2]),
  map   = list(theta = factor(c(NA, NA))),
  data = dat
)
#> Error: parameter vector length mismatch: in ‘start’, length(theta)==2, should be 1

Created on 2022-07-11 by the reprex package (v2.0.1)

@bbolker
Copy link
Contributor

bbolker commented Jul 12, 2022

Two corrections: (1) the error variance model is parameterized by betad, not by additional terms in theta; (2) apparently (I had forgotten this) for the Gaussian model the parameters are on the log-variance scale, not the log-SD scale:

mod2 <- glmmTMB(
    yield ~ rep + (1 | gen),
    REML = TRUE,
    start = list(theta = theta_start[1],
                 betad = log(3)),
    map   = list(theta = factor(c(NA)),
                 betad = factor(c(NA))),
    data = dat
)
sigma(mod2)^2 ## 3                                                                      
VarCorr(mod2)$cond$gen[1,1]  ## 5                                                       

@SchmidtPaul
Copy link
Author

Oh wow, this is fantastic. It might be quite the game changer for my field.
So I am assuming that the following two models are more or less equivalent. Or can you comment on what is different? I mean, at least their VarCorr outputs are not displayed identically.

library(agridat)
library(glmmTMB)

dat <- agridat::john.alpha

mod3 <- glmmTMB(yield ~ rep + (1 | gen),
                dispformula = ~ 0,
                REML = TRUE,
                data = dat)

mod4 <- glmmTMB(yield ~ rep + (1 | gen),
                start = list(betad = log(sqrt(.Machine$double.eps))),
                map   = list(betad = factor(c(NA))),
                REML = TRUE,
                data = dat)


sigma(mod3)^2
#> [1] 1.490116e-08
sigma(mod4)^2
#> [1] 1.490116e-08

VarCorr(mod3)
#> 
#> Conditional model:
#>  Groups Name        Std.Dev.
#>  gen    (Intercept) 0.4518
VarCorr(mod4)
#> 
#> Conditional model:
#>  Groups   Name        Std.Dev.  
#>  gen      (Intercept) 0.45179614
#>  Residual             0.00012207

Created on 2022-07-13 by the reprex package (v2.0.1)

@mebrooks
Copy link
Contributor

Yes, they are the same. I think a way to confirm that they are identical would be with the variance covariance matrix output from vcov.

> vcov(mod3)
Conditional model:
              (Intercept)         repR2         repR3
(Intercept)  8.504990e-03 -6.208817e-10 -6.208817e-10
repR2       -6.208817e-10  1.241763e-09  6.208817e-10
repR3       -6.208817e-10  6.208817e-10  1.241763e-09

> vcov(mod4)
Conditional model:
              (Intercept)         repR2         repR3
(Intercept)  8.504990e-03 -6.208817e-10 -6.208817e-10
repR2       -6.208817e-10  1.241763e-09  6.208817e-10
repR3       -6.208817e-10  6.208817e-10  1.241763e-09

> mod5 <- glmmTMB(yield ~ rep + (1 | gen),
+                 start = list(betad = log(sqrt(2*.Machine$double.eps))),
+                 map   = list(betad = factor(c(NA))),
+                 REML = TRUE,
+                 data = dat)
> 
> vcov(mod5)
Conditional model:
              (Intercept)         repR2         repR3
(Intercept)  8.504990e-03 -8.780593e-10 -8.780593e-10
repR2       -8.780593e-10  1.756119e-09  8.780593e-10
repR3       -8.780593e-10  8.780593e-10  1.756119e-09

@SchmidtPaul
Copy link
Author

I apologize for having so many questions, but this is very fruitful. Thank you very much.

I noticed that when fixing all variances in a model, having REML = TRUE leads to problems. But I am now assuming that this should not concern me since no variances are being estimated so I really don't care if REML or ML are being used... since they are not being used.

Would you agree or am I missing other potential issues?

library(agridat)
library(broom.mixed)
library(glmmTMB)

dat <- agridat::john.alpha

mod6_REML <- glmmTMB(yield ~ rep + (1 | gen),
                     start = list(theta =   log(sqrt(3)), betad =       log(5)),
                     map   = list(theta =  factor(c(NA)), betad = factor(c(NA))),
                     REML = TRUE,
                     data = dat)
#> Warning in fitTMB(TMBStruc): Model convergence problem; extreme or very small
#> eigenvalues detected. See vignette('troubleshooting')

tidy(mod6_REML, effects = "ran_pars", scales = "vcov")
#> Error in solve.default(as.matrix(Qm)): 'a' ist 0-dimensional

mod6_ML <- glmmTMB(yield ~ rep + (1 | gen),
                   start = list(theta =   log(sqrt(3)), betad =       log(5)),
                   map   = list(theta =  factor(c(NA)), betad = factor(c(NA))),
                   REML = FALSE,
                   data = dat)

tidy(mod6_ML, effects = "ran_pars", scales = "vcov")
#> # A tibble: 2 x 5
#>   effect   component group    term             estimate
#>   <chr>    <chr>     <chr>    <chr>               <dbl>
#> 1 ran_pars cond      gen      var__(Intercept)        3
#> 2 ran_pars cond      Residual var__Observation        5

Created on 2022-07-13 by the reprex package (v2.0.1)

@bbolker
Copy link
Contributor

bbolker commented Jul 13, 2022

Yes, I would agree. These problems are both edge cases - we didn't foresee anyone fitting a REML model with all random effects parameters fixed (for some reason this also leads to a loss of dimension names in one place, which isn't hard to work around). The new REML_zero branch (probably to be integrated into master shortly) fixes these problems and gives us the same answers for both approaches.

@SchmidtPaul
Copy link
Author

SchmidtPaul commented Jul 14, 2022

Ok, thanks again. I'd say it's fine if you closed the issue now.
Maybe as a last question, since it is only partly related: Is it still correct that for the error/residual variance I cannot fit any variance structure except the default iid? I mean you more or less confirmed this in #653 , but I just want to make sure I'm up-to-date and that the approach described in that issue is still the best workaround (see also #561 for example)

@bbolker
Copy link
Contributor

bbolker commented Jul 15, 2022

Yes, glmmTMB does not have a separate structure for "R-side modeling". I think the description in #653 is still the best. I did a little bit of work on trying to set up "true" zero-residual models, but these require some fairly different internal structure as well as more thought than I've had a chance to devote to the problem ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants