Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

deviance applied to a glmer model returns the GLM deviance #375

Open
dmbates opened this Issue Apr 25, 2016 · 9 comments

Comments

Projects
None yet
3 participants
Owner

dmbates commented Apr 25, 2016

This may be intentional but I was a bit surprised after running library(lme4); example(cbpp) to get

> m1
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
   Data: cbpp
     AIC      BIC   logLik deviance df.resid 
194.0531 204.1799 -92.0266 184.0531       51 
Random effects:
 Groups Name        Std.Dev.
 herd   (Intercept) 0.6421  
Number of obs: 56, groups:  herd, 15
Fixed Effects:
(Intercept)      period2      period3      period4  
    -1.3983      -0.9919      -1.1282      -1.5797  
> deviance(m1)
[1] 73.47428
> getME(m1, "devcomp")
$cmp
       ldL2       ldRX2        wrss        ussq       pwrss       drsum        REML         dev     sigmaML   sigmaREML 
 16.8969305   9.6315140  63.0674275   9.7252112  72.7926387  73.4742836          NA 184.0531328          NA          NA 
   tolPwrss 
  0.0000001 

$dims
      N       n       p     nmp       q     nth    nAGQ compDev   useSc  reTrms    spFe    REML    GLMM    NLMM 
     56      56       4      52      15       1       1       1       0       1       0       0       1       0 

The deviance reported in the model summary is the Laplace approximation but the call to the deviance function returns the sum of the squared deviance residuals (i.e. drsum from devcomp$cmp)

> packageVersion("lme4")
[1] ‘1.1.12
Owner

bbolker commented Apr 25, 2016

I'm not sure if this helps, but we have thought about this at least a little bit: the bit below iis at the bottom of ?deviance.merMod. To be honest, I don't know if this is the best way of handling it -- comments?

Deviance and log-likelihood of GLMMs:

One must be careful when defining the deviance of a GLM. For
example, should the deviance be defined as minus twice the
log-likelihood or does it involve subtracting the deviance for a
saturated model? To distinguish these two possibilities we refer
to absolute deviance (minus twice the log-likelihood) and relative
deviance (relative to a saturated model, e.g. Section 2.3.1 in
McCullagh and Nelder 1989). With GLMMs however, there is an
additional complication involving the distinction between the
likelihood and the conditional likelihood. The latter is the
likelihood obtained by conditioning on the estimates of the
conditional modes of the spherical random effects coefficients,
whereas the likelihood itself (i.e. the unconditional likelihood)
involves integrating out these coefficients. The following table
summarizes how to extract the various types of deviance for a
‘glmerMod’ object:

                     conditional        unconditional 
   relative   ‘deviance(object)’         NA in ‘lme4’ 
   absolute  ‘object@resp$aic()’  ‘-2*logLik(object)’ 
Owner

dmbates commented Apr 25, 2016

Okay, I was missing the conditional versus unconditional distinction.

I should know this but can you point me to the evaluation of the Laplace approximation to -2 * log-likelihood for a glmer model. I am trying to work from the ldL2, ussq, drsum and n elements of devcomp to the dev value.

Owner

bbolker commented Apr 25, 2016 edited

Are you looking for https://github.com/lme4/lme4/blob/0701fdc758103e2e8b82234a255c1bd4722368b7/R/lmer.R#L657 (devCrit function -- but actually this just looks up the value in the structure)?

or here
https://github.com/lme4/lme4/blob/e07f8360701d07a19656ef89e151ba9439b93266/R/utilities.R#L801 (this takes the objective function value at the end of the optimization and stores it in the merMod object to be created

or here https://github.com/lme4/lme4/blob/master/src/respModule.cpp#L161 (glmResp::Laplace, which returns ldL2 + sqrL + aic()): the aic method is

double  glmResp::aic() const {
    return d_fam.aic(d_y, d_n, d_mu, d_weights, resDev());
    }

where I think aic() is actually returning -2 times the log-likelihood, as explicated in
@stevencarlislewalker 's notes on deviance definitions here: https://github.com/lme4/lme4/blob/0701fdc758103e2e8b82234a255c1bd4722368b7/misc/notes/deviance.rmd

Owner

dmbates commented Apr 25, 2016

I had gotten to the devCrit function but I thought the -2 * log-likelihood calculation would use "drsum" not "wrss". They often end up being close to one another but it seems that "drsum" would be favored because it is based on the actual probability distribution in use.

I'll look at the notes by @stevencarlislewalker

Owner

stevencarlislewalker commented Apr 25, 2016 edited by bbolker

It's been a while now, but I believe this file is in better shape than those notes.

https://github.com/lme4/lme4/blob/master/misc/logLikGLMM/logLikGLMM.R

Steve

Sent from my iPhone

On Apr 25, 2016, at 4:06 PM, Douglas Bates notifications@github.com wrote:

I had gotten to the devCrit function but I thought the -2 * log-likelihood calculation would use "drsum" not "wrss". They often end up being close to one another but it seems that "drsum" would be favored because it is based on the actual probability distribution in use.

I'll look at the notes by @stevencarlislewalker


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Owner

dmbates commented Apr 25, 2016

@stevencarlislewalker Thanks. Looks like it will help a lot.

Owner

bbolker commented Apr 27, 2016 edited

@dmbates , do you think there is an issue/something we should change here, or do you think the current behaviour (with the current documentation) is adequate?

PS: in response to your "I had gotten to the devCrit function" comment above: I believe that all the references to pwrss there are actually being used in the cases where we are adjusting REML to ML or vice versa, which by definition are not in the GLMM case ... when we are requesting the ML deviance crit for a ML-fitted model (the only GLMM-relevant case), the code reduces to cmp[["dev"]], which gets us back to whatever was computed by the objective function at the optimal point and stored in the merMod object ...

Owner

dmbates commented Apr 27, 2016

The issue that confused me initially is that deviance(m) does not return the deviance of the fitted model m - it returns the deviance of another model - the conditional model. To me this violates the principle of least surprise.

By the way, I prefer not to apply adjectives like "conditional" to deviance. To me the deviance is a property of a model, data set and parameter values. So I think it is meaningful to say "the deviance of the conditional model" but not "the conditional deviance of the model".

I have said many times that I don't know how to define the deviance of a GLMM or an LMM because I don't know what would constitute the saturated model. Things get tricky when you have scale parameters. However, I think that for the common cases of Bernoulli, Binomial and Poisson where there are no scale parameters it is possible to define a saturated GLMM model. Set the covariance of the random effects to zero, forcing the random effects themselves to be zero, and define the mean for each response to be the response itself. The deviance for a particular fitted model then becomes the sum of the squared deviance residuals + ldL2 + sqrL which was where we started at one time.

When we have a scale factor we may need to adjust the sqrL and ldL2 values to correspond to a scale factor of 1 in the conditional model.

Does any of this make sense or am I engaging in chasing my tail around the definitions again?

Owner

bbolker commented Apr 27, 2016 edited

I think it is meaningful to say "the deviance of the conditional model" but not "the conditional deviance of the model"

that makes perfect sense, although it's nice to not have say "the deviance of the conditional model" every time you want to refer to this quantity ...

Models with scale parameters often (e.g. Gamma, Gaussian) coincide with continuous distributions, in which case the saturated model has probability 1/log-likelihood 0, which simplifies some things. (Are there any discrete distributions in the exponential family with scale parameters ... ?)

Other thoughts:

  • thanks for the "saturated GLMM deviance" def., that will actually fill in our 2x2 table nicely
  • I see your point about least surprise, but I'm hesitant about changing the definition of the deviance again: if I were 100% sure it would be the last time we wanted to change it I would be OK with that, but zig-zagging with a widely used package feels like a bad idea (this is really one of the worst parts of lme4 maintenance/development, in my mind; there are so many things that are sub-optimal but it's so hard to be sure that a given replacement is really optimal ...)

Desiderata for a default deviance method:

  • least surprise
  • matches glm() deviance in the zero-variance case
  • ??? useful for assessing overdispersion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment