Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Convergence error, unexplained / uninformative error message. #425

Open
alexWhitworth opened this Issue Jun 23, 2017 · 10 comments

Comments

Projects
None yet
3 participants

alexWhitworth commented Jun 23, 2017 edited

I'm using lme4 to build a collaborative filter and running into convergence issues. Trying to solve via the following resources and getting a new error:

Error in ans.ret[meth, ] <- c(ans$par, ans$value, ans$fevals, ans$gevals, :
number of items to replace is not a multiple of replacement length

This after the model was running towards convergence for ~ 48 hours.

  1. https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html 
    
  2. https://stats.stackexchange.com/questions/242109/model-failed-to-converge-warning-in-lmer
    

I have a model that's structured as follows:

library(lme4); library(optimx)
library(stringi)
library(data.table)

set.seed(1423L)
# highly imbalanced outcome variable
y <- sample.int(2L, size= 910000, replace=T, prob= c(0.98, 0.02)) - 1L
# product biases
prod <- sample(letters, size= 910000, replace=T)
#  user biases
my_grps <- stringi::stri_rand_strings(n= 35000, length= 10)
grps <- rep(my_grps, each= 26)
x1 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x2 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x3 <- sample.int(2L, size= 910000, replace=T, prob= c(0.9, 0.1)) - 1L
x4 <- sample(LETTERS[1:5], size= 91000, replace=T)

dt <- data.table(y= y,
             prod= prod, grps= grps,
             x1= x1, x2= x2, x3= x3, x4= x4)

lmer1 <- glmer(y ~ -1 + prod + x1 + x2 + x3 + x4 + (1|grps),
               data= dt, family= binomial(link= "logit"),
               control = glmerControl(optimizer ='optimx', optCtrl=list(method='nlminb')))

I haven't guaranteed that the above data reproduces the error; but that's the model setup. I don't understand the error message at all

NOTE: in my true use case, I have closer to 15.5M observations and 30-50 products where each product has a different average response rate (y)

Owner

bbolker commented Jun 27, 2017

A few more thoughts on this (I started to run the example case above and got impatient):

  • convergence warnings are extremely likely to be false positives
  • worth trying with glmerControl(optimizer="nloptwrap") (which defaults to using the BOBYQA implementation in nloptr)
  • lme4 is slowest when there are many top-level parameters (which for GLMMs = variance-covariance parameters + fixed-effect parameters). Two ways to get around this are (1) try nAGQ=0 (although this may sacrifice too much accuracy)? (2) make the prod term (which is a 26-level factor, i.e. adds 26 columns to the fixed-effects model matrix) into a random effect, but force its variance to be large (in effect reverting to treating it as a fixed effect). This has the dual advantages of giving us a sparse model matrix and marginalizing out (not the right word) those parameters in the fitting process (that is, those parameters get fitted in the PIRLS loop rather than in the top-level nonlinear optimization).
  • it would be worth comparing glmmTMB for this example. Almost certainly slower than Julia, but probably faster (it's about twice as fast as lme4 when fitting variants of the Contraception data set)
Owner

dmbates commented Jun 27, 2017

Are the convergence diagnostics programmed to take into account that convergence on the boundary does not require the elements of the gradient to be near zero? I have a vague recollection that they don't, which means they will always be false positives in such a case.

In my experience with the Julia code, the difference between the optimum with nAGQ=0L and nAGQ=1L is usually less than 0.5 on the deviance scale, which is not great but not terrible either. I wish I could work out a way of avoiding putting all of the fixed-effects parameters into the general optimization.

@bbolker Can you provide example code for the below and I'll test them on the production-data?

Two ways to get around this are (1) try nAGQ=0 (although this may sacrifice too much accuracy)? (2) make the prod term (which is a 26-level factor, i.e. adds 26 columns to the fixed-effects model matrix) into a random effect, but force its variance to be large (in effect reverting to treating it as a fixed effect).

Thanks, Alex

Owner

bbolker commented Jun 27, 2017

Posted some example code in my latest commit, as misc/GH_425.R (temporarily reduced N to 26000 for testing purposes ...)

Owner

bbolker commented Jun 29, 2017 edited

Quick interim results for the N=26000 case:

load("GH_425.rda")
times <- mget(ls(pattern="^t[0-9]"))
times2 <- sapply(times,"[","elapsed")
fits <- mget(ls(pattern="^fit[0-9]"))
library(lme4)
library(glmmTMB)        
LLs <- -sapply(fits,logLik)
desc <- c("fakeRE","glmmTMB","glmer_nAGQ0_noderiv",
          "glmer_bobyqa_noderiv","glmer_bobyqa_deriv",
          "glmer_nlminb_noderiv")
dd <- data.frame(desc,times2,LLs)
rownames(dd) <- NULL

Results (method, elapsed time in seconds, log-likelihood):

##                   desc  times2      LLs
## 1               fakeRE  24.697 2827.279
## 2              glmmTMB 106.940 2608.670
## 3  glmer_nAGQ0_noderiv   5.338 2608.829
## 4 glmer_bobyqa_noderiv 117.654 2608.759
## 5   glmer_bobyqa_deriv 528.013 2608.759
## 6 glmer_nlminb_noderiv 920.135 2608.889
  • I think the difference in the fakeRE log-likelihood is spurious/definitional. (Which outcomes/parameter estimates are you interested from this analysis, and which (if any) are nuisances/covariates?)
  • computing the derivative (by brute force finite differences) for the convergence checks takes a significant amount of time
  • I expect most of this stuff to scale roughly linearly with number of observations but haven't tried it yet
Owner

bbolker commented Jul 4, 2017

I haven't had any time to work on this further. Have you made any progress?

alexWhitworth commented Jul 8, 2017 edited

@bbolker you may find Norm Matloff's (github @matloff) discussion here useful... Though it appears (comments) that @dmbates has already investigated this in Julia

Owner

bbolker commented Jul 9, 2017

I may look at this at some point. I could certainly write a 'chunked' version of glmer that distributed the computation, but I don't know when I'll get to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment