New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open thread for mgcv smooth issues #928
Comments
Good to see this moving forward! I had long hoped to get around to this myself. A few things: There are a large number of options available within Some examples of settings that will fail with unhelpful messages: library(glmmTMB)
data("sleepstudy", package = "lme4")
m3 <- glmmTMB(Reaction ~ s(Days, bs = "re"), data = sleepstudy, REML = TRUE)
#> Error in dimnames(x) <- dn: length of 'dimnames' [2] not equal to array extent
m4 <- glmmTMB(Reaction ~ s(Days, fx = TRUE), data = sleepstudy, REML = TRUE)
#> Error in t.default(s$re$rand$Xr): argument is not a matrix Sometimes users setting I noticed this strange behaviour around the REML argument, which is important to note when testing: library(glmmTMB)
library(gamm4)
set.seed(0)
dat <- mgcv::gamSim(1, n = 400, scale = 2)
#> Gu & Wahba 4 term additive model
dat$fac <- fac <- as.factor(sample(1:20, 400, replace = TRUE))
dat$y <- dat$y + model.matrix(~ fac - 1) %*% rnorm(20) * .5
dat$y <- as.numeric(dat$y)
## defaults should be REML = FALSE
## doesn't match!?
br <- gamm4(y ~ s(x0), data = dat, random = ~ (1 | fac))
br1 <- glmmTMB(y ~ s(x0) + (1 | fac), data = dat)
logLik(br$mer) # looks like REML = TRUE
#> 'log Lik.' -1127.131 (df=5)
logLik(br1)
#> 'log Lik.' -1126.471 (df=5)
## but this matches:
br <- gamm4(y ~ s(x0), data = dat, random = ~ (1 | fac), REML = FALSE)
br1 <- glmmTMB(y ~ s(x0) + (1 | fac), data = dat, REML = FALSE)
logLik(br$mer)
#> 'log Lik.' -1126.471 (df=5)
logLik(br1)
#> 'log Lik.' -1126.471 (df=5)
## and so does this:
br <- gamm4(y ~ s(x0), data = dat, random = ~ (1 | fac), REML = TRUE)
br1 <- glmmTMB(y ~ s(x0) + (1 | fac), data = dat, REML = TRUE)
logLik(br$mer)
#> 'log Lik.' -1127.131 (df=5)
logLik(br1)
#> 'log Lik.' -1127.131 (df=5) Some packages seem to have chosen I have had challenges getting prediction on new data with Adding In general, I haven't seen problems fitting with ML. I'm happy to help with testing and working out a good plotting interface—possibly interfacing with gratia. Rolling your own ends up being a pain given the wide array of smoother configurations. |
Naive attempts to fit a 2D thin plate spline didn't go too well ...
|
NA handling is not working correctly it seems to be fitting the smooth before handling NA's
vs
|
Transformations not being handled in smooths
|
thanks @authagag. Now that this is exposed in a CRAN version I can see that I (or someone???) is going to have some work to do getting this from "experimental" to "robust and ready for a wide range of use cases" ... |
I was writing my own version for gamlss so i could use smooths with ordered beta regression for a paper I am working on. This came along and is saving me a lot of work. I have a vested interest :) |
Hmm. The
|
These all work now, with the
|
There are some more complex examples that still don't work, which I need to fix here. Here |
Hi, I don't think that fully addresses all the amazing (or stupid) transformations the could be expressed in s(), that only seems to look for "+" in specials but one could imagine something like s(cos((y^2*3)/2), ...) in the formula. Would that be picked up? I am still trying to figure out how to download the different branches so can't test it. I remember in the survival package there is an untangle specials function that handles a lot of formula tom-foolery pretty robustly. I know that things like this are an absolute bear for functions like predict. |
what needs to be done to get the output to work with gratia, emmeans, and Marginaleffects packages? |
I think remotes::install_github("glmmTMB/glmmTMB/glmmTMB@smooth_fixes") should work for installing the current smooth-fix-development branch. Your weird example suggested above does work with the current branch. For I know nothing about all the downstream packages -- it would be great to at least start testing them (I can imagine that |
FWIW, 2D splines now appear to work (at least at a very quick glance): https://github.com/glmmTMB/glmmTMB/blob/smooth_fixes/misc/spline_volcano.R I don't have a reproducible version for the example in this comment, but this works now: library(glmmTMB) ## with smooth_fixes branch
library(mgcv)
data("sleepstudy", package = "lme4")
m1 <- glmmTMB(Reaction ~ s(log(Days+1),k=4), data = sleepstudy, REML = TRUE)
m2 <- gam(Reaction ~ s(log(Days+1),k=4), data = sleepstudy, method = "REML") (need I'm not sure this is actually fully working:
|
Handling The main thing is that you are returning the |
I agree with the comment above that
I also found these other things that break.
Since specifying k seemed to be the problem, I tried omitting it, but then had an issue with the unique covariate combinations being fewer than the maximum degrees of freedom.
|
Are you using the smooth_fixes branch ? m1 <- glmmTMB(Reaction ~ s(log(Days+1),k=4), data = sleepstudy, REML = TRUE, start = list(theta=-3)) works for me with the However, I can replicate your next ( update: fixed the |
Yeah @bbolker, sorry about that. I didn't see that it was on a separate branch. Thanks for the PR. I just merged it. |
I'm helping a student with their data and based on analyses with splines, we could use the
item from the wishlist above. So I'm thinking about trying to do it. I don't see any changes in EDIT: following up in #997 |
I don't think you're missing anything obvious, but I think this could be a medium- to large-size job because there could be a lot of places to fix. In the short term I would be tempted to fall back to non-penalized splines ( |
Predictions for bivariate smooths don't seem to work. The model object outputs 2 columns to represent the 2D smooth, so there is a mismatch in data frame dimensions between the data frame to which evaluate the smooth on (test data) and the model objects output. It works fine for s(U) + s(V) and s(U)*s(V) etc. Using t2() does not work (I assume the "t2" function hasn't been "copied" into glmmTMB yet). I've tried manual approaches with predictMat etc, but I'm out of my depth with how mgcv handles this. > summary(testmod)
Family: gaussian ( identity )
Formula: spd ~ s(time) + s(U, V)
Data: train_data
AIC BIC logLik deviance df.resid
-17179.7 -17130.9 8596.9 -17193.7 7867
Random effects:
Conditional model:
Groups Name Variance Std.Dev. Corr
dummy dummy1 8.835e-05 0.009399 0.00 (homdiag)
dummy.1 dummy1 5.317e+01 7.292069 0.00 (homdiag)
Residual 6.325e-03 0.079530
Number of obs: 7870, groups: dummy, 1
Dispersion estimate for gaussian family (sigma^2): 0.00633
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.1540193 0.0008965 5749 <2e-16 ***
s(time)1 -0.0014914 0.0047952 0 0.756
s(U,V)1 0.1492392 0.0059513 25 <2e-16 ***
s(U,V)2 -0.2936664 0.0052754 -56 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> predict(testmod, test_data, type="response")
Error in names(dat) <- object$term :
'names' attribute [2] must be the same length as the vector [1]` |
Thanks for the report @AllInCade. Could we have a reproducible example (we can make one up ourselves, but it will go faster if you can construct one ...) please ? |
I can upload the data set I've been using (it's a small subset of a huge dataset), which I have cleaned up etc. I've tried all kinds of things and tricks that have solved prediction errors before, but can't get this to work (haven't tried very hard manually, but the automatic way seems to not work). I may have dreamed this, but I could have sworn that I made predictions and plotted etc with 2D smooths with glmmTMB a few weeks ago. Certainly don't have anything saved that works...
I may add that a colleague of mine and I have been using and testing smooths with glmmTMB extensively, and have found quite a few things which need work, and we have some solutions for several of these things. We've also been implementing ridge regularized GAM(M)s in glmmTMB using the smoothCon and smooth2random machinery, with our own automatic smoothness selection procedure (GCV based). We've found that these models tend to outperform the equivalent mgcv::gam models in terms of predictive accuracy (as measured by RMSE) and not suffering from undersmoothing with MLE (rather than REML). So perhaps it could be an idea to try and implement a control option for the regularization method (ISSD vs Ridge) in glmmTMB, specially for when undersmoothing (overfittng) is a concern, and predictive accuracy and computational time are the primary goals (i.e forecasting etc). Additionally we've seen that attempts have made to compare glmmTMB models with smooth predictors to mgcv::gam models, which are not exactly equivalent frameworks, since the smooths from gam (directly from smooth.construct / smoothCon) are naturally parameterized, and not re-parameterized for mixed model framework estimation (via smooth2random). mgcv::gamm() and gamm4() models are identically equal to glmmTMB models of equvialent model formulas (i.e given the family parameterizations are equal etc). The basis function matrices are exactly identical, and so are all model outputs between all the mixed model frameworks. We'll release the full paper on glmmTMB spline regression in a couple months, so stay tuned. We'll have everything (code, data etc) uploaded to github as well. |
I can't replicate this with the current devel (on-its-way-to-CRAN) version, currently version 1.1.9. What version are you using? |
That's interesting. Like I said I am almost certain that I've had it work in some other files / models. I was/am using version 1.1.8. Edit: It seems to work in 1.1.9, so that's good :) |
@bbolker One addition that would be useful, would be to store in each smooth which "coefficients" belong to that smooth. This info is typically stored in the components Following a quick look at > ranef(m_glmmTMB2)
$dummy
dummy1 dummy2 dummy3 dummy4 dummy5 dummy6 dummy7 dummy8 dummy1 dummy2
1 0.02046113 -0.08115577 -0.5104386 -0.385678 -0.1893127 1.407065 -0.2008775 1.480433 0.1304175 0.285876
dummy3 dummy4 dummy5 dummy6 dummy7 dummy8 dummy1 dummy2 dummy3 dummy4 dummy5
1 0.4071222 -0.1196957 0.4838889 -1.190392 -0.1950292 -1.356725 -15.68131 15.02009 -21.55647 -11.08096 -40.61811
dummy6 dummy7 dummy8
1 13.18978 -19.92904 9.275013 for this model: library("gratia")
library("glmmTMB")
df <- data_sim("eg1", seed = 2)
m_glmmTMB2 <- glmmTMB(y ~ s(x0) + s(x1) + s(x2),
data = df, REML = TRUE) I presume the duplicated names Do these elements in (and thinking ahead, for when smooths are allowed in other linear predictors, how are those different linear predictors identified when extracting model "coefficients"?) |
See notes. Currently known outstanding issues:
mgcv
?)gratia
methods?)The text was updated successfully, but these errors were encountered: