Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smooth_samples() throws error messages for gam models with random effects #121

Closed
isabellaghement opened this issue Dec 3, 2021 · 6 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@isabellaghement
Copy link

Hi Gavin,

I'm trying to use gratia's smooth_samples() function as part of a "by hand" derivatives calculation but am getting error messages for gam models which include random effects.

Using the dataset normtimedata.csv available at https://uofi.app.box.com/s/jtzww0bmgo9216qgpxu6d7avpaa15sey via the Download button, I put together some R code which illustrates one type of error message that I get. The models used in this code were inspired by this post: https://marissabarlaz.github.io/portfolio/gam/.

library(readr)
library(dplyr)
library(magrittr)
library(gratia)
library(mgcv)

setwd("G:\\")

normtimeBP <- read_csv("normtimedata.csv",
                       col_types = 
                         c(Speaker = col_character(),
                          Word = col_character(),
                          Vowel = col_character(),
                          Nasality = col_character(),
                          RepNo = col_double(),
                          NormTime = col_double(),
                          Type = col_character(),
                          label = col_character(),
                          Time = col_double(),
                          F1 = col_double(),
                          F2 = col_double(),
                          F3 = col_double(),
                          F2_3 = col_double()))

BPa = normtimeBP %>% 
      filter(Vowel=="a", 
         Speaker %in% c("BP02", "BP04", "BP05", "BP10", "BP14"))

BPa$Speaker <- factor(BPa$Speaker)
BPa$Nasality <- factor(BPa$Nasality)


model1 <- gam(F1 ~ Nasality + s(NormTime, by = Nasality), 
             data = BPa, 
             method = "REML", 
             family = gaussian())


model2a <- gam(F1 ~ Nasality + s(NormTime, by = Nasality) + 
                    s(Speaker, bs = "re"), 
             data = BPa, 
             method = "REML", 
             family = gaussian())


model2b <- gam(F1 ~ Nasality + s(NormTime, by = Nasality)+ 
                    s(Speaker, Nasality, bs = "re"),
              data = BPa, 
              method = "REML", 
              family = gaussian())

range(BPa$F1)

BPa$F1bin <- ifelse(BPa$F1 > median(BPa$F1, na.rm = TRUE), 1, 0)

model3 <- gam(F1bin ~ Nasality + s(NormTime, by = Nasality) + s(Speaker, bs = "re"), 
              data = BPa, 
              method = "REML", 
              family = binomial(link = "logit"))

model1_smooth_samples <- 
  smooth_samples(model1, 
                 select = "s(NormTime)", 
                 partial_match = TRUE,
                 n = 5, seed = 42)

model2a_smooth_samples <- 
  smooth_samples(model2a, 
                 select = "s(NormTime)", 
                 partial_match = TRUE,
                 n = 5, seed = 42)

model2b_smooth_samples <- 
  smooth_samples(model2b, 
                 select = "s(NormTime)", 
                 partial_match = TRUE,
                 n = 5, seed = 42)

model3_smooth_samples <- 
  smooth_samples(model3, 
                 select = "s(NormTime)", 
                 partial_match = TRUE,
                 n = 5, seed = 42)

The error message I get from R for each of the above gam models expect for the one without random effects is:

Error in names(summ_names) <- paste0(".x", seq_along(summ_names)) :
'names' attribute [1] must be the same length as the vector [0]

For my own dataset, I actually get a different error message, which I could not reproduce with the examples listed above:

Error: New columns must be compatible with .data.
x New column has 15000 rows.
i .data has 7000 rows.

But it seems to me that smooth_samples() is struggles with gam models that include random effects and, depending on the model specification, it throws different types of error messages.

If you could look into this and let me know if there is a fix, that would be greatly appreciated.

Thanks very much!

Isabella

@gavinsimpson
Copy link
Owner

gavinsimpson commented Dec 3, 2021

The issue with these data is annoyingly simple (and you can avoid the bug by specifying the four smooths arising from the factor by term as the only smooths to evaluate). The function is working; all the hard work gets done, but it falls over when adding information about the variables involved in the smooth because in handling factor-by terms I am ignoring factors at some point in the code and for random effects the factor is the data hence I'm also ignoring the very thing I need.

Now I just need to figure out what the right fix is...

@gavinsimpson gavinsimpson self-assigned this Dec 3, 2021
@gavinsimpson gavinsimpson added the bug Something isn't working label Dec 3, 2021
@rroyaute
Copy link

rroyaute commented Dec 20, 2021

Thanks for the clarification Gavin,
I am still struggling with some aspects of selecting the proper term with smooth_samples(). I hope you may clarify further. Here's some example code following the "factor by" example from the documentation:

library(mgcv); library(gratia)

set.seed(2)
dat <- gamSim(4)

m2 <- gam(y ~ fac + s(x2, by = fac) + s(x0), data = dat)

sms <- smooth_samples(m2, n = 5, seed = 42)
draw(sms) # Works!

sms <- smooth_samples(m2, n = 5, seed = 42, term = "s(x2)")
draw(sms)
Returns: "Error in which_smooths.bam(model, term) :
None of the terms matched a smooth."

smooths(m2) # Select smooths specific to each level of "fac"
sms.1 <- smooth_samples(m2, n = 5, seed = 42, term = "s(x2):fac1")
draw(sms.1) # Works!

sms.2 <- smooth_samples(m2, n = 5, seed = 42, term = "s(x2):fac2")
draw(sms.2) # Still returns fac1

sms.3 <- smooth_samples(m2, n = 5, seed = 42, term = "s(x2):fac2")
draw(sms.3) # Still returns fac1

I hope I didn't make any mistake here, but it looks like the lines above only works for the first level of "fac". I am not able to select and plot the next levels however.

Hope this can be solved easily, I really like the package otherwise and it's really improved my understanding of these complicated classes of model!

Thanks!

Raphaël

@gavinsimpson
Copy link
Owner

gavinsimpson commented Dec 20, 2021

Those last two should work; that they don't is a bug. Looks like I'm not pulling out the correct factor level or label somewhere.

I will take a look at this and @isabellaghement's problem in the next few days or over Christmas when I have some time away from work work.

@gavinsimpson
Copy link
Owner

@rroyaute I have fixed the problem you reported here - that was caused by a silly thinko in some of the code.

@gavinsimpson
Copy link
Owner

@isabellaghement I have now fixed the main issue you reported; I have decided to just ignore random effect smooths for now as they aren't really what I had in mind for this function and would be tricky to fit into the current paradigm.

@gavinsimpson gavinsimpson added this to the 0.7 milestone Jan 15, 2022
@gavinsimpson
Copy link
Owner

@isabellaghement The example data from your issue are all working with the GitHub version of {gratia} now. You mentioned another issue with your real data but that isn't raised in the reproducible example you provided and from the error message you mention seems unrelated to the other error messages.

I'm going to close this one as fixed now, but can you install from GitHub and try your actual data example again and file a new issue if the other problem persists?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants