New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate rows in mgcv::gam()
with ti(age, by = grade, bs ='fs')
#150
Comments
I just add a quick look. Duplicates row are due to the specific Line 138 in 7f4d5d4
Here we are trying to compute a number of observations for the smooth terms and this is this code that is failing and compute two rows per interaction term with |
At that stage: Line 159 in 7f4d5d4
We should keep only one row per term, but what would be the expected value to keep? The maximum? The minimum? NA if two computed values?
|
I think the N should be consistent with a typical continuous by categorical interaction term we report library(gtsummary)
lm(marker ~ age:grade, trial) %>%
tbl_regression() %>%
add_n(location = "level") %>%
as_kable()
Created on 2022-02-22 by the reprex package (v2.0.1) Header row reports overall N, and the row report the Ns from the categorical variable. I think that makes sense here as well |
Then it should be 64, 57 and 58. One of the issue here is that library(broom.helpers)
library(mgcv)
#> Le chargement a nécessité le package : nlme
#> This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
mod <- gam(
marker ~ s(age, bs = 'ad', k = -1) + grade + ti(age, by = grade, bs ='fs'),
data = gtsummary::trial,
method = 'REML',
family = gaussian
)
mod %>% model_get_model_matrix() %>% colnames()
#> [1] "(Intercept)" "gradeII" "gradeIII"
#> [4] "s(age).1" "s(age).2" "s(age).3"
#> [7] "s(age).4" "s(age).5" "s(age).6"
#> [10] "s(age).7" "s(age).8" "s(age).9"
#> [13] "s(age).10" "s(age).11" "s(age).12"
#> [16] "s(age).13" "s(age).14" "s(age).15"
#> [19] "s(age).16" "s(age).17" "s(age).18"
#> [22] "s(age).19" "s(age).20" "s(age).21"
#> [25] "s(age).22" "s(age).23" "s(age).24"
#> [28] "s(age).25" "s(age).26" "s(age).27"
#> [31] "s(age).28" "s(age).29" "s(age).30"
#> [34] "s(age).31" "s(age).32" "s(age).33"
#> [37] "s(age).34" "s(age).35" "s(age).36"
#> [40] "s(age).37" "s(age).38" "s(age).39"
#> [43] "ti(age):gradeI.1" "ti(age):gradeI.2" "ti(age):gradeI.3"
#> [46] "ti(age):gradeI.4" "ti(age):gradeII.1" "ti(age):gradeII.2"
#> [49] "ti(age):gradeII.3" "ti(age):gradeII.4" "ti(age):gradeIII.1"
#> [52] "ti(age):gradeIII.2" "ti(age):gradeIII.3" "ti(age):gradeIII.4" Created on 2022-02-22 by the reprex package (v2.0.1) Because there are 2 columns for each term, there is a bug in the computation of N. The other thing to consider is that I'm not familiar with these tensor interactions terms and I'm not sure how to properly deal with them. |
Hmm, i agree with you that it's not clear what to do... What do you think of this:
|
Note: the easier fix would be to remove If we want to continue to compute the number of observations for smooth terms, we need to identy such terms, interactions with such terms, and how to handle multiple smooth columns in model matrix. |
I would prefer to switch model_get_n() to the default behavior, resulting in computing the number of observations only for parametric terms that are classic terms. |
ok that sounds like a good option |
Please have a look at #151 library(broom.helpers)
library(mgcv)
#> Le chargement a nécessité le package : nlme
#> This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
mod <- gam(
marker ~ s(age, bs = 'ad', k = -1) + grade + ti(age, by = grade, bs ='fs'),
data = gtsummary::trial,
method = 'REML',
family = gaussian
)
mod %>%
tidy_plus_plus(tidy_fun = gtsummary::tidy_gam) %>%
dplyr::select(term, variable, var_type, estimate, n_obs, parametric)
#> # A tibble: 7 x 6
#> term variable var_type estimate n_obs parametric
#> <chr> <chr> <chr> <dbl> <dbl> <lgl>
#> 1 gradeI grade categorical 0 64 NA
#> 2 gradeII grade categorical -0.390 57 TRUE
#> 3 gradeIII grade categorical -0.125 58 TRUE
#> 4 s(age) s(age) continuous NA NA FALSE
#> 5 ti(age):gradeI ti(age):gradeI continuous NA NA FALSE
#> 6 ti(age):gradeII ti(age):gradeII continuous NA NA FALSE
#> 7 ti(age):gradeIII ti(age):gradeIII continuous NA NA FALSE Created on 2022-02-22 by the reprex package (v2.0.1) |
Reported on https://stackoverflow.com/questions/71215280/gtsummary-output-with-mgcv-gam
I hadn't seen the interaction specification in
mgcv::gam()
usingti(age, by = grade, bs ='fs')
, and it looks like there is a duplication of rows issue with these models. Example below! I think it's related to the calculation ofn_obs
.Created on 2022-02-21 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: