Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parametric_effect with a factor variable #212

Closed
fhui28 opened this issue Feb 24, 2023 · 4 comments
Closed

parametric_effect with a factor variable #212

fhui28 opened this issue Feb 24, 2023 · 4 comments
Labels
bug Something isn't working
Milestone

Comments

@fhui28
Copy link

fhui28 commented Feb 24, 2023

Hi Gavin,

@chrishaak and I are having some problems with the parametric_effects function. when a factor is involved. Below is an simple example illustrating the problem.

library(tidyverse)
library(gratia)
library(mgcv)
#> Loading required package: nlme
#> 
#> Attaching package: 'nlme'
#> The following object is masked from 'package:dplyr':
#> 
#>     collapse
#> This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.

set.seed(2) ## simulate some data... 
dat <- gamSim(1,n=400,dist="normal",scale = 2)
#> Gu & Wahba 4 term additive model
dat$fac <- rep(c("a","b"), c(250,150)) %>% factor
b <- gam(y ~ poly(x0, 2, raw = TRUE) + poly(x1, 2, raw = TRUE) + poly(x2, 2, raw = TRUE) + poly(x3, 2, raw = TRUE), data = dat)
summary(b)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> y ~ poly(x0, 2, raw = TRUE) + poly(x1, 2, raw = TRUE) + poly(x2, 
#>     2, raw = TRUE) + poly(x3, 2, raw = TRUE)
#> 
#> Parametric coefficients:
#>                          Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                6.0113     0.7206   8.342 1.27e-15 ***
#> poly(x0, 2, raw = TRUE)1   6.7088     1.8445   3.637 0.000312 ***
#> poly(x0, 2, raw = TRUE)2  -6.0551     1.7979  -3.368 0.000833 ***
#> poly(x1, 2, raw = TRUE)1  -2.3022     1.7919  -1.285 0.199635    
#> poly(x1, 2, raw = TRUE)2   6.9919     1.7172   4.072 5.65e-05 ***
#> poly(x2, 2, raw = TRUE)1   9.0104     1.8371   4.905 1.38e-06 ***
#> poly(x2, 2, raw = TRUE)2 -14.7168     1.7574  -8.374 1.00e-15 ***
#> poly(x3, 2, raw = TRUE)1   0.2623     1.8219   0.144 0.885603    
#> poly(x3, 2, raw = TRUE)2  -1.0502     1.7434  -0.602 0.547271    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> R-sq.(adj) =  0.486   Deviance explained = 49.6%
#> GCV = 7.2072  Scale est. = 7.045     n = 400
parametric_effects(b) 
#'  Works fine!
#> Model contains no factor terms
#> # A tibble: 1,600 × 5
#>    term                    type    value[,"1"] [,"2"] partial    se
#>    <chr>                   <chr>         <dbl>  <dbl>   <dbl> <dbl>
#>  1 poly(x0, 2, raw = TRUE) numeric       0.185 0.0342   1.03  0.282
#>  2 poly(x0, 2, raw = TRUE) numeric       0.702 0.493    1.72  0.485
#>  3 poly(x0, 2, raw = TRUE) numeric       0.573 0.329    1.86  0.505
#>  4 poly(x0, 2, raw = TRUE) numeric       0.168 0.0282   0.956 0.261
#>  5 poly(x0, 2, raw = TRUE) numeric       0.944 0.891    0.938 0.430
#>  6 poly(x0, 2, raw = TRUE) numeric       0.943 0.890    0.940 0.430
#>  7 poly(x0, 2, raw = TRUE) numeric       0.129 0.0167   0.765 0.209
#>  8 poly(x0, 2, raw = TRUE) numeric       0.833 0.695    1.39  0.444
#>  9 poly(x0, 2, raw = TRUE) numeric       0.468 0.219    1.81  0.491
#> 10 poly(x0, 2, raw = TRUE) numeric       0.550 0.302    1.86  0.504
#> # … with 1,590 more rows

b <- gam(y ~ fac + poly(x0, 2, raw = TRUE) + poly(x1, 2, raw = TRUE) + poly(x2, 2, raw = TRUE) + poly(x3, 2, raw = TRUE), data = dat)
summary(b)
#> 
#> Family: gaussian 
#> Link function: identity 
#> 
#> Formula:
#> y ~ fac + poly(x0, 2, raw = TRUE) + poly(x1, 2, raw = TRUE) + 
#>     poly(x2, 2, raw = TRUE) + poly(x3, 2, raw = TRUE)
#> 
#> Parametric coefficients:
#>                           Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                5.92964    0.72500   8.179 4.08e-15 ***
#> facb                       0.28194    0.27640   1.020 0.308333    
#> poly(x0, 2, raw = TRUE)1   6.77753    1.84562   3.672 0.000274 ***
#> poly(x0, 2, raw = TRUE)2  -6.08916    1.79816  -3.386 0.000780 ***
#> poly(x1, 2, raw = TRUE)1  -2.31667    1.79191  -1.293 0.196828    
#> poly(x1, 2, raw = TRUE)2   7.00464    1.71712   4.079 5.48e-05 ***
#> poly(x2, 2, raw = TRUE)1   9.00978    1.83705   4.904 1.38e-06 ***
#> poly(x2, 2, raw = TRUE)2 -14.74233    1.75747  -8.388 9.14e-16 ***
#> poly(x3, 2, raw = TRUE)1   0.09841    1.82884   0.054 0.957113    
#> poly(x3, 2, raw = TRUE)2  -0.90758    1.74893  -0.519 0.604100    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> R-sq.(adj) =  0.486   Deviance explained = 49.8%
#> GCV = 7.2249  Scale est. = 7.0443    n = 400
parametric_effects(b) 
#' This does not work and either prints an error like the below or leads to a session abort.
#> Error in pillar::type_sum(x): cannot have attributes on a CHARSXP

sessionInfo()
#> R version 3.6.3 (2020-02-29)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Linux Mint 19
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
#> 
#> locale:
#>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
#>  [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
#>  [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] mgcv_1.8-40     nlme_3.1-149    gratia_0.7.3    forcats_0.5.2  
#>  [5] stringr_1.5.0   dplyr_1.1.0     purrr_1.0.1     readr_2.1.2    
#>  [9] tidyr_1.3.0     tibble_3.1.8    ggplot2_3.4.1   tidyverse_1.3.2
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.10          lubridate_1.8.0      lattice_0.20-41     
#>  [4] assertthat_0.2.1     digest_0.6.29        utf8_1.2.3          
#>  [7] R6_2.5.1             cellranger_1.1.0     backports_1.2.1     
#> [10] reprex_2.0.2         evaluate_0.19        httr_1.4.2          
#> [13] highr_0.9            pillar_1.8.1         rlang_1.0.6         
#> [16] googlesheets4_1.0.1  readxl_1.3.1         rstudioapi_0.13     
#> [19] mvnfast_0.2.7        Matrix_1.2-18        rmarkdown_2.18      
#> [22] splines_3.6.3        googledrive_2.0.0    munsell_0.5.0       
#> [25] broom_1.0.1          compiler_3.6.3       modelr_0.1.9        
#> [28] xfun_0.35            pkgconfig_2.0.3      htmltools_0.5.1.1   
#> [31] tidyselect_1.2.0     fansi_1.0.4          crayon_1.5.1        
#> [34] tzdb_0.3.0           dbplyr_2.2.1         withr_2.5.0         
#> [37] grid_3.6.3           jsonlite_1.7.2       gtable_0.3.1        
#> [40] lifecycle_1.0.3      DBI_1.1.0            magrittr_2.0.3      
#> [43] scales_1.2.1         cli_3.6.0            stringi_1.7.12      
#> [46] fs_1.5.0             xml2_1.3.3           ellipsis_0.3.2      
#> [49] generics_0.1.3       vctrs_0.5.2          tools_3.6.3         
#> [52] glue_1.6.2           hms_1.1.2            yaml_2.2.1          
#> [55] colorspace_2.1-0     gargle_1.2.1         rvest_1.0.3         
#> [58] knitr_1.41           haven_2.5.1          patchwork_1.0.1.9000

Created on 2023-02-24 with reprex v2.0.2

The error produced from the second model seems to vary depending on the machine. When @chrishaak tried it on his newer machine with gratia (0.8.1), he instead gets

 "Error in `list_unchop()`:
! `..2` must be a vector, not a symbol.

Thank heaps, and apologies if we are missing something obvious!

@gavinsimpson
Copy link
Owner

I'm not surprised that model is causing problems; the poly() terms can result in more than one column and it's not something I ever thought to test these functions against (I'd just fit a smooth).

The segfault is happening in something related to the unnest() function, which is in the tidyr package. I'm sure I'm not handling these models right, but unnest() (or something it calls) shouldn't be segfaulting. I would be good to look into that side of things (I suspect the tidyr devs will need more to go on than your example above as it is many steps removed from my using unnest() and thence their code), but I can look into working properly for models like this.

@gavinsimpson gavinsimpson added the bug Something isn't working label Feb 24, 2023
@fhui28
Copy link
Author

fhui28 commented Feb 24, 2023

Brilliant! Thank you so much Gavin!

@gavinsimpson gavinsimpson added this to the 0.9 milestone Feb 25, 2023
gavinsimpson added a commit that referenced this issue Feb 25, 2023
… difficulties with environments in R CMD check, so we have to help by passing the teardown env and data
@gavinsimpson
Copy link
Owner

This is now fixed in the GH main branch. {gratia} requires an R >= 4.1.0 (since the 0.8 release) so you'll need to update the R on the system you used to report this bug to take advantage of the fixes.

parametric_effects() should now work for terms like log() etc as well as poly() in a model formula.

@gavinsimpson
Copy link
Owner

Thanks for reporting @fhui28

gavinsimpson added a commit that referenced this issue Feb 25, 2023
gavinsimpson added a commit that referenced this issue Feb 25, 2023
gavinsimpson added a commit that referenced this issue Feb 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants