Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draw() error when trying to plot 2d smooths with bs='sz' or 'fs' #249

Closed
chrishaak opened this issue Jan 31, 2024 · 8 comments
Closed

draw() error when trying to plot 2d smooths with bs='sz' or 'fs' #249

chrishaak opened this issue Jan 31, 2024 · 8 comments

Comments

@chrishaak
Copy link

I'm fitting an HGAM with a global 2d smooth and a factor smooth interaction of the 2d smooth (via bs='sz' or 'fs'). For example:

library("gratia")
library("mgcv")
data("iris")

mod2 <- gam(Petal.Width ~ 
              s(Sepal.Length, Sepal.Width, bs='ds', m=c(1,0.5)) 
            + s(Species, Sepal.Length, Sepal.Width, bs='sz', xt=list(bs='ds', m=c(1,0.5))),
            method = "REML",
            data = iris)

mod3 <- gam(Petal.Width ~ 
              s(Sepal.Length, Sepal.Width, bs='ds', m=c(1,0.5)) 
            + s(Species, Sepal.Length, Sepal.Width, bs='fs', xt=list(bs='ds', m=c(1,0.5))),
            method = "REML",
            data = iris)

When attempting to plot these with draw(), I receive (altogether different) errors for both. For the 'sz' smooth, I am getting:

Error in map():
ℹ In index: 1.
Caused by error in mutate():
ℹ In argument: .sz_var = interaction(object[variables[fs]], sep = ":", lex.order = TRUE).
Caused by error:
! .sz_var must be size 150 or 1, not 30000.

while for the 'fs' smooth I am getting:

Error in seq.default(from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE), :
'length.out' must be a non-negative number

If I omit the second (factor-smooth interaction) term via select=c(-2) in the call to draw(), the first term plots fine for both of the fits.

Finally, using the 'by' argument (instead of 'fs' or 'sz'), both terms plot fine. for example:

mod4 <- gam(Petal.Width ~ 
              s(Sepal.Length, Sepal.Width, bs='ds', m=c(1,0.5)) 
            + s(Sepal.Length, Sepal.Width, bs='ds', m=c(1,0.5), by=Species),
            method = "REML",
            data = iris)

...draws both terms as expected. So it seems to be an issue with the 'fs' and 'sz' in particular...

Apologies if I am missing something here?

Thanks!
Chris

@gavinsimpson
Copy link
Owner

Thanks for the report and easy to implement examples to reproduce the issue. I think the fundamental issue is that I didn't consider 2d TPR or Duchon splines when I wrote the interval plotters for the sz and fs basis.

Would you mind adding the output from sessionInfo() (or session info::session_info() if you have it) to your report.

I'll confirm what's going on later this week – both models should be expected to work with gratia – and see what needs to be changed to make them work.

@chrishaak
Copy link
Author

Thanks for the lightning-fast response Gavin. Sorry, I actually had the sessioninfo in my code but forgot to paste it!

R version 4.2.3 (2023-03-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS: /home/christopher/miniconda3/envs/r423_blis/lib/libblis.so.4.0.0
LAPACK: /home/christopher/miniconda3/envs/r423_blis/lib/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] mgcv_1.9-0 nlme_3.1-163 gratia_0.8.2

loaded via a namespace (and not attached):
[1] Rcpp_1.0.12 pillar_1.9.0 compiler_4.2.3 RColorBrewer_1.1-3 tools_4.2.3 mvnfast_0.2.8 lifecycle_1.0.4
[8] tibble_3.2.1 gtable_0.3.4 lattice_0.22-5 pkgconfig_2.0.3 rlang_1.1.2 Matrix_1.6-2 cli_3.6.1
[15] rstudioapi_0.15.0 patchwork_1.1.3 withr_2.5.2 dplyr_1.1.3 stringr_1.5.0 generics_0.1.3 vctrs_0.6.4
[22] isoband_0.2.7 grid_4.2.3 tidyselect_1.2.0 glue_1.6.2 R6_2.5.1 fansi_1.0.5 ggplot2_3.4.4
[29] purrr_1.0.2 tidyr_1.3.0 farver_2.1.1 magrittr_2.0.3 scales_1.2.1 splines_4.2.3 colorspace_2.1-0
[36] labeling_0.4.3 utf8_1.2.4 stringi_1.8.1 munsell_0.5.0

@gavinsimpson
Copy link
Owner

gavinsimpson commented Feb 2, 2024

Thanks for the report. As I suspected, the different errors were ultimately due the underlying code not knowing how to handle multivariate base smoothers.

"fs" smooths

The "fs" smooth issue raised a different error because I was not careful enough to handle users specifying the factor anywhere in the smooth definition. I was assuming users would use s(x1, x2, f) so factor last. I have fixed this particular issue such that if you use s(f, x1, x2) or s(x1, f, x2) smooth_estimates() will now work correctly.

"sz" smooths

The error with that model was purely due to the code not anticipating a multivariate base smoother.

I now catch this and do something rather than let the obscure error pass through.

… however

None of this actually helps you as in both cases {gratia} can't currently do anything useful from the plot side of things.

Right now I am just emitting messages for both your model examples:

r$> draw(i_fs)                                                              
Can't currently plot multivariate 'fs' smooths.
Skipping: s(Sepal.Length,Sepal.Width,Species)

r$> draw(i_sz)                                                              
Can't currently plot multivariate 'sz' smooths.
Skipping: s(Species,Sepal.Length,Sepal.Width)

but it will plot any other smooths it can plot.

It's not clear to me what to plot in the fs case. I could plot a few of the smooth surfaces, randomly selecting which to plot? For the sz case, I could plot the smooth "difference" surfaces, but how to actually do this in the draw() output? Should I treat it like a by smooth and draw a separate plot for each surface, or should I group them so that I only generate a single plot but use facetting to show the difference-from-reference surfaces for each level of the factor(s)?

A similar issue crops up with HGAMs with random tensor product smooths: t2(x1, x2, f, bs = c("cr", "cr", "re"), full = TRUE). Which of these should I plot and how?

So, back to you @chrishaak. I assume you weren't modelling the iris data in reality, so what would you like to see plotted in both your model specifications?

I'll push fixed code shortly; you'll need to install from github or r-universe (if it builds; there's an issues with the dependency on the Matrix pkg which I need to figure out there) when I have pushed. Check that you get version >= 0.8.2.58. At least then you'll be able to use smooth_estimates() to evaluate the smooths and then you can choose to plot them however you wish using {ggplot2}.

@chrishaak
Copy link
Author

chrishaak commented Feb 2, 2024 via email

@gavinsimpson
Copy link
Owner

...I vaguely remember there was a requirement somewhere along the way that I needed to put it first to plot 'sz' smooths?

I also thought this was the case, but it turns out you can put it anywhere.

I'll see what I can do. Right now, by smooths work easily because from mgcv's point of view these are really just entirely separate smooths - they are a separate entry in the $smooth list that is return by gam(), bam() etc. "fs" and "sz" smooths are different; there is just one element in $smooth for the entire "fs" or "sz" smooth. This is fine with univariate base smooths as I can produce one plot showing all levels (even if it gets messy, with the sz basis in particular), but that won't work for surfaces as I can't plot them on top of one another.

The solution I have used with trivariate and quadvariate smooths is to use facetting; the one "plot" would then have multiple facets, one per level of the factor.

Otherwise I'll need to look at how to return multiple ggplot objects for a single smooth if I am to follow the by smooth convention...

@chrishaak
Copy link
Author

chrishaak commented Feb 2, 2024 via email

@gavinsimpson
Copy link
Owner

I'm fitting to scaled covariate data (using 'scale'), but I'd like to see
the "real" (back-transformed) predictor values in my visualizations

I assume you mean that you have y ~ s(x) where x is the result of x <- scale(x_orig)[,1]? There isn't a good solution to that currently. Right now I would suggest that you just evaluate the smooths with smooth_estimates(), and then build your own plots. I'm looking at ways to make building your own plots that look like draw()s plots even easier.

Until then, you could modify the output from smooth_estimates() and then use its draw() method:

library("dplyr")
library("mgcv")
library("gratia")

df <- data_sim("eg1") |>
  mutate(x1_scl = (x1 - mean(x1)) / sd(x1))
m <- gam(y ~ s(x0) + s(x1_scl) + s(x2) + s(x3), data = df, method = "REML")

sm <- smooth_estimates(m)

# unscale x1_scl
x1_mean <- with(df, mean(x1))
x1_sd <- with(df, sd(x1))

sm <- sm |>
  mutate(x1_scl = (x1_scl * x1_sd) + x1_mean)
draw(sm)

You don't get things like partial residuals or a rug plot, but it is pretty close to what draw() will produce (as draw.gam() calls the draw.smooth_estimates() for each smooth internally).

@chrishaak
Copy link
Author

chrishaak commented Feb 2, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants