Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parametric term evaluation for ziplss models #45

Open
pboesu opened this issue Aug 8, 2019 · 4 comments
Open

Parametric term evaluation for ziplss models #45

pboesu opened this issue Aug 8, 2019 · 4 comments
Assignees
Labels

Comments

@pboesu
Copy link

@pboesu pboesu commented Aug 8, 2019

I'm working with models from the ziplss family that have parametric (i.e. non-smooth) terms, and these cause the evaluate_parametric_term method (and hence draw) to fail with an error message like

Error in evaluate_parametric_term.gam(object, term = terms[i]) : Term is not in the parametric part of model: <x3>,

because it does not handle the list structure of the two predictor formulae gracefully.

Reprex collapsed in here

#gratia error reprex
library(mgcv)
library(gratia)
## simulate some data...
f0 <- function(x) 2 * sin(pi * x); f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * 
  (10 * x)^3 * (1 - x)^10
n <- 500;set.seed(5)
x0 <- runif(n); x1 <- runif(n)
x2 <- runif(n); x3 <- runif(n)

## Simulate probability of potential presence...
eta1 <- f0(x0) + f1(x1) - 3
p <- binomial()$linkinv(eta1) 
y <- as.numeric(runif(n)<p) ## 1 for presence, 0 for absence

## Simulate y given potentially present (not exactly model fitted!)...
ind <- y>0
eta2 <- f2(x2[ind])/3
y[ind] <- rpois(exp(eta2),exp(eta2))

## Fit ZIP model... 
b <- gam(list(y~s(x2)+s(x3),~s(x0)+s(x1)),family=ziplss())
draw(b)


b1 <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b1) # fails
plot(b1,pages=1, all.terms = TRUE) #works

I've traced the issue to

vars <- labels(tt) # names of all parametric terms

which doesn't behave as expected as tt is now a two-element list with labels "1" "2".

My hacky attempt to fix this by changing the above line to

vars <- unlist(lapply(tt, labels))

produces reasonable results if there is only a single parameteric predictor in the model, but falls over with a cryptic plot_grid error that I can trace to align_margin, but which I don't understand:

#after hack-fixing evaluate_parametric_term

b2 <- gam(list(y~s(x2)+x3,~s(x0)+s(x1),family=ziplss())
draw(b2) #this works

b3 <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b3) # this fails

#Error in 1:(grep("null", x)[1] - 1) : NA/NaN argument
#In addition: Warning message:
#In predict.gam(object, newdata = mf, type = "terms", terms = term,  :
#  non-existent terms requested - ignoring
#Called from: FUN(X[[i]], ...)
sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gratia_0.2-8 mgcv_1.8-28  nlme_3.1-140

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2       rstudioapi_0.10  magrittr_1.5     splines_3.6.1    tidyselect_0.2.5 munsell_0.5.0   
 [7] cowplot_1.0.0    colorspace_1.4-1 lattice_0.20-38  R6_2.4.0         rlang_0.4.0      dplyr_0.8.3     
[13] tools_3.6.1      packrat_0.5.0    grid_3.6.1       gtable_0.3.0     lazyeval_0.2.2   assertthat_0.2.1
[19] tibble_2.1.3     crayon_1.3.4     Matrix_1.2-17    tidyr_0.8.3      purrr_0.3.2      ggplot2_3.2.0   
[25] glue_1.3.1       labeling_0.3     compiler_3.6.1   pillar_1.4.2     scales_1.0.0     mvtnorm_1.0-11  
[31] pkgconfig_2.0.2 

@pboesu pboesu changed the title Parameteric term evaluation for ziplss models Parametric term evaluation for ziplss models Aug 8, 2019
@gavinsimpson
Copy link
Owner

@gavinsimpson gavinsimpson commented Aug 8, 2019

Hi Philipp, yeah, handling these extended formula models (distributional regression models) is not something gratia does at the moment; Things work for the smooths as Simon just stores them as if they were just a single model with many smooths. The parametric terms code was written without considering those more general models.

This is on the ToDo list that I hope to get to once I get a new course written/start of term, whichever happened first.

@gavinsimpson gavinsimpson self-assigned this Aug 8, 2019
gavinsimpson added a commit that referenced this issue Sep 14, 2019
gavinsimpson added a commit that referenced this issue Sep 14, 2019
@gavinsimpson
Copy link
Owner

@gavinsimpson gavinsimpson commented Sep 14, 2019

I worked out what was going on; a combination of parametric_terms() returning a list for models with more than one linear predictor, and then needing to rename terms according to mgcvs weird convention of add .1, .2 to all smooths and terms in the higher linear predictors.

This should now work for all LSS models, but I have thus far only check the ziplss examples you provided.

Thanks for reporting this.

@pboesu
Copy link
Author

@pboesu pboesu commented Sep 20, 2019

Hi Gavin,
thanks so much for looking into this and expanding the functionality of this great package!

Unfortunately I discovered that there's a further complication in my use case that wasn't covered by my original reprex. It appears that categorical predictors cause additional problems.

Amended reprex:

#gratia error reprex
library(mgcv)
library(gratia)
## simulate some data...
f0 <- function(x) 2 * sin(pi * x); f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 * 
  (10 * x)^3 * (1 - x)^10
n <- 500;set.seed(5)
x0 <- runif(n); x1 <- runif(n)
x2 <- runif(n); x3 <- runif(n)
x4 <- sample(factor(c('a','b','c')), size = n, replace = TRUE)

## Simulate probability of potential presence...
eta1 <- f0(x0) + f1(x1) - 3
p <- binomial()$linkinv(eta1) 
y <- as.numeric(runif(n)<p) ## 1 for presence, 0 for absence

## Simulate y given potentially present (not exactly model fitted!)...
ind <- y>0
eta2 <- f2(x2[ind])/3
y[ind] <- rpois(exp(eta2),exp(eta2))

## Fit ZIP model... 
b <- gam(list(y~s(x2)+x3,~s(x0)+x1),family=ziplss())
draw(b)

#ZIP model with a categorical predictor
b0 <- gam(list(y~s(x2)+x4,~s(x0)+x1),family=ziplss())
draw(b0)# works but warning issued
plot(b0,pages=1, all.terms = TRUE) #works

#ZIP model with linear and categorical predictor
b1 <- gam(list(y~s(x2)+x3+x4,~s(x0)+x1),family=ziplss())
draw(b1) # fails
draw(b1, parametric = FALSE) # works
plot(b1,pages=1, all.terms = TRUE) #works

Having a single categorical predictor yields a warning, but the partial effect plot appears.

Warning messages:
1: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
  variable 'x4' is absent, its contrast will be ignored

Having a categorical predictor and another term leads to a failure

Error: Columns 3, 4, 5, 6, 7, ... (and 3 more) must be named.
Use .name_repair to specify repair.
Call `rlang::last_error()` to see a backtrace
In addition: Warning messages:
1: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
  variable 'x4' is absent, its contrast will be ignored
2: In model.matrix.default(Terms[[i]], mf, contrasts = object$contrasts) :
  variable 'x4' is absent, its contrast will be ignored
3: In predict.gam(object, newdata = mf, type = "terms", terms = mgcv_names[ind],  :
  non-existent terms requested - ignoring

with the following traceback

<error>
message: Columns 3, 4, 5, 6, 7, ... (and 3 more) must be named.
Use .name_repair to specify repair.
class:   `rlang_error`
backtrace:
  1. gratia::draw(b1)
  2. gratia:::draw.gam(b1)
  4. gratia:::evaluate_parametric_term.gam(object, term = terms[i])
  6. tibble:::as_tibble.data.frame(evaluated)
  7. tibble:::as_tibble.list(unclass(x), ..., .rows = .rows, .name_repair = .name_repair)
  8. tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
  9. tibble:::set_repaired_names(x, .name_repair)
 14. tibble:::repaired_names(names(x), .name_repair = .name_repair)
 15. tibble:::check_unique(new_name)
Call `rlang::last_trace()` to see the full backtrace

which to me appears to indicate that names are being lost as they are being passed from the model frame to the prediction frames, but my knowledge of programming with tibbles is very limited. Any clues?

@gavinsimpson
Copy link
Owner

@gavinsimpson gavinsimpson commented Sep 20, 2019

Thanks for the revised example; I was likely missing some subtleties when implementing this. mgcv can fit such a range of models and this was worked on for simpler models; guess I need to expand the test suites to include some more complex examples.

I’ll take a look at why this is failing.

@gavinsimpson gavinsimpson reopened this Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.