Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all pp_check types for censored y #1327

Merged

Conversation

hdrab127
Copy link
Contributor

Hopefully addresses #1326.

The diff got a bit confused, it's mostly just moving the ppc_arg assignments, like group,lw,x,psis above the censor check, then subsetting them there if they are used.

Also allows censored values for the type = "km_overlay" plot (https://mc-stan.org/bayesplot/reference/PPC-censoring.html).

sysdata.rda just has brms:::brmsfit_example7 included for the censored tests, I didn't re-run the other example fits.

Re-running some of the examples in the issue:

library(brms)
data(kidney)
fit <- brm(
  formula = time | cens(censored) ~ age * sex + disease + (1 | patient),
  data = kidney, family = lognormal(),
  prior = c(set_prior("normal(0, 5)", class = "b"),
            set_prior("cauchy(0, 2)", class = "sd")),
  seed = 0, warmup = 1000, iter = 2000, chains = 4,
  control = list(adapt_delta = 0.95)
)
#> Compiling Stan program...
#> Start sampling
#> ...

pp_check(fit, type = 'dens_overlay_grouped', group = 'sex')
#> Using 10 posterior draws for ppc type 'dens_overlay_grouped' by default.
#> Warning: Censored responses are not shown in 'pp_check'.

pp_check(fit, status_y = kidney$censored, type = 'km_overlay')
#> Using 10 posterior draws for ppc type 'km_overlay' by default.

pp_check(fit, x = 'patient', group = 'sex', type = 'intervals_grouped')
#> Using all posterior draws for ppc type 'intervals_grouped' by default.
#> Warning: Censored responses are not shown in 'pp_check'.

pp_check(fit, type = 'loo_intervals')
#> Using all posterior draws for ppc type 'loo_intervals' by default.
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

#> Warning: Censored responses are not shown in 'pp_check'.
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

pp_check(fit, type = 'loo_pit_overlay')
#> Using 10 posterior draws for ppc type 'loo_pit_overlay' by default.
#> Warning: Not enough tail samples to fit the generalized Pareto distribution in some or all columns of matrix of log importance ratios. Skipping the following columns: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... [66 more not printed].

#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#> Warning: Censored responses are not shown in 'pp_check'.
#> NOTE: The kernel density estimate assumes continuous observations and is not optimal for discrete observations.

pp_check(fit, type = 'loo_pit_qq')
#> Using 10 posterior draws for ppc type 'loo_pit_qq' by default.
#> Warning: Not enough tail samples to fit the generalized Pareto distribution in some or all columns of matrix of log importance ratios. Skipping the following columns: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... [66 more not printed].
#> Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
#> Warning: Censored responses are not shown in 'pp_check'.

Created on 2022-03-23 by the reprex package (v2.0.1)

Copy link
Owner

@paul-buerkner paul-buerkner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR! I think it looks good mostly and I have just some minor things to change.

R/pp_check.R Outdated Show resolved Hide resolved
@@ -84,16 +84,24 @@ brmsfit_example6 <- brm(
stan_model_args = stan_model_args, rename = FALSE
)

brmsfit_example7 <- brm(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These internal models take quite some space so I would like to avoid saving another one. Can you add this test to the local tests in tests/local/? Perhaps there is already a censored model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea thanks, have reverted those changes and moved. Though when running the local tests beforehand I noticed the add_rstan_model function fails for that kidney survival example. Is this the same on your computer, and if so, is it okay to ignore for this PR and handle separately?

Test fail:

-- Error (tests.models_new.R:89:3): Survival model from brm doc works correctly ---------------------------
Error in `stanc(file = file, model_code = model_code, model_name = model_name, 
    verbose = verbose, obfuscate_model_name = obfuscate_model_name, 
    allow_undefined = allow_undefined, isystem = isystem)`: failed to parse Stan model '530b15df4662e9b168e0f021510d8a97' due to the above error.
Backtrace:
 1. brms::add_rstan_model(fit3)
      at tests.models_new.R:89:2
 4. rstan::stan(...)
      at brms/R/brmsfit-helpers.R:761:4
 5. rstan::stan_model(...)
 6. rstan::stanc(...)

Checking internally looks like the arrays from the cmdstanr backend:

rstan::stan(
+       model_code = stancode(x, threads = threading()),
+       data = standata(x), chains = 0
+     )
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
 error in 'modeldb4faf53a2_530b15df4662e9b168e0f021510d8a97' at line 8, column 2
  -------------------------------------------------
     6:   int<lower=1> N; // total number of observations
     7:   vector[N] Y; // response variable
     8:   array[N] int<lower=-1, upper=2> cens; // indicates censoring
         ^
     9:   int<lower=1> K; // number of population-level effects
  -------------------------------------------------

PARSER EXPECTED: <one of the following:
  a variable declaration, beginning with type,
      (int, real, vector, row_vector, matrix, unit_vector,
       simplex, ordered, positive_ordered,
       corr_matrix, cov_matrix,
       cholesky_corr, cholesky_cov
  or '}' to close variable declarations>
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '530b15df4662e9b168e0f021510d8a97' due to the above error.

Seems to fix if I add rtsan backend to the recompiling code (

model_code = stancode(x, threads = threading()),
):

model_code = stancode(x, threads = threading(), backend = 'rstan'),
                                              ^^^^^^^^^^^^^^^^^^^

But I can submit that in different request? Or you check/change? Whatever's easiest

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just because of new array syntax in the latest version of Stan (>= 2.29.0). You can safely ignore that here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fixed it on master.

R/pp_check.R Show resolved Hide resolved
@hdrab127 hdrab127 force-pushed the issue-pp-check-grouped-cens branch from 912072e to a75cce3 Compare March 23, 2022 19:35
@hdrab127
Copy link
Contributor Author

hdrab127 commented Mar 23, 2022

Sorry I wasn't sure the cleanest way to revert the sysdata.rda file back to how it was before so did a reset then force push. Can see the requested changes from your last check to now here.

@codecov-commenter
Copy link

Codecov Report

Merging #1327 (a75cce3) into master (cffd336) will decrease coverage by 3.15%.
The diff coverage is 76.33%.

@@            Coverage Diff             @@
##           master    #1327      +/-   ##
==========================================
- Coverage   86.31%   83.15%   -3.16%     
==========================================
  Files          42       70      +28     
  Lines       15511    18958    +3447     
==========================================
+ Hits        13388    15765    +2377     
- Misses       2123     3193    +1070     
Impacted Files Coverage Δ
R/brmsfit-methods.R 77.72% <ø> (-6.21%) ⬇️
R/families.R 87.03% <ø> (-0.71%) ⬇️
R/family-lists.R 100.00% <ø> (ø)
R/formula-ac.R 87.61% <ø> (ø)
R/formula-ad.R 93.75% <ø> (ø)
R/formula-cs.R 100.00% <ø> (ø)
R/formula-gp.R 72.95% <ø> (ø)
R/formula-re.R 92.87% <ø> (ø)
R/formula-sm.R 94.87% <ø> (ø)
R/formula-sp.R 89.84% <ø> (ø)
... and 89 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9410572...a75cce3. Read the comment docs.

@paul-buerkner
Copy link
Owner

Thank you! Looks good to me! Merging now!

@paul-buerkner paul-buerkner merged commit 5b26151 into paul-buerkner:master Mar 24, 2022
@hdrab127 hdrab127 deleted the issue-pp-check-grouped-cens branch March 24, 2022 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants