plots for model predictions without y variable #151

wds15 · 2018-05-14T18:56:49Z

ppc_* are great functions for doing model checking on the data fitted. However, it would be great to also allow these function to visualize predictions in absence of any data.

Possibilites to achieve this would be

make the y argument optional in these functions
expand the mcmc_* functions with things like the x argument part of a number of ppc function (as we usually do predictions as a function of some variable x)

Generally, the x argument in the ppc functions is great and it could be available in more places.

The text was updated successfully, but these errors were encountered:

jgabry · 2018-05-14T23:03:34Z

I’ve been wanting to do this for a while. Thanks for opening the issue. If y is not included then the prefix ppc_ isn’t quite appropriate anymore. pp_ I think makes more sense. @tjmahr maybe we could have pp_ versions that don’t require y and then have the ppc_ functions be wrappers for those that also require y and have additional y-specific arguments when necessary? It would be easier to just not require y in the existing functions, but it’s really not a ppc anymore in that case, it’s just a visualization of the posterior predictive distribution.

wds15 · 2018-05-15T06:32:11Z

Introducing pp_ functions sounds like a clean way of implementing it... at the cost of having more exposed functions. This is a decision maintainers need to make, but as a user that would make me very happy.

related to #151 [ci skip]

jgabry · 2019-05-22T22:38:41Z

This is the next major thing I want to work on getting into bayesplot. I think the prefix ppd_ makes sense for plots just of the posterior (or prior) predictive distribution.

The ppd functions will share most of the same code as the existing ppc functions so both should rely on the same internals wherever possible.

jgabry · 2019-05-23T18:54:54Z

@tjmahr I can work on the implementations of these, but just want to check that you’re ok with the plan (or if you have suggestions or a totally different proposal). Here’s an example function signature for these ppd functions:

ppd_dens_overlay(ypred, ..., same_args)

prefix is ppd instead of ppc
argument name is ypred
same_args won’t actually be there, we’ll have the same arguments as the corresponding ppc function but without the y and yrep

tjmahr · 2019-05-24T13:56:48Z

So, here's the implementation of ppc_dens_overlay()

  check_ignored_arguments(...)
  data <- ppc_data(y, yrep)

  ggplot(data) +
    aes_(x = ~ value) +
    stat_density(
      aes_(group = ~ rep_id, color = "yrep"),
      data = function(x) dplyr::filter(x, !.data$is_y),
      geom = "line",
      position = "identity",
      size = size,
      alpha = alpha,
      trim = trim,
      bw = bw,
      adjust = adjust,
      kernel = kernel,
      n = n_dens
    ) +
    stat_density(
      aes_(color = "y"),
      data = function(x) dplyr::filter(x, .data$is_y),
      geom = "line",
      position = "identity",
      lineend = "round",
      size = 1,
      trim = trim,
      bw = bw,
      adjust = adjust,
      kernel = kernel,
      n = n_dens
    ) +
    scale_color_ppc_dist() +
    bayesplot_theme_get() +
    xlab(y_label()) +
    dont_expand_axes() +
    yaxis_title(FALSE) +
    xaxis_title(FALSE) +
    yaxis_text(FALSE) +
    yaxis_ticks(FALSE)

To implement this, we want to

have a ppd_data() function create the tidy dataframe
remove the second stat_density() layer
change the colors and labels to use something like y_pred instead of y_rep.

Is that basically what you had in mind? Following the format of the other functions?

I don't want us to be too clever and have functions that switch between ppd_ and ppc_ styles based on whether y is NULL. I know the temptation is there but I worry it make things harder to change down the road.

jgabry · 2019-05-24T15:43:59Z

Yeah basically just what you said is what I was planning.

But something to consider: the existing ppc_ functions can be thought of as just modifications to a ppd_ plot, adding stuff about y (and changing ypred label to yrep). So at least in theory it would make sense to have, for example, ppc_dens_overlay() first call ppd_dens_overlay() and then add the y stuff to the plot.

So conceptually something like the following would be really elegant:

# pseudocode (we don't actually have overlaid_densities() and add_y_density())
ppd_dens_overlay <- function(ypred, ...) {
   data <- ppd_data(ypred, ...)
   ggplot(data, ...) + 
     overlaid_densities(...)
}
ppc_dens_overlay <- function (y, yrep, ...) {
  ppd_dens_overlay(ypred = yrep, ...) + 
    add_y_density(y)
}

But it could be that it makes more practical sense to do this (again this is some hybrid of real and pseudocode for the moment):

# don't call ppd_dens_overlay, just call the same internal overlaid_densities() helper
ppc_dens_overlay <- function(y, yrep, ...) {
  data <- ppc_data(y, yrep, ...)
  ggplot(filter(data, !is_y)) + 
     overlaid_densities(...) + 
     add_y_density(filter(data, is_y))
}

jgabry · 2019-05-24T16:11:41Z

I don't want us to be too clever and have functions that switch between ppd_ and ppc_ styles based on whether y is NULL. I know the temptation is there but I worry it make things harder to change down the road.

I think I totally agree with this and my goal is definitely to do this in a way that makes it as easy as possible to maintain everything going forward, although that may require some non-trivial refactoring.

But just so I'm clear on what you mean, what would be the issue with something like the following?

# exported
ppd_data <- function(ypred, ...) {
  .ppd_data(predictions = ypred, observations = NULL, ...)
}
# exported
ppc_data <- function(y, yrep, ...) {
  .ppd_data(predictions = yrep, observations = y, ...)
}

# internal
.ppd_data <- function(predictions, observations = NULL, ...) {
  # if observations is NULL don't include `y` stuff in the returned object
}

tjmahr · 2019-05-24T16:59:42Z

That seems fine at the data level. It's problem of creating six-different plots from an internal plotting function (like .mcmc_trace()... I don't know if it's really six. Bear with me.) that I worry about. All of those plots are really tightly coupled and it's hard to work through. In comparison, it's really clear what does what in the ppc_dens_overlay() code. I think using helper functions, scales and geoms can help achieve consistency without over-coupling.

jgabry · 2019-05-24T18:44:14Z

Ok I see what you mean, thanks. I agree about avoiding things like what .mcmc_trace currently does. It’s probably a good idea to refactor and break that apart at some point to make it easier to work with.

advances #151

jgabry · 2019-05-26T00:31:50Z

The internal .ppc_intervals is similar to .mcmc_trace in that it responsible for way too many plots. I’m going to get rid of it as part of this process. I’ve started breaking it up into helper functions that can be shared by the relevant individual ppc and ppd functions.

jgabry · 2019-05-28T00:04:24Z

After looking into it, for some ppc functions it would work to call the ppd function inside the ppc function and then add the y stuff, but for others it’s either not possible or would require starting the implementation from scratch. For
consistency,I will just rely on shared helper functions and never call ppd inside a ppc.

However, I realized there’s a different opportunity for reducing code duplication. Without too much work, for both ppc and ppd plots we can have the the _grouped functions call their ungrouped counterpart and then add the facet layer.

ppc_ribbon_grouped <- function(y, yrep, x, group, ... facet_args, whatever...) {
call <- ungroup_call(match.call(expand.dots=FALSE)
eval(call) + add_facet_layer(facet_args)
}

So basically each _grouped function can be reduced to a few lines of code, which seems preferable to copying everything when the only difference is adding facets. The only change to the ungrouped function would be to have it allow the group argument via ... so it can create the grouped data frame when called internally by
the _grouped function.

It kind of sounds complicated in words but the code is really clean and simple.

Anyway, not saying we have to do it this way, just exploring the possibility.

——

ungroup_call() is a small helper function:

bayesplot/R/helpers-ppc.R

Line 8 in 5b1a937

ungroup_call <- function(call) {

jgabry · 2019-05-28T20:14:33Z

Didn’t mean to close this. Reopening.

tjmahr · 2019-05-29T10:45:33Z

ungroup_call <- function(fn, call) {
  args <- rlang::call_args(call)
  args$called_from_internal <- TRUE
  args$... <- NULL
  rlang::call2(.fn = fn, !!!args, .ns = "bayesplot")
}

This is pretty tricky stuff. Not a judgment, just an observation.

jgabry · 2019-05-29T15:41:31Z

Do you mean constructing calls in general is tricky or something particular about this use case? I'm not attached to doing it this way, just trying it out. We can drop the ungroup_call stuff and just to do something like this, which is much clearer:

ppc_intervals_grouped <- function(y, yrep, x = NULL, group = NULL, ..., facet_args = list(), other args) {
  g <- ppc_intervals(y, yrep, x, group, called_from_internal = TRUE)
  g + intervals_group_layer(facet_args)
}

That still requires ppc_intervals() to know it's being called from ppc_intervals_grouped() but avoids creating a call object. How do you feel about that?

tjmahr · 2019-05-29T15:59:01Z

Hmm, I take it back.

My first hunch would be something like this,

f <- function(x, group, ...) {
  g(x, ...)
}

g <- function(x, ...) {
  print(x)
}

f(1, "a", "things")

But that requires passing along all the arguments (except group) to g() which is tedious and gives you something else to update if the arguments to f() or g() change. Your way seems like the smarter, more automatic way to go about the problem.

jgabry · 2019-05-29T18:08:00Z

Ok I’ll proceed that way for now

jgabry · 2019-05-29T21:25:30Z

On another note, something else I’m doing as a part of this (because it makes sense to do at the same time as the ppd stuff) is adding all the ppc_*_data() functions that we are still missing.

bnicenboim · 2019-11-28T13:06:43Z

Hi, I also wanted to make new predictions, or prior predictive checks with the ppc_ functions, and I got stuck having to have y that are not NAs.
One super simple solution was to just change the internal function validate_y so that it accepts NA. Maybe I'm being too naive, but why is this a problem?

tjmahr · 2019-12-12T16:09:24Z

@jgabry do you need help for this on the https://github.com/stan-dev/bayesplot/compare/ppd-functions branch?

jgabry · 2019-12-20T18:18:48Z

I need to get back to working on this! @tjmahr If you have time and want to help that branch move forwards that would be great!

jgabry · 2020-06-16T19:15:38Z

@tjmahr So I'm finally getting back to working on this! I just fixed a bunch of merge conflicts and cleaned up some stuff. Currently this branch has added all the PPD versions for the "distributions", "intervals", and "test-statistics" categories (and also a bunch of missing _data() functions). I wonder if it makes sense for us to try to get this merged soon, and then the remaining PPD categories can be added after. What do you think?

tjmahr · 2020-06-16T19:17:54Z

Sounds good.

jgabry · 2020-06-16T19:25:53Z

Ok I'll get a PR ready. It's going to be a massive one (sorry!)

jgabry added feature new plot labels May 14, 2018

jgabry added a commit that referenced this issue May 8, 2019

ppd_data, ppd_hist, ppd_boxplot

eb53604

related to #151 [ci skip]

jgabry changed the title ~~extend ppc functions to plots for model predictions~~ plots for model predictions without y variable May 22, 2019

jgabry mentioned this issue May 23, 2019

Document that PPC functions are also for prior PCs #196

Open

jgabry added this to In progress in PPD module May 25, 2019

jgabry added a commit that referenced this issue May 25, 2019

ppd_ecdf_overlay, ppd_freqpoly, ppd_freqpoly_grouped

5222279

advances #151

jgabry closed this as completed May 28, 2019

jgabry reopened this May 28, 2019

jgabry mentioned this issue Jun 16, 2020

add ppc_*_overlay_grouped functions #212

Merged

jgabry mentioned this issue Jun 16, 2020

Add new PPD module for plots without y variable #222

Merged

jgabry closed this as completed in #222 Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plots for model predictions without y variable #151

plots for model predictions without y variable #151

wds15 commented May 14, 2018

jgabry commented May 14, 2018

wds15 commented May 15, 2018

jgabry commented May 22, 2019

jgabry commented May 23, 2019 •

edited

Loading

tjmahr commented May 24, 2019

jgabry commented May 24, 2019

jgabry commented May 24, 2019 •

edited

Loading

tjmahr commented May 24, 2019 •

edited

Loading

jgabry commented May 24, 2019

jgabry commented May 26, 2019

jgabry commented May 28, 2019 •

edited

Loading

jgabry commented May 28, 2019

tjmahr commented May 29, 2019

jgabry commented May 29, 2019

tjmahr commented May 29, 2019

jgabry commented May 29, 2019

jgabry commented May 29, 2019

bnicenboim commented Nov 28, 2019

tjmahr commented Dec 12, 2019

jgabry commented Dec 20, 2019

jgabry commented Jun 16, 2020 •

edited

Loading

tjmahr commented Jun 16, 2020

jgabry commented Jun 16, 2020

plots for model predictions without y variable #151

plots for model predictions without y variable #151

Comments

wds15 commented May 14, 2018

jgabry commented May 14, 2018

wds15 commented May 15, 2018

jgabry commented May 22, 2019

jgabry commented May 23, 2019 • edited Loading

tjmahr commented May 24, 2019

jgabry commented May 24, 2019

jgabry commented May 24, 2019 • edited Loading

tjmahr commented May 24, 2019 • edited Loading

jgabry commented May 24, 2019

jgabry commented May 26, 2019

jgabry commented May 28, 2019 • edited Loading

jgabry commented May 28, 2019

tjmahr commented May 29, 2019

jgabry commented May 29, 2019

tjmahr commented May 29, 2019

jgabry commented May 29, 2019

jgabry commented May 29, 2019

bnicenboim commented Nov 28, 2019

tjmahr commented Dec 12, 2019

jgabry commented Dec 20, 2019

jgabry commented Jun 16, 2020 • edited Loading

tjmahr commented Jun 16, 2020

jgabry commented Jun 16, 2020

jgabry commented May 23, 2019 •

edited

Loading

jgabry commented May 24, 2019 •

edited

Loading

tjmahr commented May 24, 2019 •

edited

Loading

jgabry commented May 28, 2019 •

edited

Loading

jgabry commented Jun 16, 2020 •

edited

Loading