-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plots for model predictions without y variable #151
Comments
I’ve been wanting to do this for a while. Thanks for opening the issue. If y is not included then the prefix |
Introducing |
This is the next major thing I want to work on getting into bayesplot. I think the prefix The ppd functions will share most of the same code as the existing ppc functions so both should rely on the same internals wherever possible. |
@tjmahr I can work on the implementations of these, but just want to check that you’re ok with the plan (or if you have suggestions or a totally different proposal). Here’s an example function signature for these ppd functions: ppd_dens_overlay(ypred, ..., same_args)
|
So, here's the implementation of
To implement this, we want to
Is that basically what you had in mind? Following the format of the other functions? I don't want us to be too clever and have functions that switch between ppd_ and ppc_ styles based on whether y is NULL. I know the temptation is there but I worry it make things harder to change down the road. |
Yeah basically just what you said is what I was planning. But something to consider: the existing So conceptually something like the following would be really elegant: # pseudocode (we don't actually have overlaid_densities() and add_y_density())
ppd_dens_overlay <- function(ypred, ...) {
data <- ppd_data(ypred, ...)
ggplot(data, ...) +
overlaid_densities(...)
}
ppc_dens_overlay <- function (y, yrep, ...) {
ppd_dens_overlay(ypred = yrep, ...) +
add_y_density(y)
} But it could be that it makes more practical sense to do this (again this is some hybrid of real and pseudocode for the moment): # don't call ppd_dens_overlay, just call the same internal overlaid_densities() helper
ppc_dens_overlay <- function(y, yrep, ...) {
data <- ppc_data(y, yrep, ...)
ggplot(filter(data, !is_y)) +
overlaid_densities(...) +
add_y_density(filter(data, is_y))
} |
I think I totally agree with this and my goal is definitely to do this in a way that makes it as easy as possible to maintain everything going forward, although that may require some non-trivial refactoring. But just so I'm clear on what you mean, what would be the issue with something like the following? # exported
ppd_data <- function(ypred, ...) {
.ppd_data(predictions = ypred, observations = NULL, ...)
}
# exported
ppc_data <- function(y, yrep, ...) {
.ppd_data(predictions = yrep, observations = y, ...)
}
# internal
.ppd_data <- function(predictions, observations = NULL, ...) {
# if observations is NULL don't include `y` stuff in the returned object
} |
That seems fine at the data level. It's problem of creating six-different plots from an internal plotting function (like |
Ok I see what you mean, thanks. I agree about avoiding things like what .mcmc_trace currently does. It’s probably a good idea to refactor and break that apart at some point to make it easier to work with. |
The internal .ppc_intervals is similar to .mcmc_trace in that it responsible for way too many plots. I’m going to get rid of it as part of this process. I’ve started breaking it up into helper functions that can be shared by the relevant individual ppc and ppd functions. |
After looking into it, for some ppc functions it would work to call the ppd function inside the ppc function and then add the y stuff, but for others it’s either not possible or would require starting the implementation from scratch. For However, I realized there’s a different opportunity for reducing code duplication. Without too much work, for both ppc and ppd plots we can have the the ppc_ribbon_grouped <- function(y, yrep, x, group, ... facet_args, whatever...) {
call <- ungroup_call(match.call(expand.dots=FALSE)
eval(call) + add_facet_layer(facet_args)
} So basically each It kind of sounds complicated in words but the code is really clean and simple. Anyway, not saying we have to do it this way, just exploring the possibility. —— ungroup_call() is a small helper function: Line 8 in 5b1a937
|
Didn’t mean to close this. Reopening. |
This is pretty tricky stuff. Not a judgment, just an observation. |
Do you mean constructing calls in general is tricky or something particular about this use case? I'm not attached to doing it this way, just trying it out. We can drop the ppc_intervals_grouped <- function(y, yrep, x = NULL, group = NULL, ..., facet_args = list(), other args) {
g <- ppc_intervals(y, yrep, x, group, called_from_internal = TRUE)
g + intervals_group_layer(facet_args)
} That still requires |
Hmm, I take it back. My first hunch would be something like this,
But that requires passing along all the arguments (except |
Ok I’ll proceed that way for now |
On another note, something else I’m doing as a part of this (because it makes sense to do at the same time as the ppd stuff) is adding all the ppc_*_data() functions that we are still missing. |
Hi, I also wanted to make new predictions, or prior predictive checks with the |
@jgabry do you need help for this on the https://github.com/stan-dev/bayesplot/compare/ppd-functions branch? |
I need to get back to working on this! @tjmahr If you have time and want to help that branch move forwards that would be great! |
@tjmahr So I'm finally getting back to working on this! I just fixed a bunch of merge conflicts and cleaned up some stuff. Currently this branch has added all the PPD versions for the "distributions", "intervals", and "test-statistics" categories (and also a bunch of missing |
Sounds good. |
Ok I'll get a PR ready. It's going to be a massive one (sorry!) |
ppc_* are great functions for doing model checking on the data fitted. However, it would be great to also allow these function to visualize predictions in absence of any data.
Possibilites to achieve this would be
y
argument optional in these functionsx
argument part of a number of ppc function (as we usually do predictions as a function of some variable x)Generally, the
x
argument in the ppc functions is great and it could be available in more places.The text was updated successfully, but these errors were encountered: