-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lineribbons for random quantities and similar problems #179
Comments
I believe Michael Betancourt has examples of plots like what you're describing (see e.g. Step 14 here), though he's not a ggplot user so they wouldn't have been made using it ;). I think a workflow for making these would definitely fit in ggdist --- I've thought about implementing similar things in the past, like so-called probability boxes (see #45), which are basically envelopes around a set of CDFs. What I would probably want is something that generalizes the current slab stat / geometry into a slab that has a lineribbon around it, and then allow people to add a lineribbon to either the pdf or cdf (or some function of these I suppose), where the pdf may be estimated either using a kernel density estimator or a histogram. |
Thanks for the link to Michael Betancourt's case study. For histograms, things should be a lot easier than for kernel density estimates. Perhaps histograms even fit into the existing lineribbon framework? And perhaps ECDF plots as well (maybe this is what you wanted to point out with probability boxes)? So perhaps kernel density estimates are really the only hard special case. If that's the case, I would be happy with the other solutions mentioned above, even though a solution for kernel density estimates would be nice as well, of course. What you suggest for the implementation/user interface sounds reasonable to me, although I have to admit that I'm not too familiar (yet) with the slab stat / geom. |
exactly :)
I'm not sure what the issue would be for KDEs --- if you use the same x grid to generate each density, it should be straightforward to calculate the ribbon (unless I'm missing something?) |
Yes, if the same x grid is used for the KDEs, then ribbons for them should be as "easy" as for other curves (in particular, as easy as for histograms and ECDF plots for which I was implicitly assuming the same x grid to be used across all curves as well). I'm not familiar with the details of KDEs and somehow was assuming that in general, the x grid would differ between the different KDEs drawn in such a plot. |
Right, typically you might get something like this: set.seed(1234)
df = data.frame(x = rnorm(20000), draw = 1:500)
df |> ggplot(aes(x, group = draw)) +
stat_slab(fill = NA, color = "black", alpha = 0.1, density = "unbounded") ggdist does let you ensure that the densities fill the full scale using set.seed(1234)
df = data.frame(x = rnorm(20000), draw = 1:500)
df |> ggplot(aes(x, group = draw)) +
stat_slab(fill = NA, color = "black", alpha = 0.1, density = "unbounded", trim = FALSE, expand = TRUE) When using library(dplyr)
from = min(df$x)
to = max(df$x)
df |>
group_by(draw) |>
reframe(with(density(x, from = from, to = to), data.frame(x, y))) |>
ggplot(aes(x, y)) +
stat_lineribbon() +
scale_fill_brewer() Something like this might be sufficient for what you want in bayesplot? For ggdist, I'd like this stat/geom to act similarly to stat_slab, i.e. to allow another variable to be mapped onto the y axis, which means a bit more work... :) |
Great, thanks a lot! |
First of all: Thank you very much for developing this great package!
I have a feature request which is related to Bayesian posterior predictive checks (PPCs), but which might also be helpful in other settings.
In "overlay" PPCs with a large number of posterior draws, I often experience the rendering of such a plot to take very long, due to the large amount of separate lines to plot (I guess). It's not just the rendering within RStudio's "Plot" pane, but also when saving the plot to a PDF file and then opening that PDF file in a PDF viewer. Here is an example, adapted from the
?bayesplot::ppc_dens_overlay
examples:That's why I've been thinking about some kind of a "lineribbon" plot in such settings, i.e., the data ($y$ ) line gets plotted as before (e.g., a kernel density estimate of the observed response values), but the generated response values ($y_{\text{rep}}$ ) are not drawn as one line per posterior draw, but as a shaded area with some pre-specified coverage probability (defaulting, e.g., to 90%) or as a gradient-colored (possibly "ramped") ribbon. That would also allow to use the full number of posterior draws and not having to choose a subset of them.
I'm not sure about the best way to solve this mathematically and neither about the best way to implement this, so I guess some work needs to be done on that first.
Furthermore, I'm not sure if ggdist is really the best place for this; bayesplot might be another good place. But I've recently found the lineribbon plots here in ggdist, so I thought the feature request might fit in here. And as I said above, this feature could also be useful for plots other than PPCs, which would be another argument for having it in ggdist and not in bayesplot. In any case, I'm also tagging @jgabry in case he has already thought about this as well.
The reason why I think the existing lineribbon plots cannot be used for this is that they require multiple y-axis values for each x-axis value, but in PPCs (and possibly other settings), we usually don't have that (because we have random quantities on the x-axis).
The text was updated successfully, but these errors were encountered: