Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ppc_intervals_data() and ppc_ribbon_data() #101

Merged
merged 9 commits into from
Aug 8, 2017

Conversation

tjmahr
Copy link
Collaborator

@tjmahr tjmahr commented Aug 7, 2017

See issue #99.

I added functions ppc_intervals_data() and ppc_ribbon_data(). These return the data that would be plotted by the ppc_intervals family of plotters.

I refactored the internal .ppc_intervals_data() function. Before it stacked replication intervals and observed ys on top of each other and told them apart using a column is_y. That is:

> bayesplot:::.ppc_intervals_data(y, yrep, x = 1:50)
# A tibble: 100 x 5
# Groups:   x [?]
      x   is_y           lo          mid           hi
  <int> <fctr>        <dbl>        <dbl>        <dbl>
1     1   TRUE -1.576958028 -1.576958028 -1.576958028
2     1  FALSE -1.741401157  0.274514926  2.519152806
3     2   TRUE -0.375700240 -0.375700240 -0.375700240
4     2  FALSE -2.462380258 -0.204469907  1.976864727
5     3   TRUE -1.289742511 -1.289742511 -1.289742511
6     3  FALSE -2.623870953 -0.186001644  2.371562257
7     4   TRUE  0.008464543  0.008464543  0.008464543
8     4  FALSE -2.217317808  0.094052618  2.737946453
9     5   TRUE  0.873829347  0.873829347  0.873829347
10    5  FALSE -2.983087056  0.010274642  2.592691328
# ... with 90 more rows

But now, those observed values are stored in a column called y_obs:

> bayesplot:::ppc_intervals_data(y, yrep, x = 1:50)
# A tibble: 50 x 6
   y_id        y_obs     x        lo         mid       hi
  <int>        <dbl> <int>     <dbl>       <dbl>    <dbl>
1     1 -1.576958028     1 -1.741401  0.27451493 2.519153
2     2 -0.375700240     2 -2.462380 -0.20446991 1.976865
3     3 -1.289742511     3 -2.623871 -0.18600164 2.371562
4     4  0.008464543     4 -2.217318  0.09405262 2.737946
5     5  0.873829347     5 -2.983087  0.01027464 2.592691
6     6  0.925084871     6 -2.402087 -0.23957047 2.484187
7     7  0.056003860     7 -2.272656  0.10470365 2.227581
8     8  0.472664991     8 -2.251600  0.07247611 2.527019
9     9 -0.153056388     9 -2.526754  0.02610329 2.337524
10   10  0.215333434    10 -2.425294  0.24194109 2.644617
# ... with 40 more rows

This format makes creating the plots a little easier.

I also updated the unit tests to match this new output format.

I don't like relying on is.missing() when setting default values for missing function arguments. (It's much harder to test during interactive programming.) When I ran into this issue, I instead used the pattern...

f <- function(x = NULL) {
  if (is.null(x) {
    # deal with missing value
  }
}

I also imported rlang and started to use the new system of nonstandard evaluation for the tidyverse.

Visual tests

These are the plots generated by the examples in ?ppc_intervals:

library(bayesplot)
#> This is bayesplot version 1.3.0
#> Plotting theme set to bayesplot::theme_default()
y <- rnorm(50)
yrep <- matrix(rnorm(5000, 0, 2), ncol = 50)

color_scheme_set("brightblue")
ppc_ribbon(y, yrep)

unnamed-chunk-1-1

ppc_intervals(y, yrep)

unnamed-chunk-1-2

color_scheme_set("teal")
year <- 1950:1999
ppc_ribbon(y, yrep, x = year, alpha = 0, size = 0.75) + ggplot2::xlab("Year")

unnamed-chunk-1-3

color_scheme_set("pink")
year <- rep(2000:2009, each = 5)
group <- gl(5, 1, length = 50, labels = LETTERS[1:5])
ppc_ribbon_grouped(y, yrep, x = year, group) +
 ggplot2::scale_x_continuous(breaks = pretty)

unnamed-chunk-1-4

ppc_ribbon_grouped(
y, yrep, x = year, group,
facet_args = list(scales = "fixed"),
alpha = 1,
size = 2
) +
xaxis_text(FALSE) +
xaxis_ticks(FALSE) +
panel_bg(fill = "gray20")

unnamed-chunk-1-5

ppc_data <- ppc_intervals_data(y, yrep, x = year, prob = 0.5)
ppc_data
#> # A tibble: 50 x 6
#>     y_id      y_obs     x         lo         mid        hi
#>    <int>      <dbl> <int>      <dbl>       <dbl>     <dbl>
#>  1     1  2.6106399  2000 -1.9121028 -0.30346588 1.2438269
#>  2     2  0.2924113  2000 -0.9975955  0.24825805 1.5434840
#>  3     3 -0.7725254  2000 -1.6513193 -0.11780907 1.2272995
#>  4     4  0.8634162  2000 -1.1818673  0.23238388 1.6475749
#>  5     5  1.1283432  2000 -1.6518806 -0.48487352 0.9016187
#>  6     6  2.1456789  2001 -1.3075440  0.41361380 1.1956589
#>  7     7  0.1427364  2001 -1.3490220 -0.26051485 0.8740965
#>  8     8 -0.8945021  2001 -1.0472378  0.23367006 1.5635030
#>  9     9 -0.4810363  2001 -0.8543759 -0.02456840 1.4952441
#> 10    10 -0.5175339  2001 -1.3181571  0.04460883 1.5440722
#> # ... with 40 more rows

ppc_group_data <- ppc_intervals_data(y, yrep, x = year, group, prob = 0.5)
ppc_group_data
#> # A tibble: 50 x 7
#>     y_id      y_obs  group     x         lo         mid        hi
#>    <int>      <dbl> <fctr> <int>      <dbl>       <dbl>     <dbl>
#>  1     1  2.6106399      A  2000 -1.9121028 -0.30346588 1.2438269
#>  2     2  0.2924113      B  2000 -0.9975955  0.24825805 1.5434840
#>  3     3 -0.7725254      C  2000 -1.6513193 -0.11780907 1.2272995
#>  4     4  0.8634162      D  2000 -1.1818673  0.23238388 1.6475749
#>  5     5  1.1283432      E  2000 -1.6518806 -0.48487352 0.9016187
#>  6     6  2.1456789      A  2001 -1.3075440  0.41361380 1.1956589
#>  7     7  0.1427364      B  2001 -1.3490220 -0.26051485 0.8740965
#>  8     8 -0.8945021      C  2001 -1.0472378  0.23367006 1.5635030
#>  9     9 -0.4810363      D  2001 -0.8543759 -0.02456840 1.4952441
#> 10    10 -0.5175339      E  2001 -1.3181571  0.04460883 1.5440722
#> # ... with 40 more rows

ppc_loo_intervals() build on top of the ppc_intervals() function, so here are some plots generated in the test-ppc-loo.R file. (I would use the example but it takes forever to run.) This check just confirms that everything looks okay.

image

image

image

@jgabry
Copy link
Member

jgabry commented Aug 7, 2017

Awesome. I will take a look at the changes to the code shortly but what you describe sounds good.

On another note, for now don't worry about travis-ci errors as long as the tests pass locally for you. I've been having issues with travis lately (as described in #102)

Copy link
Member

@jgabry jgabry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great. Just one tiny change that I think needs to be made before I merge it:

Currently we have dplyr (>= 0.4.3) in Imports, but with your changes I think we need to bump that up to something like dplyr (>= 0.7.2) right?

@@ -1,7 +1,7 @@
library(bayesplot)
context("PPC: intervals & ribbon")

source("data-for-ppc-tests.R")
source(test_path("data-for-ppc-tests.R"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't believe I hadn't seen the testthat::test_path function before! Much better than having to manually change the path when working locally with the file.

...,
prob = 0.9,
alpha = 0.33,
size = 0.25) {
check_ignored_arguments(...)
y <- validate_y(y)
if (!missing(intervals)) {
if (!is.null(intervals)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on switching from missing to is.null here and elsewhere

@jgabry jgabry added the feature label Aug 7, 2017
@tjmahr
Copy link
Collaborator Author

tjmahr commented Aug 8, 2017

Now requiring dplyr 7.1.

Also ppc_intervals_data() now returns the probability mass / interval width to enable stacking of multiple intervals and quick plotting like this:

ppc_data <- ppc_intervals_data(y, yrep, x = year, prob = 0.5)
ppc_data
#> # A tibble: 50 x 7
#>     y_id       y_obs     x  prob         lo         mid       hi
#>    <int>       <dbl> <int> <dbl>      <dbl>       <dbl>    <dbl>
#>  1     1  0.53805957  2000   0.5 -1.3676763  0.04949865 1.234838
#>  2     2 -0.08530312  2000   0.5 -1.4076316  0.02063800 1.525351
#>  3     3 -2.07159502  2000   0.5 -1.4066701 -0.21142794 1.142865
#>  4     4  0.50558133  2000   0.5 -0.9525499 -0.07389371 1.325387
#>  5     5 -0.53260805  2000   0.5 -1.6223410 -0.12686028 1.460296
#>  6     6 -0.58404397  2001   0.5 -1.3369523  0.24740165 1.662270
#>  7     7 -1.70230154  2001   0.5 -1.0900023  0.24664589 1.483820
#>  8     8 -1.93219789  2001   0.5 -1.1667882  0.12664362 1.487778
#>  9     9  1.13020785  2001   0.5 -1.2858419  0.04993576 1.298840
#> 10    10 -2.06319327  2001   0.5 -1.0138175  0.13137916 1.391382
#> # ... with 40 more rows


ppc_data1 <- ppc_intervals_data(y, yrep, x = year, prob = 0.5)
ppc_data2 <- ppc_intervals_data(y, yrep, x = year, prob = 0.8)
ppc_data3 <- ppc_intervals_data(y, yrep, x = year, prob = 0.9)

library(ggplot2)
theme_set(theme_grey())
ggplot(rbind(ppc_data1, ppc_data2, ppc_data3)) + 
  aes(x = x, y = mid, ymin = lo, ymax = hi) + 
  geom_linerange(aes(size = -prob)) + 
  guides(size = "none") + 
  theme_grey()

image

@jgabry
Copy link
Member

jgabry commented Aug 8, 2017

Good idea on including prob in the returned data frame. This PR looks ready to me and it passes R CMD CHECK so I'm going to go ahead and merge this.

@jgabry jgabry merged commit bbdacc8 into stan-dev:master Aug 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants