add ppc_intervals_data() and ppc_ribbon_data() #101

tjmahr · 2017-08-07T20:30:53Z

See issue #99.

I added functions ppc_intervals_data() and ppc_ribbon_data(). These return the data that would be plotted by the ppc_intervals family of plotters.

I refactored the internal .ppc_intervals_data() function. Before it stacked replication intervals and observed ys on top of each other and told them apart using a column is_y. That is:

> bayesplot:::.ppc_intervals_data(y, yrep, x = 1:50)
# A tibble: 100 x 5
# Groups:   x [?]
      x   is_y           lo          mid           hi
  <int> <fctr>        <dbl>        <dbl>        <dbl>
1     1   TRUE -1.576958028 -1.576958028 -1.576958028
2     1  FALSE -1.741401157  0.274514926  2.519152806
3     2   TRUE -0.375700240 -0.375700240 -0.375700240
4     2  FALSE -2.462380258 -0.204469907  1.976864727
5     3   TRUE -1.289742511 -1.289742511 -1.289742511
6     3  FALSE -2.623870953 -0.186001644  2.371562257
7     4   TRUE  0.008464543  0.008464543  0.008464543
8     4  FALSE -2.217317808  0.094052618  2.737946453
9     5   TRUE  0.873829347  0.873829347  0.873829347
10    5  FALSE -2.983087056  0.010274642  2.592691328
# ... with 90 more rows

But now, those observed values are stored in a column called y_obs:

> bayesplot:::ppc_intervals_data(y, yrep, x = 1:50)
# A tibble: 50 x 6
   y_id        y_obs     x        lo         mid       hi
  <int>        <dbl> <int>     <dbl>       <dbl>    <dbl>
1     1 -1.576958028     1 -1.741401  0.27451493 2.519153
2     2 -0.375700240     2 -2.462380 -0.20446991 1.976865
3     3 -1.289742511     3 -2.623871 -0.18600164 2.371562
4     4  0.008464543     4 -2.217318  0.09405262 2.737946
5     5  0.873829347     5 -2.983087  0.01027464 2.592691
6     6  0.925084871     6 -2.402087 -0.23957047 2.484187
7     7  0.056003860     7 -2.272656  0.10470365 2.227581
8     8  0.472664991     8 -2.251600  0.07247611 2.527019
9     9 -0.153056388     9 -2.526754  0.02610329 2.337524
10   10  0.215333434    10 -2.425294  0.24194109 2.644617
# ... with 40 more rows

This format makes creating the plots a little easier.

I also updated the unit tests to match this new output format.

I don't like relying on is.missing() when setting default values for missing function arguments. (It's much harder to test during interactive programming.) When I ran into this issue, I instead used the pattern...

f <- function(x = NULL) {
  if (is.null(x) {
    # deal with missing value
  }
}

I also imported rlang and started to use the new system of nonstandard evaluation for the tidyverse.

Visual tests

These are the plots generated by the examples in ?ppc_intervals:

library(bayesplot)
#> This is bayesplot version 1.3.0
#> Plotting theme set to bayesplot::theme_default()
y <- rnorm(50)
yrep <- matrix(rnorm(5000, 0, 2), ncol = 50)

color_scheme_set("brightblue")
ppc_ribbon(y, yrep)

ppc_intervals(y, yrep)

color_scheme_set("teal")
year <- 1950:1999
ppc_ribbon(y, yrep, x = year, alpha = 0, size = 0.75) + ggplot2::xlab("Year")

color_scheme_set("pink")
year <- rep(2000:2009, each = 5)
group <- gl(5, 1, length = 50, labels = LETTERS[1:5])
ppc_ribbon_grouped(y, yrep, x = year, group) +
 ggplot2::scale_x_continuous(breaks = pretty)

ppc_ribbon_grouped(
y, yrep, x = year, group,
facet_args = list(scales = "fixed"),
alpha = 1,
size = 2
) +
xaxis_text(FALSE) +
xaxis_ticks(FALSE) +
panel_bg(fill = "gray20")

ppc_data <- ppc_intervals_data(y, yrep, x = year, prob = 0.5)
ppc_data
#> # A tibble: 50 x 6
#>     y_id      y_obs     x         lo         mid        hi
#>    <int>      <dbl> <int>      <dbl>       <dbl>     <dbl>
#>  1     1  2.6106399  2000 -1.9121028 -0.30346588 1.2438269
#>  2     2  0.2924113  2000 -0.9975955  0.24825805 1.5434840
#>  3     3 -0.7725254  2000 -1.6513193 -0.11780907 1.2272995
#>  4     4  0.8634162  2000 -1.1818673  0.23238388 1.6475749
#>  5     5  1.1283432  2000 -1.6518806 -0.48487352 0.9016187
#>  6     6  2.1456789  2001 -1.3075440  0.41361380 1.1956589
#>  7     7  0.1427364  2001 -1.3490220 -0.26051485 0.8740965
#>  8     8 -0.8945021  2001 -1.0472378  0.23367006 1.5635030
#>  9     9 -0.4810363  2001 -0.8543759 -0.02456840 1.4952441
#> 10    10 -0.5175339  2001 -1.3181571  0.04460883 1.5440722
#> # ... with 40 more rows

ppc_group_data <- ppc_intervals_data(y, yrep, x = year, group, prob = 0.5)
ppc_group_data
#> # A tibble: 50 x 7
#>     y_id      y_obs  group     x         lo         mid        hi
#>    <int>      <dbl> <fctr> <int>      <dbl>       <dbl>     <dbl>
#>  1     1  2.6106399      A  2000 -1.9121028 -0.30346588 1.2438269
#>  2     2  0.2924113      B  2000 -0.9975955  0.24825805 1.5434840
#>  3     3 -0.7725254      C  2000 -1.6513193 -0.11780907 1.2272995
#>  4     4  0.8634162      D  2000 -1.1818673  0.23238388 1.6475749
#>  5     5  1.1283432      E  2000 -1.6518806 -0.48487352 0.9016187
#>  6     6  2.1456789      A  2001 -1.3075440  0.41361380 1.1956589
#>  7     7  0.1427364      B  2001 -1.3490220 -0.26051485 0.8740965
#>  8     8 -0.8945021      C  2001 -1.0472378  0.23367006 1.5635030
#>  9     9 -0.4810363      D  2001 -0.8543759 -0.02456840 1.4952441
#> 10    10 -0.5175339      E  2001 -1.3181571  0.04460883 1.5440722
#> # ... with 40 more rows

ppc_loo_intervals() build on top of the ppc_intervals() function, so here are some plots generated in the test-ppc-loo.R file. (I would use the example but it takes forever to run.) This check just confirms that everything looks okay.

… column.

jgabry · 2017-08-07T20:42:04Z

Awesome. I will take a look at the changes to the code shortly but what you describe sounds good.

On another note, for now don't worry about travis-ci errors as long as the tests pass locally for you. I've been having issues with travis lately (as described in #102)

jgabry

This is really great. Just one tiny change that I think needs to be made before I merge it:

Currently we have dplyr (>= 0.4.3) in Imports, but with your changes I think we need to bump that up to something like dplyr (>= 0.7.2) right?

jgabry · 2017-08-07T20:48:45Z

tests/testthat/test-ppc-intervals.R

@@ -1,7 +1,7 @@
 library(bayesplot)
 context("PPC: intervals & ribbon")

-source("data-for-ppc-tests.R")
+source(test_path("data-for-ppc-tests.R"))


I can't believe I hadn't seen the testthat::test_path function before! Much better than having to manually change the path when working locally with the file.

jgabry · 2017-08-07T20:49:19Z

R/ppc-loo.R

           ...,
           prob = 0.9,
           alpha = 0.33,
           size = 0.25) {
    check_ignored_arguments(...)
    y <- validate_y(y)
-    if (!missing(intervals)) {
+    if (!is.null(intervals)) {


Good call on switching from missing to is.null here and elsewhere

tjmahr · 2017-08-08T13:45:58Z

Now requiring dplyr 7.1.

Also ppc_intervals_data() now returns the probability mass / interval width to enable stacking of multiple intervals and quick plotting like this:

ppc_data <- ppc_intervals_data(y, yrep, x = year, prob = 0.5)
ppc_data
#> # A tibble: 50 x 7
#>     y_id       y_obs     x  prob         lo         mid       hi
#>    <int>       <dbl> <int> <dbl>      <dbl>       <dbl>    <dbl>
#>  1     1  0.53805957  2000   0.5 -1.3676763  0.04949865 1.234838
#>  2     2 -0.08530312  2000   0.5 -1.4076316  0.02063800 1.525351
#>  3     3 -2.07159502  2000   0.5 -1.4066701 -0.21142794 1.142865
#>  4     4  0.50558133  2000   0.5 -0.9525499 -0.07389371 1.325387
#>  5     5 -0.53260805  2000   0.5 -1.6223410 -0.12686028 1.460296
#>  6     6 -0.58404397  2001   0.5 -1.3369523  0.24740165 1.662270
#>  7     7 -1.70230154  2001   0.5 -1.0900023  0.24664589 1.483820
#>  8     8 -1.93219789  2001   0.5 -1.1667882  0.12664362 1.487778
#>  9     9  1.13020785  2001   0.5 -1.2858419  0.04993576 1.298840
#> 10    10 -2.06319327  2001   0.5 -1.0138175  0.13137916 1.391382
#> # ... with 40 more rows


ppc_data1 <- ppc_intervals_data(y, yrep, x = year, prob = 0.5)
ppc_data2 <- ppc_intervals_data(y, yrep, x = year, prob = 0.8)
ppc_data3 <- ppc_intervals_data(y, yrep, x = year, prob = 0.9)

library(ggplot2)
theme_set(theme_grey())
ggplot(rbind(ppc_data1, ppc_data2, ppc_data3)) + 
  aes(x = x, y = mid, ymin = lo, ymax = hi) + 
  geom_linerange(aes(size = -prob)) + 
  guides(size = "none") + 
  theme_grey()

jgabry · 2017-08-08T19:57:53Z

Good idea on including prob in the returned data frame. This PR looks ready to me and it passes R CMD CHECK so I'm going to go ahead and merge this.

tjmahr and others added 7 commits August 7, 2017 11:10

import rlang package

7235fa9

don't use is.missing() in validate_x(). use {}'s

3ee7391

refactor ppc_intervals_data() and plotting functions to remove is_y…

f43a65d

… column.

document and export ppc_intervals_data and ppc_ribbon_data

e457ebe

Merge branch 'master' into master

8f9a549

fix global variable warnings from R CMD check

a0bcab3

simplify standard evaluation code

1b1485d

jgabry requested changes Aug 7, 2017

View reviewed changes

jgabry added the feature label Aug 7, 2017

tjmahr added 2 commits August 8, 2017 08:07

require newer dplyr version

271a668

have ppc_intervals_data return prob value. improve tests.

972e290

jgabry approved these changes Aug 8, 2017

View reviewed changes

jgabry merged commit bbdacc8 into stan-dev:master Aug 8, 2017

jgabry mentioned this pull request Aug 8, 2017

provide ppc_intervals_data() and ppc_ribbon_data() #99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ppc_intervals_data() and ppc_ribbon_data() #101

add ppc_intervals_data() and ppc_ribbon_data() #101

tjmahr commented Aug 7, 2017 •

edited

Loading

jgabry commented Aug 7, 2017

jgabry left a comment

jgabry Aug 7, 2017

jgabry Aug 7, 2017

tjmahr commented Aug 8, 2017 •

edited

Loading

jgabry commented Aug 8, 2017 •

edited

Loading

add ppc_intervals_data() and ppc_ribbon_data() #101

add ppc_intervals_data() and ppc_ribbon_data() #101

Conversation

tjmahr commented Aug 7, 2017 • edited Loading

Visual tests

jgabry commented Aug 7, 2017

jgabry left a comment

Choose a reason for hiding this comment

jgabry Aug 7, 2017

Choose a reason for hiding this comment

jgabry Aug 7, 2017

Choose a reason for hiding this comment

tjmahr commented Aug 8, 2017 • edited Loading

jgabry commented Aug 8, 2017 • edited Loading

tjmahr commented Aug 7, 2017 •

edited

Loading

tjmahr commented Aug 8, 2017 •

edited

Loading

jgabry commented Aug 8, 2017 •

edited

Loading