Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppc_km_overlay(): PPC for (right-)censored data #234

Merged
merged 36 commits into from Oct 1, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
f37a86b
Created ppc_km_overlay(), a function for performing a PPC on right-ce…
fweber144 Aug 12, 2020
149b2f1
Added a check for package "survival" in ppc_km_overlay().
fweber144 Aug 12, 2020
d4433ff
DESCRIPTION file: Added packages "survival" and "ggfortify" to "Sugge…
fweber144 Aug 12, 2020
029f5e0
Added a description of ppc_km_overlay() to the roxygen2 documentation.
fweber144 Aug 12, 2020
0837f99
Minor improvement in the roxygen2 documentation of ppc_km_overlay().
fweber144 Aug 12, 2020
5f9acd7
Another minor improvement in the roxygen2 documentation of ppc_km_ove…
fweber144 Aug 12, 2020
552a37a
Updated the documentation using devtools::document().
fweber144 Aug 12, 2020
e4ed9e3
Another minor improvement in the roxygen2 documentation of ppc_km_ove…
fweber144 Aug 12, 2020
606cabc
Added unit tests for ppc_km_overlay().
fweber144 Aug 12, 2020
d9e5759
Fixed a bug in the unit tests for ppc_km_overlay() (had to add 'statu…
fweber144 Aug 12, 2020
70c0e0e
Fixed a bug in the unit tests for ppc_km_overlay() (had to name argum…
fweber144 Aug 12, 2020
c0116eb
Ran vdiffr::manage_cases() to validate the ppc_km_overlay() figures f…
fweber144 Aug 12, 2020
d518368
In ppc_km_overlay(): Use aes_() instead of aes() for two reasons:
fweber144 Aug 12, 2020
a02442e
In ppc_km_overlay(): Use '.data$' in the call to dplyr::mutate() to p…
fweber144 Aug 12, 2020
4d6ae9d
In ppc_km_overlay(): Indent the call to dplyr::mutate().
fweber144 Aug 12, 2020
ce5df55
Fix the NOTE "release_questions: missing arguments not allowed in cal…
fweber144 Aug 26, 2020
b9d53c5
Add argument "check_unq" to validate_group() telling it whether or no…
fweber144 Aug 26, 2020
52036c8
Use the new argument "check_unq" of validate_group() in ppc_km_overla…
fweber144 Aug 26, 2020
6bdc102
Re-ran roxygen2 to update the documentation.
fweber144 Aug 26, 2020
b9345af
remove check for unique group values
jgabry Sep 10, 2020
c7cb255
move suggested package checks to top
jgabry Sep 10, 2020
34a445e
In ppc_km_overlay(): Ensure that the observed data gets plotted last …
fweber144 Sep 18, 2020
72cea0e
Re-ran roxygen2 to update the documentation.
fweber144 Sep 18, 2020
5323a1d
Update the tests
fweber144 Sep 18, 2020
43fe549
Update the expected results for the visual ppc_km_overlay() tests.
fweber144 Sep 18, 2020
079617e
Update NEWS and add Frank Weber as contributor
jgabry Sep 28, 2020
7cd815d
add missing comma in Suggests
jgabry Sep 28, 2020
d50b7dd
Moved ppc_km_overlay() from ppc-distributions.R to ppc-censoring.R to…
fweber144 Sep 30, 2020
33d5432
Adopt the example for censored data.
fweber144 Sep 30, 2020
fe6eb2b
Add a reference for Kaplan and Meier (1958).
fweber144 Sep 30, 2020
3836da1
For Kaplan and Meier (1958), follow the citation style from "referenc…
fweber144 Sep 30, 2020
14bfa1a
Re-ran roxygen2 to update the documentation.
fweber144 Sep 30, 2020
fdd5d47
PPC-overview: Add a link to PPC-censoring.
fweber144 Oct 1, 2020
6e06cc7
PPC-overview: Improve the label for the link to page PPC-censoring.
fweber144 Oct 1, 2020
b639113
Re-ran roxygen2 to update the documentation.
fweber144 Oct 1, 2020
bbd7767
Add a note requesting suggestions of additional plots for censored data
jgabry Oct 1, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 4 additions & 1 deletion DESCRIPTION
Expand Up @@ -7,7 +7,8 @@ Authors@R: c(person("Jonah", "Gabry", role = c("aut", "cre"), email = "jsg2201@c
person("Tristan", "Mahr", role = "aut"),
person("Paul-Christian", "Bürkner", role = "ctb"),
person("Martin", "Modrák", role = "ctb"),
person("Malcolm", "Barrett", role = "ctb"))
person("Malcolm", "Barrett", role = "ctb"),
person("Frank", "Weber", role = "ctb"))
Maintainer: Jonah Gabry <jsg2201@columbia.edu>
Description: Plotting functions for posterior analysis, MCMC diagnostics,
prior and posterior predictive checks, and other visualizations
Expand Down Expand Up @@ -36,6 +37,7 @@ Imports:
tidyselect,
utils
Suggests:
ggfortify,
gridExtra (>= 2.2.1),
hexbin,
knitr (>= 1.16),
Expand All @@ -47,6 +49,7 @@ Suggests:
rstantools (>= 1.5.0),
scales,
shinystan (>= 2.3.0),
survival,
testthat (>= 2.0.0),
vdiffr
RoxygenNote: 7.1.1
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Expand Up @@ -117,6 +117,7 @@ export(ppc_hist)
export(ppc_intervals)
export(ppc_intervals_data)
export(ppc_intervals_grouped)
export(ppc_km_overlay)
export(ppc_loo_intervals)
export(ppc_loo_pit)
export(ppc_loo_pit_overlay)
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Expand Up @@ -8,13 +8,19 @@
* Items for next release go here
-->

* New plotting function `ppc_km_overlay()` for outcome variables that are
right-censored. Empirical CCDF estimates of `yrep` are compared with the
Kaplan-Meier estimate of `y`. (#233, #234, @fweber144)

* CmdStanMCMC objects (from CmdStanR) can now be used with extractor
functions `nuts_params()`, `log_posterior()`, `rhat()`, and
`neff_ratio()`. (#227)

* Added missing `facet_args` argument to `mcmc_rank_overlay()`. (#221, @hhau)

* Size of points and interval lines can set in
`mcmc_intervals(..., outer_size, inner_size, point_size)`. (#215, #228, #229)

* `mcmc_areas()` tries to use less blank vertical blank space. (#218, #230)


Expand Down
2 changes: 1 addition & 1 deletion R/bayesplot-package.R
Expand Up @@ -98,6 +98,6 @@ NULL
# release reminders (for devtools)
release_questions <- function() { # nocov start
c(
"Have you reduced the size of the vignettes for CRAN?",
"Have you reduced the size of the vignettes for CRAN?"
)
} # nocov end
7 changes: 2 additions & 5 deletions R/helpers-ppc.R
Expand Up @@ -70,7 +70,8 @@ validate_yrep <- function(yrep, y) {
#' Checks that grouping variable has same length as `y` and is either a vector or
#' factor variable.
#'
#' @param group,y The user's `group` object and the `y` object returned by `validate_y()`.
#' @param group,y The user's `group` object and the `y` object returned by
#' `validate_y()`.
#' @return Either throws an error or returns `group` (coerced to a factor).
#' @noRd
validate_group <- function(group, y) {
Expand All @@ -88,10 +89,6 @@ validate_group <- function(group, y) {
abort("length(group) must be equal to length(y).")
}

if (length(unique(group)) == 1) {
abort("'group' must have more than one unique value.")
}

unname(group)
}

Expand Down
136 changes: 136 additions & 0 deletions R/ppc-censoring.R
@@ -0,0 +1,136 @@
#' PPC censoring
#'
#' @description Compare the empirical distribution of censored data `y` to the
#' distributions of simulated/replicated data `yrep` from the posterior
#' predictive distribution. See the **Plot Descriptions** section, below, for
#' details.
#'
#' Although some of the other plots can be used with censored data,
#' `ppc_km_overlay()` is currently the only plotting function designed
#' *specifically* for censored data. We encourage you to suggest or contribute
#' additional plots at [https://github.com/stan-dev/bayesplot](github.com/stan-dev/bayesplot).
#'
#'
#'
#' @name PPC-censoring
#' @family PPCs
#'
#' @template args-y-yrep
#' @param size,alpha Passed to the appropriate geom to control the appearance of
#' the `yrep` distributions.
#' @param ... Currently unused.
#'
#' @template return-ggplot
#'
#' @section Plot Descriptions:
#' \describe{
#' \item{`ppc_km_overlay()`}{
#' Empirical CCDF estimates of each dataset (row) in `yrep` are overlaid,
#' with the Kaplan-Meier estimate (Kaplan and Meier, 1958) for `y` itself
#' on top (and in a darker shade). This is a PPC suitable for
#' right-censored `y`. Note that the replicated data from `yrep` is assumed
#' to be uncensored.
#' }
#' }
#'
#' @templateVar bdaRef (Ch. 6)
#' @template reference-bda
#' @template reference-km
#'
#' @examples
#' color_scheme_set("brightblue")
#' y <- example_y_data()
#' # For illustrative purposes, (right-)censor values y > 110:
#' status_y <- as.numeric(y <= 110)
#' y <- pmin(y, 110)
#' # In reality, the replicated data (yrep) would be obtained from a
#' # model which takes the censoring of y properly into account. Here,
#' # for illustrative purposes, we simply use example_yrep_draws():
#' yrep <- example_yrep_draws()
#' dim(yrep)
#' \donttest{
#' ppc_km_overlay(y, yrep[1:25, ], status_y = status_y)
#' }
NULL

#' @export
#' @rdname PPC-censoring
#' @param status_y The status indicator for the observations from `y`. This must
#' be a numeric vector of the same length as `y` with values in \{0, 1\} (0 =
#' right censored, 1 = event).
ppc_km_overlay <-
function(y,
yrep,
...,
status_y,
size = 0.25,
alpha = 0.7) {
check_ignored_arguments(...)

if(!requireNamespace("survival", quietly = TRUE)){
abort("Package 'survival' required.")
}
if(!requireNamespace("ggfortify", quietly = TRUE)){
abort("Package 'ggfortify' required.")
}

# Checks for 'status_y':
stopifnot(is.numeric(status_y))
stopifnot(all(status_y %in% c(0, 1)))

# Create basic PPC dataset:
data <- ppc_data(y, yrep, group = status_y)

# Modify the status indicator:
# * For the observed data ("y"), convert the status indicator back to
# a numeric.
# * For the replicated data ("yrep"), set the status indicator
# to 1 ("event"). This way, the Kaplan-Meier estimator reduces
# to "1 - ECDF" with ECDF denoting the ordinary empirical cumulative
# distribution function.
data <- data %>%
dplyr::mutate(group = ifelse(.data$is_y,
as.numeric(as.character(.data$group)),
1))

# Create 'survfit' object and 'fortify' it
sf <- survival::survfit(
survival::Surv(value, group) ~ rep_label,
data = data
)
fsf <- fortify(sf)

# Add variables specifying color, size, and alpha:
fsf$is_y_color <- as.factor(sub("\\[rep\\] \\(.*$", "rep", sub("^italic\\(y\\)", "y", fsf$strata)))
fsf$is_y_size <- ifelse(fsf$is_y_color == "yrep", size, 1)
fsf$is_y_alpha <- ifelse(fsf$is_y_color == "yrep", alpha, 1)

# Ensure that the observed data gets plotted last by reordering the
# levels of the factor "strata":
fsf$strata <- factor(fsf$strata, levels = rev(levels(fsf$strata)))

# Plot:
ggplot(data = fsf,
mapping = aes_(x = ~ time,
y = ~ surv,
color = ~ is_y_color,
group = ~ strata,
size = ~ is_y_size,
alpha = ~ is_y_alpha)) +
geom_step() +
hline_at(
c(0, 0.5, 1),
size = c(0.2, 0.1, 0.2),
linetype = 2,
color = get_color("dh")
) +
scale_size_identity() +
scale_alpha_identity() +
scale_color_ppc_dist() +
scale_y_continuous(breaks = c(0, 0.5, 1)) +
xlab(y_label()) +
yaxis_title(FALSE) +
xaxis_title(FALSE) +
yaxis_ticks(FALSE) +
bayesplot_theme_get()
}
3 changes: 3 additions & 0 deletions R/ppc-overview.R
Expand Up @@ -87,6 +87,9 @@
#' multinomial outcomes.
#' * [LOO predictive checks][PPC-loo]: PPC functions for predictive checks
#' based on (approximate) leave-one-out (LOO) cross-validation.
#' * [Censored data][PPC-censoring]: PPC functions comparing the empirical
#' distribution of censored data `y` to the distributions of individual
#' simulated datasets (rows) in `yrep`.
#'
#' @section Providing an interface for predictive checking from another package:
#'
Expand Down
4 changes: 4 additions & 0 deletions man-roxygen/reference-km.R
@@ -0,0 +1,4 @@
#' @references Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation
#' from incomplete observations.
#' *Journal of the American Statistical Association*. 53(282), 457--481.
#' doi:10.1080/01621459.1958.10501452.
92 changes: 92 additions & 0 deletions man/PPC-censoring.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/PPC-discrete.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/PPC-distributions.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/PPC-errors.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/PPC-intervals.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/PPC-loo.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions man/PPC-overview.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/PPC-scatterplots.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.