Skip to content

Commit

Permalink
Merge pull request #127 from MarcinKosinski/log-comp
Browse files Browse the repository at this point in the history
use survMisc::comp to enable using weighted Log-rank tests - fix #17
  • Loading branch information
kassambara committed Jan 29, 2017
2 parents e336ac8 + a0fb34b commit 49c299e
Show file tree
Hide file tree
Showing 6 changed files with 242 additions and 26 deletions.
5 changes: 3 additions & 2 deletions DESCRIPTION
Expand Up @@ -26,12 +26,13 @@ Imports:
survival,
stats,
broom,
tidyr
tidyr,
survMisc
Suggests:
knitr,
RTCGA(>= 1.4.0),
RTCGA.clinical(>= 20151101.4.0),
survMisc
KMsurv
VignetteBuilder: knitr
URL: http://www.sthda.com/english/rpkgs/survminer/
BugReports: https://github.com/kassambara/survminer/issues
Expand Down
2 changes: 2 additions & 0 deletions NAMESPACE
Expand Up @@ -27,4 +27,6 @@ importFrom(stats,pchisq)
importFrom(stats,predict)
importFrom(stats,resid)
importFrom(stats,residuals)
importFrom(survMisc,comp)
importFrom(survMisc,ten)
importFrom(survival,coxph)
8 changes: 6 additions & 2 deletions NEWS.md
Expand Up @@ -2,6 +2,7 @@

## New features

- New possibilities to compare survival curves with `pval.method` and `log.rank.weights` parameters in `ggsurvplot()`, by specifing weights in Log-rank test. Functionality based on `survMisc::comp`.
- New function `ggforest()` added for drawing forest plot for the Cox model.
- New argument `sline` in the `ggcoxdiagnostics()` function for adding loess smoothed trend on the residual plots. This will make it easier to spot some problems with residuals (like quadratic relation). ([@pbiecek, #119](https://github.com/kassambara/survminer/issues/119)).
- New function `pairwise_survdiff()` for multiple comparisons of survival Curves ([#97](https://github.com/kassambara/survminer/issues/97)).
Expand All @@ -14,7 +15,6 @@
- It is possible to specify subtitle (param `risk.table.subtitle`) and caption (param `risk.table.caption`) for the `table` part of the `ggsurvplot`.
- It is possible to specify title (param `ncens.plot.title`), subtitle (param `ncens.plot.subtitle`) and caption (param `ncens.plot.caption`) for the `ncens` part of the `ggsurvplot`.
- ncens.plot part of the `ggsurvplot` has no integer values on y axis (not real as before), as it only can take integer values.
- A new vignette and a `ggsurvplot` example was added to present new functionalities of possible texts and fonts customizations.
- README was extended with `uber platinium customization` example.

- The design of `ggcoxfunctional()` has been changed to be consistent with the other functions in the survminer package. Now, `ggcoxfunctional()` works with coxph objects not formulas. The arguments formula and data are now deprecated ([@pbiecek, #115](https://github.com/kassambara/survminer/issues/115)).
Expand All @@ -26,7 +26,7 @@

- The R package `maxstat` doesn't support very well an object of class `tbl_df`. To fix this issue, now, in the `surv_cutpoint()` function, the input data is systematically transformed into a standard data.frame format ([@MarcinKosinski, #104](https://github.com/kassambara/survminer/issues/104)).

- It's now possible to print the output the survminer packages in a powerpoint created with the ReporteRs package. Thanks to ([@abossenbroek, #110](https://github.com/kassambara/survminer/issues/110)). You should use the argument *newpage = FALSE* in the `print()` function when printing the output in the powerpoint. For instance:
- It's now possible to print the output the survminer packages in a powerpoint created with the ReporteRs package. Thanks to ([@abossenbroek, #110](https://github.com/kassambara/survminer/issues/110)) and ([@zzawadz, #111](https://github.com/kassambara/survminer/issues/110)). You should use the argument *newpage = FALSE* in the `print()` function when printing the output in the powerpoint. For instance:


```r
Expand Down Expand Up @@ -62,6 +62,10 @@ writeDoc(doc, "test.pptx")

- Now, `ggsurvplot()` works properly in the situation where `strata()` is included in the cox formula ([#109](https://github.com/kassambara/survminer/issues/109)).

## Vignettes and examples

- A new vignette and a `ggsurvplot` example was added to present new functionalities of possible texts and fonts customizations.
- A new vignette and a `ggsurvplot` example was added to present new functionalities of possible weights specification in a Log-rank test.

# survminer 0.2.4
Expand Down
48 changes: 42 additions & 6 deletions R/ggsurvplot.R
@@ -1,6 +1,8 @@
#' @include utilities.R theme_classic2.R surv_summary.R
#' @importFrom methods is
#' @importFrom stats pchisq
#' @importFrom survMisc ten comp
#' @importFrom utils capture.output
NULL
#'Drawing Survival Curves Using ggplot2
#'@description Drawing survival curves using ggplot2
Expand Down Expand Up @@ -85,6 +87,13 @@
#' \code{\link[ggplot2]{theme}}.
#'@param ... other arguments to be passed to ggplot2 geom_*() functions such as
#' linetype, size, ...
#'@param log.rank.weights The name for the type of weights to be used in computing the p-value for log-rank test.
#'By default \code{survdiff} is used to calculate regular log-rank test (with weights == 1). A user can specify
#'\code{"1", "n", "sqrtN", "S1", "S2", "FH"} to use weights specified in \link[survMisc]{comp}, so that weight correspond to the test as
#': 1 - log-rank, n - Gehan-Breslow (generalized Wilcoxon), sqrtN - Tarone-Ware, S1 - Peto-Peto's modified survival estimate, S2 - modified Peto-Peto (by Andersen), FH - Fleming-Harrington(p=1, q=1).
#'@param pval.method whether to add a text with the test name used for calculating the pvalue, that corresponds to survival curves' comparison - used only when \code{pval=TRUE}
#'@param pval.method.size the same as \code{pval.size} but for displaying \code{log.rank.weights} name
#'@param pval.method.coord the same as \code{pval.coord} but for displaying \code{log.rank.weights} name
#'@details \strong{legend position}: The argument \strong{legend} can be also a
#' numeric vector c(x,y). In this case it is possible to position the legend
#' inside the plotting area. x and y are the coordinates of the legend box.
Expand Down Expand Up @@ -274,6 +283,8 @@ ggsurvplot <- function(fit, fun = NULL,
conf.int = FALSE, conf.int.fill = "gray", conf.int.style = "ribbon",
censor = TRUE,
pval = FALSE, pval.size = 5, pval.coord = c(NULL, NULL),
pval.method = FALSE, pval.method.size = pval.size, pval.method.coord = c(NULL, NULL),
log.rank.weights = c("survdiff", "1", "n", "sqrtN", "S1", "S2", "FH_p=1_q=1"),
main = NULL, submain = NULL, caption = NULL, xlab = "Time", ylab = "Survival probability",
font.main = c(16, "plain", "black"), font.submain = c(15, "plain", "black"),
font.caption = c(15, "plain", "black"),
Expand Down Expand Up @@ -301,6 +312,8 @@ ggsurvplot <- function(fit, fun = NULL,
if(is.null(ylim) & is.null(fun)) ylim <- c(0, 1)
if(!is(legend, "numeric")) legend <- match.arg(legend)
surv.median.line <- match.arg(surv.median.line)
stopifnot(log.rank.weights %in% c("survdiff", "1", "n", "sqrtN", "S1", "S2","FH_p=1_q=1"))
log.rank.weights <- match.arg(log.rank.weights)
# Adapt ylab value according to the value of the argument fun
ylab <- .check_ylab(ylab, fun)
# Check and get linetypes
Expand Down Expand Up @@ -417,14 +430,21 @@ ggsurvplot <- function(fit, fun = NULL,

# Add pvalue
if(pval & !is.null(fit$strata)){
pval <- .get_pvalue(fit)
pvaltxt <- ifelse(pval < 1e-04, "p < 0.0001",
paste("p =", signif(pval, 2)))
pval <- .get_pvalue(fit, method = log.rank.weights)
pvaltxt <- ifelse(pval$val < 1e-04, "p < 0.0001",
paste("p =", signif(pval$val, 2)))

pval.x <- ifelse(is.null(pval.coord[1]), 0.1*max(fit$time), pval.coord[1])
pval.y <- ifelse(is.null(pval.coord[2]), 0.2, pval.coord[2])
p <- p + ggplot2::annotate("text", x = pval.x, y = pval.y,
label = pvaltxt, size = pval.size)
if(pval.method){
pvalmethod <- pval$method
pval.method.x <- ifelse(is.null(pval.method.coord[1]), 0.1*max(fit$time), pval.method.coord[1])
pval.method.y <- ifelse(is.null(pval.method.coord[2]), 0.3, pval.method.coord[2])
p <- p + ggplot2::annotate("text", x = pval.method.x, y = pval.method.y,
label = pvalmethod, size = pval.method.size)
}
}

# Drawing a horizontal line at 50% survival
Expand Down Expand Up @@ -646,17 +666,33 @@ p <- p + theme(legend.key.height = NULL, legend.key.width = NULL,


# get survdiff pvalue
.get_pvalue <- function(fit){
.get_pvalue <- function(fit, method){
# One group
if(length(levels(summary(fit)$strata)) == 0) return(NULL)
if(length(levels(summary(fit)$strata)) == 0) return(list(val = NULL, method = NULL))

if(method == "survdiff") {
ssubset <- fit$call$subset
if(is.null(ssubset))
sdiff <- survival::survdiff(eval(fit$call$formula), data = eval(fit$call$data))
else
sdiff <- survival::survdiff(eval(fit$call$formula), data = eval(fit$call$data),
subset = eval(fit$call$subset))
pvalue <- stats::pchisq(sdiff$chisq, length(sdiff$n) - 1, lower.tail = FALSE)
return (pvalue)
return(list(val = pvalue, method = "Log-rank (survdiff)"))
} else {
tenfit <- ten(eval(fit$call$formula), data = eval(fit$call$data))
capture.output(comp(tenfit)) -> null_dev
# comp modifies tenfit object (ten class: ?survMisc::ten)
# and adds attributes with tests
attributes(tenfit)$lrt -> tests
# check str(tests) -> W:weights / pNorm:p-values
pvalue <- round(tests$pNorm[tests$W == method], 4)
test_name <- c("Log-rank (comp)", "Gehan-Breslow (generalized Wilcoxon)",
"Tarone-Ware", "Peto-Peto's modified survival estimate",
"modified Peto-Peto (by Andersen)", "Fleming-Harrington (p=1, q=1)")
# taken from ?survMisc::comp
return(list(val = pvalue, method = test_name[tests$W == method]))
}
}

# Draw risk table
Expand Down
46 changes: 30 additions & 16 deletions man/ggsurvplot.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

159 changes: 159 additions & 0 deletions vignettes/Specifiyng_weights_in_log-ran_comparisons.Rmd
@@ -0,0 +1,159 @@
---
title: Specifiyng weights in Log-rank comparisons
author: Marcin Kosinski
date: "29-01-2017"
output:
html_document:
mathjax: default
fig_caption: true
toc: true
section_numbering: true
css: ggsci.css
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Specifiyng weights in Log-rank comparisons}
---
```{r include = FALSE}
library(knitr)
opts_chunk$set(
comment = "",
fig.width = 12,
message = FALSE,
warning = FALSE,
tidy.opts = list(
keep.blank.line = TRUE,
width.cutoff = 150
),
options(width = 150),
eval = TRUE
)
```

```{r}
library("survminer")
```


> This vignette covers changes between versions 0.2.4 and 0.2.5 for specifiyng weights in the log-rank comparisons done in `ggsurvplot()`.
# Log-rank statistic for 2 groups


As it is stated in the literature, the Log-rank test for comparing survival (estimates of survival curves) in 2 groups ($A$ and $B$) is based on the below statistic

$$LR = \frac{U^2}{V} \sim \chi(1),$$

where $$U = \sum_{i=1}^{T}w_{t_i}(o_{t_i}^A-e_{t_i}^A), \ \ \ \ \ \ \ \ V = Var(U) = \sum_{i=1}^{T}(w_{t_i}^2\frac{n_{t_i}^An_{t_i}^Bd_i(n_{t_i}-o_{t_i})}{n_{t_i}^2(n_{t_i}-1)})$$
and

- $t_i$ for $i=1, \dots, T$ are possible event times,
- $n_{t_i}$ is the overall risk set size on the time $t_i$ ($n_{t_i} = n_{t_i}^A+n_{t_i}^B$),
- $n_{t_i}^A$ is the risk set size on the time $t_i$ in group $A$,
- $n_{t_i}^B$ is the risk set size on the time $t_i$ in group $B$,
- $o_{t_i}$ overall observed events in the time $t_i$ ($o_{t_i} = o_{t_i}^A+o_{t_i}^B$),
- $o_{t_i}^A$ observed events in the time $t_i$ in group $A$,
- $o_{t_i}^B$ observed events in the time $t_i$ in group $B$,
- $e_{t_i}$ number of overall expected events in the time $t_i$ ($e_{t_i} = e_{t_i}^A+e_{t_i}^B$),
- $e_{t_i}^A$ number of expected events in the time $t_i$ in group $A$,
- $e_{t_i}^B$ number of expected events in the time $t_i$ in group $B$,
- $w_{t_i}$ is a weight for the statistic,

also remember about few notes

$$e_{t_i}^A = n_{t_i}^A \frac{o_{t_i}}{n_{t_i}}, \ \ \ \ \ \ \ \ \ \ e_{t_i}^B = n_{t_i}^B \frac{o_{t_i}}{n_{t_i}},$$
$$e_{t_i}^A + e_{t_i}^B = o_{t_i}^A + o_{t_i}^B$$

that's why we can substitute group $A$ with $B$ in $U$ and receive same results.

# Weighted Log-rank extensions

Regular Log-rank comparison uses $w_{t_i} = 1$ but many modifications to that approach have been proposed. The most popular modifications, called weighted Log-rank tests, are available in `?survMisc::comp`

- `n` Gehan and Breslow proposed to use $w_{t_i} = n_{t_i}$ (this is also called generalized Wilcoxon),
- `srqtN` Tharone and Ware proposed to use $w_{t_i} = \sqrt{n_{t_i}}$,
- `S1` Peto-Peto's modified survival estimate $w_{t_i} = S1({t_i}) = \prod_{i=1}^{T}(\frac{1-e_{t_i}}{n_{t_i}+1})$,
- `S2` modified Peto-Peto (by Andersen) $w_{t_i} = S2({t_i}) = \frac{S1({t_i})n_{t_i}}{n_{t_i}+1}$,
- `FH` Fleming-Harrington $w_{t_i} = S(t_i)^p(1 - S(t_i))^q$.

> Watch out for `FH` as I [submitted an info on survMisc repository](https://github.com/dardisco/survMisc/issues/15) where I think their mathematical notation is misleading for Fleming-Harrington.
## Why are they useful?

The regular Log-rank test is sensitive to detect differences in late survival times, where Gehan-Breslow and Tharone-Ware propositions might be used if one is interested in early differences in survival times. Peto-Peto modifications are also useful in early differences and are more robust (than Tharone-Whare or Gehan-Breslow) for situations where many observations are censored. The most flexible is Fleming-Harrington method for weights, where high `p` indicates detecting early differences and high `q` indicates detecting differences in late survival times. But there is always an issue on how to detect `p` and `q`.

> Remember that test selection should be performed at the research design level! Not after looking in the dataset.
# Plots

```{r}
library("survival")
data("kidney", package="KMsurv")
fit <- survfit(Surv(time=time, event=delta) ~ type, data=kidney)
```

After preparing a functionality for this GitHub's issue [Other tests than log-rank for testing survival curves and Log-rank test for trend](https://github.com/kassambara/survminer/issues/17) we are now able to compute p-values for various Log-rank test in survminer package. Let as see below examples on executing all possible tests.

## Log-rank (survdiff)
```{r}
ggsurvplot(fit, pval = TRUE, pval.method = TRUE)
```

## Log-rank (comp)

```{r}
ggsurvplot(fit, pval = TRUE, pval.method = TRUE,
log.rank.weights = "1")
```

## Gehan-Breslow (generalized Wilcoxon)

```{r}
ggsurvplot(fit, pval = TRUE, pval.method = TRUE,
log.rank.weights = "n", pval.method.coord = c(200, 0.1),
pval.method.size = 3)
```

## Tharone-Ware

```{r}
ggsurvplot(fit, pval = TRUE, pval.method = TRUE,
log.rank.weights = "sqrtN", pval.method.coord = c(100, 0.1),
pval.method.size = 4)
```

## Peto-Peto's modified survival estimate

```{r}
ggsurvplot(fit, pval = TRUE, pval.method = TRUE,
log.rank.weights = "S1", pval.method.coord = c(200, 0.1),
pval.method.size = 3)
```

## modified Peto-Peto's (by Andersen)

```{r}
ggsurvplot(fit, pval = TRUE, pval.method = TRUE,
log.rank.weights = "S2", pval.method.coord = c(200, 0.1),
pval.method.size = 3)
```

## Fleming-Harrington (p=1, q=1)

```{r}
ggsurvplot(fit, pval = TRUE, pval.method = TRUE,
log.rank.weights = "FH_p=1_q=1",
pval.method.coord = c(200, 0.1), pval.method.size = 4)
```


# References

- Gehan A. A Generalized Wilcoxon Test for Comparing Arbitrarily Singly-Censored Samples. Biometrika 1965 Jun. 52(1/2):203–23. [JSTOR](www.jstor.org/stable/2333825)

- Tarone RE, Ware J 1977 On Distribution-Free Tests for Equality of Survival Distributions. Biometrika;64(1):156–60. [JSTOR](http://www.jstor.org/stable/2335790)

- Peto R, Peto J 1972 Asymptotically Efficient Rank Invariant Test Procedures. J Royal Statistical Society 135(2):186–207. [JSTOR](http://www.jstor.org/stable/2344317)

- Fleming TR, Harrington DP, O'Sullivan M 1987 Supremum Versions of the Log-Rank and Generalized Wilcoxon Statistics. J American Statistical Association 82(397):312–20. [JSTOR](http://www.jstor.org/stable/2289169)

- Billingsly P 1999 Convergence of Probability Measures. New York: John Wiley & Sons. [Wiley (paywall)](dx.doi.org/10.1002/9780470316962)

0 comments on commit 49c299e

Please sign in to comment.