New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post-hoc analysis for Logrank-test: Multiple comparisons of survival curves #97

Closed
kassambara opened this Issue Dec 15, 2016 · 26 comments

Comments

Projects
None yet
6 participants
@kassambara
Owner

kassambara commented Dec 15, 2016

In this paper - the molecular classification of multiple myeloma, the authors classify multiple myeloma patients into 7 groups based on gene expression profiling. The survival curves of these 7 groups are shown in the figure here-after (Panel A). In panel B, groups with similar survival curves have ben merged into high and low risk groups.

In a such situation, one would like to perform a pairewise comparison between groups in order to decide the groups to be merged. Groups might be merged, if the long-rank test fails to be significant.

The function survdiff()[in survival package] returns a global p-value whether to reject or not the null hypothesis. It returns a pooled p-value even when we have a strata including multiple groups (more than 2). With this, you know that a difference exists between groups, but you don't know where. You can't know until you test each combination.

The survival package does not provide any function to perform pairwise comparison between groups.

An implementation of an easy-to-use function - pairewise_survdiff() or pairewise_logrank() - might be useful for multiple log-rank tests between pairs of groups . This function should return an adjusted p-value for multiple testing correction.

molecular-classification-of-multiple

@MarcinKosinski

This comment has been minimized.

Contributor

MarcinKosinski commented Jan 8, 2017

What if we have

  • p-value for post hoc for: group A vs group B - result insignificant
  • p-value for post hoc for: group B vs group C - result insignificant
  • p-value for post hoc for: group A vs group C - result significant

which groups will you merge then :) ?

@kassambara

This comment has been minimized.

Owner

kassambara commented Jan 8, 2017

A very good question!!

I don't really have a definitive answer. In a such difficult case, an additional external information about the groups might be required in order to help the analyst to decide which groups to merge.

I think that the primary goal of the pairwise comparison should be only descriptive, investigating where the difference is, when the global test is significant.

If the analyst wants to merge some groups, then the adjusted p-values might help in some cases:

For example,

if:
A vs B, adj.pval = 0.1
B vs C, adj.pval = 0.8
A vs C, adj.pval <= 0.05

Then:
it's tempting to merge B & C, leading to a classification with 2 groups (Low vs High risk).

@MarcinKosinski

This comment has been minimized.

Contributor

MarcinKosinski commented Jan 8, 2017

Sounds reasonable :)

kassambara added a commit that referenced this issue Jan 28, 2017

@kassambara

This comment has been minimized.

Owner

kassambara commented Jan 28, 2017

New function pairwise_survdiff() added.

  • Survival curves with global p-value
library(survival)
library(survminer)
# Survival curves with global p-value
data(myeloma)
fit <- survfit(Surv(time, event) ~ molecular_group, data = myeloma)
ggsurvplot(fit, legend.labs = levels(myeloma$molecular_group),
           pval = TRUE)

rplot03

  • Pairwise survdiff
# Pairwise survdiff
res <- pairwise_survdiff(Surv(time, event) ~ molecular_group,
     data = myeloma)
res
	Pairwise comparisons using Log-Rank test 

data:  myeloma and molecular_group 

                 Cyclin D-1 Cyclin D-2 Hyperdiploid Low bone disease MAF   MMSET
Cyclin D-2       0.723      -          -            -                -     -    
Hyperdiploid     0.328      0.103      -            -                -     -    
Low bone disease 0.644      0.447      0.723        -                -     -    
MAF              0.943      0.723      0.103        0.523            -     -    
MMSET            0.103      0.038      0.527        0.485            0.038 -    
Proliferation    0.723      0.988      0.103        0.485            0.644 0.062

P value adjustment method: BH 
  • Symbolic number coding
# Symbolic number coding
symnum(res$p.value, cutpoints = c(0, 0.0001, 0.001, 0.01, 0.05, 0.1, 1),
   symbols = c("****", "***", "**", "*", "+", " "),
   abbr.colnames = FALSE, na = "")
                 Cyclin D-1 Cyclin D-2 Hyperdiploid Low bone disease MAF MMSET
Cyclin D-2                                                                    
Hyperdiploid                                                                  
Low bone disease                                                              
MAF                                                                           
MMSET                       *                                        *        
Proliferation                                                            +    
attr(,"legend")
[1] 0 ‘****’ 1e-04 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘+’ 0.1 ‘ ’ 1 \t    ## NA: ‘’

kassambara added a commit that referenced this issue Jan 28, 2017

@MarcinKosinski

This comment has been minimized.

Contributor

MarcinKosinski commented Jan 28, 2017

Outstanding!

I am working locally on extending:
#17
but can't work on this today and probably will smash that tomorrow

@kassambara

This comment has been minimized.

Owner

kassambara commented Jan 28, 2017

Ok, that would be great!

Have a nice weekend:-)!

@kassambara

This comment has been minimized.

Owner

kassambara commented Jan 28, 2017

I think that we can close this issue

@kassambara kassambara closed this Jan 28, 2017

@moripe

This comment has been minimized.

moripe commented Jun 5, 2017

Hi! First I´m sorry for my bad English
I´m using the function pairwise_survdiff because I want to compare survival curves.
When I have installed the package from repository Cran R project and from Github too, in both cases I get the following error when I load the library in RStudio:

Error in get(method, envir = home) :
lazy-load database 'C:/Users/drimay_Monica/Documents/R/win-library/3.3/survminer/R/survminer.rdb' is corrupt
In addition: Warning messages:
1: In .registerS3method(fin[i, 1], fin[i, 2], fin[i, 3], fin[i, 4], :
restarting interrupted promise evaluation
2: In get(method, envir = home) :
restarting interrupted promise evaluation
3: In get(method, envir = home) : internal error -3 in R_decompress1
Error: package or namespace load failed for ‘survminer’

Although I got that mistake, I run the function and I get the following error:

Error in paste(deparse(substitute(data)), "and", .collapse(group_var, :
could not find function ".collapse"

Could you help me please?

Thanks!

@kassambara

This comment has been minimized.

Owner

kassambara commented Jun 5, 2017

Hi,

This error is not reproducible on my computer.

Suggestion:

  • Re-start your R session
  • Re-install the developmental version of survminer from github
  • Load survminer and Try pairwise_survdiff() again.

If it still fails, then send to me the output of devtools::session_info().

Thanks

@moripe

This comment has been minimized.

moripe commented Jun 5, 2017

I re-start my R session and Re-install the developmental version of survminer from github but I get the same error

the output of devtools::session_info().

devtools::session_info()
Session info ----------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, mingw32
ui RStudio (1.0.136)
language (EN)
collate Spanish_Spain.1252
tz Europe/Paris
date 2017-06-05

Packages --------------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.1 2013-12-06 CRAN (R 3.3.2)
base * 3.3.2 2016-10-31 local
broom 0.4.2 2017-02-13 CRAN (R 3.3.3)
cmprsk 2.2-7 2014-06-17 CRAN (R 3.3.3)
colorspace 1.3-2 2016-12-14 CRAN (R 3.3.2)
curl 2.3 2016-11-24 CRAN (R 3.3.2)
data.table 1.10.0 2016-12-03 CRAN (R 3.3.2)
datasets * 3.3.2 2016-10-31 local
DBI 0.5-1 2016-09-10 CRAN (R 3.3.2)
devtools * 1.13.1 2017-05-13 CRAN (R 3.3.3)
digest 0.6.10 2016-08-02 CRAN (R 3.3.2)
dplyr 0.5.0 2016-06-24 CRAN (R 3.3.2)
foreign 0.8-67 2016-09-13 CRAN (R 3.3.2)
ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.3.3)
ggpubr * 0.1.3 2017-06-05 Github (kassambara/ggpubr@62ae7d2)
git2r 0.18.0 2017-01-01 CRAN (R 3.3.3)
graphics * 3.3.2 2016-10-31 local
grDevices * 3.3.2 2016-10-31 local
grid 3.3.2 2016-10-31 local
gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.2)
gtable 0.2.0 2016-02-26 CRAN (R 3.3.2)
httr 1.2.1 2016-07-03 CRAN (R 3.3.3)
km.ci 0.5-2 2009-08-30 CRAN (R 3.3.3)
KMsurv 0.1-5 2012-12-03 CRAN (R 3.3.2)
knitr 1.15.1 2016-11-22 CRAN (R 3.3.2)
lattice 0.20-34 2016-09-06 CRAN (R 3.3.2)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.3.2)
magrittr * 1.5 2014-11-22 CRAN (R 3.3.2)
Matrix 1.2-7.1 2016-09-01 CRAN (R 3.3.2)
memoise 1.0.0 2016-01-29 CRAN (R 3.3.2)
methods * 3.3.2 2016-10-31 local
mnormt 1.5-5 2016-10-15 CRAN (R 3.3.2)
munsell 0.4.3 2016-02-13 CRAN (R 3.3.2)
nlme 3.1-128 2016-05-10 CRAN (R 3.3.2)
parallel 3.3.2 2016-10-31 local
plyr 1.8.4 2016-06-08 CRAN (R 3.3.2)
psych 1.6.12 2017-01-08 CRAN (R 3.3.2)
purrr 0.2.2.2 2017-05-11 CRAN (R 3.3.3)
R6 2.2.0 2016-10-05 CRAN (R 3.3.2)
Rcpp 0.12.8 2016-11-17 CRAN (R 3.3.2)
reshape2 1.4.2 2016-10-22 CRAN (R 3.3.2)
rlang 0.1.1 2017-05-18 CRAN (R 3.3.3)
scales 0.4.1 2016-11-09 CRAN (R 3.3.2)
splines 3.3.2 2016-10-31 local
stats * 3.3.2 2016-10-31 local
stringi 1.1.2 2016-10-01 CRAN (R 3.3.2)
stringr 1.2.0 2017-02-18 CRAN (R 3.3.3)
survival 2.40-1 2016-10-30 CRAN (R 3.3.2)
survminer * 0.3.1.999 2017-06-05 Github (8511dee)
survMisc 0.5.4 2016-11-23 CRAN (R 3.3.3)
tibble 1.3.3 2017-05-28 CRAN (R 3.3.3)
tidyr 0.6.3 2017-05-15 CRAN (R 3.3.3)
tools 3.3.2 2016-10-31 local
utils * 3.3.2 2016-10-31 local
withr 1.0.2 2016-06-20 CRAN (R 3.3.3)
xtable 1.8-2 2016-02-05 CRAN (R 3.3.2)
zoo 1.7-14 2016-12-16 CRAN (R 3.3.2)

Thanks!

@kassambara

This comment has been minimized.

Owner

kassambara commented Jun 5, 2017

This is strange!

This kind of issues has been reported many times on stackoverflow (see this, this or this forum) and in all the cases, re-starting R fixes the issue.

Have you try to remove the old survminer before re-installation?

If the answer is no, then try this:

  • Re-start R
  • Remove the old survminer packages with remove.packages("survminer").
  • Re-install again

And, please let-me know if it works.

@moripe

This comment has been minimized.

moripe commented Jun 6, 2017

Hi!
Everything you say, I did it yesterday. Today I have done again but I get the same error.
I have tried on another pc but the same error still happens.
thank you anyway!!!!

@kassambara

This comment has been minimized.

Owner

kassambara commented Jun 6, 2017

I have no idea what might cause this issue.

@moripe

This comment has been minimized.

moripe commented Jun 6, 2017

I think the problem is the function ".collapse", which you use in:
DNAME <- paste(deparse(substitute(data)), "and", collapse(group_var, sep = " + " ))
The issue is: Error in paste(deparse(substitute(data)), "and", .collapse(group_var, :
could not find function ".collapse"
I´m finding about .collapse but I don´t find anything.
I need to use the function pairwise_survdiff so I have to compare the survival curves and I don´t know other function that I can use for it :(

@kassambara

This comment has been minimized.

Owner

kassambara commented Jun 6, 2017

As the error you obtained is not reproducible on my computer,

please, let me know if the following code works on your system?

library("survminer")
library(survival)
data(mgus)
res.cox <- coxph(Surv(futime, death) ~ mspike + log(mspike) + I(mspike^2) +
    age + I(log(age)^2) + I(sqrt(age)), data = mgus)
ggcoxfunctional(res.cox,  data = mgus, point.col = "blue", point.alpha = 0.5)
@moripe

This comment has been minimized.

moripe commented Jun 6, 2017

Yes, the code has worked perfectly

kassambara added a commit that referenced this issue Jun 6, 2017

@kassambara

This comment has been minimized.

Owner

kassambara commented Jun 6, 2017

.collapse() function pasted in the pairwise_survdiff.R file

Please, re-install survminer from github and let me know, if pairwise_survdiff() works. Thanks

@moripe

This comment has been minimized.

moripe commented Jun 6, 2017

I re-install survminer from github and I get the same issue with my dataset and with dataset "mymeloma" (http://www.sthda.com/english/wiki/survminer-0-3-0). I've tried using with the two dataset the function pairwise_survdiff() so maybe it's my data problem.

I run the function .collapse (.collapse pasted in pairwise_survdiff.R #97) and I get other issue
Error in survdiff.fit(y, groups, strata.keep, rho) :
There is only 1 group
so I use the dataset "mymeloma" and yes that works!!!!!!!

There is definitely a problem in my data!!!!! I don´t understand because the error say there is only 1 group if I have 7 factor different but I will found about it
Thanks a lot of and sorry for the inconvenience.

@moripe

This comment has been minimized.

moripe commented Jun 6, 2017

I finally got it!!!!! I was able to use the function pairwise_survdiff()

The ultim problem I have solved it. I tell you where my mistake was. In my initial dataset, my predictor variable has 9 levels but I´m working with 7 levels of a subdataset because the other two levels haven´t got values. So I have defined the 7 levels that have value and pairwise_survdiff() works!!!!!
I´m very vey happy
Thanks again, I wanted you to know that I already got it :)

I hope you understand what I've explained

@kassambara

This comment has been minimized.

Owner

kassambara commented Jun 6, 2017

I'm happy that it works. Thank you for your feedback, I appreciate it and it will help me to improve pairwise_survdiff() to handle such situation. This means that I should include the function droplevels() to drop unused levels.

kassambara added a commit that referenced this issue Jun 6, 2017

@kassambara

This comment has been minimized.

Owner

kassambara commented Jun 6, 2017

droplevels() added now, thanks

@ccapizzano

This comment has been minimized.

ccapizzano commented Jun 6, 2017

I first wanted to say you prepared and delivered a fantastic post on the use of log-rank tests to identify differences in survivorship over time between groups.

My quick question builds off of @MarcinKosinski in regards to merging data to provide more generalized groups and thus interpretations of the data. I agree it is logical to merge factor levels that are not considered to be statistically different. However, how does that change when you have three or more factor levels. The Zhan et al. (2006) article you refer to, for instance, appears to have visually analyzed the data with the KM estimator of survival and created two risk groups. As such, they could theoretically divide levels into whatever groups they deem necessary, such as placing MF with the low risk group.

Any critiques or suggestions to merging three or more factor levels like in the Zhan article?

Thank you

@raph06

This comment has been minimized.

raph06 commented Oct 31, 2017

Hello and thank you for this interesting thread,

I find it very nice to take interest in subgroups, I tried the pairewise_survdiff() function for one of my dataset.

I do not manage to understand how my global pvalue can be <0.05 and none of the pairewise_survdiff() pvalues are below this same threshold.

Also I've been struggling to code the pairewise_survdiff() into a loop to iterate on every covariates.
It seems that the "data" argument prevents me from using the syntax data$covariates or data[,n].
I get the following:

Error in [.data.frame(data, , group_var, drop = FALSE) :
undefined columns selected

Thank you for your answers.

Raphaël

@kassambara

This comment has been minimized.

Owner

kassambara commented Oct 31, 2017

Hi,

You're writing in closed issue.

Please, kindly open a new issue and provide a reproducible example with a demo data set so that we can work on it, thanks

@audyavar

This comment has been minimized.

audyavar commented Nov 15, 2018

I am trying to run pairwise_survdiff() in a loop where I use
res <- pairwise_survdiff(Surv(x$"PFI.Time"/365,x$"PFI") ~ x[,k],data=x)
It says "Error in [.data.frame(data, , group_var, drop = FALSE) :
undefined columns selected". I tried it multiple ways with get(k) as well, but still same error.

@MarcinKosinski

This comment has been minimized.

Contributor

MarcinKosinski commented Nov 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment