Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggsurvplot removes points when a named vector is passed as a palette #100

Closed
oneillkza opened this issue Dec 20, 2016 · 6 comments
Closed

ggsurvplot removes points when a named vector is passed as a palette #100

oneillkza opened this issue Dec 20, 2016 · 6 comments

Comments

@oneillkza
Copy link

@oneillkza oneillkza commented Dec 20, 2016

Expected behavior

Survival plot with points, using the provided palette.

Actual behavior

Warning message:
Removed 61 rows containing missing values (geom_point). 

Steps to reproduce the problem

library(survival)
library(survminer)
fit<- survfit(Surv(time, status) ~ sex, data = lung)
the.pal <- c("#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#66A61E", "#E6AB02")
named.pal <- the.pal
names(named.pal) <- 1:6

#Works as expected:
 ggsurvplot(fit,
 palette = the.pal)

#Fails:
 ggsurvplot(fit,
 palette = named.pal)

Note: this is mostly just an annoyance, but I have a use case where I defined a palette for another plot, and wanted the colours in my survival plot to match. It was fairly confusing trying to trace this, and it is slightly annoying having to copy the palette to a new variable and scrub the names before it works.

session_info()

Session info ----------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, darwin13.4.0        
 ui       RStudio (1.0.44)            
 language (EN)                        
 collate  en_CA.UTF-8                 
 tz       America/Vancouver           
 date     2016-12-20                  

Packages --------------------------------------------------------------------------------------------------------------------------------------------------
 package         * version  date       source        
 AnnotationDbi   * 1.36.0   2016-10-18 Bioconductor  
 assertthat        0.1      2013-12-06 CRAN (R 3.3.0)
 BH              * 1.60.0-2 2016-05-07 CRAN (R 3.3.0)
 Biobase         * 2.34.0   2016-10-18 Bioconductor  
 BiocGenerics    * 0.20.0   2016-10-18 Bioconductor  
 BiocParallel      1.8.1    2016-10-30 Bioconductor  
 chron             2.3-47   2015-06-24 CRAN (R 3.3.0)
 circlize        * 0.3.9    2016-09-26 CRAN (R 3.3.0)
 cluster         * 2.0.5    2016-10-08 CRAN (R 3.3.2)
 clusterProfiler * 3.2.5    2016-11-15 Bioconductor  
 codetools         0.2-15   2016-10-05 CRAN (R 3.3.2)
 colorspace        1.3-0    2016-11-10 CRAN (R 3.3.2)
 data.table        1.9.6    2015-09-19 CRAN (R 3.3.0)
 DBI               0.5-1    2016-09-10 CRAN (R 3.3.0)
 devtools          1.12.0   2016-06-24 CRAN (R 3.3.0)
 digest            0.6.10   2016-08-02 CRAN (R 3.3.0)
 DO.db             2.9      2016-11-17 Bioconductor  
 doParallel        1.0.10   2015-10-14 CRAN (R 3.3.0)
 DOSE            * 3.0.6    2016-11-15 Bioconductor  
 evaluate          0.10     2016-10-11 CRAN (R 3.3.0)
 fastmatch         1.0-4    2012-01-21 CRAN (R 3.3.0)
 fgsea             1.0.1    2016-10-29 Bioconductor  
 foreach           1.4.3    2015-10-13 CRAN (R 3.3.0)
 gdata           * 2.17.0   2015-07-04 CRAN (R 3.3.0)
 ggplot2         * 2.2.0    2016-11-11 CRAN (R 3.3.2)
 GlobalOptions     0.0.10   2016-04-17 CRAN (R 3.3.0)
 GO.db             3.4.0    2016-11-17 Bioconductor  
 googleVis       * 0.6.1    2016-09-01 CRAN (R 3.3.0)
 GOSemSim          2.0.1    2016-11-11 Bioconductor  
 graph             1.52.0   2016-10-18 Bioconductor  
 graphite          1.20.1   2016-10-19 Bioconductor  
 gridBase          0.4-7    2014-02-24 CRAN (R 3.3.0)
 gridExtra         2.2.1    2016-02-29 CRAN (R 3.3.0)
 gtable            0.2.0    2016-02-26 CRAN (R 3.3.0)
 gtools            3.5.0    2015-05-29 CRAN (R 3.3.0)
 highr             0.6      2016-05-09 CRAN (R 3.3.0)
 HiveR           * 0.2.55   2016-03-26 CRAN (R 3.3.0)
 htmltools         0.3.5    2016-03-21 CRAN (R 3.3.0)
 htmlwidgets       0.8      2016-11-09 CRAN (R 3.3.2)
 httpuv            1.3.3    2015-08-04 CRAN (R 3.3.0)
 igraph            1.0.1    2015-06-26 CRAN (R 3.3.0)
 IRanges         * 2.8.1    2016-11-08 Bioconductor  
 iterators         1.0.8    2015-10-13 CRAN (R 3.3.0)
 jpeg              0.1-8    2014-01-23 CRAN (R 3.3.0)
 jsonlite          1.1      2016-09-14 CRAN (R 3.3.0)
 KernSmooth        2.23-15  2015-06-29 CRAN (R 3.3.2)
 knitr           * 1.15     2016-11-09 CRAN (R 3.3.2)
 labeling          0.3      2014-08-23 CRAN (R 3.3.0)
 lattice           0.20-34  2016-09-06 CRAN (R 3.3.2)
 lazyeval          0.2.0    2016-06-12 CRAN (R 3.3.0)
 magrittr          1.5      2014-11-22 CRAN (R 3.3.0)
 Matrix            1.2-7.1  2016-09-01 CRAN (R 3.3.0)
 memoise           1.0.0    2016-01-29 CRAN (R 3.3.0)
 mime              0.5      2016-07-07 CRAN (R 3.3.0)
 misc3d            0.8-4    2013-01-25 CRAN (R 3.3.0)
 munsell           0.4.3    2016-02-13 CRAN (R 3.3.0)
 mvtnorm           1.0-5    2016-02-02 CRAN (R 3.3.0)
 NMF             * 0.20.6   2015-05-26 CRAN (R 3.3.0)
 org.Hs.eg.db    * 3.4.0    2016-11-17 Bioconductor  
 pheatmap        * 1.0.8    2015-12-11 CRAN (R 3.3.0)
 pkgmaker        * 0.22     2014-05-14 CRAN (R 3.3.0)
 plyr            * 1.8.4    2016-06-08 CRAN (R 3.3.0)
 png               0.1-7    2013-12-03 CRAN (R 3.3.0)
 qvalue            2.6.0    2016-10-18 Bioconductor  
 R6                2.2.0    2016-10-05 CRAN (R 3.3.0)
 rappdirs          0.3.1    2016-03-28 CRAN (R 3.3.0)
 RColorBrewer    * 1.1-2    2014-12-07 CRAN (R 3.3.0)
 Rcpp            * 0.12.8   2016-11-17 CRAN (R 3.3.2)
 reactome.db       1.58.0   2016-11-17 Bioconductor  
 ReactomePA      * 1.18.1   2016-11-11 Bioconductor  
 registry        * 0.3      2015-07-08 CRAN (R 3.3.0)
 reshape2          1.4.2    2016-10-22 CRAN (R 3.3.0)
 rmarkdown         1.1      2016-10-16 CRAN (R 3.3.0)
 rngtools        * 1.2.4    2014-03-06 CRAN (R 3.3.0)
 RSQLite           1.0.0    2014-10-25 CRAN (R 3.3.0)
 S4Vectors       * 0.12.0   2016-10-18 Bioconductor  
 scales            0.4.1    2016-11-09 CRAN (R 3.3.2)
 sfsmisc           1.1-0    2016-02-23 CRAN (R 3.3.0)
 shape             1.4.2    2014-11-05 CRAN (R 3.3.0)
 shiny             0.14.2   2016-11-01 CRAN (R 3.3.0)
 stringi           1.1.2    2016-10-01 CRAN (R 3.3.0)
 stringr           1.1.0    2016-08-19 CRAN (R 3.3.0)
 survival        * 2.40-1   2016-10-30 CRAN (R 3.3.0)
 survminer       * 0.2.4    2016-12-11 CRAN (R 3.3.2)
 tibble            1.2      2016-08-26 CRAN (R 3.3.0)
 tidyr             0.6.0    2016-08-12 CRAN (R 3.3.0)
 withr             1.0.2    2016-06-20 CRAN (R 3.3.0)
 xtable            1.8-2    2016-02-05 CRAN (R 3.3.0)
 yaml              2.1.14   2016-11-12 CRAN (R 3.3.2)
@oneillkza
Copy link
Author

@oneillkza oneillkza commented Dec 21, 2016

To add to this, it looks like I also need to ensure that my groups are in the correct order when I pass them to survfit, for them to match up correctly. It might also be good if the names on the palette would match to the group (e.g. in this example, for sex==1 the colour would be the one named "1" on the palette.)

@oneillkza
Copy link
Author

@oneillkza oneillkza commented Dec 21, 2016

Oh, hmmm... It actually looks like it automatically sorts the group names into alphabetical order, then assigns them to the palette. I have one data set with a list of clusters from 1:11which I want to use the colours in order. However, ggsurvplot currently assigns them to colours in the order 1,10,11,2,3,4...

This can be hacked around by alphabetically reordering the palette, but that's a little convoluted. Probably the ideal behaviour would be for it to respect the order of factor levels where the grouping variable is a factor. I believe that's how ggplot handles colours, and would allow for custom ordering of groups by manipulating the order of the factor levels.

@kassambara
Copy link
Owner

@kassambara kassambara commented Dec 21, 2016

Hi @oneillkza,

Thank you for reporting these behaviors of ggsurvplot. However, I think that many of these behaviors seem to be caused by the way the data is prepared. It's not a problem in ggsurvplot.

A) ggsurvplot() does not re-order the group names if your data are well prepared before fitting survival curves.

In the lung data, by default the levels of the sexvariable ar c(1, 2). In the example below, I'll create a second variable sex2 with levels c(2, 1). Next, I'll fit survival curves for sex and sex2. In the following plots, it can be seen that, the order of the factor levels is always kept by ggsurvplot.

library(survival)
library(survminer)

# Create the variable sex 2
data("lung")
lung$sex2 <- factor(lung$sex, levels = c(2, 1))

# Fit survival curves for the variable sex
fit1<- survfit(Surv(time, status) ~ sex, data = lung)
ggsurvplot(fit1, risk.table = TRUE)

rplot

# Fit survival curves for the variable sex2
fit2<- survfit(Surv(time, status) ~ sex2, data = lung)
ggsurvplot(fit2, risk.table = TRUE)

rplot03

Pease, make sure that your cluster variable is a factor with levels = 1:11, before fitting survival curves. If this is not the case, the default behavior of survfit() is to define the alphabetical order as strata. This is not a ggsurvplot() problem.

mydata$clusters <- factor(mydata$clusters, levels = 1:11)
fit <-survfit(time, status)~ clusters, data = mydata)
ggsurvplot(fit)

B) Color palettes

In ggsurvplot(), by default the first color in the palette is used to color the first level of the factor variable. That is the default behavior of ggplot2. In fit2, the first level is sex=2.

For example:

ggsurvplot(fit2, risk.table = TRUE, palette = c("black", "lightgray"))

rplot04

This default behavior of ggplot2-based plotting system can be changed by assigning correctly a named vector. That is, in our example, the names of colors should match the names of strata as generated by the survival::survfit() function. If they don't match, ggplot generates a warning message as in your case.

For example, now, I want to color the sex2=2 in lightgray and sex2=1 in black:

pals = c("black", "lightgray")
names(pals) <- paste0("sex2=", c(1, 2))
ggsurvplot(fit2, risk.table = TRUE, palette = pals)

rplot05

Please let me know if it works with your data.

Best regards,
/A

@oneillkza
Copy link
Author

@oneillkza oneillkza commented Dec 21, 2016

Thanks for the detailed and thoughtful answer! After some poking, that does actually work for me. I hadn't realised it had to match the strata names, or that survfit derived those from the formula. This might be useful information to include in the documentation for ggsurvplot.

Also, interestingly, it did not work when I used $ notation in the formula passed to survfit (such that the strata names contained $).

@kassambara
Copy link
Owner

@kassambara kassambara commented Dec 22, 2016

I agree with you and I updated the documentation . In ggsurvplot(), more information, about color palettes, have been added now in the details section of the documentation.

About the dollar ($) notation, we generally prefer when users separate the formula and the data as follow (case 1):

library(survival)
fit1 <- survfit(Surv(time, status) ~ sex, data = lung)

Instead of using this (case 2):

library(survival)
fit2 <- survfit(Surv(lung$time, lung$status) ~ lung$sex)

However, even if we don't recommend the script used in the case 2, the current version of ggsurvplot() (survminer 0.2.4) fully handle this situation (at least in my hand).

For example:

# Default plot
library(survminer)
ggsurvplot(fit2, risk.table = TRUE, pval = TRUE, conf.int = TRUE)

rplot

# Change color palette (1/2): sex=1 in "black"; sex=2 in "grey"
pals <- c("black", "grey")
names(pals) <- c("sex=1", "sex=2")

ggsurvplot(fit2, risk.table = TRUE, pval = TRUE, conf.int = TRUE,
palette = pals)

rplot06

# Change color palette (2/2): sex=1 in "grey"; sex=2 in "black"
pals <- c("black", "grey")
names(pals) <- c("sex=2", "sex=1")

ggsurvplot(fit2, risk.table = TRUE, pval = TRUE, conf.int = TRUE,
palette = pals)

rplot07

As you mentioned, in the case 2, strata names returned by survfit() is in a very long format c("lung$sex=1", "lung$sex=2"). To make the legend readable, this is simplified by the ggsurvplot() function into c("sex=1", "sex=2"). So to make a general conclusion about the argument palette, the names of colors should match the names of strata as generated by ggsurvplot()in the legend. I updated the documentation accordingly (details section).

Let me know if it's ok for you so that we can close this issue.

Have a good day,
/A

kassambara added a commit that referenced this issue Dec 22, 2016
@oneillkza
Copy link
Author

@oneillkza oneillkza commented Dec 22, 2016

Looks good to me! Thanks so much for all your hard work!

@kassambara kassambara closed this Dec 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.