Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add loading colors in legend (feature request) #27

Closed
ginolhac opened this issue Feb 7, 2017 · 9 comments
Closed

add loading colors in legend (feature request) #27

ginolhac opened this issue Feb 7, 2017 · 9 comments

Comments

@ginolhac
Copy link

ginolhac commented Feb 7, 2017

Me again,

Now that the invisible option is fixed in #26 (thanks again!), my goal is to have some colors for the quanti.sup while hiding the variables (or loadings). This is working fine, but that would be great to add them to the legend. In my case, the quanti.sup names are experiments and the colors should be the treated cells.

The ellipses are filled, so that take the fill legend. Great. The remaining issue is the color of individuals that should be let's say black, otherwise I cannot get the legend for the quanti.sup.

A plot explains better the problem

pca_deca <- PCA(decathlon2, scale.unit = TRUE, graph = FALSE, quanti.sup = 11:12, quali.sup = c(13))
fviz_pca_biplot(pca_deca, invisible = "var", habillage = "Competition",
                addEllipses = TRUE, col.ind = "black", pointshape = 19,
                col.quanti.sup = c("purple", "darkblue"))

image

see that the quanti.sup are properly colored but don't show up in the legend. And my attempt to use "black" for indiv was a bit naive.

Since, I am not sure how to solve this, here is a toy example of what should be achieved

# example adapted from this answer
# http://stackoverflow.com/a/20291006/1395352
library(FactoMineR)
library(tidyverse)
pca    <- prcomp(iris[, 1:4], retx = TRUE, scale. = TRUE) # scaled pca [exclude species col]
pca_iris <- PCA(iris[, 1:4], graph = FALSE)
var_iris <- pca_iris$var$coord %>%
  as.data.frame() %>%
  rownames_to_column(var = "var") %>%
  separate(var, into = c("flower", "measure"), sep = "\\.") %>%
  as_tibble()
scores <- pca$x[, 1:3]                        # scores for first three PC's

# k-means clustering [assume 3 clusters]
km     <- kmeans(scores, centers = 3, nstart = 5)
ggdata <- data.frame(scores, Cluster = km$cluster, Species = iris$Species)
# get some custom colors
my_col_var <- ggsci::pal_npg("nrc")(4)
my_col_ell <- ggsci::pal_uchicago()(3)

ggplot(ggdata) +
  geom_point(aes(x = PC1, y = PC2, shape = factor(Cluster)), size = 2) +
  stat_ellipse(aes(x = PC1, y = PC2, fill = factor(Cluster)),
               geom = "polygon", level = 0.95, alpha = 0.4) +
  geom_segment(data = var_iris, aes(x = 0, xend = Dim.1 * 2, colour = flower,
                                    y = 0, yend = Dim.2 * 2), size = 1.2, arrow = arrow(length = unit(0.03, "npc"))) +
  geom_text(data = var_iris, aes(x = Dim.1 * 2, colour = flower, label = measure,
                                 y = Dim.2 * 2), nudge_x = 0.2, nudge_y = 0.3, show.legend = FALSE) +
  scale_fill_manual(values = my_col_ell) +
  scale_colour_manual(values = my_col_var) +
  labs(fill = "cluster",
       shape = "cluster",
       colour = "loadings") +
  theme_bw(14)  

image

see that allows to add more information and reduce the text length. The shape mapping is not mandatory I think.

@ginolhac
Copy link
Author

ginolhac commented Feb 7, 2017

I wrote loadings, but I should have written quanti.sup for my specific need. However, for both it would be useful I guess.

@kassambara
Copy link
Owner

I really appreciate this very well written request.

The idea is to be able to color variables (active and supplementary) by groups so that they will appear in the legends.

I think that this is an interesting feature and I will implement it as soon as possible.

Let me know If you have any other suggestions.

Have a great day,
/A

@ginolhac
Copy link
Author

ginolhac commented Feb 7, 2017

thanks a lot! You summarized very the (long) request. I have more ideas but will open separate issues later. After watching François Husson talking about PCA, the real diagnostic power of PCA enlighten me! Like if you see genes that belong to one group but found in another one, you can investigate further. Or genes that clearly belong to one group but were not included. Very great tool.
Have a great day too!

@ginolhac ginolhac changed the title add loadings colors in legend (feature request) add loading colors in legend (feature request) Feb 7, 2017
@ginolhac
Copy link
Author

Hello @kassambara, any chance you have time to look into this feature request?

@kassambara
Copy link
Owner

I think that the current developmental version of factoextra includes already a quick solution to your question.

Please install the latest developmental version and try this:

res.pca <- prcomp(iris[, -5],  scale = TRUE)
fviz_pca_biplot(res.pca, label = "var",
             col.ind = iris$Species,
             col.var = c("sepal", "sepal", "petal", "petal"),
             repel = TRUE,
             palette = "jco",
             legend.title = "Group"
             )

What do you think about that?

@ginolhac
Copy link
Author

that is nice indeed, but I'd like the col.var to be in the legend on its own. A trick I used before is to use a shape = 21 for points so the fill argument is for coloring and let the colour one for loadings.

see from the example above

ggplot(ggdata) +
  geom_point(aes(x = PC1, y = PC2, fill = factor(Species)), size = 2, shape = 21, colour = "grey90") +
  geom_segment(data = var_iris, aes(x = 0, xend = Dim.1 * 2, colour = flower,
                                    y = 0, yend = Dim.2 * 2), size = 1.2, arrow = arrow(length = unit(0.03, "npc"))) +
  geom_text(data = var_iris, aes(x = Dim.1 * 2, colour = flower, label = measure,
                                 y = Dim.2 * 2), nudge_x = 0.2, nudge_y = 0.3, show.legend = FALSE) +
  scale_fill_manual(values = my_col_ell) +
  scale_colour_manual(values = my_col_var) +
  labs(fill = "cluster",
       shape = "cluster",
       colour = "loadings") +
  theme_bw(14)

image

kassambara added a commit that referenced this issue Aug 15, 2017
kassambara added a commit that referenced this issue Aug 15, 2017
@kassambara
Copy link
Owner

New arguments fill.var and fill.ind added.

The following R code should work:

library(factoextra)
res.pca <- prcomp(iris[, -5],  scale = TRUE)
fviz_pca_biplot(res.pca, 
                
                # Fill individuals by groups
                geom.ind = "point",
                pointshape = 21,
                pointsize = 2,
                fill.ind = iris$Species,
                col.ind = "white",
                
                # Color variable by groups
                col.var = factor(c("sepal", "sepal", "petal", "petal")),
                
                repel = TRUE
             )+
  ggpubr::color_palette("npg")+
  ggpubr::fill_palette("jco")+
  labs(fill = "Species", color = "Clusters")

rplot13

@kassambara
Copy link
Owner

After installing the latest developmental version of ggpubr and factoextra, the following R code should work:

library(factoextra)
res.pca <- prcomp(iris[, -5],  scale = TRUE)
fviz_pca_biplot(res.pca, 
                # Individuals
                geom.ind = "point",
                fill.ind = iris$Species, col.ind = "white",
                pointshape = 21, pointsize = 2,
                palette = "jco",
                addEllipses = TRUE,
                # Variables
                alpha.var ="contrib", col.var = "contrib",
                gradient.cols = "RdBu"
                )+
  labs(fill = "Species", color = "Contrib", alpha = "Contrib") # Change legend title

rplot

@ginolhac
Copy link
Author

Looks superb! Thanks a lot for your much appreciated efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants