Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node aesthetics using independently computed data for each node #31

Closed
kotliary opened this issue May 14, 2019 · 5 comments
Closed

Node aesthetics using independently computed data for each node #31

kotliary opened this issue May 14, 2019 · 5 comments

Comments

@kotliary
Copy link

I compute some statistics from predicting models for each cluster and save them in a separate data.frame. But I cannot figure out how to color the clustree nodes by these values. I've tried to directly modify data for returned ggplot object, but with this operation the data is converted to pure data.frame loosing all information on edges.

ct = clustree(sobj, prefix = "K", node_colour="sig", node_colour_aggr = "mean")
class(ct$data)
[1] "layout_igraph" "layout_ggraph" "data.frame"
ct$data = ct$data %>% left_join(df.nodes, by="node")
class(ct$data)
[1] "data.frame"
ct
Error: edges must contain the columns from, to, x, y, xend, yend and circular

Any ideas or a workaround?

Thanks for the great package!

@lazappi
Copy link
Owner

lazappi commented May 15, 2019

Hi @kotliary

Thanks for giving clustree a go! This is something that is probably a bit hard to do at the moment but hopefully we can come up with a solution that works, I'm sure other people want to do the same thing.

Could you provide a full reproducible example that shows what you have tried doing so far, perhaps using the iris_clusts dataset? I'm not sure exactly what is in all of those objects so it's a bit hard for me to play around with it. The reprex package may be helpful for doing this.

In the mean time I'll have a think about what might work and see if I can come up with something.

@kotliary
Copy link
Author

kotliary commented May 17, 2019

Thank you for the response. Please see an example below. I added a group column to the iris_clusts dataset with random 0s and 1s. Then I do t.test comparing Sepal.Length between these groups within each cluster for each K.

Then I'd like to generate clustree plot with nodes colored by -log10(p.value) from df.test. This is a simple example, and it's probably possible to do it with aggregate function. But it's more complex in my case, and it would be great if clustree nodes can be colored by data from an external table.

library(tidyverse)
library(clustree)

data("iris_clusts")

set.seed(123)
iris_with_groups = iris_clusts %>% 
  mutate(group = sample(0:1, size = nrow(.), replace = T))

df.test = data.frame()
for(k in paste0("K",1:5)) {
  for (cl in iris_with_groups %>% pull(k) %>% unique()) {
    idx = iris_with_groups %>% pull(k) == cl
    df = iris_with_groups[idx,]
    df.test = rbind(df.test,
                    broom::glance(t.test(Sepal.Length~group, data=df)) %>% 
                      mutate(N = nrow(df), K = k, cluster = cl)
                    )
  }
}
df.test = df.test %>% mutate(node = paste0(K,"C",cluster))

@lazappi
Copy link
Owner

lazappi commented May 20, 2019

Thanks for the example, that helps a lot!

This is something that clustree should be able to do but can't at the moment. I would like to implement it but I am working on some other things at the moment so I don't quite have the time just now.

Here is a work around that should be ok for now. Basically we are going to take the layout object, modify it and manually make the plot. Starting from your code above:

# Get the graph layout
layout <- clustree(iris_with_groups, prefix = "K", return = "layout") %>%
    arrange(node)

# Add the statistic column (or whatever you want to show)
df.test <- arrange(df.test, node)
layout$statistic <- df.test$statistic

# Plot the graph (modified from clustree.R)
gg <- ggraph(layout)

# Add the edges
gg <- gg + geom_edge_link(arrow = arrow(length = unit(1.5 * 5, "points"),
                                        ends = "last"),
                          end_cap = circle(9.5 * 1.5, "points"),
                          start_cap = circle(9.5 * 1.5, "points"),
                          aes_(colour = ~count,
                               alpha = ~in_prop,
                               edge_width = ~is_core)) +
    scale_edge_width_manual(values = c(1.5, 1.5),
                                   guide = "none") +
    scale_edge_colour_gradientn(colours = viridis::viridis(256)) +
    scale_edge_alpha(limits = c(0, 1))

# Add the node points (replace "statistic" with the column you want to show)
gg <- gg + clustree:::add_node_points("statistic", "size", 1, colnames(layout))

# Add the node text
gg <- gg + geom_node_text(aes_(label = ~cluster), size = 3,
                          colour = "black")

# Plot theme
gg <- gg + scale_size(range = c(4, 15)) +
    ggraph::theme_graph(base_family = "",
                        plot_margin = ggplot2::margin(2, 2, 2, 2))

Which gives us this plot.

image

I've shown the statistic column here but you could use any of the others. Hope that helps!

@kotliary
Copy link
Author

Works great! Thank you so much!

One modification. You cannot use arrange(node) in the layout object. This converts layout object to a data.frame, and ggraph does not work with data.frame. It's better to make sure that the order of nodes in the external data.frame matches with the order of nodes in the layout.

iord = match(layout$node, df.test$node)
layout$statistic <- df.test$statistic[iord]

Other than that it works perfectly. I hope this feature will be integrated into the package at some point.
Thanks again!

@lazappi
Copy link
Owner

lazappi commented May 21, 2019

You are right about arrange, I thought I fixed that but I must have forgotten.

I'm going to close this issue now but I have made a note in #26 about trying to add this as a feature.

@lazappi lazappi closed this as completed May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants