Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Subgraph/cluster with version 0.9.2 #257

Closed
heichi opened this issue Nov 24, 2017 · 20 comments
Closed

Create Subgraph/cluster with version 0.9.2 #257

heichi opened this issue Nov 24, 2017 · 20 comments

Comments

@heichi
Copy link

heichi commented Nov 24, 2017

Hello,
are cluster/subgraph supported in version 0.9.2 of DiagrameR?

Thanks

@Enchufa2
Copy link

It seems so (see #260). However, I don't know whether these graphs can be created using create_graph (I've already asked in #236).

@flyaflya
Copy link
Contributor

flyaflya commented Jan 8, 2018

Thx @rich-iannone for the great package; I think the importance of programmatically generating DAG's and similar graphs from dataframes cannot be understated.

In regards to this issue, I am wondering if there is any guidance on @Enchufa2 's comment/inquiry. Can the "subgraph cluster_X" capabilities be replicated using create_graph? I would like to use a node dataframe to mimic basic plate model notation. Thanks!!

@rich-iannone
Copy link
Owner

@flyaflya if I recall correctly, there is some capability to bundle/coalesce nodes together (I think by using the cluster node attribute), but I need to revisit the state of this.

Could you provide a concrete example of what you’d like to achieve from inputs to output? Should the functionality not be available, I’d be more than happy to include it into the package as a new feature.

@Enchufa2
Copy link

I see:

library(DiagrammeR)

create_node_df(n=4, label=TRUE, cluster=rep(c("A", "B"), 2)) %>%
  create_graph() %>%
  set_global_graph_attrs("layout", "dot", "graph") %>%
  render_graph()
#> Error: `replacement` must be a character vector

Not a very helpful message, BTW.

@flyaflya
Copy link
Contributor

@rich-iannone, thanks for the follow-up. As a simple example, I would like to reproduce the output of this code:

grViz("
digraph G {
  compound=true;
  a;
  subgraph cluster1 {
    b [label = 'b@_{n}']; c [label = 'c@_{n}'];
    label = 'Data N';
    labelloc = b;
    labeljust = r;

  }
  a -> c;
  b -> c;
}
")

which produces this picture:
image

The code I would like to use is something like this:

nodeDF = data.frame(id = c(1,2,3), 
                    label = c("a","b","c"),
                    cluster = c(NA,"data","data"))
edgeDF = data.frame(from = c(1,2),
                    to = c(3,3))

create_graph() %>%
  add_nodes_from_table(nodeDF, label_col = label) %>%
  add_edges_from_table(edgeDF, from, to, id_external) %>%
  render_graph()

, but I get the same error as @Enchufa2 (i.e. Error: replacement must be a character vector). If you omit the cluster column from the dataframe and rerun this code, then you only get this:
image

Thanks for any help/guidance/new features/etc. I would love to make this tool my default over using tikz.

@stevepowell99
Copy link

+1 for this - essential / critical for me

@stevepowell99
Copy link

also, sounds like DiagrammeR should support nested subgraphs too? (but obviously this won't work if ordinary subgraphs don't work)

@tvarju
Copy link
Contributor

tvarju commented Aug 22, 2018

@Enchufa2 I tracked down this bug to line 499 of the file generate_dot.R.
It is clearly a syntax error, that is always raised, when 'cluster' %in% colnames(nodes_df) in line 468 of the same file evaluates to TRUE

I can not fix it, since I do not really get what should happen here.

@rich-iannone This bug was there in the first version of the file. Even the indentation gives away, that something is wrong. Unfortunately there is no unit test for generate_dot. The feature is undocumented, so the comment of @Enchufa2 is the only place where it comes to the surface.

@Enchufa2
Copy link

@tvarju Good catch! But it's not a simple syntax error. That code is supposed to strip out nodes from the node_block and put them inside a cluster subgraph, but AFAICT the whole implementation is flawed in several ways (apart from the obvious syntax error).

@tvarju
Copy link
Contributor

tvarju commented Aug 24, 2018

@stevepowell99 nested subraphs with a dot notation syntax like
nodes$cluster <- c('A', 'A', 'B', 'B.sub', 'B.sub', 'B.sub.subsub')

I could implement it right now.
Let us wait what happens to my pull request #306 !

A more extended support with cluster attributes (like label positioning in the example of @flyaflya )
require design decisions.
Design decisions are better in the hands of maintainers, than in the hands of an external contributor.

@stevepowell99
Copy link

@tvarju sounds promising! I am so eager for clusters to happen! Sure, there are design decisions .... how to introduce 1) global cluster attributes and 2) cluster-by-cluster attributes and 3) information on the cluster hierarchy? The way you suggest sounds fine for 3 but doesn't help with 1 or 2. 1 and 2 could be achieved with

  • adding a cluster option to add_global_graph_attrs, delete_global_graph_attrs etc
  • and introductng something like set_cluster_attrs.
    ... now, that solution could also include a cluster-only attribute like "child of ..." in which case one wouldn't really need the dot mechanism you suggest (though it could be useful).

without having a separate mechanism for clusters like the one for nodes and edges, which might be overkill

@rich-iannone
Copy link
Owner

Thanks @tvarju for the PR and everyone else in this issue. The PR is now merged :)

@stevepowell99
Copy link

Thanks @tvarju and @rich-iannone ! Could you give a quick example how this can now be used to create a graph? I can run the @flyaflya https://github.com/rich-iannone/DiagrammeR/issues/257#issuecomment-357470193 which now does not crash but produces this:
image.
If I replace the NA in the cluster assignment with data, I get a correct diagram but no cluster boundaries drawn. If I then change one of the cluster assignments to data.b, I get nonsense again.

@Enchufa2
Copy link

You need to add "dot" to the layout attribute, but still, similar issue:

nodeDF = data.frame(id = c(1,2,3), 
                    label = c("a","b","c"),
                    cluster = c(NA,"data","data"))
edgeDF = data.frame(from = c(1,2),
                    to = c(3,3))

create_graph() %>%
  add_nodes_from_table(nodeDF, label_col = label) %>%
  add_edges_from_table(edgeDF, from, to, id_external) %>%
  add_global_graph_attrs("layout", "dot", "graph") %>%
  render_graph()

image

The implementation needs refinement and more test cases, because there are problems with edges.

@tvarju tvarju mentioned this issue Aug 27, 2018
@tvarju
Copy link
Contributor

tvarju commented Aug 27, 2018

@Enchufa2 My bad. Did not notice that another part of the code is also messing with the cluster attribute. My unit test did not catch it. Your example is now one of the unit tests, and the fix is in the PR.
Thanks for testing!

@stevepowell99
Copy link

Thanks @tvarju @rich-iannone @Enchufa2 ... so this does not include the suggested dot notation for nested clusters? This doesn't produce nested clusters:

nodeDF = data.frame(id = c(1,2,3), 
                    label = c("a","b","c"),
                    cluster = c(NA,"data","data.sub"))
edgeDF = data.frame(from = c(1,2),
                    to = c(3,3))

create_graph() %>%
  add_nodes_from_table(nodeDF, label_col = label) %>%
  add_edges_from_table(edgeDF, from, to, id_external) %>%
  add_global_graph_attrs("layout", "dot", "graph") %>%
  render_graph()

@tvarju
Copy link
Contributor

tvarju commented Aug 28, 2018

@stevepowell99 no, it does not implement the dot notation, as you have written in a previous comment it would only solve 1 of the three requirements. I think as this issue is closed now it is time to open a new one for that discussion. Keep the issue tracker tidy, one issue per thread.

@flyaflya
Copy link
Contributor

Thanks @tvarju @rich-iannone and @Enchufa2 ! This is great. The cluster label placement is also changeable via:
%>% add_global_graph_attrs(attr = "labelloc", value = "b", attr_type = "graph")

Here is a pic of an example I made:

image

@vorpalvorpal
Copy link

I have implemented nested clusters by creating a clusters_df dataframe with the following structure:

clusters_df <- tibble(id = c(1,2,3,4),
                    graph = list(list(label = "parent"), list(label = "child"), list(label = "child"), list(label = "grandchild")),
                    node = NA,
                    edge = NA,
                    id.parent = c(NA, 1,1,2))

where graph, node and edge are lists of named lists with each named list representing graph/node/edge [foo = bar] and id.parent is the id of the parent cluster.

I then changed the cluster section of generate_dot.R to be the following:

      } else if ('cluster' %in% colnames(nodes_df)) {
        # add required columns if they don't exist
        if(!"graph" %in% colnames(clusters_df)){
          clusters_df <- add_column(clusters_df, graph = NA)
        }
        if(!"node" %in% colnames(clusters_df)){
          clusters_df <- add_column(clusters_df, node = NA)
        }
        if(!"edge" %in% colnames(clusters_df)){
          clusters_df <- add_column(clusters_df, edge = NA)
        }
        if(!'id.parent' %in% colnames(clusters_df)){
          clusters_df <- add_column(clusters_df, id.parent = NA)
        }
        clusters_df$id.parent.parent <- clusters_df$id.parent
        clusters_df <- mutate(clusters_df, id.depth = if_else(is.na(id.parent), 0, 1))
       
        # Find depth of youngest child
        while(length(unique(clusters_df$id.parent.parent)) > 1){
          clusters_df <-
            clusters_df %>% 
            select(-id.parent.parent) %>%
            left_join(select(clusters_df, id, id.parent.parent),
                      by = c("id.parent" = "id")) %>% 
            mutate(id.depth = if_else(is.na(id.parent.parent), 
                                       id.depth, 
                                       id.depth + 1))
        }
        # arrange so that most deep cluster is added first
        clusters_df <- 
          clusters_df %>% 
          arrange(desc(id.depth)) %>% 
          select(id, id.parent, id.depth, graph, node, edge) 
        
        df <- tribble(~id.parent, ~id.depth, ~gv.str)
        walk(unique(clusters_df$id.depth), 
                          \(d){
          pwalk(filter(clusters_df, id.depth == d), 
                        \(id, id.parent, id.depth, graph, node, edge){
            # convert named lists to dot character strings
            if(!(is.null(graph) | is.na(graph))){
              graph <- paste0("graph [", 
                              paste(names(graph), 
                                    graph, 
                                    sep = " = ", collapse = ", "), 
                              "]\n")
              }else graph <- '' 
            if(!(is.null(node) | is.na(node))){
              node <- paste0("node [", 
                             paste(names(node), 
                                   node, 
                                   sep = " = ", collapse = ", "), 
                             "]\n")
              }else node <- '' 
            if(!(is.null(edge) | is.na(edge))){
              edge <- paste0("edge [", 
                             paste(names(edge), 
                                   edge, 
                                   sep = " = ", collapse = ", "), 
                             "]\n")
              }else edge <- '' 
            subclusters <-
              df %>% 
              filter(id.parent == id) %>% 
              pull(gv.str) %>% 
              str_trim() %>% 
              paste0(collapse = ";\n")
            nodes <- 
              node_block[which(nodes_df$cluster == id)] %>% 
              str_trim() %>% 
              paste0(collapse = ";\n")
            # combine dot strings into single cluster string           
            gv.str <- paste0("subgraph cluster", id, " {\n", 
                             graph, node, edge,
                             subclusters, "\n",
                             nodes, "\n}\n"
            )
            df <<- add_case(df, id.parent = id.parent, id.depth = d, gv.str = gv.str)
          })
        })
        gv.str <-
          df %>% 
          filter(id.depth == 0) %>% 
          pull(gv.str) %>% 
          paste0(collapse = "\n")
        
        node_block <- c(gv.str, node_block[which(is.na(nodes_df$cluster))])
        
        # clean up
        rm(df, gv.str)
      }

This is a bit ugly, but if you don't mind the approach I could try submitting a PR.

@trenvort
Copy link

hi, can u showme how this map looks in an image:
digraph MapaConceptual {
rankdir=LR;
node [shape=box, style="filled, rounded", fillcolor="#F9F9F9"];

subgraph cluster_0 {
    label="Evaluaciones en México y América Latina en 2012";
    style=filled;
    fillcolor="#D9EDF7";
    fontcolor="#154360";
    fontname="Arial";
    
    node [fillcolor="#2980B9", fontcolor="white"];
    PruebasDeAprendizaje [label="Pruebas de aprendizaje"];
    ExperienciaMéxico [label="Experiencia de México en pruebas estandarizadas"];
    
    node [fillcolor="#3498DB"];
    Normal [label="Exámenes de ingreso a normal"];
    IDANIS [label="IDANIS"];
    EstudioPrimaria [label="Estudio de Evaluación de Primaria"];
    
    node [fillcolor="#2980B9"];
    ANMEB [label="Acuerdo Nacional para la Modernización de la Educación Básica (ANMEB)"];
    ProgramaCarreraMagisterial [label="Programa de Carrera Magisterial"];
    
    node [fillcolor="#3498DB"];
    PARE [label="Programa para Abatir el Rezago Educativo (PARE)"];
    
    node [fillcolor="#2980B9"];
    OCDE [label="Ingreso de México a la OCDE"];
    PISA [label="Participación en el proyecto PISA"];
    TIMSS [label="Participación en el Tercer Estudio Internacional de Matemáticas y Ciencias (TIMSS)"];
    LLECE [label="Participación en el Laboratorio Latinoamericano de Evaluación de la Calidad Educativa (LLECE)"];
    
    node [fillcolor="#3498DB"];
    ReformaCurricular [label="Reforma curricular"];
    PruebasEstándaresNacionales [label="Pruebas de Estándares Nacionales"];
    
    node [fillcolor="#2980B9"];
    IDANIS [label="IDANIS"];
    OlimpiadasConocimiento [label="Olimpiadas del Conocimiento"];
    
    node [fillcolor="#3498DB"];
    INEE [label="Instituto Nacional para la Evaluación de la Educación (INEE)"];
    
    node [fillcolor="#2980B9"];
    EXCALE [label="Exámenes de la Calidad y el Logro Educativo (EXCALE)"];
    ENLACE [label="Exámenes Nacionales del Logro Educativo en Centros Escolares (ENLACE)"];
}

subgraph cluster_1 {
    label="Desarrollo de pruebas y consecuencias negativas";
    style=filled;
    fillcolor="#E8DAEF";
    fontcolor="#6C3483";
    fontname="Arial";
    
    node [fillcolor="#913D88", fontcolor="white"];
    DesarrolloPruebas [label="Desarrollo de nuevos instrumentos de evaluación"];
    MejoraCalidad [label="Mejora de la calidad técnica del INEE"];
    
    node [fillcolor="#9B59

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants