Draw entire "single" graphs for assembly graphs #10

fedarko · 2016-07-22T17:15:56Z

This would make understanding really large graphs a bit easier, but for close analysis using a double graph is probably better. However, having this option available (I guess we'd figure it out in the Python script) would make things easier.

Maybe we could have the Python script automatically lay out a "single graph," and then store both the RC and non-RC information in the hypothetical database file generated? This would allow the Javascript UI to overlay RC nodes on top of non-RC nodes, if the user requests a double graph. I think this is similar to what Bandage does.

Note that this is only really feasible for contig assembly inputs (e.g. LastGraph files), not for scaffold inputs (e.g. GraphML files from BAMBUS).

fedarko · 2016-07-28T14:53:37Z

Also, worth noting that we can/should probably omit arrows on edges when drawing single graphs, since each contig is directionless. This would make drawing the graph even faster.

fedarko · 2016-08-08T20:56:38Z

^^^^Actually, that's the opposite of what we should do for single graphs, isn't it? Keep arrows on edges to indicate what node is the source and what node is the sink, but change the node shapes to squares, or hexagons, or rectangles, or something to indicate that the "single" nodes lack direction.

^^^ Update: wait, never mind, edges are directionless in single graphs

fedarko · 2016-08-10T00:26:49Z

Okay. So, what AsmViz currently does for contig assemblies is it draws a
"double graph," representing both positive and negative contigs (equivalent to
3' -> 5' and 5' -> 3' DNA sequences) as distinct nodes in the graph. This lets
us more clearly examine the directionality of the graph, since we know all
edges have directionality.

However, another way to go about visualizing a contig assembly graph is
drawing a "single graph," in which +/- nodes are collapsed into a single node.
This halves the number of nodes and edges in the graph, making it easier to
understand.

As a point of reference, the following edge definitions (copied
from sample2_LastGraph) would result in the following edges added for double
graphs:

ARC 1 -60 78
1 -> c60, 60 -> c1
ARC -1 -54 73
c1 -> c54, 54 -> 1
ARC 2 45 66
2 -> 45, c45 -> c2
ARC -2 -30 71
c2 -> c30, 30 -> 2
ARC 3 48 86
3 -> 48, c48 -> c3
ARC -3 9 78
c3 -> 9, c9 -> 3
ARC 4 -57 69
4 -> c57, 57 -> c4
ARC -4 -57 74
c4 -> c57, 57 -> 4
ARC 5 -30 70
5 -> c30, 30 -> c5
ARC -5 -20 73
c5 -> c20, 20 -> 5

It makes sense, right? In LastGraph, each edge definition actually represents
two edges: one from A to B, and one from -B to -A (with same multiplicity).
These edges have clear directionality, and you can clearly check for
structural patterns in the resulting graph.

Now, here's the resulting edge additions generated for a single graph:

ARC 1 -60 78
1 -- 60
ARC -1 -54 73
1 -- 54
ARC 2 45 66
2 -- 45
ARC -2 -30 71
2 -- 30
ARC 3 48 86
3 -- 48
ARC -3 9 78
3 -- 9
ARC 4 -57 69
4 -- 57
ARC -4 -57 74
4 -- 57
ARC 5 -30 70
5 -- 30
ARC -5 -20 73
5 -- 20

Not only are nodes directionless, but edges are also directionless.
This means that single graphs can contain duplicate edges, as seen with the
4 -- 57 edges. (We can resolve this in Cytoscape.js by just
adding arbitrary numbers on to the end of the edge ID, if we detect duplicates
-- that part should not be that difficult.)

The real issue, in my opinion, comes when detecting structural patterns in the
graph in the python script. All the patterns we detect (bubbles, frayed ropes, chains, cyclic
chains) are significant because of the ordering of their underlying nodes,
right? So if edges are directionless, it seems like detecting these patterns
wouldn't really be feasible--or might give incorrect information (?).

I could very well be wrong here, so please feel free to correct me -- but it seems to me like single graphs might not work well with pattern detection.

fedarko · 2016-08-10T15:39:19Z

So here's what I'm thinking: if we switch from storing nodes' neighbors as outgoing_nodes and incoming_nodes to storing nodes' neighbors as adjacent_nodes, then we could probably approximate most of pattern detection with that. I was thinking we could store neighbors as adjacent_to_positive_nodes and adjacent_to_negative_nodes, where both groups of neighbors are connected to the node but separated by port/etc, but that wouldn't work, right? Since nodes have only half the edges and the edges don't really have +/- data associated with them (despite what Bandage says).

There are going to be some strange things associated with this, including:

Where do we decide to put the headport/tailport for a node's edges? If we took the second approach to storing adjacencies then we could, say, make all connections to the "positive" side of the node hit the headport of this node, and make all connections to the "negative" side of the node hit the tailport of this node. But that's necessarily going to look sort of weird for some cases, since we only pick one edge from each edge declaration in LastGraph...?

fedarko · 2016-08-19T20:27:26Z

Evidently Jay (in the Pop Lab) has done work on this, and we might be able to utilize some of what he's done.

fedarko · 2016-12-26T06:37:46Z

So, wait a sec. Didn't the first version of collate.py I wrote in the Summer create single graphs?

Maybe just use that and stop overthinking this?

Just draw out some examples and see where node patterns would be identified, how edges should be positioned, etc. It'll take some time but the clarification will be worth it.

fedarko · 2017-01-20T00:39:34Z

So there's options here:

double graph (double nodes, double edges) -- what we currently support
single graph (single nodes, single edges) -- what Bandage considers "single graphs"
single graph with toggling supported between a node and its reverse complement, with edge ports indicating directionality -- what ABySS-Explorer does

I like ABySS-Explorer's approach: it seems like it reveals more information than Bandage, while having the simplicity of a single graph layout.

See #10 for the full story here. Basically, single graphs might not be that useful for the sort of analysis we want to do using this tool -- in Bambus 3's output, contigs already have an orientation while in LastGraph/GFA output that isn't a guarantee. Our current behavior is to auto-draw double graphs for LastGraph and GFA files, and auto-draw just the graph structure (I don't know if you'd call it a "single graph") for Bambus 3 GML files. Using explicit "double graphs" for GML files just results in 2x the connected components with no real added information (since nodes already have an orientation -- all nodes are "positive", so reverse complementing all nodes in the graph just creates new connections between the new negative nodes). Using "single graphs" for LastGraph/GFA files in general, though, does not keep the same level of information -- there exists the possibility for a given node and its RC to be in the same connected component, and nodes + edges must be treated as directionless. The effect of this is that analyzing the structural properties of the graph becomes a decent amount harder (not to mention that just laying out the graph hierarchically depends on the graph being a digraph). Essentially, we could include this functionality but it wouldn't really tie in with any of the Bambus 3 stuff or with most of the other features AsmViz has right now. As long as we're explicit in the README about what sorts of drawings are produced, I think keeping the current policy (default structure for Bambus 3 GML files, double graphs for other filetypes) should be alright. The tool kind of focuses on scaffold graphs, anyway.

fedarko · 2017-03-01T23:05:18Z

Main takeaway: our current approach is fine for Bambus 3 GML output. Just be explicit in the README/documentation/etc. and this should be alright.

We can eventually use these edges to draw "single graphs" using, say, neato/circo/fdp/sfdp/twopi. (Of course, that'd be using PyGraphViz.) Furthermore, single graph information like this will be useful in generating SPQR trees (since the input graphs for those are assumed to be undirected) for the corresponding graph. Also it might be a cool idea to eventually draw single graphs with "double" edges, where instead of using 2 nodes we just give each edge a specific headport/tailport configuration. However, that's not really a super important feature at present -- so not going to bother with that for now. (something to think of re: #10)

fedarko · 2017-06-15T04:19:40Z

When no biconnected components are present within a given connected component of a graph, a "single graph" is basically drawn. It's certainly possible to extend this functionality to draw single graphs for LastGraph/GFA files, but since pattern detection wouldn't be an option then (without the use of polarity ports or something) I don't really know how useful this would be.

fedarko · 2017-07-26T22:35:30Z

Somewhat related: it might be worth adding a command-line option to collate.py that, for a GFA/LastGraph/etc file, assumes that contigs are already oriented (and thus doesn't create implied negative nodes/edges). Since MetaCarvel can produce GFA output, implementing something like this would be worthwhile.

(We'd also probably have to adjust the JS code for the viewer interface to -- instead of just treating ASM_FILETYPE === "GML" as the basis for graphs with oriented contigs -- getting some corresponding bool from the db re: the contigs being oriented or unoriented, and operating accordingly.)

fedarko · 2017-08-09T21:40:14Z

This issue was moved to marbl/MetagenomeScope#4

fedarko added the highpriorityfeature label Jul 22, 2016

fedarko self-assigned this Jul 22, 2016

fedarko added lowpriorityfeature and removed highpriorityfeature labels Jul 26, 2016

fedarko added highpriorityfeature and removed lowpriorityfeature labels Aug 8, 2016

fedarko added this to the Important Things milestone Aug 8, 2016

fedarko added the help wanted label Aug 10, 2016

fedarko mentioned this issue Aug 10, 2016

"Symmetric" Bubble/Frayed Rope Detection #11

Closed

fedarko added a commit that referenced this issue Jan 3, 2017

#10 prep for single graph support in collate

f82eefa

fedarko added the graphlayout label Jan 5, 2017

This was referenced Jun 2, 2017

Lay out internal structure of metanodes; orient metaedges in SPQR trees; implement SPQR trees in the viewer #173

Closed

Support conditional generation of different graph view types #198

Closed

fedarko added lowpriorityfeature and removed graphlayout help wanted highpriorityfeature labels Jun 15, 2017

fedarko changed the title ~~Draw single graphs (with 3'->5' and 5'->3' nodes collapsed into a single node)~~ Draw entire "single" graphs for assembly graphs Jun 15, 2017

fedarko mentioned this issue Jul 27, 2017

Add command-line option to assume contigs are already oriented #244

Closed

fedarko mentioned this issue Aug 9, 2017

Support discarding redundant connected components for pre-oriented GFA/LastGraph/etc. files marbl/MetagenomeScope#4

Closed

fedarko closed this as completed Aug 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draw entire "single" graphs for assembly graphs #10

Draw entire "single" graphs for assembly graphs #10

fedarko commented Jul 22, 2016 •

edited

fedarko commented Jul 28, 2016

fedarko commented Aug 8, 2016 •

edited

fedarko commented Aug 10, 2016

fedarko commented Aug 10, 2016

fedarko commented Aug 19, 2016

fedarko commented Dec 26, 2016 •

edited

fedarko commented Jan 20, 2017

fedarko commented Mar 1, 2017

fedarko commented Jun 15, 2017

fedarko commented Jul 26, 2017

fedarko commented Aug 9, 2017

Draw entire "single" graphs for assembly graphs #10

Draw entire "single" graphs for assembly graphs #10

Comments

fedarko commented Jul 22, 2016 • edited

fedarko commented Jul 28, 2016

fedarko commented Aug 8, 2016 • edited

fedarko commented Aug 10, 2016

fedarko commented Aug 10, 2016

fedarko commented Aug 19, 2016

fedarko commented Dec 26, 2016 • edited

fedarko commented Jan 20, 2017

fedarko commented Mar 1, 2017

fedarko commented Jun 15, 2017

fedarko commented Jul 26, 2017

fedarko commented Aug 9, 2017

fedarko commented Jul 22, 2016 •

edited

fedarko commented Aug 8, 2016 •

edited

fedarko commented Dec 26, 2016 •

edited