Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Draw entire "single" graphs for assembly graphs #10

Closed
fedarko opened this issue Jul 22, 2016 · 11 comments
Closed

Draw entire "single" graphs for assembly graphs #10

fedarko opened this issue Jul 22, 2016 · 11 comments

Comments

@fedarko
Copy link
Owner

fedarko commented Jul 22, 2016

This would make understanding really large graphs a bit easier, but for close analysis using a double graph is probably better. However, having this option available (I guess we'd figure it out in the Python script) would make things easier.

Maybe we could have the Python script automatically lay out a "single graph," and then store both the RC and non-RC information in the hypothetical database file generated? This would allow the Javascript UI to overlay RC nodes on top of non-RC nodes, if the user requests a double graph. I think this is similar to what Bandage does.

Note that this is only really feasible for contig assembly inputs (e.g. LastGraph files), not for scaffold inputs (e.g. GraphML files from BAMBUS).

@fedarko
Copy link
Owner Author

fedarko commented Jul 28, 2016

Also, worth noting that we can/should probably omit arrows on edges when drawing single graphs, since each contig is directionless. This would make drawing the graph even faster.

@fedarko
Copy link
Owner Author

fedarko commented Aug 8, 2016

^^^^Actually, that's the opposite of what we should do for single graphs, isn't it? Keep arrows on edges to indicate what node is the source and what node is the sink, but change the node shapes to squares, or hexagons, or rectangles, or something to indicate that the "single" nodes lack direction.

^^^ Update: wait, never mind, edges are directionless in single graphs

@fedarko
Copy link
Owner Author

fedarko commented Aug 10, 2016

Okay. So, what AsmViz currently does for contig assemblies is it draws a
"double graph," representing both positive and negative contigs (equivalent to
3' -> 5' and 5' -> 3' DNA sequences) as distinct nodes in the graph. This lets
us more clearly examine the directionality of the graph, since we know all
edges have directionality.

However, another way to go about visualizing a contig assembly graph is
drawing a "single graph," in which +/- nodes are collapsed into a single node.
This halves the number of nodes and edges in the graph, making it easier to
understand.

As a point of reference, the following edge definitions (copied
from sample2_LastGraph) would result in the following edges added for double
graphs:

ARC 1 -60 78
1 -> c60, 60 -> c1
ARC -1 -54 73
c1 -> c54, 54 -> 1
ARC 2 45 66
2 -> 45, c45 -> c2
ARC -2 -30 71
c2 -> c30, 30 -> 2
ARC 3 48 86
3 -> 48, c48 -> c3
ARC -3 9 78
c3 -> 9, c9 -> 3
ARC 4 -57 69
4 -> c57, 57 -> c4
ARC -4 -57 74
c4 -> c57, 57 -> 4
ARC 5 -30 70
5 -> c30, 30 -> c5
ARC -5 -20 73
c5 -> c20, 20 -> 5

It makes sense, right? In LastGraph, each edge definition actually represents
two edges: one from A to B, and one from -B to -A (with same multiplicity).
These edges have clear directionality, and you can clearly check for
structural patterns in the resulting graph.

Now, here's the resulting edge additions generated for a single graph:

ARC 1 -60 78
1 -- 60
ARC -1 -54 73
1 -- 54
ARC 2 45 66
2 -- 45
ARC -2 -30 71
2 -- 30
ARC 3 48 86
3 -- 48
ARC -3 9 78
3 -- 9
ARC 4 -57 69
4 -- 57
ARC -4 -57 74
4 -- 57
ARC 5 -30 70
5 -- 30
ARC -5 -20 73
5 -- 20

Not only are nodes directionless, but edges are also directionless.
This means that single graphs can contain duplicate edges, as seen with the
4 -- 57 edges. (We can resolve this in Cytoscape.js by just
adding arbitrary numbers on to the end of the edge ID, if we detect duplicates
-- that part should not be that difficult.)

The real issue, in my opinion, comes when detecting structural patterns in the
graph in the python script. All the patterns we detect (bubbles, frayed ropes, chains, cyclic
chains) are significant because of the ordering of their underlying nodes,
right? So if edges are directionless, it seems like detecting these patterns
wouldn't really be feasible--or might give incorrect information (?).

I could very well be wrong here, so please feel free to correct me -- but it seems to me like single graphs might not work well with pattern detection.

@fedarko
Copy link
Owner Author

fedarko commented Aug 10, 2016

So here's what I'm thinking: if we switch from storing nodes' neighbors as outgoing_nodes and incoming_nodes to storing nodes' neighbors as adjacent_nodes, then we could probably approximate most of pattern detection with that. I was thinking we could store neighbors as adjacent_to_positive_nodes and adjacent_to_negative_nodes, where both groups of neighbors are connected to the node but separated by port/etc, but that wouldn't work, right? Since nodes have only half the edges and the edges don't really have +/- data associated with them (despite what Bandage says).

There are going to be some strange things associated with this, including:

  • Where do we decide to put the headport/tailport for a node's edges? If we took the second approach to storing adjacencies then we could, say, make all connections to the "positive" side of the node hit the headport of this node, and make all connections to the "negative" side of the node hit the tailport of this node. But that's necessarily going to look sort of weird for some cases, since we only pick one edge from each edge declaration in LastGraph...?

@fedarko
Copy link
Owner Author

fedarko commented Aug 19, 2016

Evidently Jay (in the Pop Lab) has done work on this, and we might be able to utilize some of what he's done.

@fedarko
Copy link
Owner Author

fedarko commented Dec 26, 2016

So, wait a sec. Didn't the first version of collate.py I wrote in the Summer create single graphs?

Maybe just use that and stop overthinking this?

Just draw out some examples and see where node patterns would be identified, how edges should be positioned, etc. It'll take some time but the clarification will be worth it.

@fedarko
Copy link
Owner Author

fedarko commented Jan 20, 2017

So there's options here:

  • double graph (double nodes, double edges) -- what we currently support
  • single graph (single nodes, single edges) -- what Bandage considers "single graphs"
  • single graph with toggling supported between a node and its reverse complement, with edge ports indicating directionality -- what ABySS-Explorer does

I like ABySS-Explorer's approach: it seems like it reveals more information than Bandage, while having the simplicity of a single graph layout.

fedarko added a commit that referenced this issue Mar 1, 2017
See #10 for the full story here. Basically, single graphs might
not be that useful for the sort of analysis we want to do using this
tool -- in Bambus 3's output, contigs already have an orientation
while in LastGraph/GFA output that isn't a guarantee.

Our current behavior is to auto-draw double graphs for LastGraph and
GFA files, and auto-draw just the graph structure (I don't know if
you'd call it a "single graph") for Bambus 3 GML files.

Using explicit "double graphs" for GML files just results in 2x the
connected components with no real added information (since nodes
already have an orientation -- all nodes are "positive", so reverse
complementing all nodes in the graph just creates new connections
between the new negative nodes).

Using "single graphs" for
LastGraph/GFA files in general, though, does not keep the same level
of information -- there exists the possibility for a given node and
its RC to be in the same connected component, and nodes + edges
must be treated as directionless. The effect of this is that
analyzing the structural properties of the graph becomes a decent
amount harder (not to mention that just laying out the graph
hierarchically depends on the graph being a digraph). Essentially,
we could include this functionality but it wouldn't really tie in
with any of the Bambus 3 stuff or with most of the other features
AsmViz has right now.

As long as we're explicit in the README about what sorts of drawings
are produced, I think keeping the current policy (default structure
for Bambus 3 GML files, double graphs for other filetypes) should be
alright. The tool kind of focuses on scaffold graphs, anyway.
@fedarko
Copy link
Owner Author

fedarko commented Mar 1, 2017

Main takeaway: our current approach is fine for Bambus 3 GML output. Just be explicit in the README/documentation/etc. and this should be alright.

fedarko added a commit that referenced this issue May 29, 2017
We can eventually use these edges to draw "single graphs" using, say,
neato/circo/fdp/sfdp/twopi. (Of course, that'd be using PyGraphViz.)

Furthermore, single graph information like this will be useful in
generating SPQR trees (since the input graphs for those are assumed
to be undirected) for the corresponding graph.

Also it might be a cool idea to eventually draw single graphs with
"double" edges, where instead of using 2 nodes we just give
each edge a specific headport/tailport configuration. However, that's
not really a super important feature at present -- so not going to
bother with that for now. (something to think of re: #10)
@fedarko
Copy link
Owner Author

fedarko commented Jun 15, 2017

When no biconnected components are present within a given connected component of a graph, a "single graph" is basically drawn. It's certainly possible to extend this functionality to draw single graphs for LastGraph/GFA files, but since pattern detection wouldn't be an option then (without the use of polarity ports or something) I don't really know how useful this would be.

@fedarko fedarko changed the title Draw single graphs (with 3'->5' and 5'->3' nodes collapsed into a single node) Draw entire "single" graphs for assembly graphs Jun 15, 2017
@fedarko
Copy link
Owner Author

fedarko commented Jul 26, 2017

Somewhat related: it might be worth adding a command-line option to collate.py that, for a GFA/LastGraph/etc file, assumes that contigs are already oriented (and thus doesn't create implied negative nodes/edges). Since MetaCarvel can produce GFA output, implementing something like this would be worthwhile.

(We'd also probably have to adjust the JS code for the viewer interface to -- instead of just treating ASM_FILETYPE === "GML" as the basis for graphs with oriented contigs -- getting some corresponding bool from the db re: the contigs being oriented or unoriented, and operating accordingly.)

@fedarko
Copy link
Owner Author

fedarko commented Aug 9, 2017

This issue was moved to marbl/MetagenomeScope#4

@fedarko fedarko closed this as completed Aug 9, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant