Draw entire "single" graphs for assembly graphs #10
Comments
Also, worth noting that we can/should probably omit arrows on edges when drawing single graphs, since each contig is directionless. This would make drawing the graph even faster. |
^^^^Actually, that's the opposite of what we should do for single graphs, isn't it? Keep arrows on edges to indicate what node is the source and what node is the sink, but change the node shapes to squares, or hexagons, or rectangles, or something to indicate that the "single" nodes lack direction. ^^^ Update: wait, never mind, edges are directionless in single graphs |
|
So here's what I'm thinking: if we switch from storing nodes' neighbors as There are going to be some strange things associated with this, including:
|
Evidently Jay (in the Pop Lab) has done work on this, and we might be able to utilize some of what he's done. |
So, wait a sec. Didn't the first version of collate.py I wrote in the Summer create single graphs? Maybe just use that and stop overthinking this? Just draw out some examples and see where node patterns would be identified, how edges should be positioned, etc. It'll take some time but the clarification will be worth it. |
So there's options here:
I like ABySS-Explorer's approach: it seems like it reveals more information than Bandage, while having the simplicity of a single graph layout. |
See #10 for the full story here. Basically, single graphs might not be that useful for the sort of analysis we want to do using this tool -- in Bambus 3's output, contigs already have an orientation while in LastGraph/GFA output that isn't a guarantee. Our current behavior is to auto-draw double graphs for LastGraph and GFA files, and auto-draw just the graph structure (I don't know if you'd call it a "single graph") for Bambus 3 GML files. Using explicit "double graphs" for GML files just results in 2x the connected components with no real added information (since nodes already have an orientation -- all nodes are "positive", so reverse complementing all nodes in the graph just creates new connections between the new negative nodes). Using "single graphs" for LastGraph/GFA files in general, though, does not keep the same level of information -- there exists the possibility for a given node and its RC to be in the same connected component, and nodes + edges must be treated as directionless. The effect of this is that analyzing the structural properties of the graph becomes a decent amount harder (not to mention that just laying out the graph hierarchically depends on the graph being a digraph). Essentially, we could include this functionality but it wouldn't really tie in with any of the Bambus 3 stuff or with most of the other features AsmViz has right now. As long as we're explicit in the README about what sorts of drawings are produced, I think keeping the current policy (default structure for Bambus 3 GML files, double graphs for other filetypes) should be alright. The tool kind of focuses on scaffold graphs, anyway.
Main takeaway: our current approach is fine for Bambus 3 GML output. Just be explicit in the README/documentation/etc. and this should be alright. |
We can eventually use these edges to draw "single graphs" using, say, neato/circo/fdp/sfdp/twopi. (Of course, that'd be using PyGraphViz.) Furthermore, single graph information like this will be useful in generating SPQR trees (since the input graphs for those are assumed to be undirected) for the corresponding graph. Also it might be a cool idea to eventually draw single graphs with "double" edges, where instead of using 2 nodes we just give each edge a specific headport/tailport configuration. However, that's not really a super important feature at present -- so not going to bother with that for now. (something to think of re: #10)
When no biconnected components are present within a given connected component of a graph, a "single graph" is basically drawn. It's certainly possible to extend this functionality to draw single graphs for LastGraph/GFA files, but since pattern detection wouldn't be an option then (without the use of polarity ports or something) I don't really know how useful this would be. |
Somewhat related: it might be worth adding a command-line option to (We'd also probably have to adjust the JS code for the viewer interface to -- instead of just treating |
This issue was moved to marbl/MetagenomeScope#4 |
This would make understanding really large graphs a bit easier, but for close analysis using a double graph is probably better. However, having this option available (I guess we'd figure it out in the Python script) would make things easier.
Maybe we could have the Python script automatically lay out a "single graph," and then store both the RC and non-RC information in the hypothetical database file generated? This would allow the Javascript UI to overlay RC nodes on top of non-RC nodes, if the user requests a double graph. I think this is similar to what Bandage does.
Note that this is only really feasible for contig assembly inputs (e.g. LastGraph files), not for scaffold inputs (e.g. GraphML files from BAMBUS).
The text was updated successfully, but these errors were encountered: