Notes on graph/connected component/file size limits #134

fedarko · 2017-01-10T02:22:23Z

For GraphViz (dot)

I'm not sure here. Per Emden Gansner, there isn't any set upper limit on graph size, although lots of nodes/edges can cause "an explosion in the problem size" due to how dot handles edges crossing multiple levels.

Gansner recommends a general upper limit of 1000 nodes per drawing (see the second link in the previous paragraph), which is not really practical for our purposes at all -- however, we know from experience with the Shakya dataset that laying out larger components isn't that bad (the first component of Shakya has around 6k nodes and 9.8k edges, and takes around five minutes to lay out using dot). Furthermore, since the conceit of AsmViz involves precomputing the layouts, I don't think worrying about efficiency at this part of the program is a super significant issue.

For huge, huge connected components, we have a few options (and almost certainly more that I'm not aware of):

Try using desktop Cytoscape's hierarchical graph layout option -- not sure how fast it is in comparison to dot, but it's worth a shot. If it's much faster than dot, then we can just refer users to desktop Cytoscape from collate (???).
Use the nslimit/nslimitl parameters to bound the upper limit of iterations in dot at the cost of worse graph drawing. I really like this option -- we could even have this be up to the user of collate, so that they could personally make the tradeoff of drawing quality vs. time/memory used.
Force the user to split up the graph at certain places within components (???)

Worth noting that for one of the datasets Jay sent (it's in my testgraphs/ directory as 20170226_oriented.gml) has a largest connected component of around 19,000 nodes. Trying to lay out the .gv file generated for this component (which, due to backfilling, has 15,569 nodes and 31,322 edges) just crashes dot with a segfault -- running the layout operation from within PyGraphviz also results in a crash, naturally. So the upper limit in connected component size is maybe somewhere around 15k nodes as a most optimistic guess? (The massive amount of edges probably also factors into this, also.) At this size of connected component we can focus on SPQR decompositions/desktop Cytoscape/etc.

For sqlite3 .db files

Assuming we go the "no DNA" approach, we can assume that most large graphs will generally approach having their main "size"-causing factors be node count and edge count (this relies upon the assumption that the amount of connected components and clusters, as related to the number of nodes/edges, will be somewhat uniform).

I've read that, at least in one environment, the upper limit of .db files in sql.js is around 120M.

Since the size of the shakya.db file is about 4.6M on my system (and would be, likely, somewhere around 4.8M if it included edge multiplicity and node G/C content data), we can do some (very approximate) math using this + the 120M figure as a basis to estimate the maximum total number of nodes and edges:

(my phone's camera is not that great, sorry)

So we can say that, only considering the .db side of things, the maximum amount of nodes is somewhere around 500,000.

For Cytoscape.js' renderer

I know it's somewhere on the order of tens of thousands of nodes, but I'd have to check. (TODO: add here)

The text was updated successfully, but these errors were encountered:

fedarko · 2017-08-09T21:44:31Z

This issue was moved to marbl/MetagenomeScope#28

fedarko added note vieweroptimization labels Jan 10, 2017

fedarko self-assigned this Jan 10, 2017

fedarko mentioned this issue Aug 9, 2017

Notes on graph/connected component/file size limits marbl/MetagenomeScope#28

Open

fedarko closed this as completed Aug 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on graph/connected component/file size limits #134

Notes on graph/connected component/file size limits #134

fedarko commented Jan 10, 2017 •

edited

fedarko commented Aug 9, 2017

Notes on graph/connected component/file size limits #134

Notes on graph/connected component/file size limits #134

Comments

fedarko commented Jan 10, 2017 • edited

For GraphViz (dot)

For sqlite3 .db files

For Cytoscape.js' renderer

fedarko commented Aug 9, 2017

fedarko commented Jan 10, 2017 •

edited