Add defined behavior for duplicate edges #75

fedarko · 2017-08-09T21:50:07Z

From @fedarko on August 4, 2017 1:41

I think they're used in layout but not rendered at present? ==> Yep, just looked over my code and that seems to be the case. Options:

raise error in collate.py
give a warning and don't incorporate them in layout -- leaning toward this, or the above
incorporate them in layout and don't render them in JS (this isn't a good idea, but it's what's happening now)
incorporate them in layout and render them in JS
incorporate them in layout and have them toggleable in JS?

Copied from original issue: fedarko/MetagenomeScope#258

fedarko · 2017-08-09T21:50:08Z

also need to consider how these edges affect statistics; right now they're all included in the total edge count, but that behavior isn't necessarily set in stone

fedarko · 2017-08-09T21:50:08Z

Another downside of implicit edge IDs (described in the message for commit d7eaf4a above) is that it slightly messes up the ID displayed for edges adjacent to collapsed clusters (since the source/target change). All that's different is the ID displayed in the selected info table is different -- not a big deal, but still the sort of thing that shows that this workaround is an inconvenience and should eventually be reverted.

fedarko · 2018-01-20T04:40:28Z

It's looking like certain GFA files, in particular, can have these defined (I think fastg2gfa can produce them). So supporting this behavior (likely through just the "warning" thing, or maybe just ignoring them unless the user requests the warnings to be output) would be a good option.

We could also output, like, a warnings.txt file or something that contains all duplicated edges. That'd avoid the costs of printing all those duplicates to stdout.

Anyway, this is kind of a lower priority issue since tools like gfaview can apparently be used to remove redundant edges in files like this. I'd still like to get this addressed, though.

I also should get to work on adding a solution to #75, in the vein of fully supporting the output of tools like fastg2gfa. But this is a cool first-ish step in that direction! This sort of solution might be desirable to implement for LastGraph and MetaCarvel GML files, but since I don't think either of those filestypes usually have links interspersed with node definitions just supporting this in GFA files should be fine for now. (I can make a separate issue for that later, if that's requested.)

fedarko · 2018-01-20T07:29:52Z

A simple solution for this is to just check (upon parsing any given edge declaration) if the edge in question currently exists in the graph. (Either by checking nodeid2outgoingnodeids for GFA and FASTG files, or by checking nodeid2obj[src_id].outgoing_nodes for LastGraph and GML files.) If so, add the edge; if not, ignore it.

Pseudocode for the GFA/FASTG approach:

if src_id in nodeid2outgoingnodeids:
    if snk_id in nodeid2outgoingnodeids[src_id]:
        continue

Pseudocode for the LastGraph/GML approach:

if nodeid2obj[snk_id] in nodeid2obj[src_id].outgoing_nodes:
    continue

For filetypes which have implied edges (LastGraph/sometimes GFA), this also takes care of redundant edges brought about by implication like following --

A -> -B
B -> -A

-- in which the first edge implies the second edge. In this case, we'd just parse the first edge declaration, and then when we'd get to the second one it would already be in the graph, so we could just ignore it.

-Reorder the post-clean HMP dataset (august1) to be higher up in the order, so it gets generated earlier. -Comment out the pre-clean HMP dataset's generation invocation, since it's super slow. Once we fix the duplicate edges thing (#75), it should be possible to re-add this back into the script. -Add the -spqr flag to the Marygold paper graph, since that one we definitely want to have SPQR data for (I'm pretty sure this version of the demo graph generation script predates the -spqr option in the preprocessing script, which is why none of the invocations here had -spqr flags).

Also added comments addressing #75, #67, #71 in test code. Writing preprocessing script tests should be a lot quicker now, ideally.

fedarko · 2019-09-15T04:49:37Z

Yeah, easiest solution is just throwing an error if these are found. no need to support these as far as i can tell

still should account for duplicate edges, though (#75)

Towards #75

Progress on #75! Now just need to generalize this behavior to other parsers ;)

fedarko · 2020-11-04T22:46:23Z

If we want to support showing de Bruijn graphs in the more "conventional" way (e.g. SPAdes graphs with edges actually shown as edges, unlike Bandage where the edges are shown as nodes) then we'll need to cross this bridge eventually. Likely not urgent for now, though, since this shift would likely involve a lot of other changes besides this (e.g. completely different pattern detection algorithms, I think...?).

FWIW AGB does support visualizing these sorts of graphs

Although the parsing functions return nx.MultiDiGraph objects, they can't actually handle graphs containing duplicate edges. This is due to the other libraries we are using to parse these graphs -- GfaPy and pyfastg, respectively. For FASTG, I'm not sure there is a need to update this behavior, but for GFA it might be worth addressing this. Although this would involve making a change to GfaPy, I think -- that or doing some weird custom stuff to define our own link classes? -- e.g. https://github.com/ggonnella/gfapy/blob/f150d9e0ca6f82f522bdb5b61669ff27837bd294/gfapy/line/common/connection.py#L99-L110 ... So it might be easier to just revert to using an ad hoc line-by- line GFA parser, rather than GfaPy. But let's not worry about that for now.

fedarko self-assigned this Aug 9, 2017

fedarko added the highpriorityfeature label Aug 9, 2017

fedarko mentioned this issue Aug 9, 2017

Add defined behavior for duplicate edges fedarko/MetagenomeScope#258

Closed

fedarko added a commit that referenced this issue Jan 20, 2018

Add comments to collate.py re: #75

0f62f1c

fedarko mentioned this issue Jul 9, 2018

Automating index.html demo dialog generation #113

Closed

fedarko added a commit that referenced this issue Oct 17, 2018

Add utils helper functions and loop.gfa test

0c535cb

Also added comments addressing #75, #67, #71 in test code. Writing preprocessing script tests should be a lot quicker now, ideally.

fedarko added a commit that referenced this issue Sep 15, 2019

TST: done-ish with unit-testing LG validation

23ddb62

still should account for duplicate edges, though (#75)

fedarko added a commit that referenced this issue Sep 15, 2019

MAINT: ensure gmls directed and non-multigraph

e6397ba

Towards #75

fedarko added a commit that referenced this issue Sep 15, 2019

MAINT/TST: Throw err if dupl. edges found in LG!

9850d49

Progress on #75! Now just need to generalize this behavior to other parsers ;)

fedarko mentioned this issue Dec 2, 2020

Explicitly handle/test duplicate edges in the decomposed graph #187

Open

fedarko mentioned this issue Mar 31, 2023

Support duplicate (aka parallel) edges in GFA / FASTG files? #239

Open

2 tasks

fedarko added a commit to fedarko/MetagenomeScope-1 that referenced this issue Mar 31, 2023

TST: upd8 gml parsing tests re allow dup edges marbl#75

06400f7

fedarko linked a pull request May 16, 2023 that will close this issue

[WIP] Support DOT inputs (and multigraphs); much improved boundary node splitting in pattern identification; add back frayed rope support; lots of other improvements #244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add defined behavior for duplicate edges #75

Add defined behavior for duplicate edges #75

fedarko commented Aug 9, 2017

fedarko commented Aug 9, 2017

fedarko commented Aug 9, 2017

fedarko commented Jan 20, 2018 •

edited

Loading

fedarko commented Jan 20, 2018 •

edited

Loading

fedarko commented Sep 15, 2019

fedarko commented Nov 4, 2020

Add defined behavior for duplicate edges #75

Add defined behavior for duplicate edges #75

Comments

fedarko commented Aug 9, 2017

fedarko commented Aug 9, 2017

fedarko commented Aug 9, 2017

fedarko commented Jan 20, 2018 • edited Loading

fedarko commented Jan 20, 2018 • edited Loading

fedarko commented Sep 15, 2019

fedarko commented Nov 4, 2020

fedarko commented Jan 20, 2018 •

edited

Loading

fedarko commented Jan 20, 2018 •

edited

Loading