Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add defined behavior for duplicate edges #75

Open
fedarko opened this issue Aug 9, 2017 · 6 comments · May be fixed by #244
Open

Add defined behavior for duplicate edges #75

fedarko opened this issue Aug 9, 2017 · 6 comments · May be fixed by #244
Assignees

Comments

@fedarko
Copy link
Member

fedarko commented Aug 9, 2017

From @fedarko on August 4, 2017 1:41

I think they're used in layout but not rendered at present? ==> Yep, just looked over my code and that seems to be the case. Options:

  • raise error in collate.py
  • give a warning and don't incorporate them in layout -- leaning toward this, or the above
  • incorporate them in layout and don't render them in JS (this isn't a good idea, but it's what's happening now)
  • incorporate them in layout and render them in JS
  • incorporate them in layout and have them toggleable in JS?

Copied from original issue: fedarko/MetagenomeScope#258

@fedarko
Copy link
Member Author

fedarko commented Aug 9, 2017

also need to consider how these edges affect statistics; right now they're all included in the total edge count, but that behavior isn't necessarily set in stone

@fedarko
Copy link
Member Author

fedarko commented Aug 9, 2017

Another downside of implicit edge IDs (described in the message for commit d7eaf4a above) is that it slightly messes up the ID displayed for edges adjacent to collapsed clusters (since the source/target change). All that's different is the ID displayed in the selected info table is different -- not a big deal, but still the sort of thing that shows that this workaround is an inconvenience and should eventually be reverted.

@fedarko
Copy link
Member Author

fedarko commented Jan 20, 2018

It's looking like certain GFA files, in particular, can have these defined (I think fastg2gfa can produce them). So supporting this behavior (likely through just the "warning" thing, or maybe just ignoring them unless the user requests the warnings to be output) would be a good option.

We could also output, like, a warnings.txt file or something that contains all duplicated edges. That'd avoid the costs of printing all those duplicates to stdout.

Anyway, this is kind of a lower priority issue since tools like gfaview can apparently be used to remove redundant edges in files like this. I'd still like to get this addressed, though.

fedarko added a commit that referenced this issue Jan 20, 2018
I also should get to work on adding a solution to #75, in the
vein of fully supporting the output of tools like fastg2gfa. But this
is a cool first-ish step in that direction!

This sort of solution might be desirable to implement for LastGraph
and MetaCarvel GML files, but since I don't think either of those
filestypes usually have links interspersed with node definitions
just supporting this in GFA files should be fine for now. (I can
make a separate issue for that later, if that's requested.)
@fedarko
Copy link
Member Author

fedarko commented Jan 20, 2018

A simple solution for this is to just check (upon parsing any given edge declaration) if the edge in question currently exists in the graph. (Either by checking nodeid2outgoingnodeids for GFA and FASTG files, or by checking nodeid2obj[src_id].outgoing_nodes for LastGraph and GML files.) If so, add the edge; if not, ignore it.

Pseudocode for the GFA/FASTG approach:

if src_id in nodeid2outgoingnodeids:
    if snk_id in nodeid2outgoingnodeids[src_id]:
        continue

Pseudocode for the LastGraph/GML approach:

if nodeid2obj[snk_id] in nodeid2obj[src_id].outgoing_nodes:
    continue

For filetypes which have implied edges (LastGraph/sometimes GFA), this also takes care of redundant edges brought about by implication like following --

A -> -B
B -> -A

-- in which the first edge implies the second edge. In this case, we'd just parse the first edge declaration, and then when we'd get to the second one it would already be in the graph, so we could just ignore it.

fedarko added a commit that referenced this issue Jan 20, 2018
fedarko added a commit that referenced this issue Sep 10, 2018
-Reorder the post-clean HMP dataset (august1) to be higher up in the
 order, so it gets generated earlier.
-Comment out the pre-clean HMP dataset's generation invocation, since
 it's super slow. Once we fix the duplicate edges thing (#75), it
 should be possible to re-add this back into the script.
-Add the -spqr flag to the Marygold paper graph, since that one we
 definitely want to have SPQR data for (I'm pretty sure this version
 of the demo graph generation script predates the -spqr option in the
 preprocessing script, which is why none of the invocations here had
 -spqr flags).
fedarko added a commit that referenced this issue Oct 17, 2018
Also added comments addressing #75, #67, #71 in test code.

Writing preprocessing script tests should be a lot quicker now,
ideally.
@fedarko
Copy link
Member Author

fedarko commented Sep 15, 2019

Yeah, easiest solution is just throwing an error if these are found. no need to support these as far as i can tell

fedarko added a commit that referenced this issue Sep 15, 2019
still should account for duplicate edges, though (#75)
fedarko added a commit that referenced this issue Sep 15, 2019
fedarko added a commit that referenced this issue Sep 15, 2019
Progress on #75! Now just need to generalize this behavior to
other parsers ;)
@fedarko
Copy link
Member Author

fedarko commented Nov 4, 2020

If we want to support showing de Bruijn graphs in the more "conventional" way (e.g. SPAdes graphs with edges actually shown as edges, unlike Bandage where the edges are shown as nodes) then we'll need to cross this bridge eventually. Likely not urgent for now, though, since this shift would likely involve a lot of other changes besides this (e.g. completely different pattern detection algorithms, I think...?).

FWIW AGB does support visualizing these sorts of graphs

fedarko added a commit to fedarko/MetagenomeScope-1 that referenced this issue Mar 31, 2023
Although the parsing functions return nx.MultiDiGraph objects,
they can't actually handle graphs containing duplicate edges.

This is due to the other libraries we are using to parse these
graphs -- GfaPy and pyfastg, respectively.

For FASTG, I'm not sure there is a need to update this behavior,
but for GFA it might be worth addressing this. Although this would
involve making a change to GfaPy, I think -- that or doing some weird
custom stuff to define our own link classes? -- e.g.
https://github.com/ggonnella/gfapy/blob/f150d9e0ca6f82f522bdb5b61669ff27837bd294/gfapy/line/common/connection.py#L99-L110
... So it might be easier to just revert to using an ad hoc line-by-
line GFA parser, rather than GfaPy. But let's not worry about that for
now.
fedarko added a commit to fedarko/MetagenomeScope-1 that referenced this issue Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment