paths from rGFA #11

egoltsman · 2020-04-15T02:57:15Z

Hello,
I'm working with a large pan-genome and starting to explore options of working with it in a graph context. I created an rGFA graph using minigraph (testing with just 3 samples for now). Now I'd like to derive from it a set of sample-coherent paths. Reading the discussion you had in issue #1 it sounds like at the time you were in the process of formalizing this feature in gfatools. Has there been much progress in this area? It seems like one should be able to pick a vertex and do a BFS where only edges to ranks equal or lower than the starting segment (sample) are considered. Am I interpreting the meaning of rank correctly here?
Thanks!

lh3 · 2020-04-15T03:09:41Z

#1 hasn't been formalized. gfatools may support paths, but it won't happen soon.

It seems like one should be able to pick a vertex and do a BFS where only edges to ranks equal or lower than the starting segment (sample) are considered.

This works in simple cases, but generally not reliable. The better approach is to map sequences back to the graph and trace the alignment path.

egoltsman · 2020-04-15T19:34:13Z

I see. But if the segment coordinates are stable relative to the linear reference (if my understanding is correct), then at least all the sample-specific segments, i.e., any new segments formed after the sample is added, can be "traced" along the reference, right? Why is it better to re-map and trace the alignment path?

egoltsman · 2020-04-15T20:07:56Z

Here's where my confusion is coming from: the description in doc/rGRA.md and what I'm observing in my results tell me that the SO tag specifies the "offset on the stable sequence" which I presume to mean the input chromosome/scaffold/contig from the added sample. The illustration in that document suggests instead that the SO offset is relative to the reference sequence (only one segment shows the offest of 0 - the s1:chr1), which implies a single coordinate system for all sample segments. If it's the former, then my suggestion in the previous comment obviously won't work.

Or am I not interpreting the illustration correctly ?

Thank you

egoltsman · 2020-06-12T18:25:20Z

Ok, I see it now. The illustration is for a case where all three samples have the first 5 bases in common, and so all have the same 0th coordinate, so to speak.

lh3 added the question Further information is requested label Apr 15, 2020

egoltsman closed this as completed Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paths from rGFA #11

paths from rGFA #11

egoltsman commented Apr 15, 2020

lh3 commented Apr 15, 2020

egoltsman commented Apr 15, 2020

egoltsman commented Apr 15, 2020 •

edited

Loading

egoltsman commented Jun 12, 2020

paths from rGFA #11

paths from rGFA #11

Comments

egoltsman commented Apr 15, 2020

lh3 commented Apr 15, 2020

egoltsman commented Apr 15, 2020

egoltsman commented Apr 15, 2020 • edited Loading

egoltsman commented Jun 12, 2020

egoltsman commented Apr 15, 2020 •

edited

Loading