Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paths from rGFA #11

Closed
egoltsman opened this issue Apr 15, 2020 · 4 comments
Closed

paths from rGFA #11

egoltsman opened this issue Apr 15, 2020 · 4 comments
Labels
question Further information is requested

Comments

@egoltsman
Copy link

Hello,
I'm working with a large pan-genome and starting to explore options of working with it in a graph context. I created an rGFA graph using minigraph (testing with just 3 samples for now). Now I'd like to derive from it a set of sample-coherent paths. Reading the discussion you had in issue #1 it sounds like at the time you were in the process of formalizing this feature in gfatools. Has there been much progress in this area? It seems like one should be able to pick a vertex and do a BFS where only edges to ranks equal or lower than the starting segment (sample) are considered. Am I interpreting the meaning of rank correctly here?
Thanks!

@lh3 lh3 added the question Further information is requested label Apr 15, 2020
@lh3
Copy link
Owner

lh3 commented Apr 15, 2020

#1 hasn't been formalized. gfatools may support paths, but it won't happen soon.

It seems like one should be able to pick a vertex and do a BFS where only edges to ranks equal or lower than the starting segment (sample) are considered.

This works in simple cases, but generally not reliable. The better approach is to map sequences back to the graph and trace the alignment path.

@egoltsman
Copy link
Author

I see. But if the segment coordinates are stable relative to the linear reference (if my understanding is correct), then at least all the sample-specific segments, i.e., any new segments formed after the sample is added, can be "traced" along the reference, right? Why is it better to re-map and trace the alignment path?

@egoltsman
Copy link
Author

egoltsman commented Apr 15, 2020

Here's where my confusion is coming from: the description in doc/rGRA.md and what I'm observing in my results tell me that the SO tag specifies the "offset on the stable sequence" which I presume to mean the input chromosome/scaffold/contig from the added sample. The illustration in that document suggests instead that the SO offset is relative to the reference sequence (only one segment shows the offest of 0 - the s1:chr1), which implies a single coordinate system for all sample segments. If it's the former, then my suggestion in the previous comment obviously won't work.

example1

Or am I not interpreting the illustration correctly ?

Thank you

@egoltsman
Copy link
Author

Ok, I see it now. The illustration is for a case where all three samples have the first 5 bases in common, and so all have the same 0th coordinate, so to speak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants