Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overeager link-adding in odgi flip #496

Open
anshumanmohan opened this issue Apr 18, 2023 · 2 comments
Open

Overeager link-adding in odgi flip #496

anshumanmohan opened this issue Apr 18, 2023 · 2 comments

Comments

@anshumanmohan
Copy link

anshumanmohan commented Apr 18, 2023

I think flip is adding new links too eagerly.

Minimal

Here's a minimal example:

H	VN:Z:1.0
S	1	A
S	2	T
S	3	G
P	x	1+,2+,3+	*
L	1	+	2	+	0M

Clearly the path does not require a flip. More subtly, though, the graph is not well-formed: it will fail validate. Running flip and then view has an interesting effect:

H	VN:Z:1.0
S	1	A
S	2	T
S	3	G
P	x	1+,2+,3+	*
L	1	+	2	+	0M
L	2	+	3	+	0M

Now this graph is valid! Interesting, but IMO not flip's job!

Why this exists

I'll create a valid graph that is in need of a flip:

H	VN:Z:1.0
S	1	A
S	2	TTT
S	3	G
P	x	1+,2-,3+	*
L	1	+	2	+	0M
L	2	+	3	+	0M

And now flip and then view shows a reasonable output: a path was flipped, and two links were added in support of the new path.

H	VN:Z:1.0
S	1	A
S	2	TTT
S	3	G
P	x_inv	3-,2+,1-	*
L	1	+	2	+	0M
L	2	+	1	-	0M
L	2	+	3	+	0M
L	3	-	2	+	0M

Fix

The paths that have just been generated need to be siloed off, and only links that are needed by those paths should be added.

Something else is also going on

I do still think that flip is doing something else that's a bit fishy. See note5.gfa from your test suite. I don't think that fixing the above would fix flip's behavior when run against note5. See my comment here: #485 (comment)

@sampsyo
Copy link
Contributor

sampsyo commented Apr 18, 2023

Interesting! Can I ask some follow-up questions to ensure that I'm understanding what's interesting about these examples?

  • First, I think what you're deducing here is that the algorithm roughly consists of these steps: (1) flip the paths, and then (2) add any new links to cover all the paths.
  • Under "Minimal:"
    • Is your concern here that step (2) generated new links to cover old paths also? And you're inferring that it should maybe only generate new links to cover newly flipped paths?
    • I think there may be an argument here about preconditions: that is, odgi flip's contract may say, "if you give me a valid GFA, I will do X," but if that premise is falsified, then all bets are off and it can do anything it likes. From that perspective, commands' behavior on invalid graphs may not matter much?
  • Under "Why this exists:"
    • Is the "why" that this illustrates accurately summarized as: odgi flip adds links to cover any steps in flipped paths that were not previously covered by old links?

@anshumanmohan
Copy link
Author

anshumanmohan commented Apr 18, 2023

Thanks for this, Adrian!

tl;dr: I think leaving it as-is would be fine, I just worry about downstream reliances on this "fixing" behavior.

  • Yup I think this is what the algorithm is doing, and I think that's a hair too much. I think it should just add links for those paths that it just flipped.
  • Minimal
    • Yes that's right; I think it could get away with being more restrained, and this would have the nice feature that the graph, even if invalid, remained (loosely speaking) the same kind of invalid as it started.
    • I agree that "garbage in, garbage out" could work, and in fact that is what is currently happening. Just stylistically, I don't like that the present behavior can take garbage, fix it quite carefully, and then let the user proceed. I think it could create bad reliances downstream that will suddenly be broken someday. If any user truly relies on this "fixing" behavior, I can imagine a separate odgi command makevalid that backfills any missing links.
  • Why this exists
    • This bit was just me checking my understanding and showing the "good case".
    • I agree that your summary is accurate in this case.
    • A little more verbosely: odgi flip adds links to cover any steps in all paths that were not previously covered by old links. In this illustration, this happens to mean that it just needed to add links to cover steps in those paths that it just flipped. That is why it collapses into your summary just above, and why I call it the good case.

anshumanmohan added a commit to cucapra/pollen that referenced this issue Apr 18, 2023
We agree with odgi on all the handmade files except note5.gfa
I think odgi is doing something fishy there;
see pangenome/odgi#485 (comment)

We agree with odgi on all the handmade files except flip4.gfa
flip4 is not a valid graph and so arguably it does not matter what we do with it,
but `odgi flip` does currently _fix_ flip4, so I think it's worth keeping around.
The point of divergence is that we do not currently fix graphs like flip4, though it's easy enough to do if that were desired.
See pangenome/odgi#496
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants