Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add INFO about cause of rejection in reject VCF #10

Open
davmlaw opened this issue May 2, 2024 · 7 comments
Open

Feature request: Add INFO about cause of rejection in reject VCF #10

davmlaw opened this issue May 2, 2024 · 7 comments

Comments

@davmlaw
Copy link

davmlaw commented May 2, 2024

Liftover can fail for a variety of reasons...

Many of the liftover functions return -1 on any error, and the check is < 0

If you defined a range of negative constants, you could return specifically what went wrong, then look it up then add that to the INFO of rejected variants

@freeseek
Copy link
Owner

freeseek commented May 3, 2024

If a variant is dropped it means that neither of the two anchors could be mapped using the chain files or if the two anchors mapped to different chains, or if the two anchors mapped to locations too far from each other. Compared to other liftover tools the reasons for dropping a variant are more limited and basically it is always related to the chain breaking or missing at the locus. Would this information really be useful?

@davmlaw
Copy link
Author

davmlaw commented May 7, 2024

When a liftover of their data fails, the bio people pick half dozen variants then we do a deep dive to try and work out what happened. This is to verify that it's not due to a bug in the code

I think it reassures them that things are working correctly, and they appreciate much more seeing an error "two anchors mapped to locations too far from each other" vs "liftover failed"

I can try and write a pull request but haven't written C in close to 20 years...

@freeseek
Copy link
Owner

freeseek commented May 7, 2024

Maybe give me some minimal documentation for how you would like to see the INFO field engineered, given the three possible return codes, in a way that you deem usable by bio people, and I will write the code myself

@davmlaw
Copy link
Author

davmlaw commented May 9, 2024

For my particular use case, I'll be processing the VCF file then displaying it in a GUI for the BIO people

There are no enums in VCF types so I guess just make it a string and fill it in with constants, I'd expect the info to look a bit like:

##INFO=<ID=REJECTION_REASON,Number=A,Type=String,Description="Reason for rejection. Will be 1 of NO_MAP (neither of the two anchors could be mapped using the chain files), DIFFERENT_CHAINS (two anchors mapped to different chains) or DISTANCE_EXCEEDED (the two anchors mapped to locations too far from each other)">

But feel free to change to whatever you think best. Thanks a lot!

@davmlaw
Copy link
Author

davmlaw commented Jun 6, 2024

variants can also end up in the reject list for other reasons, eg the contig not being defined in the header - there should be a code for this (eg VCF record error) to distinguish it from ones to do with chains

@freeseek
Copy link
Owner

Try to see if the development version here fixes both problems

@davmlaw
Copy link
Author

davmlaw commented Jun 18, 2024

Hi, thanks looks good.

Ran dev version with --write-reject and VCF was written with FILTER containing the reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants