Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend vcf.h API to distinguish between the two indel types #1454

Open
pd3 opened this issue Jun 14, 2022 · 1 comment
Open

Extend vcf.h API to distinguish between the two indel types #1454

pd3 opened this issue Jun 14, 2022 · 1 comment

Comments

@pd3
Copy link
Member

pd3 commented Jun 14, 2022

This request is motivated by samtools/bcftools#1704 and the idea is to extend htslib VCF interface as follows

#define VCF_REF      0
#define VCF_SNP      1
#define VCF_MNP      2
//
#define VCF_OTHER    8
#define VCF_BND     16      // breakend
#define VCF_OVERLAP 32      // overlapping deletion, ALT=*
#define VCF_INS     64      // short insertion
#define VCF_DEL    128      // short deletion
#define VCF_INDEL  (VCF_INS|VCF_DEL)
@daviesrob
Copy link
Member

The tricky part here is how to do this without breaking the expectations of programs that think VCF_INDEL is 4.

These values are currently set by bcf_set_variant_type() which is only called externally via bcf_get_variant_types() or bcf_get_variant_type().

One way around the problem might be to make a new interface that replaces the old ones and takes a parameter that indicates which of the various types the caller expects to see. As a bonus, it could also return bcf_variant_t::n which is currently a bit difficult to access.

I think for the best compatibility, VCF_INDEL would have to remain as 4, and VCF_INS and VCF_DEL would be extras. The existing interfaces would not return the new values so old code still works. The old interfaces could be marked as deprecated, with advice to use the new interface.

While this would require some small code changes for users it would alert them of the need to update, and reduce the chance of indels suddenly going missing. If, as suggested above, VCF_INDEL were defined as (VCF_INS|VCF_DEL) code looking just for VCF_INDEL might fail to spot cases which only set one of the bits. By making the need to change explicit, it's more likely that code would be altered to do the right thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants