Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected handling of invalid (non-integer) POS values #1570

Closed
colin-nolan opened this issue Feb 24, 2023 · 1 comment · Fixed by #1575
Closed

Unexpected handling of invalid (non-integer) POS values #1570

colin-nolan opened this issue Feb 24, 2023 · 1 comment · Fixed by #1575
Assignees

Comments

@colin-nolan
Copy link

According to the VCF specification, POS must be an integer:

POS — position: The reference position, with the 1st base having position 1. ... Telomeres are indicated by using positions 0 or N+1, where N is the length of the corresponding chromosome or contig. (Integer, Required)

If a variant's POS value contains a string that doesn't encode an integer:

$ cat example.vcf
##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20150218
##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
##source=1000GenomesPhase3Pipeline
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=1,assembly=b37,length=51304566>
##SAMPLE=<ID=SAMPLE_1>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE_1
1       stuff   ID1     G       T       .       PASS    .       GT      1/0
1       1.5   ID2    G       T       .       PASS    .       GT      1/0

BCFtools silently ignores the issue and parses these invalid positions into valid ones:

$ ./bcftools view -H example.vcf
1       0       ID1     G       T       .       PASS    .       GT      1/0
1       1       ID2     G       T       .       PASS    .       GT      1/0
$ echo $?
0

This happens because hts_str2int doesn't have any checks for non-int values: https://github.com/samtools/htslib/blob/008eabd/textutils_internal.h#L218 (which I think mirrors strtol's behaviour).

It would be a nice to have for BCFtools to fail upon detection of invalid POS values. For example, column swapping would get detected, and not potentially result in a valid VCF full of 0positions! A valid argument against doing anything about this is GIGO. However, BCFtools certainly validates some things - and after doing an upgrade, I noticed the number of checks are increasing.

@colin-nolan
Copy link
Author

Thanks @pd3 for moving (I'm never sure which repo bcftools/htslib I should report things on).

@daviesrob daviesrob assigned daviesrob and pd3 and unassigned daviesrob Feb 28, 2023
pd3 added a commit to pd3/htslib that referenced this issue Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants