Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jkbonfield vcf int64 #1003

Closed
wants to merge 9 commits into from
Closed

Jkbonfield vcf int64 #1003

wants to merge 9 commits into from

Commits on Dec 9, 2019

  1. Fixes BCF in-memory decoding with 64-bit integers.

    Any 64-bit INFO field that wasn't the last in the list would cause
    subsequent fields to be decoded incorrectly.
    
    This commit fixes that, plus updates the tests accordingly so the bug
    could be triggered.
    
    Fixes the first part of samtools#999 (test1.vcf), but doesn't fix the second
    part (BCF output silently being broken).
    
    Fixes samtools/bcftools#1123
    jkbonfield committed Dec 9, 2019
    Configuration menu
    Copy the full SHA
    ed4c281 View commit details
    Browse the repository at this point in the history

Commits on Dec 10, 2019

  1. Make 64-bit optional and don't compile by default.

    This commit addresses several problems introduced by the recently added
    support of 64-bit coordinates:
    - 64-bit positions were silently modified on BCF output
    - any erroneous out-of-range INFO value resulted in unparseable BCF
      output (in contrast to the previous behavior of a silent under/overflow)
    - conversion between VCF and BCF is no longer possible
    
    Large genomes is a niche area and can be easily worked around by splitting
    the largest chromosome into two. This commit makes the compilation of 64-bit
    values optional, favoring the interroperability of BCF with VCF, a feature
    heavily relied on in many pipelines.
    
    The support of large genomes in this commit is limited to POS and INFO tags
    defined as Number=1 (motivated by INFO/END), all other extreme values
    are replaced with a missing value ("."). This changes the original ad-hoc
    approach which supported first field of any INFO tag array.
    
    The default 32-bit mode adds checks to prevent under/overflow of INFO and
    FORMAT integer values and prints a warning when extreme values are replaced
    with a missing value. An additional check is added to prevent overflow of
    64-bit coordinates on BCF output.
    
    Note that the 64-bit support is not compiled in by default because:
    - it breaks VCF/BCF specification
    - it works only for VCF but cannot produce a functional BCF
    - there is currently no front-end capable of indexing VCFs with 64bit
      coordinates (n_lvls is not accessible via `bcftools index`)
    
    Note that a full 64-bit support for BCF would require significantly more
    work and large changes of the codebase, API, and the BCF specification,
    therefore it is not planned at the moment.
    
    The limited POS and INFO/END support for VCF can be compiled in by
    editing htslib/vcf.c and setting
        #define ALLOW_INT64 1
    pd3 committed Dec 10, 2019
    Configuration menu
    Copy the full SHA
    4e1d32a View commit details
    Browse the repository at this point in the history

Commits on Dec 11, 2019

  1. Configuration menu
    Copy the full SHA
    448de2e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    31aa365 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2019

  1. Allow compilation w/o modifying vcf.c

    and temporarily comment out failing tests which assume 64 bit positions
    are compiled in by default.
    pd3 committed Dec 12, 2019
    Configuration menu
    Copy the full SHA
    2965196 View commit details
    Browse the repository at this point in the history

Commits on Dec 13, 2019

  1. Configuration menu
    Copy the full SHA
    5073db3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    aa0a090 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4929c3d View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2019

  1. Polish a few minor details

    - bug fix in `bcf_dec_typed_int1_safe()`, 64-bit values could not have been read
      correctly by this function
    
    - cosmetic change to declare val1 always as int64_t and explicit cast to int32_t
      and add explicit LL to a 64-bit constant in case some compilers were fussy
    pd3 committed Dec 16, 2019
    Configuration menu
    Copy the full SHA
    ecc69cf View commit details
    Browse the repository at this point in the history