Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible over strict testing in vcflib #365

Open
PeteClapham opened this issue Oct 27, 2022 · 3 comments
Open

Possible over strict testing in vcflib #365

PeteClapham opened this issue Oct 27, 2022 · 3 comments
Assignees
Labels
bug Genuine bug
Milestone

Comments

@PeteClapham
Copy link

Only bug reports!

When running vg construct (https://github.com/vgteam/vg), the vcflib component checking appears to identify the vcf file as having an incorrect field order. A similar bug has previously been filed and addressed within htslib:

VCF Header: must Number be before Type? · Issue #642
samtools/hts-specs#642

The over strict checking rejects files where the field order may vary in the header. An example of the impact using vg, is described below:

$ vg construct -r /data/refs/hs38DH.fa
-v1kGP_high_coverage_Illumina.chr1.filtered.SNV_INDEL_SV_phased_panel.v
cf.g z

1kGP_high_coverage_Illumina.chr1.filtered.SNV_INDEL_SV_phased_panel.vc
f.g
z.vg
header parse error at:
fields[2] != "Number"
##INFO=<ID=END2,Type=Integer,Number=1,Description="Position of
breakpoint on CHR2">

The vcf's used are from the 1k genomes project and are of known verified quality. Downloads are available from here:
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20220422_3202_phased_SNV_INDEL_SV/

In the example case a single new vg file, describing the variants in a graph format should be produced.

The 1k genomes repository is an excellent test set, that was designed to aid and accelerate tools development and is widely used. The current changes to the order testing will therefore be an issue that impacts anyone looking to use vcf tools based on the current vcflib releases

It has been suggested that the issue may be related to the vcflib code here:

if (fields[2] != "Number") {

With thanks to local team members for their help in tracking this issue to the above code base and many thanks to the vcflib team for making these tools available

Pete

@pjotrp
Copy link
Contributor

pjotrp commented Oct 29, 2022

Thanks @PeteClapham. This refers to some 12 year old code ;). I haven't read everything, but I think the gist is that vcflib should not check for the field position, though it may check whether the field exists or not. Note, this refers to the VCF header, not to the column layout of the file itself. I think that is fair. Do you want to try and write a patch?

@pjotrp pjotrp self-assigned this Oct 29, 2022
@pjotrp pjotrp added this to the v1.0.4 milestone Oct 29, 2022
@pjotrp pjotrp modified the milestones: v1.0.4, Later Jan 14, 2023
pjotrp added a commit to pjotrp/vcflib that referenced this issue Jan 15, 2023
@pjotrp
Copy link
Contributor

pjotrp commented Jan 16, 2023

In v1.0.5 release of vcflib.

@pjotrp pjotrp closed this as completed Jan 16, 2023
pjotrp added a commit to pjotrp/vcflib that referenced this issue Feb 7, 2023
commit 6e3ede4
Author: Pjotr Prins <pjotr.public01@thebird.nl>
Date:   Sat Jan 14 23:29:50 2023 -0600

    Allow for different field order - see vcflib#365

because it introduced a bug discovered by the Debian packagers. Thank atille and
garguatua_kerr!
@pjotrp pjotrp reopened this Feb 18, 2023
@pjotrp
Copy link
Contributor

pjotrp commented Feb 18, 2023

had to reopen on reversal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Genuine bug
Projects
None yet
Development

No branches or pull requests

2 participants