Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools norm -m-both lost some record after about 1K records #336

Closed
wangyugui opened this issue Oct 14, 2015 · 15 comments
Closed

bcftools norm -m-both lost some record after about 1K records #336

wangyugui opened this issue Oct 14, 2015 · 15 comments

Comments

@wangyugui
Copy link

bcftools norm -m-both works correctly for the first 1K records , but lost some of them after about 1K records.

bcftools version 1.2 does NOT have this problem, but the last source in github have this problem.

the feature of this vcf file
1)VCF 4.2
2)it have 200 samples

@wangyugui
Copy link
Author

Not only the second allele of multiallelic record may be lost, but also some record of single allele may be lost too.

@winni2k
Copy link
Contributor

winni2k commented Oct 14, 2015

I suspect that the error is caused by a particular line in your VCF.

Can you post a minimal file (maybe as a gist?) that is made up or real that can be used to reproduce this error?

@winni2k
Copy link
Contributor

winni2k commented Oct 14, 2015

PS Thank you for the bug report!

@wangyugui
Copy link
Author

I failed to create a public gist. so I sent the vcf file as an attachment to wkretzsch@gmail.com.

Best Regards
Wang Yugui

@winni2k
Copy link
Contributor

winni2k commented Oct 15, 2015

Thanks, I'll take a look.

@winni2k
Copy link
Contributor

winni2k commented Oct 16, 2015

I can confirm the existence of a bug. I have narrowed down the problem to a small VCF with 79 sites. If I remove a site from the front or the end, then the bug disappears. I have posted the file here. However, I am not familiar with bcftools norm, so someone else needs to take a look at the source code.

This is the code I used to replicate the error:

function test {
    ../bcftools norm -m-both $1  | grep -v '^#' | wc -l
    grep -v '^#' $1 | wc -l
    grep -v '^#' $1 | cut -f5| grep -o ',' |wc -l
}
test issue_336_minimal.vcf

where the first number should be the same as the sum of the second and third number.

I am using bcftools version:

../bcftools --version
bcftools 1.2-157-g8deae27
Using htslib 1.2.1-226-g1e5c377
Copyright (C) 2015 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

@winni2k
Copy link
Contributor

winni2k commented Oct 16, 2015

It appears as if the first site is being dropped (position 1:789241), but only if the entire file is read. Running bcftools norm -m-both on just a subset of the sites does not cause the omission.

@adamspargo
Copy link
Contributor

Yes. Also the number of lines modified is not updated:

Lines total/modified/skipped: 78/0/0

I'm having a look, but also not familiar with bcftools norm.

@wangyugui
Copy link
Author

The number of lines modified is not updated too for bcftools 1.2.

@adamspargo
Copy link
Contributor

I can prevent specific examples from dropping lines by changing buffer limits (e.g. increase above 100 in vcfnorm.c line1486 to make the example above work), but there is a fundamental over-writing issue that will take a while to figure out.

@wangyugui
Copy link
Author

vcfnorm.c line1486 is same as bcftools 1.2, so another fundamental over-writing issue maybe the real reason?

@adamspargo
Copy link
Contributor

Sure.

pd3 added a commit to pd3/bcftools that referenced this issue Oct 21, 2015
pd3 added a commit to pd3/bcftools that referenced this issue Oct 21, 2015
@pd3 pd3 closed this as completed in f6831b5 Oct 22, 2015
@mcshane
Copy link
Contributor

mcshane commented Oct 22, 2015

@wangyugui Should be fixed on the test case from you and @wkretzsch, but let us know if it hasn't fixed up the issue with your original VCF.

@winni2k
Copy link
Contributor

winni2k commented Oct 22, 2015

PS I can confirm this code passes fine using bcftools 1.2 (using htslib 1.2.1).

@wangyugui
Copy link
Author

This problem is fixed by this patch in my case.
Thanks for this patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants