Dear freebayes devs,
I pulled and compiled freebayes from GIT today. It identifies itself as version 0.9.5. I have hundreds of BAM files but recent versions of freebayes (0.9.4 and 0.9.5) are not able to handle a particular site in one of these files. Version 0.8.9 handled the site (in that it didn't crash) but overall seemed considerably slower at SNP calling this data. Recent freebayes versions crash with an error message when it gets to a certain position at chromosome "Group12" in this file:
freebayes -b 10154640-10164640.bam -f ref.fasta -v test_Group12.20120503.vcf -r Group12:10154651..
variant at Group12:10154652
alt is the same as the reference
C == C
freebayes: ResultData.cpp:166: vcf::Variant& Results::vcf(vcf::Variant&, long double, long double, Samples&, std::string, std::vector >&, std::map, std::allocator >, int, std::less, std::allocator > >, std::allocator, std::allocator >, int> > >, std::vector, std::allocator >, std::allocator, std::allocator > > >&, int, GenotypeCombo&, bool, std::map, std::allocator >, std::vector >, std::less, std::allocator > >, std::allocator, std::allocator >, std::vector > > > >&, std::map >, std::less, std::allocator > > > >&, std::vector, std::allocator >, std::allocator, std::allocator > > >&, AlleleParser_): Assertion `_a != var.ref' failed.
Aborted (core dumped)
Samtools tview of that position reveals that 10154651 is followed by an "nc" insertion in a single read:
10154641 10154651 10154661
It looks rather odd and probably represents an edge case that freebayes does not currently handle. I prepared an archive for Erik a while ago with a small BAM file and the reference genome. The archive although zipped is rather large and I'd like to keep the data at least pseudo-private for now, so contact me if more information and the data itself is needed to fix this bug.
In the meantime, is there anything I can do to make freebayes tolerate the site and just ignore? I suppose I can tweak my pipeline to for instance run freebayes twice, before and after the site, and then merge the outputs but that is an ugly solution :-)
Hmm, I actually get this error at several sites in several chromosomes with 0.9.5. I'll see if I can produce similar alignment snippets for those.
This is still an issue in 0.9.6 as of 2012-05-16.
Erik and Andreas;
I am running into this problem as well with the latest FreeBayes. Here is a small reproducible example BAM, which looks to have similar characteristics to Andreas' example:
$ freebayes -f GRCh37.fa -b alt_ref_same_problem.bam -v test.vcf -r 2:133004350..133004400
Here is a further test for this bug (https://xythos.uiowa.edu/tbair%40iowa.uiowa.edu/Public/test.bam), it is ABI-Solid data and your latest fix has taken care of it but hopefully this will help figure out a better way to deal with the N's
The error generated by my test case seems to be solved now. Freebayes prints the error message but does not crash after the recent patch.
If it is fixed for others, I suggest closing this issue.