Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Use @gmod/vcf #1227
Whew, finally ready I think. Here's some test data I've been using: data.tar.gz. That's got the same VCFs indexed with both tabix and tribble so you can compare them. There are a few changes and bugfixes. One change is with the way some feature descriptions are displayed:
There were also some tribble VCFs that wouldn't load, and then feature callbacks (like the red-blue heterozygous/homozygous one in the Volvox data) didn't work with tribble. Both issues are fixed now.
I've tested on a bigger dbSNP VCF and didn't see any performance issues, but feel free to hammer it and see how it performs. There are probably a couple places I could optimize if I need to.
I think this is looking really good! I found a potential bug with a 1000 genomes dataset for SV though
I tested out with http://s3.amazonaws.com/1000genomes/phase3/integrated_sv_map/ALL.wgs.integrated_sv_map_v2.20130502.svs.genotypes.vcf.gz and found it gives an error displaying a feature e.g. in this region 1:67477101..67604500 (Error is cannot read property 'undefined' of undefined)
Also on some datasets the layout is a little weird and not optimal, especially noticeable if you zoom out. Here is the chrY data for human
Gotcha. I guess with BAM it might be a little more optimized because you can know the exact byte boundaries of a feature and can simply pass a byte buffer straight to crc32 but tabix/vcf seems to just need to still parse text chunks of data so it doesn't necessarily receive those byte boundaries that can be used
There is one more small concern which is that the encoded characters e.g. %2C are not urldecoded, I think this only applies to the info field.
For what it's worth though the old VCF parser did not do this either, but the VCF spec does mention these should be decoded
Nov 6, 2018
This was referenced
Nov 6, 2018
Btw, if you're looking for additional VCF4.3 datasets the only tool I know which will generate them is octopus