I would like to use bcftools annotate as follows:
bcft annotate -a b.vcf.gz a.vcf.gz
And have every marker in file a.vcf.gz have a binary flag when the same marker is present in b.vcf.gz
However, either this is not possible or too complicated to understand from the help section
If it is possible to do this, please show an example in the help section
OK, I added a section with examples to the manpage, hopefully that will make the usage clearer? http://samtools.github.io/bcftools/bcftools.html#annotate
I still don't understand if what I want to do is possible or not. In my case:
bcft annotate -Oz -a kgp.vcf.gz in.vcf.gz -o out.vcf.gz
Let's say that kgp.vcf.gz is a VCF file with all variants in the 1000 Genomes project with no INFO tags and no IDs. I would just like to add a flag to the in.vcf.gz file whenever the variant is included in the kgp.vcf.gz file. I don't understand if and how I can do this from the examples.
Ah, I did not answer your question, sorry. No, it's not possible to do this in a single step. But one can do it in two steps:
bcftools query -f'%CHROM\t%POS\t%REF\t%ALT\t1\n' 1kgp.vcf | bgzip -c > 1kgp.tab.gz
tabix -s1 -b2 -e2 1kgp.tab.gz
bcftools annotate -c CHROM,POS,REF,ALT,1000GP -a 1kgp.tab.gz -h 1kgp.hdr in.vcf.gz
Okay, but this is more like a hack and many users might be familiar with bcftools annotate and not with bcftools query, which has a much steeper learning curve than bcftools annotate. Also, it takes away from the very elegant architecture of bcftools which allows piping one command into another. That said, I believe it would greatly help if this example was in the documentation.
Well, I disagree about the "hack" part. The tools were meant to be combined this way. It gives more flexibility, there cannot be an option for everything.
I am not sure that the man page is the best place for this, more appropriate might be the wiki
I assume the manpage is http://samtools.github.io/bcftools/bcftools.html#annotate rather than what you get when you run "bcftools annotate" from the command line. Anyway, the reason I asked this is because I was trying to move a GATK pipeline in which I was using:
java -Xmx2g -jar GenomeAnalysisTK.jar -R ref.fasta -T VariantAnnotator --comp:FOO ...
(explained at http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_VariantAnnotator.html)
That said, bcftools is hundred of times faster than GATK and I am truly grateful to the people that wrote it.
Yeah, the link points to the actual man page (man bcftools) rendered in html format.
I'll reopen this issue as a feature request, it might get implemented at some point.
This is is now supported via bcftools annotate --mark-sites, added by 6ccecd1.
bcftools annotate --mark-sites