feature request: can "annotate" mark variants present in -a file? #108

Closed
freeseek opened this Issue Sep 6, 2014 · 8 comments

Projects

None yet

2 participants

@freeseek
freeseek commented Sep 6, 2014

I would like to use bcftools annotate as follows:
bcft annotate -a b.vcf.gz a.vcf.gz
And have every marker in file a.vcf.gz have a binary flag when the same marker is present in b.vcf.gz
However, either this is not possible or too complicated to understand from the help section
If it is possible to do this, please show an example in the help section

@pd3
Member
pd3 commented Sep 9, 2014

OK, I added a section with examples to the manpage, hopefully that will make the usage clearer? http://samtools.github.io/bcftools/bcftools.html#annotate

@pd3 pd3 closed this Sep 9, 2014
@freeseek
freeseek commented Sep 9, 2014

I still don't understand if what I want to do is possible or not. In my case:
bcft annotate -Oz -a kgp.vcf.gz in.vcf.gz -o out.vcf.gz
Let's say that kgp.vcf.gz is a VCF file with all variants in the 1000 Genomes project with no INFO tags and no IDs. I would just like to add a flag to the in.vcf.gz file whenever the variant is included in the kgp.vcf.gz file. I don't understand if and how I can do this from the examples.

@pd3
Member
pd3 commented Sep 10, 2014

Ah, I did not answer your question, sorry. No, it's not possible to do this in a single step. But one can do it in two steps:

bcftools query -f'%CHROM\t%POS\t%REF\t%ALT\t1\n' 1kgp.vcf | bgzip -c > 1kgp.tab.gz
tabix -s1 -b2 -e2 1kgp.tab.gz
bcftools annotate -c CHROM,POS,REF,ALT,1000GP -a 1kgp.tab.gz -h 1kgp.hdr in.vcf.gz
@freeseek

Okay, but this is more like a hack and many users might be familiar with bcftools annotate and not with bcftools query, which has a much steeper learning curve than bcftools annotate. Also, it takes away from the very elegant architecture of bcftools which allows piping one command into another. That said, I believe it would greatly help if this example was in the documentation.

@pd3
Member
pd3 commented Sep 10, 2014

Well, I disagree about the "hack" part. The tools were meant to be combined this way. It gives more flexibility, there cannot be an option for everything.

I am not sure that the man page is the best place for this, more appropriate might be the wiki
https://github.com/samtools/bcftools/wiki/HOWTOs

@freeseek

I assume the manpage is http://samtools.github.io/bcftools/bcftools.html#annotate rather than what you get when you run "bcftools annotate" from the command line. Anyway, the reason I asked this is because I was trying to move a GATK pipeline in which I was using:
java -Xmx2g -jar GenomeAnalysisTK.jar -R ref.fasta -T VariantAnnotator --comp:FOO ...
(explained at http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_VariantAnnotator.html)
That said, bcftools is hundred of times faster than GATK and I am truly grateful to the people that wrote it.

@pd3
Member
pd3 commented Sep 10, 2014

Yeah, the link points to the actual man page (man bcftools) rendered in html format.

I'll reopen this issue as a feature request, it might get implemented at some point.

@pd3 pd3 reopened this Sep 10, 2014
@pd3 pd3 changed the title from bcftools annotate basic usage to feature request: can "annotate" mark variants present in -a file? Sep 10, 2014
@pd3 pd3 added the enhancement label Sep 10, 2014
@pd3
Member
pd3 commented Jan 22, 2016

This is is now supported via bcftools annotate --mark-sites, added by 6ccecd1.

@pd3 pd3 closed this Jan 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment