Skip to content
Viral Annotation DefineR: classification and annotation of viral sequences based on RefSeq annotation
Perl Shell
Branch: master
Clone or download

README.md

VADR - Viral Annotation DefineR

Version 1.0.5; March 2020

https://github.com/nawrockie/vadr.git

VADR is a suite of tools for classifying and analyzing sequences homologous to a set of reference models of viral genomes or gene families. It has been mainly tested for analysis of Norovirus and Dengue virus sequences in preparation for submission to the GenBank database.

The VADR v-annotate.pl script is used to classify a sequence, by determining which in a set of reference models it is most similar to, and then annotate that sequence based on that most similar model. Example usage of v-annotate.pl can be found here. Another VADR script, v-build.pl, is used to create the models from NCBI RefSeq sequences or from input multiple sequence alignments, potentially with secondary structure annotation. v-build.pl stores the RefSeq feature annotation in the model, and v-annotate.pl maps that annotation (e.g. CDS coordinates) onto the sequences it annotates. VADR includes 197 prebuilt models of Flaviviridae and Caliciviridae viral RefSeq genomes, described here. Example usage of v-build.pl can be found here.

v-annotate.pl identifies unexpected or divergent attributes of the sequences it annotates (e.g. invalid or early stop codons in CDS features) and reports them to the user in the form of alerts. A subset of alerts are fatal and cause a sequence to fail. A sequence passes if zero fatal alerts are reported for it. VADR is used by GenBank staff to evaluate incoming sequence submissions of some viruses (currently Norovirus and Dengue virus). Submitted sequences that pass v-annotate.pl are accepted into GenBank.

The homology search and alignment components of VADR scripts, the most computationally expensive steps, are performed by the Infernal and BLAST software packages, which are downloaded and installed with VADR installation.


Available VADR models

You can download pre-built models to use to validate and annotate viruses or cox1 genes as listed below. Importantly, to use a set of models other than the default set that is installed with VADR, you will need to use use the -m, -i and -b options as described here.

Pre-built models are available for:

  • Norovirus and Dengue virus RefSeqs, along with other Flaviviridae and Caliciviridae RefSeqs (this is the "default" set of models that is installed with VADR)
  • Coronaviridae RefSeqs, including 2019-nCoV (NC_045512)
  • metazoan Cytochrome c oxidase I (COX1)

See this page for more information


VADR documentation


Reference

  • The recommended citation for using VADR is: Alejandro A Schaffer, Eneida L Hatcher, Linda Yankie, Lara Shonkwiler, J Rodney Brister, Ilene Karsch-Mizrachi, Eric P Nawrocki; VADR: validation and annotation of virus sequence submissions to GenBank; bioRxiv 852657; doi: https://doi.org/10.1101/852657.

Questions, comments or feature requests? Send a mail to eric.nawrocki@nih.gov.

You can’t perform that action at this time.