viralVerify rewrite/refactor for PyPI packaging and distribution, maintainability and clarity.
NOTE: BLAST+ search option has been removed. Results output table will be different than the original viralVerify. Naive Bayes classifier training script has not been ported yet.
- Free software: MIT license
- Documentation: https://viral-verify.readthedocs.io.
- Gene prediction with Prodigal in metagenomic mode
- HMMer3
hmmsearch
for protein domains in predicted genes - Naive Bayes classification of contigs as viral/not viral based on HMMer3 results
- Output of detailed contig classification results table in CSV format
- Output of contigs based on classification into separate FASTA files
An HMMer3 HMM database is required. For example, the latest version of Pfam-A HMM:
NOTE: Please extract any compressed HMM DB ($ gunzip Pfam-A.hmm.gz
)
Software dependencies:
Python dependencies:
It's recommended that you use Conda to install the required software (Prodigal and HMMer3) and Python dependencies.
$ conda env create -f environment.yml
If you have Prodigal and HMMer3 installed in your $PATH
, and Python 3.6 or greater, you can use pip
to install viral_verify
:
$ pip install viral_verify
$ viral_verify --help
Usage: viral_verify [OPTIONS]
HMM and Naive Bayes classification of contig sequences as either viral,
plasmid or chromosomal.
Requires Prodigal for gene prediction and hmmsearch from HMMer3 for
searching for Pfam HMM profiles.
Options:
-i, --input-fasta PATH Input fasta file [required]
-o, --outdir PATH Output directory [required]
-H, --hmm-db PATH Path to Pfam-A HMM database [required]
-t, --threads INTEGER Number of threads (default=16)
-p, --output-plasmids-separately
Output predicted plasmids separately?
--prefix TEXT Output file prefix (default: None)
--uncertainty-threshold FLOAT Uncertainty threshold (Natural log
probability) (default=3.0)
--naive-bayes-classifier-table PATH
Table of protein domain frequencies to use
for Naive Bayes classification (default="/ho
me/pkruczkiewicz/repos/viral_verify/viral_ve
rify/data/classifier_table.txt")
-v, --verbose Logging verbosity
--version Show the version and exit.
--help Show this message and exit.
The original source code, design and conception can be found at viralVerify. This is merely a rewrite for easier packaging via PyPI, adding some CI with Travis-CI and organizing the code for maintainability and clarity.
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.