Skip to content

peterk87/viral_verify

Repository files navigation

viral_verify

image

image

Documentation Status

viralVerify rewrite/refactor for PyPI packaging and distribution, maintainability and clarity.

NOTE: BLAST+ search option has been removed. Results output table will be different than the original viralVerify. Naive Bayes classifier training script has not been ported yet.

Features

  • Gene prediction with Prodigal in metagenomic mode
  • HMMer3 hmmsearch for protein domains in predicted genes
  • Naive Bayes classification of contigs as viral/not viral based on HMMer3 results
  • Output of detailed contig classification results table in CSV format
  • Output of contigs based on classification into separate FASTA files

Requirements

An HMMer3 HMM database is required. For example, the latest version of Pfam-A HMM:

NOTE: Please extract any compressed HMM DB ($ gunzip Pfam-A.hmm.gz)

Software dependencies:

Python dependencies:

Installation

Conda

It's recommended that you use Conda to install the required software (Prodigal and HMMer3) and Python dependencies.

$ conda env create -f environment.yml

Pip

If you have Prodigal and HMMer3 installed in your $PATH, and Python 3.6 or greater, you can use pip to install viral_verify:

$ pip install viral_verify

Usage

$ viral_verify --help
Usage: viral_verify [OPTIONS]

  HMM and Naive Bayes classification of contig sequences as either viral,
  plasmid or chromosomal.

  Requires Prodigal for gene prediction and hmmsearch from HMMer3 for
  searching for Pfam HMM profiles.

Options:
  -i, --input-fasta PATH          Input fasta file  [required]
  -o, --outdir PATH               Output directory  [required]
  -H, --hmm-db PATH               Path to Pfam-A HMM database  [required]
  -t, --threads INTEGER           Number of threads (default=16)
  -p, --output-plasmids-separately
                                  Output predicted plasmids separately?
  --prefix TEXT                   Output file prefix (default: None)
  --uncertainty-threshold FLOAT   Uncertainty threshold (Natural log
                                  probability) (default=3.0)

  --naive-bayes-classifier-table PATH
                                  Table of protein domain frequencies to use
                                  for Naive Bayes classification (default="/ho
                                  me/pkruczkiewicz/repos/viral_verify/viral_ve
                                  rify/data/classifier_table.txt")

  -v, --verbose                   Logging verbosity
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Credits

The original source code, design and conception can be found at viralVerify. This is merely a rewrite for easier packaging via PyPI, adding some CI with Travis-CI and organizing the code for maintainability and clarity.

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.