hivmmer
An alignment and variant-calling pipeline for Illumina deep sequencing of HIV-1, based on the probabilistic aligner HMMER.
Pipeline steps
- Constructs an amino acid profile Hidden Markov Model (pHMM) from a multiple sequence alignment of all HIV-1 Group M amino acid sequences publicly available in the Los Alamos HIV Sequence Database for the a given gene or region of the HIV genome.
- Preprocesses the NGS data using the paired-end read merging tool PEAR and consolidates duplicate sequences using FASTX-Toolkit. The number of duplicates are tracked to enable correct inference of frequencies later in the pipeline.
- Translates each de-duplicated sequence into all six possible frames (forward and reverse), retaining only the translated sequences that contain no stop codons.
- Aligns the translated reads to the reference pHMM with hmmsearch, producing a multiple sequence alignment of translated reads.
- Constructs a sample-specific amino acid pHMM from the multiple sequence alignment of translated reads.
- Repeats the HMMER alignment against the sample-specific pHMM for increased sensitivity.
- Maps the translated amino acid coordinates in the alignment to the original frame and coordinates in the nucleotide reads to construct a codon frequency table (adjusting the counts for duplicate reads).
Usage
hivmmer --id ID --fq1 FASTQ1 --fq2 FASTQ2 --ref REFERENCE [--cpu N]
[-h|--help] [-v|--version]
ID
specifies a name for the analysis that will be used as the basename for
all output.
FASTQ1
and FASTQ2
are the forward and reverse Illumina reads.
Optionally, you can use N
threads to speed-up the HMMER stages of the pipeline.
Installation
hivmmer requires Python 3.6
Quick install with Anaconda Python
On 64-bit Linux, it is also possible to install hivmmer using prebuilt packages from the kantorlab Anaconda channel.
First, install the Anaconda or Miniconda distribution of Python 3.
Once the conda
command is in your PATH, hivmmer and all its dependencies can
be installed into its own isolated conda environment with the single command:
conda create -c kantorlab -n hivmmer hivmmer
Once installed, activate the hivmmer
conda environment with:
source activate hivmmer
This will place hivmmer and all its dependencies in your PATH.
We have primarily tested hivmmer on CentOS 6.8, but in theory it should run on any 64-bit Linux system with glibc >= 2.12.
All relevant conda recipes are available from the Kantor Lab's conda-recipes repository.
Quick install with Docker
On systems other than 64-bit Linux, you can run hivmmer via a Docker container.
First, visit the Docker website to download and install Docker for your host operating system.
Second, pull the pre-compiled hivmmer Docker image, which includes all dependencies, from DockerHub:
docker pull kantorlab/hivmmer
Each time you want to use Agalma, run the docker image with:
docker run -it kantorlab/hivmmer
This will launch a new Docker container with hivmmer, and provide an interactive prompt to access to the container.
Manual installation
hivmmer can be installed with pip using the included setup.py, and has the following dependencies on external tools (which must be in your PATH):
- FASTX-Toolkit 0.0.14
- HMMER 3.2.1
- PEAR 0.9.11
Note: PEAR source code is available under an academic license from https://www.h-its.org/en/research/sco/software/#NextGenerationSequencingSequenceAnalysis.
Preparing a pHMM reference
hivmmer comes with prebuilt amino acid pHMM references for the pol, int and env regions based on curated multiple-sequence alignments downloaded from the Los Alamos HIV Sequence Database.
A custome reference can be created using the included hivmmer-trim-reference
utility and the hmmbuild
program from HMMER. For example, the pol reference
was created using the commands:
hivmmer-trim-reference HIV1_ALL_2016_2253-3870_DNA.fasta >HIV1_ALL_2016_2253-3870_DNA.trimmed.aa.fasta
hmmbuild HIV1_ALL_2016_2253-3870_DNA.trimmed.aa.hmm HIV1_ALL_2016_2253-3870_DNA.trimmed.aa.fasta
The resulting pHMM HIV1_ALL_2016_2253-3870_DNA.trimmed.aa.hmm
can then be
passed to hivmmer
as with the --ref
option.
Authors
Mark Howison mhowison@brown.edu
For bug reports and questions, please create an issue on Github
License
Copyright 2018, Brown University, Providence, RI. All Rights Reserved.
See LICENSE for full terms of use.