Variant calling in deeply sequenced viral populations
This set of scripts provides an automated pipeline for identifying mutations in viral populations using Illumina deep sequencing. Coupled with our deep sequencing variant calling tool, it provides a full workflow to move from Illumina fastq sequences to a VCF file of population mutations and amino acid changes.
Configuration files in YAML input format define all inputs to the process. An example configuration file is a useful starting point. With this file, the entire run process consists of a single commandline:
python scripts/variant_identify.py <your_config.yaml>
This creates a
variation directory containing files named
raw_your-run-name-sort-realign.tsv which has detailed statistics about each
position with aligned reads. These values feed directly into the
variant calling framework.
What does it do?
The build script performs the following steps to prepare for variant calling:
Collapses the input fastq reads into unique reads. At high sequencing depth, we expect extensive read duplication, and this step avoids uncessary overhead of aligning identical reads multiple times.
Aligns collapsed fastq reads to reference genome. This handles ambiguous reference genomes with IUPAC characters, which is useful for error matching in viral populations with known variant regions.
Re-aligns reads, avoiding inconsistent and incorrect alignments due to indels.
Summarizes unique reads at each position with read quality score, alignment quality score and percent representation of the k-mer surrounding region. These metrics feed directly into variant calling.
The pipeline leverages these freely available tools:
- novoalign -- alignment to the reference genome
- Picard -- Manipulation of BAM alignment files
- GATK -- re-alignment of reads around indels
- khmer -- count k-mer regions surrounding each variant
The CloudBioLinux project provides automated installation scripts with all of these dependencies.
Following installation of these, run:
$ python setup.py build $ sudo python setup.py install
to install required Python libraries.