Read mapping, variant calling and consensus sequence generation workflow for Ion Torrent Ampliseq sequence data of FMDV and CSFV.
NB: Built-in Ion Torrent AmpliSeq panels for Zika virus (ZIKV), Ebola virus (EBOV) and SARS-CoV-2 will be added shortly!
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.
-
Install
nextflow
-
Install either
Docker
orSingularity
for full pipeline reproducibility (please only useConda
as a last resort; see docs) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run peterk87/nf-ionampliseq -profile test,<docker/singularity/conda/institute>
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. -
Start running your own analysis!
nextflow run peterk87/nf-ionampliseq -profile <docker/singularity/conda/institute> --input '/path/to/iontorrent/*.bam'
See usage docs for all of the available options when running the pipeline.
The peterk87/nf-ionampliseq pipeline comes with documentation about the pipeline which you can read at https://peterk87/nf-ionampliseq/docs or find in the docs/
directory.
This workflow includes several built-in analysis packages for Ion Torrent AmpliSeq sequence data of CSFV and FMDV. Users can also specify their own analysis packages, however, these files must be compatible with the Ion Torrent Software Suite including tmap and tvc
There are three methods of specifying input: --input
; --rundir
; --sample_sheet
and --panel
(or --ref_fasta
and --bed_file
). For all input modes, the primary input is BAM files generated by Torrent Suite tmap tooling.
The simplest way of running this workflow is with --input
pointing at your Ion Torrent Torrent Suite produced BAM files:
nextflow run peterk87/nf-ionampliseq -profile <docker/singularity> --input '/path/to/*.bam'
With BAM file inputs specified via --input
, the sample name and correct AmpliSeq panel (either CSFV or FMDV) will be determined from the BAM file headers.
You can also specify the Ion Torrent sequencing run directory as input with --rundir
. All BAM files matching IonCode_*_rawlib.bam
will be run through the workflow with sample names retrieved from the ion_params_00.json
.
nextflow run peterk87/nf-ionampliseq -profile <docker/singularity> --rundir /path/to/rundir
If you wish to use custom names and a specific AmpliSeq panel it is recommended that you specify the following:
--sample_sheet
- CSV file with 2 columns:
- Column 1: sample name
- Column 2: path to raw BAM file from Ion Torrent (absolute path recommended)
- CSV file with 2 columns:
--panel
- either
csf
orfmd
for built-in AmpliSeq panel, otherwise, the user will need to specify a reference genome(s) FASTA file (--ref_fasta
) and detailed BED file (--bed_file
)
- either
- BAM Sample Info - Sample info extracted from BAM file headers
- FASTQ Reads - BAM to FASTQ output
- FastQC - Read quality control
- Mash - Top reference genome determination by Mash screen
- TMAP - Read mapping using the Thermo Fisher mapper tmap
- Samtools - Read mapping stats calculation with Samtools
- Mosdepth - Coverage stats calculated by Mosdepth
- TVC - Variant calling using the Thermo Fisher variant caller tvc
- Bcftools - Variant filtering for majority consensus sequence generation and variant statistics for MultiQC report.
- Consensus Sequence - Majority consensus sequence with
N
masking of low/no coverage positions. - Edlib Pairwise Alignment - Pairwise global alignment and edit distance between reference and consensus sequences.
- Coverage Plots - Coverage plots with/without low/no coverage and/or variants highlighted with linear and log10 scaling of y-axis depth values.
- MultiQC - Aggregate report describing results from the whole pipeline. Consensus sequences are embedded in the MultiQC HTML report and can be downloaded from it.
- Pipeline information - Report metrics generated during the workflow execution
For more information about the analysis steps and output of the pipeline, see the output documenation.
peterk87/nf-ionampliseq was originally written by Peter Kruczkiewicz.
If you encounter any issues when running this pipeline, please see the documentation above
If you would like to contribute to this pipeline, please see the contributing guidelines.
The development of this pipeline tries to follow the guidelines and best-practices established by nf-core and was bootstrapped using nf-core tools. One day this pipeline may be added to nf-core.
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. ReadCube: Full Access Link