Skip to content

peterk87/nf-ionampliseq

Repository files navigation

peterk87/nf-ionampliseq

Read mapping, variant calling and consensus sequence generation workflow for Ion Torrent Ampliseq sequence data of FMDV and CSFV.

NB: Built-in Ion Torrent AmpliSeq panels for Zika virus (ZIKV), Ebola virus (EBOV) and SARS-CoV-2 will be added shortly!

GitHub Actions CI Status GitHub Actions Linting Status Nextflow

install with bioconda Docker

Introduction

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Quick Start

  1. Install nextflow

  2. Install either Docker or Singularity for full pipeline reproducibility (please only use Conda as a last resort; see docs)

  3. Download the pipeline and test it on a minimal dataset with a single command:

    nextflow run peterk87/nf-ionampliseq -profile test,<docker/singularity/conda/institute>

    Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.

  4. Start running your own analysis!

    nextflow run peterk87/nf-ionampliseq -profile <docker/singularity/conda/institute> --input '/path/to/iontorrent/*.bam'

See usage docs for all of the available options when running the pipeline.

Documentation

The peterk87/nf-ionampliseq pipeline comes with documentation about the pipeline which you can read at https://peterk87/nf-ionampliseq/docs or find in the docs/ directory.

This workflow includes several built-in analysis packages for Ion Torrent AmpliSeq sequence data of CSFV and FMDV. Users can also specify their own analysis packages, however, these files must be compatible with the Ion Torrent Software Suite including tmap and tvc

Input

There are three methods of specifying input: --input; --rundir; --sample_sheet and --panel (or --ref_fasta and --bed_file). For all input modes, the primary input is BAM files generated by Torrent Suite tmap tooling.

The simplest way of running this workflow is with --input pointing at your Ion Torrent Torrent Suite produced BAM files:

nextflow run peterk87/nf-ionampliseq -profile <docker/singularity> --input '/path/to/*.bam'

With BAM file inputs specified via --input, the sample name and correct AmpliSeq panel (either CSFV or FMDV) will be determined from the BAM file headers.

You can also specify the Ion Torrent sequencing run directory as input with --rundir. All BAM files matching IonCode_*_rawlib.bam will be run through the workflow with sample names retrieved from the ion_params_00.json.

nextflow run peterk87/nf-ionampliseq -profile <docker/singularity> --rundir /path/to/rundir

If you wish to use custom names and a specific AmpliSeq panel it is recommended that you specify the following:

  • --sample_sheet
    • CSV file with 2 columns:
      • Column 1: sample name
      • Column 2: path to raw BAM file from Ion Torrent (absolute path recommended)
  • --panel
    • either csf or fmd for built-in AmpliSeq panel, otherwise, the user will need to specify a reference genome(s) FASTA file (--ref_fasta) and detailed BED file (--bed_file)

Steps

  1. BAM Sample Info - Sample info extracted from BAM file headers
  2. FASTQ Reads - BAM to FASTQ output
  3. FastQC - Read quality control
  4. Mash - Top reference genome determination by Mash screen
  5. TMAP - Read mapping using the Thermo Fisher mapper tmap
  6. Samtools - Read mapping stats calculation with Samtools
  7. Mosdepth - Coverage stats calculated by Mosdepth
  8. TVC - Variant calling using the Thermo Fisher variant caller tvc
  9. Bcftools - Variant filtering for majority consensus sequence generation and variant statistics for MultiQC report.
  10. Consensus Sequence - Majority consensus sequence with N masking of low/no coverage positions.
  11. Edlib Pairwise Alignment - Pairwise global alignment and edit distance between reference and consensus sequences.
  12. Coverage Plots - Coverage plots with/without low/no coverage and/or variants highlighted with linear and log10 scaling of y-axis depth values.
  13. MultiQC - Aggregate report describing results from the whole pipeline. Consensus sequences are embedded in the MultiQC HTML report and can be downloaded from it.
  14. Pipeline information - Report metrics generated during the workflow execution

Output

For more information about the analysis steps and output of the pipeline, see the output documenation.

Credits

peterk87/nf-ionampliseq was originally written by Peter Kruczkiewicz.

Contributions and Support

If you encounter any issues when running this pipeline, please see the documentation above

If you would like to contribute to this pipeline, please see the contributing guidelines.

The development of this pipeline tries to follow the guidelines and best-practices established by nf-core and was bootstrapped using nf-core tools. One day this pipeline may be added to nf-core.

Citation

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. ReadCube: Full Access Link