Skip to content
forked from CFIA-NCFAD/nf-flu

Influenza genome analysis Nextflow workflow

License

Notifications You must be signed in to change notification settings

ric-costa/nf-flu

 
 

Repository files navigation

CFIA-NCFAD/nf-flu - Influenza A Virus Genome Assembly Nextflow Workflow

CI

Nextflow run with conda run with docker run with singularity

Introduction

nf-flu is a bioinformatics analysis pipeline for assembly and H/N subtyping of Influenza A virus. The pipeline supports both Illumina and Nanopore Platform. Since Influenza is a special virus with multiple gene segments (8 segments) and there might be a reference or multiple we would want to align against, the pipeline will automatically pull top match references for each segment. To achieve this task, the pipeline downloads Influenza database from NCBI and user could provide their own reference database. The pipline performs read mapping against each reference segment, variant calling and genome assembly.

The pipeline is implemented in Nextflow

Pipeline summary

  1. Download latest NCBI Influenza DB sequences and metadata (or use user-specified files)
  2. Merge reads of re-sequenced samples (cat) (if needed)
  3. Assembly of Influenza gene segments with IRMA using the built-in FLU module
  4. Nucleotide BLAST search against NCBI Influenza DB
  5. Automatically pull top match references for segments
  6. H/N subtype prediction and Excel XLSX report generation based on BLAST results
  7. Perform Variant calling and genome assembly for all segments.

Quick Start

  1. Install Nextflow (>=21.04.0).

  2. Install any of Docker, Singularity, Podman, Shifter or Charliecloud for full pipeline reproducibility (please only use Conda as a last resort)

  3. Download the pipeline and test it on a minimal dataset with a single command:

    nextflow run CFIA-NCFAD/nf-flu -profile test,<docker/singularity/podman/shifter/charliecloud/conda>
    • If you are using singularity then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the --singularity_pull_docker_container parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the nf-core download command to pre-download all of the required containers before running the pipeline and to set the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options to be able to store and re-use the images from a central location for future pipeline runs.
    • If you are using conda, it is highly recommended to use the NXF_CONDA_CACHEDIR or conda.cacheDir settings to store the environments in a central location for future pipeline runs.
  4. Run your own analysis

    • [Optional] Generate an input samplesheet from a directory containing Illumina FASTQ files (e.g. /path/to/illumina_run/Data/Intensities/Basecalls/) with the included Python script fastq_dir_to_samplesheet.py before you run the pipeline (requires Python 3 installed locally) e.g.

      python ~/.nextflow/assets/CFIA-NCFAD/nf-flu/bin/fastq_dir_to_samplesheet.py \
        -i /path/to/illumina_run/Data/Intensities/Basecalls/ \
        -o samplesheet.csv
    • Typical command for Illumina Platform

      nextflow run CFIA-NCFAD/nf-flu \
        --input samplesheet.csv \
        --platform illumina \
        --profile <docker/singularity/podman/shifter/charliecloud/conda>
    • Typical command for Nanopore Platform

      nextflow run CFIA-NCFAD/nf-flu \
        --input samplesheet.csv \
        --platform nanopore \
        --profile <docker/singularity/conda>

Documentation

The nf-flu pipeline comes with:

Resources

Credits

The nf-flu pipeline was originally developed by Peter Kruczkiewicz from CFIA-NCFAD, Hai Nguyen extended the piepline for Nanopore data analysis.

About

Influenza genome analysis Nextflow workflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 40.2%
  • Nextflow 39.0%
  • Groovy 20.8%