nf-flu is a bioinformatics analysis pipeline for assembly and H/N subtyping of Influenza A virus. The pipeline supports both Illumina and Nanopore Platform. Since Influenza is a special virus with multiple gene segments (8 segments) and there might be a reference or multiple we would want to align against, the pipeline will automatically pull top match references for each segment. To achieve this task, the pipeline downloads Influenza database from NCBI and user could provide their own reference database. The pipline performs read mapping against each reference segment, variant calling and genome assembly.
The pipeline is implemented in Nextflow
- Download latest NCBI Influenza DB sequences and metadata (or use user-specified files)
- Merge reads of re-sequenced samples (
cat
) (if needed) - Assembly of Influenza gene segments with IRMA using the built-in FLU module
- Nucleotide BLAST search against NCBI Influenza DB
- Automatically pull top match references for segments
- H/N subtype prediction and Excel XLSX report generation based on BLAST results
- Perform Variant calling and genome assembly for all segments.
-
Install
Nextflow
(>=21.04.0
). -
Install any of
Docker
,Singularity
,Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (please only useConda
as a last resort) -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run CFIA-NCFAD/nf-flu -profile test,<docker/singularity/podman/shifter/charliecloud/conda>
- If you are using
singularity
then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the--singularity_pull_docker_container
parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use thenf-core download
command to pre-download all of the required containers before running the pipeline and to set theNXF_SINGULARITY_CACHEDIR
orsingularity.cacheDir
Nextflow options to be able to store and re-use the images from a central location for future pipeline runs. - If you are using
conda
, it is highly recommended to use theNXF_CONDA_CACHEDIR
orconda.cacheDir
settings to store the environments in a central location for future pipeline runs.
- If you are using
-
Run your own analysis
-
[Optional] Generate an input samplesheet from a directory containing Illumina FASTQ files (e.g.
/path/to/illumina_run/Data/Intensities/Basecalls/
) with the included Python scriptfastq_dir_to_samplesheet.py
before you run the pipeline (requires Python 3 installed locally) e.g.python ~/.nextflow/assets/CFIA-NCFAD/nf-flu/bin/fastq_dir_to_samplesheet.py \ -i /path/to/illumina_run/Data/Intensities/Basecalls/ \ -o samplesheet.csv
-
Typical command for Illumina Platform
nextflow run CFIA-NCFAD/nf-flu \ --input samplesheet.csv \ --platform illumina \ --profile <docker/singularity/podman/shifter/charliecloud/conda>
-
Typical command for Nanopore Platform
nextflow run CFIA-NCFAD/nf-flu \ --input samplesheet.csv \ --platform nanopore \ --profile <docker/singularity/conda>
-
The nf-flu pipeline comes with:
- NCBI Influenza FTP site
- IRMA Iterative Refinement Meta-Assembler
The nf-flu pipeline was originally developed by Peter Kruczkiewicz from CFIA-NCFAD, Hai Nguyen extended the piepline for Nanopore data analysis.
- nf-core project for establishing Nextflow workflow development best-practices, nf-core tools and nf-core modules
- nf-core/viralrecon for inspiration and setting a high standard for viral sequence data analysis pipelines
- Conda and Bioconda project for making it easy to install, distribute and use bioinformatics software.
- Biocontainers for automatic creation of Docker and Singularity containers for bioinformatics software in [Bioconda]