This Nextflow pipeline is designed to process metagenomic sequencing data, characterize overall taxonomic composition, and identify and quantify reads mapping to viruses infecting certain host taxa of interest. It was developed as part of the Nucleic Acid Observatory project.
The pipeline currently consists of three workflows:
INDEX: Creates indices and reference files used by theRUNandRUN_VALIDATIONworkflows1.RUN: Performs the main analysis, including QC, viral identification, taxonomic profiling, and optional BLAST validation.RUN_VALIDATION: Performs part of therunworkflow dedicated to validation of taxonomic classification with BLAST2.DOWNSTREAM: Performs downstream analysis of the results from therunworkflow 3.
- Installation and usage:
- Workflow details:
- Configuration and output:
- Other:
Footnotes
-
The
INDEXworkflow is intended to be run first, after which many instantiations of theRUNworkflow can use the same index output files. ↩ -
The
RUN_VALIDATIONworkflow is intended to be run after theRUNworkflow if the optional BLAST validation was not selected during theRUNworkflow. Typically, this workflow is run on a subset of the host viral reads identified in theRUNworkflow, to evaluate the sensitivity and specificity of the viral identification process. ↩ -
The
DOWNSTREAMworkflow is designed to handle tasks that require cross-read comparisons, including potentially across multiple runs, e.g., marking duplicate reads. ↩