Skip to content

nextstrain/zika-tutorial-nextflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nextstrain build for Zika virus tutorial

This repository provides the data and scripts associated with the Zika virus tutorial. See the original Zika build repository for more about the public build.

This repo is a conversion of the standard Snakemake workflow into the Nextflow workflow language.

git clone https://github.com/nextstrain/zika-tutorial-nextflow.git
cd zika-tutorial-nextflow
nextflow run main.nf

Help Statement

Assuming you have a working installation of Nextflow

nextflow run nextstrain/zika-tutorial-nextflow -r main --help
See help statement
N E X T F L O W  ~  version 21.10.6
Launching `main.nf` [naughty_pasteur] - revision: f50f9d379a

  Usage:
   The typical command for running the pipeline are as follows:
   nextflow run nextflow/zika-tutorial-nextflow -r main -profile docker
   
   Input Files:
   --sequences                        Sequences fasta [default: 'false']
   --metadata                         Metadata tsv file [default: 'false']
   --exclude                          List of excluded sequences file [default: 'false']
   --reference                        Reference genbank file [default: 'false']
   --colors                           Colors tsv file [default: 'false']
   --lat_longs                        Latitude and longituide file [default: 'false']
   --auspice_config                   Auspice config file [default: 'false']
   Optional augur arguments
   --filter_args                      Parameters passed to filter [default: '--group-by country year month --sequences-per-group 20 --min-date 2012']
   --align_args                       Parameters passed to align [default: '--fill-gaps']
   --tree_args                        Parameters passed to tree [default: '']
   --refine_args                      Parameters passed to refine [default: '--timetree --coalescent opt --date-confidence --date-inference marginal --clock-filter-iqd 4']
   --ancestral_args                   Parameters passed to ancestral [default: '--inference joint']
   --traits_args                      Parameters passed to traits [default: '--columns region country --confidence']
   Optional arguments:
   --augur_app                        Augur executable [default: 'augur']
   --outdir                           Output directory to place final output [default: 'results']
   --help                             This usage statement.
   --check_software                   Check if software dependencies are available.

Demonstration

Augur commands were wrapped in processes (similar to Snakemake's rules) and placed in the modules/augur.nf. Nextflow processes were imported into main.nf and connected via Nextflow channels.

sequence_ch 
 | index                   // INDEX
 | combine(metadata_ch) 
 | combine(exclude_ch)
 | combine(channel.of("--group-by country year month --sequences-per-group 20 --min-date 2012"))
 | filter                  // FILTER
 | combine(reference_ch ) 
 | combine(channel.of("--fill-gaps"))
 | align                   // ALIGN
 | combine(channel.of(""))
 | tree                    // TREE
 | combine(align.out) 
 | combine(metadata_ch) 
 | combine(channel.of("--timetree --coalescent opt --date-confidence --date-inference marginal --clock-filter-iqd 4")) 
 | refine                  // REFINE
...

See main.nf for full details.

To run the workflow:

# (1) Install nextstrain or activate the nextstrain conda environment
conda activate nextstrain

# (2) Install nextflow via conda or mamba
conda install -c bioconda nextflow

# (3) Place the input files in the current directory in a "data" folder

# (4) Run pipeline on a set of input files
nextflow run nextstrain/zika-tutorial-nextflow \
         --sequences "data/sequences.fasta" \
         --metadata "data/metadata.tsv" \
         --colors "data/colors.tsv" \
         --auspice_config "data/auspice_config.json" \
         --lat_longs "data/lat_longs.tsv" \
         --colors "data/colors.tsv" \
         --exclude "data/dropped_strains.txt" \
         --reference "data/zika_outgroup.gb" \
         -resume

#> N E X T F L O W  ~  version 21.10.6
#> Launching `main.nf` [maniac_hypatia] - revision: d136460fdb
#> executor >  local (9)
#> [69/fb06ea] process > index (1)     [100%] 1 of 1 ✔
#> [14/54db50] process > filter (1)    [100%] 1 of 1 ✔
#> [c7/1a6fc7] process > align (1)     [100%] 1 of 1 ✔
#> [68/c3cfc9] process > tree (1)      [100%] 1 of 1 ✔
#> [1e/1d2fd1] process > refine (1)    [100%] 1 of 1 ✔
#> [f4/813036] process > ancestral (1) [100%] 1 of 1 ✔
#> [52/02e75d] process > translate (1) [100%] 1 of 1 ✔
#> [b7/55d75f] process > traits (1)    [100%] 1 of 1 ✔
#> [71/98c68b] process > export (1)    [100%] 1 of 1 ✔
#> WARN: Task runtime metrics are not reported when using macOS without a container engine
#> Completed at: 11-Feb-2022 10:54:14
#> Duration    : 1m 23s
#> CPU hours   : (a few seconds)
#> Succeeded   : 9

The output folder should look like:

results/
|_ 01_Index/          #<= contains output files for each step
|_ 02_Filter/
|_ 03_Align/
|_ 04_Tree/
|_ 05_Refine/
|_ 06_Ancestral/
|_ 07_Translate/
|_ 08_Traits/
|_ auspice/           #<= Final files! Use "nextstrain view results/auspice"
|
|_ report.html
|_ timeline.html      #<= runtime and memory use at each step

Based on Nextflow's timeline, the refine step seems to take the longest.

About

Nextflow based pipeline for Zika tutorial

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published