## Modularity in Snakemake workflows

The Snakemake workflow engine has proven itself useful in creating, maintaining, and extending a variety of our bioinformatics analysis workflows.
We use Snakemake to create both single-purpose workflows as well as workflows that we expect to use and re-use frequently over an extended period of time.
Accordingly, we will be faced with two recurrent questions.

1. Which parts of which workflows can be re-used?
2. What is the best strategy for modularity and re-use?

We don't yet have enough information to provide a detailed and confident answer for those questions in the context of NBFAC workflows.
The purpose of this workshop is to describe three complementary strategies to creating workflows or workflow components that can be maintained in a single place and re-used in numerous contexts.

## Synopsis

1. **Subworkflows** are the best way to connect distinct workflows, when each of those workflows is capable of running on its own. Each workflow can have its own configuration.
2. **Includes** are typically a small set of re-usable rules that get embedded into a larger workflow and share its configuration. This is the simplest way to re-use workflow components.
3. **Wrappers** are used at the rule level, and provide the smallest building block for constructing workflows from re-usable parts. Snakemake maintains a public repository of wrappers for common tools, but Snakemake can also make use of private or local wrapper definitions.

## Example workflow

![Workflow](workflow.png)

The demo workflow for this workshop is a fairly simple and linear workflow. In summary, it:

- Downsamples Illumina reads to a user-specified number of read pairs using seqtk
- Assembles the downsampled reads using SPAdes
- Maps the original reads back to the assembled contigs using Bowtie2
- Converts the alignment output in SAM format to BAM format, and sorts the aligned reads by position using SAMtools
- Calculates some summary statistics from the read mappings using SAMtools

The workflow consists of 9 rules: 3 related to proprocessing and assembly, 5 related to read mapping, and the 1 "default" rule to rule them all.
With subworkflows and includes, we can explore how to re-use groups of related rules.
With wrappers, we can explore how to replace rules running commonly used software with standardized invocation.

## Subworkflows

asdf

## Includes

Snakemake's `include` statement allows a user to import the contents of another Snakefile.
This other file can implement an entire workflow itself, or represent only a fragment of a workflow.
Snakemake operates as if the user had copied and pasted the contents of the `include`d file into the main Snakefile (except that the default rule is not affected).
As a result, the main workflow and all included workflows share a single scope, and thus use a single shared configuration.

- pros
    - simplest to implement
    - included Snakefiles don't have to be free-standing workflows
    - ???
- cons
    - shared scope encourages tight coupling of loosely related steps
    - shared config not easy to implement

## Wrappers

asdf