Workflow Tutorial

Background

This is the companion repo for a tutorial on workflow managers hosted on my blog. The overall idea is to demonstrate the use of two workflow managers commonly used in scientific computing, Nextflow and Snakemake, by automating a "pipeline" that doesn't require any special software or domain knowledge to use or understand. The purpose of the analysis is to determine if the words used in a set of books are more similar within genres than between genres. The input data, then, are 13 files obtained from Project Gutenberg, each containing a text of a book in the public domain. They are very loosely organized into three genres: children's literature, science fiction, and Shakespeare. The pipeline proceeds in four main steps. First, the input files are cleaned of Project Gutenberg specific headers and footers. Next, a word count distribution is calculated for each book. These distributions are then compared across all pairs of books using a metric called the Jensen-Shannon divergence, and afterwards the results are aggregated within and between genres.

Use

The pipeline's major components are largely written in Python and use SciPy and pandas for statistical functions and manipulating tabular data, respectively. (The exact versions are detailed in env.yaml). Executing the workflow files requires working installations of Nextflow and Snakemake. The Snakemake workflow additionally depends on an inline Bash script that uses a few standard Unix command-line programs.

To run the Nextflow workflow, use:

NXF_CONDA_ENABLED=true nextflow run workflow.nf

To run the Snakemake workflow, use:

snakemake --use-conda --conda-frontend conda -c 1 -s workflow.smk

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
code		code
data		data
.gitignore		.gitignore
README.md		README.md
env.yml		env.yml
workflow.nf		workflow.nf
workflow.smk		workflow.smk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Workflow Tutorial

Background

Use

About

Uh oh!

Releases

Packages

Languages

marcsingleton/workflow_tutorial

Folders and files

Latest commit

History

Repository files navigation

Workflow Tutorial

Background

Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages