Skip to content

Latest commit

 

History

History
255 lines (199 loc) · 15.8 KB

index.rst

File metadata and controls

255 lines (199 loc) · 15.8 KB

Snakemake

https://img.shields.io/conda/dn/bioconda/snakemake.svg?label=Bioconda https://img.shields.io/docker/cloud/build/snakemake/snakemake https://github.com/snakemake/snakemake/workflows/CI/badge.svg?branch=master https://img.shields.io/twitter/follow/johanneskoester.svg?style=social&label=Follow GitHub stars

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.

Snakemake is highly popular with, ~3 new citations per week.

Quick Example

Snakemake workflows are essentially Python scripts extended by declarative code to define rules. Rules describe how to create output files from input files.

rule targets:
    input:
        "plots/myplot.pdf"

rule transform:
    input:
        "raw/{dataset}.csv"
    output:
        "transformed/{dataset}.csv"
    singularity:
        "docker://somecontainer:v1.0"
    shell:
        "somecommand {input} {output}"

rule aggregate_and_plot:
    input:
        expand("transformed/{dataset}.csv", dataset=[1, 2])
    output:
        "plots/myplot.pdf"
    conda:
        "envs/matplotlib.yaml"
    script:
        "scripts/plot.py"
  • Similar to GNU Make, you specify targets in terms of a pseudo-rule at the top.
  • For each target and intermediate file, you create rules that define how they are created from input files.
  • Snakemake determines the rule dependencies by matching file names.
  • Input and output files can contain multiple named wildcards.
  • Rules can either use shell commands, plain Python code or external Python or R scripts to create output files from input files.
  • Snakemake workflows can be easily executed on workstations, clusters, the grid, and in the cloud without modification. The job scheduling can be constrained by arbitrary resources like e.g. available CPU cores, memory or GPUs.
  • Snakemake can automatically deploy required software dependencies of a workflow using Conda or Singularity.
  • Snakemake can use Amazon S3, Google Storage, Dropbox, FTP, WebDAV, SFTP and iRODS to access input or output files and further access input files via HTTP and HTTPS.

Getting started

To get a first impression, see our introductory slides or watch the live demo video. News about Snakemake are published via Twitter. To learn Snakemake, please do the :ref:`tutorial`, and see the :ref:`FAQ <project_info-faq>`.

Support

Citation

Köster, Johannes and Rahmann, Sven. "Snakemake - A scalable bioinformatics workflow engine". Bioinformatics 2012.

See :doc:`Citations <project_info/citations>` for more information.

Resources

Snakemake Wrappers Repository
The Snakemake Wrapper Repository is a collection of reusable wrappers that allow to quickly use popular tools from Snakemake rules and workflows.
Snakemake Workflows Project
This project provides a collection of high quality modularized and re-usable workflows. The provided code should also serve as a best-practices of how to build production ready workflows with Snakemake. Everybody is invited to contribute.
Snakemake Profiles Project
This project provides Snakemake configuration profiles for various execution environments. Please consider contributing your own if it is still missing.
Bioconda
Bioconda can be used from Snakemake for creating completely reproducible workflows by defining the used software versions and providing binaries.

Publications using Snakemake

In the following you find an incomplete list of publications making use of Snakemake for their analyses. Please consider to add your own.

.. toctree::
   :caption: Getting started
   :name: getting_started
   :hidden:
   :maxdepth: 1

   getting_started/installation
   tutorial/tutorial
   tutorial/short


.. toctree::
  :caption: Executing workflows
  :name: execution
  :hidden:
  :maxdepth: 1

  executing/cli
  executing/cluster-cloud
  executing/caching
  executing/interoperability

.. toctree::
    :caption: Defining workflows
    :name: snakefiles
    :hidden:
    :maxdepth: 1

    snakefiles/writing_snakefiles
    snakefiles/rules
    snakefiles/configuration
    snakefiles/modularization
    snakefiles/remote_files
    snakefiles/utils
    snakefiles/deployment
    snakefiles/reporting


.. toctree::
    :caption: API Reference
    :name: api-reference
    :hidden:
    :maxdepth: 1

    api_reference/snakemake
    api_reference/snakemake_utils
    api_reference/internal/modules


.. toctree::
    :caption: Project Info
    :name: project-info
    :hidden:
    :maxdepth: 1

    project_info/citations
    project_info/more_resources
    project_info/faq
    project_info/contributing
    project_info/authors
    project_info/history
    project_info/license