We are an organization/community with the goal of making snakemake pipeline development easier and faster, and a bit more structured.
We do this by providing snakemake modules that can be combined to create a complete analysis or included in already existing pipelines. All modules are subjected to extensive testing to make sure that new releases doesn't unexpectedly break existing pipeline or deviate from guidelines and best practices on how to write code.
Current tests:
- pycodestyle: (--max-line-length 130)
- snakefmt: (line-length 130)
- snakemake --lint: (line-length 130)
- snakemake dry-run
- execution test with small dataset
execution: integration test witt complete dataset: planned to be implemented
The small execution tests will make sure that the pipeline actually can be executed, i.e run and generate data. The integration test with complete data set will also evaluate the generated result to make sure that results does not change, if it's not deliberately.
We also have a development document describing how a hydra-genetic pipeline/module should be structured, named and what a rule should contain.
Please visit our ReadTheDocs for more information about using hydra-genetics.
A list of pipelines built with Hydra-Genetics can be found at https://github.com/hydra-genetics/hydra-genetics-pipelines.
The current repositories can be divided into the following sections.
Command line interface to create new modules/pipelines or adding a new rule to a existing project. Provides libraries used to make it easier for people not used to pandas to extract information from samples and units dataframes, these dataframes are generated from units.tsv (schema definition) and samples.tsv (schema definition) files and used as input for hydra-genetics.
Collection of docker files with bioinformatic tools used to execute snakemake with singularity. All dockers are automatically uploaded to dockerhub (after being merged into master)
Example of configuration profiles for executing Snakemake in various computing environments
Repositories used to setup an environment for users using windows or OSX, or any other system that doesn't support singularity.
Build script for a vagrant machine that can be used by Window/OSX/Linux user to run snakemake in combination with singularity, will not work for ARM cpu systems.
Snakemake module containing processing steps that are be performed during sequence alignment.
Collection of rules to annotate vcf files.
Collection of rules used to calculate biomarkers like MSI, TMB and HRD.
Collection of rules used to call structural variants
Rules used to compress files
Collection of variant filters
Collection of fusion callers
Module containing rules that are general and most likely will be used by multiple modules.
Snakemake module for performing Mitochondrial short variant discovery
Snakemake module containing an array of steps provided by the parabricks tookit
Snakemake module containing processing steps that are be performed before sequence alignment.
Collection of rules performing QC and generating reports.
Collection of rules used to create references, panel of normals (PoN), and background filters.
Sentieon tools
Collection of rules used to call snv and small indels