Skip to content

ivargr/snakehelp

Repository files navigation

Test of snakemake pipeline

Snakehelp

A small Python package for making Snakemake pipelines easier to write and maintain.

Installation

pip install snakehelp

About and usage

Snakemake-pipelines often end up with long wildcards paths, which are hard to read, often copied between rules and hard to debug when something doesn't work:

rule simulate:
    output:
        simulated_file = "data/{param1}/{param2}/{param3}/{param4}/data.csv"

rule some_analysis:
    output:
        simulated_file = "data/{param1}/{param2}/{param3}/{param4}/plot.png"

Often the {param1}/{param2}/.. and so on are copy-pasted and hardcoded into several rules, so that when you want to add a new parameter or do changes, you have to do the changes many places (and some are often forgotten).

This packages lets you instead define parameters as Python dataclasses, and then use these dataclasses in the Snakemake rules:

from snakehelp import parameters
from typing import Literal

@parameters
class SimulatedData:
    param1: str = "some_default_value"
    param2: float = 3.14
    param3: Literal["a", "b", "c"] = "a"
    param4: int = 100
    file_ending = ".csv"
rule simulation:
    output:
        simulated_file = SimulatedData.path()

This looks a bit magic, but the only thing .path() does is creating a valid Snakemake wildcard path:

'{param1,\\w+}/{param2,[+-]?([0-9]*[.])?[0-9]+}/{param3,a|b|c}/{param4,\\d+}.csv'

The addede benefits are:

  • You only change the dataclass when you want to change something
  • You can add @parameters to existing dataclasses to make them compatible with Snakemake rules (as long as the fields are either of classes with @parameters or base types)
  • Regexes are automatically generated for you and added to the snakemake path (making errors and ambigious rules less likely)
  • Classes can have references to other @parameter-classes, so you can easily create complex structures without duplicating code (or writing many complex paths manually)

Some useful features

The path() method can take fixed parameters, which is useful when defining rules that require some specific value for some parameter:

print(SimulatedData.path(param1="fixed_value"))
fixed_value/{param2,[+-]?([0-9]*[.])?[0-9]+}/{param3,a|b|c}/{param4,\d+}.csv

You can nest @parameters-decorated classes:

@parameters
class A:
    param1: int = 1
    param2: int = 2

@parameters
class B:
    a: A
    param3: int = 3
    file_ending = ".csv"

# B.path() will now give you a path that includes the parameters from A:
print(B.path())
# gives: {param1,\d+}/{param2,\d+}/{param3,\d+}.csv

You can create objects from Parameter-classes by calling the from_flat_params()-method. Here you can specify any parameters, also parameters for nested classes. This makes it easier to tests rules without having to manually write long paths:

rule test_something:
    output:
        B.from_flat_params(param1=100).file_path()
        # gives: '100/2/3.csv'

Note that file_path() can be called on objects to create an actual path, and path() can be called on the classes to generate a wildcard path.

Gathering results and plotting

Snakehelp can also help you gather results produced by rules into a Pandas dataframe.

Assume you have some results that depend on a set of parameters:

class SomeResult:
    param1: int
    param2: str
    param3: Literal["a", "b", "c"]
    param4: float
    file_ending = ".csv"

You typically have a rule that creates some results from these parameters:

rule RunSomeAnalysis:
    output:
        SomeResult.path()
    run:
        # do some analysis
        # write to SomeResult
        with open(output[0], "w") as f:
            f.write(123)

(This section has not been finished written)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published