bio-pipeline

Common pipeline tasks. This Bio module does not do the work of a job scheduler, for this you can choose to use our simple Ruby Queue (rq) from many other schedulers.

bio-pipeline, meanwhile, addresses do-not-repeat-yoursefl (DRY) principles for creating tasks at the job level, and aims for convention-over-configuration (CoC). For example, bio-pipeline comes with a library of templates, mostly based on YAML and ERB, for common bioinformatics tasks.

Another feature of bio-pipeline is the run-once command, which caches results and won't calculate the same result twice - allowing resilience in the pipeline (when one or more jobs fails, just rerun the pipeline). Also the pipeline can be interrupted and start where it left off.

You do not need to know Ruby to use bio-pipeline. But you it may be interesting to note that other successful tools for cluster deployment use similar ideas. For example Chef uses Ruby, YAML and ERB for configuring machines. It may be an idea to combine Chef with bio-pipeline.

Note: this software is under active development! Feel free to pitch in.

task files as YAML/erb templates

In order to describe a job that can be run in a pipeline, we introduce a data structure in YAML, a task file, which acts also as a template preparsed by erb. An example for running an alignment program would be

    # task file: muscle.yaml
    :inputs:
      - <%= in_file = 'aa.fa' %>      # here we set in_file too!
    :commands:
      - <%= muscle_bin %> -i <%= in_file %> -o <%= output_dir %>/aa-align.fa
    :outputs:
      - <%= output_dir %>             # defaults to ./output

Note that in_file gets defined in the YAML task file, while muscle_bin and output_dir are defined by the calling context. Run this command from the command line with

  ./bin/runner -c muscle.yaml

The idea here is to have richer meta-data possibilities, and rather than using commands on the command line we can easily share common tasks, add context, paths, and features like creating and copying the output_dir.

To set/override parameters outside the template, they can also be added on the command line as switches:

  ./bin/runner -c muscle.yaml -output_dir tmp -muscle_bin /opt/muscle/bin/muscle

the runner handles that by copying the switches into the name space - using some nice Ruby magic.

erb executes the Ruby between <% and %> on compiling the template. After this, at runtime, you can run Ruby programs as scripts, but you can also call into the bio-pipeline engine and libraries. A command is always checked if it exists as a method in the engine's namespace first. So if a command exists as a method the rest of the command is executed as Ruby in the local interpreter. For example

    :commands:
      - BioPipeline::report(<%= in_file %>,<%= output_dir %>/aa-align.fa)

Within the task file commands section, commands are simply executed in sequence.

Chaining task files

Chaining tasks allows modularising work in task files - so each task file represents as few steps as possible. To chain we want

to call the next task file
to pass in new inputs (including output of the current task)

(more soon)

run-once

(coming soon)

map reduce and dependencies

(coming soon)

Installation

(sorry, not ready yet!)

    gem install bio-pipeline

Usage

    require 'bio-pipeline'

The API doc is online. For more code examples see the test and feature files in the source tree.

Project home page

Information on the source tree, documentation, examples, issues and how to contribute, see

http://github.com/pjotrp/bioruby-pipeline

The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.

Cite

If you use this software, please cite one of

Biogems.info

This Biogem is published at #bio-pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
bin		bin
doc		doc
features		features
lib		lib
spec		spec
test/data		test/data
.document		.document
.gitignore		.gitignore
.rspec		.rspec
.travis.yml		.travis.yml
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
VERSION		VERSION

License

pjotrp/bioruby-pipeline

Folders and files

Latest commit

History

Repository files navigation

bio-pipeline

task files as YAML/erb templates

Chaining task files

run-once

map reduce and dependencies

more documentation

Installation

Usage

Project home page

Cite

Biogems.info

Copyright

About

Resources

License

Stars

Watchers

Forks

Languages