Skip to content

pjotrp/bioruby-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bio-pipeline

Build Status

Common pipeline tasks. This Bio module does not do the work of a job scheduler, for this you can choose to use our simple Ruby Queue (rq) from many other schedulers.

bio-pipeline, meanwhile, addresses do-not-repeat-yoursefl (DRY) principles for creating tasks at the job level, and aims for convention-over-configuration (CoC). For example, bio-pipeline comes with a library of templates, mostly based on YAML and ERB, for common bioinformatics tasks.

Another feature of bio-pipeline is the run-once command, which caches results and won't calculate the same result twice - allowing resilience in the pipeline (when one or more jobs fails, just rerun the pipeline). Also the pipeline can be interrupted and start where it left off.

You do not need to know Ruby to use bio-pipeline. But you it may be interesting to note that other successful tools for cluster deployment use similar ideas. For example Chef uses Ruby, YAML and ERB for configuring machines. It may be an idea to combine Chef with bio-pipeline.

Note: this software is under active development! Feel free to pitch in.

task files as YAML/erb templates

In order to describe a job that can be run in a pipeline, we introduce a data structure in YAML, a task file, which acts also as a template preparsed by erb. An example for running an alignment program would be

    # task file: muscle.yaml
    :inputs:
      - <%= in_file = 'aa.fa' %>      # here we set in_file too!
    :commands:
      - <%= muscle_bin %> -i <%= in_file %> -o <%= output_dir %>/aa-align.fa
    :outputs:
      - <%= output_dir %>             # defaults to ./output

Note that in_file gets defined in the YAML task file, while muscle_bin and output_dir are defined by the calling context. Run this command from the command line with

  ./bin/runner -c muscle.yaml

The idea here is to have richer meta-data possibilities, and rather than using commands on the command line we can easily share common tasks, add context, paths, and features like creating and copying the output_dir.

To set/override parameters outside the template, they can also be added on the command line as switches:

  ./bin/runner -c muscle.yaml -output_dir tmp -muscle_bin /opt/muscle/bin/muscle

the runner handles that by copying the switches into the name space - using some nice Ruby magic.

erb executes the Ruby between <% and %> on compiling the template. After this, at runtime, you can run Ruby programs as scripts, but you can also call into the bio-pipeline engine and libraries. A command is always checked if it exists as a method in the engine's namespace first. So if a command exists as a method the rest of the command is executed as Ruby in the local interpreter. For example

    :commands:
      - BioPipeline::report(<%= in_file %>,<%= output_dir %>/aa-align.fa)

Within the task file commands section, commands are simply executed in sequence.

Chaining task files

Chaining tasks allows modularising work in task files - so each task file represents as few steps as possible. To chain we want

  1. to call the next task file
  2. to pass in new inputs (including output of the current task)

(more soon)

run-once

(coming soon)

map reduce and dependencies

(coming soon)

more documentation

Features describe the behaviour of bio-pipeline. More documentation can also be found

Installation

(sorry, not ready yet!)

    gem install bio-pipeline

Usage

    require 'bio-pipeline'

The API doc is online. For more code examples see the test and feature files in the source tree.

Project home page

Information on the source tree, documentation, examples, issues and how to contribute, see

http://github.com/pjotrp/bioruby-pipeline

The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.

Cite

If you use this software, please cite one of

Biogems.info

This Biogem is published at #bio-pipeline

Copyright

Copyright (c) 2012 Pjotr Prins. See LICENSE.txt for further details.

About

TODO: one-line summary of your gem

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages