Skip to content

Releases: suvayu/genome-genie

Data pipeline improvements

20 Mar 20:54
Compare
Choose a tag to compare
  • Aggregate batch job logs into the pipeline summary dataframe.
  • Use logs to determine job status (finished/failed).

Data pipeline

18 Mar 23:03
Compare
Choose a tag to compare

This release includes a simple shell template based data pipelining infrastructure. With the given API, you can specify a dependency graph (a DAG), but the jobs are run as regular shell jobs under an execution engine like SGE, PBS, or LSF. It also incorporates some automatic parallelism based on the number of input files. Advanced features like rerun, pause, resume, etc, are not supported.

A job script (pipeline-job.py) that makes use of some of these features, has also been provided. It maybe used with a JSON pipeline configuration. A second script (debug-templates.py) is also provided that helps in debugging templates during the development process.

Happy running!