Skip to content
Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Parallel Workflow extension for Rake, runs on multicores, clusters, clouds.

  • Author: Masahiro Tanaka

README in Japanese, GitHub Repository, RubyGems


  • Pwrake executes a workflow written in Rakefile in parallel.
    • The specification of Rakefile is same as Rake.
    • The tasks which do not have mutual dependencies are automatically executed in parallel.
    • The multitask which is a parallel task definition of Rake is no more necessary.
  • Parallel and distributed execution is possible using a computer cluster which consists of multiple compute nodes.
    • Cluster settings: SSH login (or MPI), and the directory sharing using a shared filesystem, e.g., NFS, Gfarm.
    • Pwrake automatically connects to remote hosts using SSH. You do not need to start a daemon.
    • Remote host names and the number of cores to use are provided in a hostfile.
  • Gfarm file system utilizes storage of compute nodes. It provides the high-performance parallel I/O.
    • Parallel I/O access to local storage of compute nodes enables scalable increase in the I/O performance.
    • Gfarm schedules a compute node to store an output file, to local storage.
    • Pwrake schedules a compute node to execute a task, to a node where input files are stored.
    • Other supports for Gfarm: Automatic mount of the Gfarm file system, etc.


  • Ruby version 2.2.3 or later
  • UNIX-like OS
  • For distributed processing using multiple computers:
    • SSH command
    • distributed file system (NFS, Gfarm, etc.)


Install with RubyGems:

$ gem install pwrake

Or download source tgz/zip and expand, cd to subdirectory and install:

$ ruby setup.rb

If you use rbenv, your system may fail to find pwrake command after installation:

-bash: pwrake: command not found

In this case, you need the rehash of command paths:

$ rbenv rehash


Parallel execution using 4 cores at localhost:

$ pwrake -j 4

Parallel execution using all cores at localhost:

$ pwrake -j

Parallel execution using total 2*2 cores at remote 2 hosts:

  1. Share your directory among remote hosts via distributed file system such as NFS, Gfarm.

  2. Allow passphrase-less access via SSH in either way:

    • Add passphrase-less key generated by ssh-keygen. (Be careful)
    • Add passphrase using ssh-add.
  3. Make hosts file in which remote host names and the number of cores are listed:

     $ cat hosts
     host1 2
     host2 2
  4. Run pwrake with an option --hostfile or -F:

     $ pwrake -F hosts

Sustitute MPI for SSH to start remote worker (Experimental)

  1. Setup MPI on your cluster.

  2. Install MPipe gem. (requires mpicc)

  3. Run pwrake-mpi command.

     $ pwrake-mpi -F hosts


Pwrake command line options (in addition to Rake option)

-F, --hostfile FILE              [Pw] Read hostnames from FILE
-j, --jobs [N]                   [Pw] Number of threads at localhost (default: # of processors)
-L, --log, --log-dir [DIRECTORY] [Pw] Write log to DIRECTORY
    --ssh-opt, --ssh-option OPTION
                                 [Pw] Option passed to SSH
    --filesystem FILESYSTEM      [Pw] Specify FILESYSTEM (nfs|gfarm)
    --gfarm                      [Pw] FILESYSTEM=gfarm
-A, --disable-affinity           [Pw] Turn OFF affinity (AFFINITY=off)
-S, --disable-steal              [Pw] Turn OFF task steal
-d, --debug                      [Pw] Output Debug messages
    --pwrake-conf [FILE]         [Pw] Pwrake configuration file in YAML
    --show-conf, --show-config   [Pw] Show Pwrake configuration options
    --report LOGDIR              [Pw] Generate `report.html' (Report of workflow statistics) in LOGDIR and exit.
    --report-image IMAGE_TYPE    [Pw] Gnuplot output format (png,jpg,svg etc.) in report.html.
    --clear-gfarm2fs             [Pw] Clear gfarm2fs mountpoints left after failure.


  • If pwrake_conf.yaml exists at current directory, Pwrake reads options from it.

  • Example (in YAML form):

      HOSTFILE: hosts
      LOG_DIR: true
      DISABLE_STEAL: true
      FAILED_TARGET: delete
      PASS_ENV :
       - ENV1
       - ENV2
  • Option list:

      HOSTFILE, HOSTS   nil(default, localhost)|filename
      LOG_DIR, LOG      nil(default, No log output)|true(dirname="Pwrake%Y%m%d-%H%M%S")|dirname
      LOG_FILE          default="pwrake.log"
      TASK_CSV_FILE     default="task.csv"
      COMMAND_CSV_FILE  default="command.csv"
      GC_LOG_FILE       default="gc.log"
      WORK_DIR          default=$PWD
      FILESYSTEM        default(autodetect)|gfarm
      SSH_OPTION        SSH option
      PASS_ENV          (Array) Environment variables passed to SSH
      HEARTBEAT         default=240 - Hearbeat interval in seconds
      RETRY             default=1 - The number of task retry
      HOST_FAILURE      default=2 - The number of allowed continuous host failure (since v2.3)
      FAILED_TARGET     rename(default)|delete|leave - Treatment of failed target files
      FAILURE_TERMINATION wait(default)|kill|continue - Behavior of other tasks when a task is failed
      QUEUE_PRIORITY          LIFO(default)|FIFO|LIHR(LIfo&Highest-Rank-first; obsolete)
      DISABLE_RANK_PRIORITY   false(default)|true - Disable rank-aware task scheduling (since v2.3)
      RESERVE_NODE            false(default)|true - Reserve a node for tasks with ncore>1 (since v2.3)
      SHELL_START_INTERVAL    default=0.012 (sec)
      GRAPH_PARTITION         false(default)|true
      REPORT_IMAGE            default=png
  • Options for Gfarm system:

      DISABLE_AFFINITY    default=false
      DISABLE_STEAL       default=false
      GFARM_BASEDIR       default="/tmp"
      GFARM_PREFIX        default="pwrake_$USER"
      GFARM_SUBDIR        default='/'
      MAX_GFWHERE_WORKER  default=8
      GFARM2FS_OPTION     default=""
      GFARM2FS_DEBUG      default=false
      GFARM2FS_DEBUG_WAIT default=1

Task Properties

  • Task properties are specified in desc strings above task definition in Rakefile.

Example of Rakefile:

desc "ncore=4 allow=ourhost*" # desc has no effect on rule in original Rake, but it is used for task property in Pwrake.
rule ".o" => ".c" do
  sh "..."

(1..n).each do |i|
  desc "ncore=2 steal=no" # desc should be inside of loop because it is effective only for the next task.
  file "task#{i}" do
    sh "..."

Properties (The leftmost item is default):

ncore=integer     - The number of cores used by this task.
exclusive=no|yes  - Exclusively execute this task in a single node.
reserve=no|yes    - Gives higher priority to this task if ncore>1. (reserve a host)
allow=hostname    - Allow this host to execute this task. (accepts wild card)
deny=hostname     - Deny this host to execute this task. (accepts wild card)
order=deny,allow|allow,deny - The order of evaluation.
steal=yes|no      - Allow task stealing for this task.
retry=integer     - The number of retry for this task.

Note for Gfarm

  • Gfarm file-affinity scheduling is achieved by gfwhere-pipe script bundled in the Pwrake package. This script accesses through Fiddle (a Ruby's standard module) since Pwrake ver.2.2.7. Please set the environment variable LD_LIBRARY_PATH correctly to find

Scheduling with Graph Partitioning



This work is supported by:

You can’t perform that action at this time.