Add support for --jobs NUMBER (or -j) #112

Closed
wants to merge 3 commits into
from

Conversation

Projects
None yet
1 participant
Contributor

michaeljbishop commented Apr 18, 2012

PROBLEM SUMMARY:

Rake can be unusable for builds invoking large numbers of concurrent external processes.

PROBLEM DESCRIPTION:

Rake makes it easy to maximize concurrency in builds with its "multitask" function. When using rake to build non-ruby projects quite often rake needs to execute shell tasks to process files. Unfortunately, when executing multitasks, rake spawns a new thread for each task-prerequisite. This shouldn't cause problems when the build code is pure ruby (for green threads), but when the tasks are executing external processes, the sheer number of spawned processes can cause the machine to thrash. Additionally ruby can reach the maximum number of open files (presumably because it's reading stdout for all those processes).

SOLUTION SUMMARY:

This request includes the code to add support for a "--jobs NUMBER (-j)" command-line option to specify the number of simultaneous tasks to execute.

SOLUTION:

The solution creates a work queue to which blocks calling the task-prerequisites are added and a thread pool to process them. To prevent deadlock, the task that added the pre-requisites processes items on the queue (alongside the thread pool) until its prerequisites have been processed.

To maintain backward compatibility, not passing -j reverts to the old behavior of unlimited concurrent tasks.

REQUIREMENTS:

The Ruby version requirements remain the same. "multi-task.rb" adds two new requirements: 'thread' and 'set'

Rake now supports a --jobs <n> command-line option.
DESCRIPTION
-----------
The new option: "--jobs number (-j)" specifies the maximum number of
concurrent tasks. The suggested value is equal to the number of CPUs.

Sample values:
  default: unlimited concurrent tasks (standard rake behavior)
  1: one task at a time

The code consists of two major edits, the first is a change to
`application.rb` to support the parsing of the option.

The second is a more substantial change to `multi_task.rb` which
replaces the multi-task scheduling algorithm. Instead of spawning a new
thread for every pre-requisite that needs to be executed, a block is
created which calls the pre-requisite and added to a Queue.

Additionally, a thread-pool is created to pull the blocks off the queue
and execute them. Finally, the MultiTask queueing up its prerequisites
will itself participate in the block-processing while waiting for its
prerequisites to finish processing.

It can tell when its prerequisites are finished by enveloping the
queued blocks in another block that adds a little bookkeeping.

VERSION REQUIREMENTS
--------------------
Rake ruby version requirements remain unchanged.
@@ -31,12 +31,20 @@ Options are:
[<tt>--execute-print</tt> _code_ (-p)]
Execute some Ruby code, print the result, and exit.
-[<tt>--execute-continue</tt> _code_ (-p)]
+[<tt>--execute-continue</tt> _code_ (-E)]
@michaeljbishop

michaeljbishop Apr 19, 2012

Contributor

This has nothing to do with the -j option, I just found out the docs differ from the implementation here. This is updated to match the implementation.

michaeljbishop added some commits Apr 20, 2012

Fixed a bug where the MultiTask tests were not testing the thread pool
This is because I left in the original code which just passed through
to spawn unlimited threads.

Now the code always uses the thread pool, and sets the initial
limit to be the maximum Fixnum (which means virtually unlimited)

This is nice because it means the rake MultiTask tests are now
stressing the thread pool implementation.

Additionally, the thread pool size can now be changed dynamically
to adjust to load by changing 'application.options.thread_pool_size'
while rake is running.

There is no code that observes load, but it certainly could and
adjust as it saw fit.

Notes:
  While the threads are in the their processing loop, they add
  other threads need to be added to the pool to meet the limit.
  Additionally, if the thread pool size is larger than the
  application preference the thread exits.
Fixed bug where the stack was being blown
The previous "add blocks to queue" method worked, but had the
unnecessary side-effect of blowing the stack for large amounts of
prerequisites.

This is a new thread pool implementation which retains all the
advantages of the original, but keeps the stack size the same as the
pre-thread-pool rake.

It's closer to the pre-thread-pool rake implementation with the
addition of checking the thread pool size before spawning a new thread.
Contributor

michaeljbishop commented Apr 21, 2012

Closing pull request for further testing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment