Add -m option to change all tasks into multitasks #133

Merged
merged 1 commit into from Oct 24, 2012

Projects

None yet

3 participants

@michaeljbishop

This pull request includes code which:

Adds a new method Rake::Task#invoke_prerequisites_concurrently.

Rake::MultiTask#invoke_prerequisites always calls
Rake::Task#invoke_prerequisites_concurrently and now
Rake::Task#invoke_prerequisites will call it when
Rake.application.options.always_multitask == true.

Passing -m at the command-line sets
Rake.application.options.always_multitask to true

New tests are added as well as documentation.

@michaeljbishop michaeljbishop Added -m option which changes all tasks into multitasks
Adds a new method `Rake::Task#invoke_prerequisites_concurrently`.

`Rake::MultiTask#invoke_prerequisites` always calls
`Rake::Task#invoke_prerequisites_concurrently` and now
`Rake::Task#invoke_prerequisites` will call it when
`Rake.application.options.always_multitask == true`.

Passing `-m` at the command-line sets
`Rake.application.options.always_multitask` to `true`
c249a0a
@jimweirich
Owner

There seems to be some debate whether we do this or merge drake. I'm going to let the debate continue a bit before I move on this one way or the other.

@michaeljbishop

Yes, I saw the discussion. I've added my thoughts to the mailing list.

On Oct 23, 2012, at 4:44 PM, Jim Weirich notifications@github.com wrote:

There seems to be some debate whether we do this or merge drake. I'm going to let the debate continue a bit before I move on this one way or the other.


Reply to this email directly or view it on GitHub.

@jimweirich
Owner

I'm wondering if it would be worthwhile to be able to get some post-run statistics out of the thread pool. Maximum number of tasks created is what I'm particularly interested in. Since the pool will create extra threads if one is blocked, I'd be interested in how many threads were really needed to finish a task.

Thoughts?

@michaeljbishop

That's a great idea. I'll see if I can work in some private accessors for that kind of data.

I have another idea which might possibly remove some extra threads and that kind of data would be helpful for me to see if it does.

Also, there's the potential for the thread-pool to create and delete a lot of threads and I wonder if there isn't some way to throttle that down so threads are reused more than re-created.

More to look into…

On Oct 24, 2012, at 10:50 AM, Jim Weirich notifications@github.com wrote:

I'm wondering if it would be worthwhile to be able to get some post-run statistics out of the thread pool. Maximum number of tasks created is what I'm particularly interested in. Since the pool will create extra threads if one is blocked, I'd be interested in how many threads were really needed to finish a task.

Thoughts?


Reply to this email directly or view it on GitHub.

@jimweirich jimweirich merged commit c249a0a into jimweirich:master Oct 24, 2012
@michaeljbishop

I checked in a change that adds some instrumentation to the ThreadPool. I'm a little unfamiliar with how this works. Would I need to issue a pull request for you to get it os is checking it in good enough?

michaeljbishop@a89a68d

On Oct 24, 2012, at 10:50 AM, Jim Weirich notifications@github.com wrote:

I'm wondering if it would be worthwhile to be able to get some post-run statistics out of the thread pool. Maximum number of tasks created is what I'm particularly interested in. Since the pool will create extra threads if one is blocked, I'd be interested in how many threads were really needed to finish a task.

Thoughts?


Reply to this email directly or view it on GitHub.

@michaeljbishop

I was thinking about optimizations that might reduce the number of overall threads. One thought I had was to have threads actively process the queue while they are waiting on their futures.

Here's what this would look like: A thread calls #value on a future, but if another thread has the lock on the actual processing of the future, the current thread executes some blocks out of the queue while waiting for the other thread to finish processing the future.

One danger might be that calling blocks on the queue while waiting would take a longer time to complete than simply waiting for the future to complete. But it would save threads.

On Oct 24, 2012, at 10:50 AM, Jim Weirich notifications@github.com wrote:

I'm wondering if it would be worthwhile to be able to get some post-run statistics out of the thread pool. Maximum number of tasks created is what I'm particularly interested in. Since the pool will create extra threads if one is blocked, I'd be interested in how many threads were really needed to finish a task.

Thoughts?


Reply to this email directly or view it on GitHub.

@jimweirich
Owner

On Oct 25, 2012, at 8:37 AM, Michael Bishop notifications@github.com wrote:

I was thinking about optimizations that might reduce the number of overall threads. One thought I had was to have threads actively process the queue while they are waiting on their futures.

Here's what this would look like: A thread calls #value on a future, but if another thread has the lock on the actual processing of the future, the current thread executes some blocks out of the queue while waiting for the other thread to finish processing the future.

One danger might be that calling blocks on the queue while waiting would take a longer time to complete than simply waiting for the future to complete. But it would save threads.

I see two possible solutions for this issue. One is the one you offer above. I like that because I see no reason for a thread to wait on a future if there is other work to be done. I'm not overly concerned about it taking slightly longer because it is processing a job when it could continue, that's the oddities of thread scheduling. I'd love to see what you come up with.

Another solution might be to look at comp_tree, the gem used by drake and mimic how that handles it (but without the prebuilt computation tree it wants to use.

-- Jim Weirich
-- jim.weirich@gmail.com

@jimweirich
Owner

So, I've pushed your statistics stuff into master and tied a but of command line options to it to display the event history. Then I created a Rakefile that looks like this:

#!/usr/bin/ruby -wKU

require 'rake/clean'

N=10

task :default => (1..N).map { |i| "t#{i}" }

SUBJOBS = N.times.map { |i| "s#{i+1}" }

(1..N).each do |i|
  task "t#{i}" => SUBJOBS do
  end
  task "s#{i}" => "g#{i}" do
  end
  task "g#{i}" => "k#{i}" do
  end
  task "k#{i}" => "m#{i}" do
  end
  task "m#{i}" => :slow do
  end
end

task :mid => :slow

task :slow do
  sleep 2
end

And run a rake command like so:

time ruby -I../lib ../bin/rake  -j3 -m --job-stats=history

I get a max of 5 tasks in play at any one time. I've tried all kinds of combinations and variations in the Rakefile but was unable to get more that 2 tasks above the -j limit.

That either means we are good, or I don't understand how to cause more tasks to wait on futures all at once. (I suspect the latter).

Feel free to play with it (its in master). I've tweeked the statistics stuff that you provided a bit, but the history stuff is cool. Great idea.

@michaeljbishop

Oh, good. I'm glad you liked it and I liked your tweaks as well.

test/test_rake_thread_pool.rb might yield some good stress tests.

If I'm honest, the results are a little surprising to me. I thought that kind of arrangement would generate more sleeping threads. Even test_pool_always_has_max_threads_doing_work will generate 7 threads and it's doing less than the sample Rakefile you sent.

All this makes me wonder if there is an interaction between Rake and the ThreadPool that artificially keeps the count low and I'll need to look at it a little more closely.

On Oct 25, 2012, at 12:23 PM, Jim Weirich notifications@github.com wrote:

So, I've pushed your statistics stuff into master and tied a but of command line options to it to display the event history. Then I created a Rakefile that looks like this:

#!/usr/bin/ruby -wKU

require 'rake/clean'

N=10

task :default => (1..N).map { |i| "t#{i}" }

SUBJOBS = N.times.map { |i| "s#{i+1}" }

(1..N).each do |i|
task "t#{i}" => SUBJOBS do
end
task "s#{i}" => "g#{i}" do
end
task "g#{i}" => "k#{i}" do
end
task "k#{i}" => "m#{i}" do
end
task "m#{i}" => :slow do
end
end

task :mid => :slow

task :slow do
sleep 2
end
And run a rake command like so:

time ruby -I../lib ../bin/rake -j3 -m --job-stats=history
I get a max of 5 tasks in play at any one time. I've tried all kinds of combinations and variations in the Rakefile but was unable to get more that 2 tasks above the -j limit.

That either means we are good, or I don't understand how to cause more tasks to wait on futures all at once. (I suspect the latter).

Feel free to play with it (its in master). I've tweeked the statistics stuff that you provided a bit, but the history stuff is cool. Great idea.


Reply to this email directly or view it on GitHub.

@jimweirich
Owner

Tried this:

M = 500

task :x => "x1"

M.times do |i|
  task "x#{i}" => "x#{i+1}" do
    sleep 0.001
  end
end

t = task "x#{M}" do
  sleep 0.001
end

This created a deep chain of 500 dependencies and I got the threads in play count up to 418. (why not closer to 500? have no idea).

I'm not too worried about this scenario. I would be more concerned if the broad dependency trees (which could be 1000s wide on really large projects) grew the threads in play count. I doubt deep chains of dependencies will be deeper than we can handle in with threads (will probably run out of stack space before we run out of the ability to create threads).

@jimweirich
Owner

I can push 418 up to about 490 by increasing the sleep time. So evidently some of the prerequisites are completing before the parent task is able to wait on the future. Ok, that makes sense.

@michaeljbishop

I'm glad you are not worried because that looked bad to me :)

I'll still (later tonight) see if I can't reduce the total number of threads.

On Oct 25, 2012, at 2:40 PM, Jim Weirich notifications@github.com wrote:

Tried this:

M = 500

task :x => "x1"

M.times do |i|
task "x#{i}" => "x#{i+1}" do
sleep 0.001
end
end

t = task "x#{M}" do
sleep 0.001
end
This created a deep chain of 500 dependencies and I got the threads in play count up to 418. (why not closer to 500? have no idea).

I'm not too worried about this scenario. I would be more concerned if the broad dependency trees (which could be 1000s wide on really large projects) grew the threads in play count. I doubt deep chains of dependencies will be deeper than we can handle in with threads (will probably run out of stack space before we run out of the ability to create threads).


Reply to this email directly or view it on GitHub.

@michaeljbishop

By the way, I should mention that the statistics retrieved are not guaranteed to be in order (but they should be mostly in order).

You'll need to do that yourself before displaying them.

A simple

thread_pool.history.sort_by { |s| s[:time] }

should do it.

On Oct 25, 2012, at 12:23 PM, Jim Weirich notifications@github.com wrote:

So, I've pushed your statistics stuff into master and tied a but of command line options to it to display the event history. Then I created a Rakefile that looks like this:

#!/usr/bin/ruby -wKU

require 'rake/clean'

N=10

task :default => (1..N).map { |i| "t#{i}" }

SUBJOBS = N.times.map { |i| "s#{i+1}" }

(1..N).each do |i|
task "t#{i}" => SUBJOBS do
end
task "s#{i}" => "g#{i}" do
end
task "g#{i}" => "k#{i}" do
end
task "k#{i}" => "m#{i}" do
end
task "m#{i}" => :slow do
end
end

task :mid => :slow

task :slow do
sleep 2
end
And run a rake command like so:

time ruby -I../lib ../bin/rake -j3 -m --job-stats=history
I get a max of 5 tasks in play at any one time. I've tried all kinds of combinations and variations in the Rakefile but was unable to get more that 2 tasks above the -j limit.

That either means we are good, or I don't understand how to cause more tasks to wait on futures all at once. (I suspect the latter).

Feel free to play with it (its in master). I've tweeked the statistics stuff that you provided a bit, but the history stuff is cool. Great idea.


Reply to this email directly or view it on GitHub.

@michaeljbishop

Or perhaps best to put it inside ThreadPool itself (but outside the monitor)...

def history                 # :nodoc:
  @history_mon.synchronize { @history.dup }.sort_by { |s| s[:time] }
end

On Oct 25, 2012, at 12:23 PM, Jim Weirich notifications@github.com wrote:

So, I've pushed your statistics stuff into master and tied a but of command line options to it to display the event history. Then I created a Rakefile that looks like this:

#!/usr/bin/ruby -wKU

require 'rake/clean'

N=10

task :default => (1..N).map { |i| "t#{i}" }

SUBJOBS = N.times.map { |i| "s#{i+1}" }

(1..N).each do |i|
task "t#{i}" => SUBJOBS do
end
task "s#{i}" => "g#{i}" do
end
task "g#{i}" => "k#{i}" do
end
task "k#{i}" => "m#{i}" do
end
task "m#{i}" => :slow do
end
end

task :mid => :slow

task :slow do
sleep 2
end
And run a rake command like so:

time ruby -I../lib ../bin/rake -j3 -m --job-stats=history
I get a max of 5 tasks in play at any one time. I've tried all kinds of combinations and variations in the Rakefile but was unable to get more that 2 tasks above the -j limit.

That either means we are good, or I don't understand how to cause more tasks to wait on futures all at once. (I suspect the latter).

Feel free to play with it (its in master). I've tweeked the statistics stuff that you provided a bit, but the history stuff is cool. Great idea.


Reply to this email directly or view it on GitHub.

@jimweirich
Owner

done

@michaeljbishop

I've created a version of the ThreadPool class which doesn't need to create threads beyond the maximum to compensate for threads about to sleep on futures.

The solution in the pull request is not exactly either of the solutions we proposed. It's a little different in that the future consists of 3 parts:

core - This is executes the future's block with the future's parameters and stores the result
promise - This synchronizes on the futures' lock and executes the core.
worker - This tries the future's lock on and if successful, will execute the core, but if not successful, exits.

The 'promise' is what is returned from the #future method. If called, it will block until the future has results.

The 'worker' is what is put on the ThreadPool queue. It tries to execute the future, but if it can't get the lock, that means another thread is actively processing it so it bails.

So the ThreadPool is essentially split in two. The front-facing part returns blocking futures. The rear-facing part has a pool of threads, buzzing along trying to force all the futures in the queue but assuming if it can't get a lock on a future, it will still be processed.

At first glance, it seems to work pretty well, but I'm curious to see how it performs under more rigorous testing.

Here's the pull request.

#139

On Oct 25, 2012, at 9:20 AM, Jim Weirich notifications@github.com wrote:

On Oct 25, 2012, at 8:37 AM, Michael Bishop notifications@github.com wrote:

I was thinking about optimizations that might reduce the number of overall threads. One thought I had was to have threads actively process the queue while they are waiting on their futures.

Here's what this would look like: A thread calls #value on a future, but if another thread has the lock on the actual processing of the future, the current thread executes some blocks out of the queue while waiting for the other thread to finish processing the future.

One danger might be that calling blocks on the queue while waiting would take a longer time to complete than simply waiting for the future to complete. But it would save threads.

I see two possible solutions for this issue. One is the one you offer above. I like that because I see no reason for a thread to wait on a future if there is other work to be done. I'm not overly concerned about it taking slightly longer because it is processing a job when it could continue, that's the oddities of thread scheduling. I'd love to see what you come up with.

Another solution might be to look at comp_tree, the gem used by drake and mimic how that handles it (but without the prebuilt computation tree it wants to use.

-- Jim Weirich
-- jim.weirich@gmail.com

Reply to this email directly or view it on GitHub.

@ktns
ktns commented Oct 29, 2012

Hi.

First of all, thank you very much for implementing this feature.

But, with version 10.0.0.beta.2, I found that rake -m doesn't execute top level tasks simultaneously.

I think it would be nicer that they are also executed simultaniously, wouldn't it?

@jimweirich
Owner

ktns - I gave this some thought and I think I prefer the current behavior. From the command line I often type:

rake clobber build

and expect the clobber to be run before the build task.

@ktns
ktns commented Nov 3, 2012

I understand.
Thanks for your paying attenstion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment