New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pool#terminate() waits until subprocesses are dead to fulfill #176
Conversation
- `Pool#_removeWorker()` is now async - ensure task resolver handles cancellations & timeouts in sequence - mock behavior of `subprocess.kill()` for worker threads (returns a boolean) - mock `subprocess.killed` for worker threads (a boolean) - many changes to tests to support async nature of terminate - some tests changed to use native Promise-handling functionality of Mocha (where it made sense)
Of note, I had to remove a few assertions which no longer made sense. In other words, when I don't particularly like how that works, but changing it would require a somewhat gnarly refactor, and I'm not sure it's even desired behavior. For that reason, this PR is probably a breaking change. |
Thanks a lot Christopher, this looks good on first sight! I'll review your PR asap, I hope coming weekend. |
I think what's worrisome is that the tests appear to mainly target worker threads instead of child processes, unless I missing something... would seem that some portion of the tests should be repeated across all supported worker implementations |
src/Pool.js
Outdated
this._removeWorkerFromList(worker); | ||
return new Promise(function(resolve, reject) { | ||
worker.terminate(false, function(err) { | ||
me._removeWorkerFromList(worker); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new solution, termination is async, so there is a temporary situation where the worker is still in the worker list but is being terminated. Should we change this to remove the worker from the list first, and after that call .terminate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking me? 😄 I was keeping the series of instructions the same; terminate then remove.
However, the only place in the codebase that calls _removeWorker()
removes it synchronously beforehand:
Lines 211 to 214 in 57e6795
me._removeWorkerFromList(worker); | |
// If minWorkers set, spin up new workers to replace the crashed ones | |
me._ensureMinWorkers(); | |
return me._removeWorker(worker); |
It appears the call to _removeWorkerFromList()
here is not needed unless you expect _removeWorker()
to be called for some other reason.
// If minWorkers set, spin up new workers to replace the crashed ones | ||
me._ensureMinWorkers(); | ||
return me._removeWorker(worker); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can change this to something like:
return me._removeWorker(worker).then(function () {
// If minWorkers set, spin up new workers to replace the crashed ones
me._ensureMinWorkers();
})
So we don't have to clear the worker from the list first? Having to do that feels a bit off, should ideally be only needed to call from within _removeWorker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I should probably read all the comments before replying. But yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... no. I think that would have an additional negative affect on performance--you have to wait for termination before spinning up a new one. I think that just leaving this as-is and removing the call to _removeWorkerFromList
in _removeWorker
makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just moving it to before the Promise
. Let's just do that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense 👍
} | ||
cleanup(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like cleanup()
is called twice: one time right away, and one time on callback of this.worker.once('exit'
. Should we split this in two functions? Maybe set this.terminating = true
when starting to terminate, and this.terminated = true
after termination finished?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not called twice (unless somehow the child process has already been killed, then emits an additional exit
event. I have no idea if this is possible):
workerpool/src/WorkerHandler.js
Lines 392 to 401 in 57e6795
if (typeof this.worker.kill === 'function') { | |
// child process | |
// cleanup once the child process has exited | |
this.worker.once('exit', function() { | |
cleanup(); | |
}); | |
if (!this.worker.killed && !this.worker.kill()) { | |
cleanup(new Error('Failed to send SIGTERM to worker')); | |
} | |
return; |
However, that should probably change to:
if (typeof this.worker.kill === 'function') {
// child process
if (!this.worker.killed && !this.worker.kill()) {
cleanup(new Error('Failed to send SIGTERM to worker'));
} else {
// cleanup once the child process has exited
this.worker.once('exit', function() {
cleanup();
});
}
return;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I'm not entirely confident that just listening for the exit
event is enough. Maybe a strategy like signal-exit would be more robust--if that's possible with a child process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think you're right
Thanks a lot Christoper for all your work! It looks like a solid solution. I've placed a couple of feedbacks inline in the code. The custom Promise implementation is not ideal, sorry about that. It's needed in order to be able to cancel the promise after a timeout or custom action. What would be nice is if we can make the behavior more standard compliant to have less WTF's when working with it. Or maybe we can come up with a totally different solution which doesn't require extending Promise with cancellation. I'm open to ideas in this regard! About the unit tests for child_process vs worker_threads: that's a good point. Both should be tested thoroughly of course. It's currently a bit implicit. Maybe we can rewrite the relevant unit tests such that they are executed twice: once for 'process', and once for 'thread'. |
@boneskull I see this PR is still open. Do you expect to have time to finish it? |
Yes |
…ens no more than once
@josdejong I've sent an update, please take a look. |
Thanks for the updates Christopher, looks really solid now, will merge your improvements. |
Your fixes are published now in |
Pool#_removeWorker()
is now asyncsubprocess.kill()
for worker threads (returns a boolean)subprocess.killed
for worker threads (a boolean)I spent quite a bit of time with this one, as I was struggling a bit with the builtin
Promise
implementation. Hopefully there are no unintended consequences here 😄Ref: #32
Ref: #175