Pool#terminate() waits until subprocesses are dead to fulfill #176

boneskull · 2020-08-11T01:00:08Z

Pool#_removeWorker() is now async
ensure task resolver handles cancellations & timeouts in sequence
mock behavior of subprocess.kill() for worker threads (returns a boolean)
mock subprocess.killed for worker threads (a boolean)
many changes to tests to support async nature of terminate
some tests changed to use native Promise-handling functionality of Mocha (where it made sense)

I spent quite a bit of time with this one, as I was struggling a bit with the builtin Promise implementation. Hopefully there are no unintended consequences here 😄

Ref: #32
Ref: #175

- `Pool#_removeWorker()` is now async - ensure task resolver handles cancellations & timeouts in sequence - mock behavior of `subprocess.kill()` for worker threads (returns a boolean) - mock `subprocess.killed` for worker threads (a boolean) - many changes to tests to support async nature of terminate - some tests changed to use native Promise-handling functionality of Mocha (where it made sense)

boneskull · 2020-08-11T01:06:18Z

Of note, I had to remove a few assertions which no longer made sense. Pool#terminate() will kill the subprocesses then fulfill, but if a promise is canceled (or it times out, or the task fails, etc.), the subprocesses are not killed before rejection of the Promise returned by exec.

In other words, when catch-ing a canceled Promise, you are not guaranteed n workers within the Pool instance.

I don't particularly like how that works, but changing it would require a somewhat gnarly refactor, and I'm not sure it's even desired behavior.

For that reason, this PR is probably a breaking change.

josdejong · 2020-08-12T19:47:23Z

Thanks a lot Christopher, this looks good on first sight! I'll review your PR asap, I hope coming weekend.

boneskull · 2020-08-12T20:30:55Z

I think what's worrisome is that the tests appear to mainly target worker threads instead of child processes, unless I missing something... would seem that some portion of the tests should be repeated across all supported worker implementations

josdejong · 2020-08-19T12:06:21Z

src/Pool.js

-  this._removeWorkerFromList(worker);
+  return new Promise(function(resolve, reject) {
+    worker.terminate(false, function(err) {
+      me._removeWorkerFromList(worker);


With the new solution, termination is async, so there is a temporary situation where the worker is still in the worker list but is being terminated. Should we change this to remove the worker from the list first, and after that call .terminate?

Are you asking me? 😄 I was keeping the series of instructions the same; terminate then remove.

However, the only place in the codebase that calls _removeWorker() removes it synchronously beforehand:

workerpool/src/Pool.js

Lines 211 to 214 in 57e6795

me._removeWorkerFromList(worker);

// If minWorkers set, spin up new workers to replace the crashed ones

me._ensureMinWorkers();

return me._removeWorker(worker);

It appears the call to _removeWorkerFromList() here is not needed unless you expect _removeWorker() to be called for some other reason.

josdejong · 2020-08-19T12:08:21Z

src/Pool.js

              // If minWorkers set, spin up new workers to replace the crashed ones
              me._ensureMinWorkers();
+              return me._removeWorker(worker);


Maybe we can change this to something like:

return me._removeWorker(worker).then(function () { // If minWorkers set, spin up new workers to replace the crashed ones me._ensureMinWorkers(); })

So we don't have to clear the worker from the list first? Having to do that feels a bit off, should ideally be only needed to call from within _removeWorker.

OK, I should probably read all the comments before replying. But yes.

Actually... no. I think that would have an additional negative affect on performance--you have to wait for termination before spinning up a new one. I think that just leaving this as-is and removing the call to _removeWorkerFromList in _removeWorker makes more sense.

Or just moving it to before the Promise. Let's just do that...

This makes sense 👍

josdejong · 2020-08-19T12:13:51Z

src/WorkerHandler.js

    }
+    cleanup();


It looks like cleanup() is called twice: one time right away, and one time on callback of this.worker.once('exit'. Should we split this in two functions? Maybe set this.terminating = true when starting to terminate, and this.terminated = true after termination finished?

It's not called twice (unless somehow the child process has already been killed, then emits an additional exit event. I have no idea if this is possible):

workerpool/src/WorkerHandler.js

Lines 392 to 401 in 57e6795

if (typeof this.worker.kill === 'function') {

// child process

// cleanup once the child process has exited

this.worker.once('exit', function() {

cleanup();

});

if (!this.worker.killed && !this.worker.kill()) {

cleanup(new Error('Failed to send SIGTERM to worker'));

}

return;

However, that should probably change to:

if (typeof this.worker.kill === 'function') { // child process if (!this.worker.killed && !this.worker.kill()) { cleanup(new Error('Failed to send SIGTERM to worker')); } else { // cleanup once the child process has exited this.worker.once('exit', function() { cleanup(); }); } return;

FWIW I'm not entirely confident that just listening for the exit event is enough. Maybe a strategy like signal-exit would be more robust--if that's possible with a child process.

👍 I think you're right

josdejong · 2020-08-19T12:20:14Z

Thanks a lot Christoper for all your work! It looks like a solid solution.

I've placed a couple of feedbacks inline in the code.

The custom Promise implementation is not ideal, sorry about that. It's needed in order to be able to cancel the promise after a timeout or custom action. What would be nice is if we can make the behavior more standard compliant to have less WTF's when working with it. Or maybe we can come up with a totally different solution which doesn't require extending Promise with cancellation. I'm open to ideas in this regard!

About the unit tests for child_process vs worker_threads: that's a good point. Both should be tested thoroughly of course. It's currently a bit implicit. Maybe we can rewrite the relevant unit tests such that they are executed twice: once for 'process', and once for 'thread'.

josdejong · 2020-09-23T14:05:09Z

@boneskull I see this PR is still open. Do you expect to have time to finish it?

boneskull · 2020-09-25T22:16:00Z

Yes

…ens no more than once

boneskull · 2020-09-25T22:50:34Z

@josdejong I've sent an update, please take a look.

josdejong · 2020-10-03T09:33:29Z

Thanks for the updates Christopher, looks really solid now, will merge your improvements.

josdejong · 2020-10-03T10:52:08Z

Your fixes are published now in v6.0.2 🎉

josdejong reviewed Aug 19, 2020

View reviewed changes

improve worker removal logic & ensure worker termination cleanup happ…

4d4da78

…ens no more than once

boneskull mentioned this pull request Oct 2, 2020

Broken IntelliJ integration when tests are running in parallel mochajs/mocha#4403

Closed

4 tasks

josdejong merged commit e5f2fb4 into josdejong:master Oct 3, 2020

This was referenced Oct 3, 2020

Better worker termination #32

Closed

workerpool does not wait for child processes to terminate #175

Closed

boneskull mentioned this pull request Oct 5, 2020

make guarantees about orphaned processes mochajs/mocha#4465

Merged

boneskull mentioned this pull request Oct 16, 2020

nyc reports no coverage in parallel mode in powershell/cmd mochajs/mocha#4372

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pool#terminate() waits until subprocesses are dead to fulfill #176

Pool#terminate() waits until subprocesses are dead to fulfill #176

boneskull commented Aug 11, 2020

boneskull commented Aug 11, 2020

josdejong commented Aug 12, 2020

boneskull commented Aug 12, 2020

josdejong Aug 19, 2020

boneskull Sep 25, 2020 •

edited

josdejong Aug 19, 2020

boneskull Sep 25, 2020

boneskull Sep 25, 2020

boneskull Sep 25, 2020

josdejong Oct 3, 2020

josdejong Aug 19, 2020

boneskull Sep 25, 2020 •

edited

boneskull Sep 25, 2020

josdejong Oct 3, 2020

josdejong commented Aug 19, 2020

josdejong commented Sep 23, 2020

boneskull commented Sep 25, 2020

boneskull commented Sep 25, 2020

josdejong commented Oct 3, 2020

josdejong commented Oct 3, 2020

	me._removeWorkerFromList(worker);
	// If minWorkers set, spin up new workers to replace the crashed ones
	me._ensureMinWorkers();
	return me._removeWorker(worker);

	if (typeof this.worker.kill === 'function') {
	// child process
	// cleanup once the child process has exited
	this.worker.once('exit', function() {
	cleanup();
	});
	if (!this.worker.killed && !this.worker.kill()) {
	cleanup(new Error('Failed to send SIGTERM to worker'));
	}
	return;

Pool#terminate() waits until subprocesses are dead to fulfill #176

Pool#terminate() waits until subprocesses are dead to fulfill #176

Conversation

boneskull commented Aug 11, 2020

boneskull commented Aug 11, 2020

josdejong commented Aug 12, 2020

boneskull commented Aug 12, 2020

Choose a reason for hiding this comment

boneskull Sep 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

boneskull Sep 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josdejong commented Aug 19, 2020

josdejong commented Sep 23, 2020

boneskull commented Sep 25, 2020

boneskull commented Sep 25, 2020

josdejong commented Oct 3, 2020

josdejong commented Oct 3, 2020

boneskull Sep 25, 2020 •

edited

boneskull Sep 25, 2020 •

edited