Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pool#terminate() waits until subprocesses are dead to fulfill #176

Merged
merged 2 commits into from Oct 3, 2020

Conversation

boneskull
Copy link
Contributor

  • Pool#_removeWorker() is now async
  • ensure task resolver handles cancellations & timeouts in sequence
  • mock behavior of subprocess.kill() for worker threads (returns a boolean)
  • mock subprocess.killed for worker threads (a boolean)
  • many changes to tests to support async nature of terminate
  • some tests changed to use native Promise-handling functionality of Mocha (where it made sense)

I spent quite a bit of time with this one, as I was struggling a bit with the builtin Promise implementation. Hopefully there are no unintended consequences here 😄

Ref: #32
Ref: #175

- `Pool#_removeWorker()` is now async
- ensure task resolver handles cancellations & timeouts in sequence
- mock behavior of `subprocess.kill()` for worker threads (returns a boolean)
- mock `subprocess.killed` for worker threads (a boolean)
- many changes to tests to support async nature of terminate
- some tests changed to use native Promise-handling functionality of Mocha (where it made sense)
@boneskull
Copy link
Contributor Author

Of note, I had to remove a few assertions which no longer made sense. Pool#terminate() will kill the subprocesses then fulfill, but if a promise is canceled (or it times out, or the task fails, etc.), the subprocesses are not killed before rejection of the Promise returned by exec.

In other words, when catch-ing a canceled Promise, you are not guaranteed n workers within the Pool instance.

I don't particularly like how that works, but changing it would require a somewhat gnarly refactor, and I'm not sure it's even desired behavior.

For that reason, this PR is probably a breaking change.

@josdejong
Copy link
Owner

Thanks a lot Christopher, this looks good on first sight! I'll review your PR asap, I hope coming weekend.

@boneskull
Copy link
Contributor Author

I think what's worrisome is that the tests appear to mainly target worker threads instead of child processes, unless I missing something... would seem that some portion of the tests should be repeated across all supported worker implementations

src/Pool.js Outdated
this._removeWorkerFromList(worker);
return new Promise(function(resolve, reject) {
worker.terminate(false, function(err) {
me._removeWorkerFromList(worker);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new solution, termination is async, so there is a temporary situation where the worker is still in the worker list but is being terminated. Should we change this to remove the worker from the list first, and after that call .terminate?

Copy link
Contributor Author

@boneskull boneskull Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking me? 😄 I was keeping the series of instructions the same; terminate then remove.

However, the only place in the codebase that calls _removeWorker() removes it synchronously beforehand:

workerpool/src/Pool.js

Lines 211 to 214 in 57e6795

me._removeWorkerFromList(worker);
// If minWorkers set, spin up new workers to replace the crashed ones
me._ensureMinWorkers();
return me._removeWorker(worker);

It appears the call to _removeWorkerFromList() here is not needed unless you expect _removeWorker() to be called for some other reason.

// If minWorkers set, spin up new workers to replace the crashed ones
me._ensureMinWorkers();
return me._removeWorker(worker);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can change this to something like:

return me._removeWorker(worker).then(function () {
  // If minWorkers set, spin up new workers to replace the crashed ones
  me._ensureMinWorkers();
})

So we don't have to clear the worker from the list first? Having to do that feels a bit off, should ideally be only needed to call from within _removeWorker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I should probably read all the comments before replying. But yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... no. I think that would have an additional negative affect on performance--you have to wait for termination before spinning up a new one. I think that just leaving this as-is and removing the call to _removeWorkerFromList in _removeWorker makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just moving it to before the Promise. Let's just do that...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense 👍

}
cleanup();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like cleanup() is called twice: one time right away, and one time on callback of this.worker.once('exit'. Should we split this in two functions? Maybe set this.terminating = true when starting to terminate, and this.terminated = true after termination finished?

Copy link
Contributor Author

@boneskull boneskull Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not called twice (unless somehow the child process has already been killed, then emits an additional exit event. I have no idea if this is possible):

if (typeof this.worker.kill === 'function') {
// child process
// cleanup once the child process has exited
this.worker.once('exit', function() {
cleanup();
});
if (!this.worker.killed && !this.worker.kill()) {
cleanup(new Error('Failed to send SIGTERM to worker'));
}
return;

However, that should probably change to:

      if (typeof this.worker.kill === 'function') {
        // child process
        if (!this.worker.killed && !this.worker.kill()) {
          cleanup(new Error('Failed to send SIGTERM to worker'));
        } else {          
          // cleanup once the child process has exited
          this.worker.once('exit', function() {
            cleanup();
          });
        }
        return;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'm not entirely confident that just listening for the exit event is enough. Maybe a strategy like signal-exit would be more robust--if that's possible with a child process.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think you're right

@josdejong
Copy link
Owner

Thanks a lot Christoper for all your work! It looks like a solid solution.

I've placed a couple of feedbacks inline in the code.

The custom Promise implementation is not ideal, sorry about that. It's needed in order to be able to cancel the promise after a timeout or custom action. What would be nice is if we can make the behavior more standard compliant to have less WTF's when working with it. Or maybe we can come up with a totally different solution which doesn't require extending Promise with cancellation. I'm open to ideas in this regard!

About the unit tests for child_process vs worker_threads: that's a good point. Both should be tested thoroughly of course. It's currently a bit implicit. Maybe we can rewrite the relevant unit tests such that they are executed twice: once for 'process', and once for 'thread'.

@josdejong
Copy link
Owner

@boneskull I see this PR is still open. Do you expect to have time to finish it?

@boneskull
Copy link
Contributor Author

Yes

@boneskull
Copy link
Contributor Author

@josdejong I've sent an update, please take a look.

@josdejong
Copy link
Owner

Thanks for the updates Christopher, looks really solid now, will merge your improvements.

@josdejong josdejong merged commit e5f2fb4 into josdejong:master Oct 3, 2020
@josdejong
Copy link
Owner

Your fixes are published now in v6.0.2 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants