Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster:RR - Added public API pausing/unpausing a worker #10369

Closed
wants to merge 2 commits into from

Conversation

Yemanu
Copy link

@Yemanu Yemanu commented Dec 21, 2016

cluster:RR - Added public API pausing/unpausing a worker

We have the following two use-cases:

  1. Run offline GC, we have a script which coordinates offline GC using pause and unpause API. Basically, the master process is able not to distribute new requests to a paused worker by calling cluster.pause(worker). The paused worker can force full GC after draining all pending requests in its flight. Once it is done the worker can continue to server new request after the master cluster unpauses it by calling cluster.unpause(worker).

  2. Respawn a worker after serving X requests. The master cluster should not distribute requests to the new worker until it prime-cache which may take few seconds. We need to pause it while prime caching the worker.

This enabled us to reduce long tail latency.

There has been discussion about this on how best to this feature (#7695)

Yemanu and others added 2 commits December 21, 2016 00:20
…er not to distribute new requests to a paused worker until unpaused by setting cluster.unpause(worker)
@nodejs-github-bot nodejs-github-bot added cluster Issues and PRs related to the cluster subsystem. lts-watch-v6.x labels Dec 21, 2016
@cjihrig
Copy link
Contributor

cjihrig commented Dec 21, 2016

I'm not really a fan of this approach. I'd prefer allowing the user to define the scheduling policy, which I believe was discussed in #7695.

@Yemanu
Copy link
Author

Yemanu commented Dec 21, 2016

Colin - Thanks for your quick response.

I think in the previous discussion we try to avoid exposing RR handler to the public API. I was thinking using explicit pause/unpause API make it more useful for various uses cases.

How can we user defined scheduling policy be used without exposing RR handler to the public API?

@Yemanu
Copy link
Author

Yemanu commented Dec 21, 2016

@cjihrig FYI: we (at Yahoo) also tried different approach https://github.com/bengl/toor using socket, but this doesn't work if the keepAlive option is on.

@Trott
Copy link
Member

Trott commented Dec 21, 2016

/cc @bengl

@bengl
Copy link
Member

bengl commented Dec 21, 2016

@Yemanu I opened an issue on bengl/toor . Can you please add further detail there about the keep-alive issue?

@Yemanu
Copy link
Author

Yemanu commented Dec 22, 2016

@bengl one of the issues with toor that is http-shutdown closes all ideal keepAlive connection, so it causes additional connection overhead.

@Yemanu
Copy link
Author

Yemanu commented Dec 22, 2016

Toor is also using the shimer module, and we need to fully understand the implication of using shimer like that other than debugging :)

@jasnell
Copy link
Member

jasnell commented Mar 24, 2017

Updates on this one?

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Mar 24, 2017
@bnoordhuis
Copy link
Member

This has been superseded by #11546. I'll close it out.

@bnoordhuis bnoordhuis closed this Mar 26, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cluster Issues and PRs related to the cluster subsystem. stalled Issues and PRs that are stalled.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants