-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster:RR - Added public API pausing/unpausing a worker #10369
Conversation
…er not to distribute new requests to a paused worker until unpaused by setting cluster.unpause(worker)
I'm not really a fan of this approach. I'd prefer allowing the user to define the scheduling policy, which I believe was discussed in #7695. |
Colin - Thanks for your quick response. I think in the previous discussion we try to avoid exposing RR handler to the public API. I was thinking using explicit pause/unpause API make it more useful for various uses cases. How can we user defined scheduling policy be used without exposing RR handler to the public API? |
@cjihrig FYI: we (at Yahoo) also tried different approach https://github.com/bengl/toor using socket, but this doesn't work if the keepAlive option is on. |
/cc @bengl |
@Yemanu I opened an issue on |
@bengl one of the issues with toor that is http-shutdown closes all ideal keepAlive connection, so it causes additional connection overhead. |
Toor is also using the shimer module, and we need to fully understand the implication of using shimer like that other than debugging :) |
Updates on this one? |
This has been superseded by #11546. I'll close it out. |
cluster:RR - Added public API pausing/unpausing a worker
We have the following two use-cases:
Run offline GC, we have a script which coordinates offline GC using pause and unpause API. Basically, the master process is able not to distribute new requests to a paused worker by calling cluster.pause(worker). The paused worker can force full GC after draining all pending requests in its flight. Once it is done the worker can continue to server new request after the master cluster unpauses it by calling cluster.unpause(worker).
Respawn a worker after serving X requests. The master cluster should not distribute requests to the new worker until it prime-cache which may take few seconds. We need to pause it while prime caching the worker.
This enabled us to reduce long tail latency.
There has been discussion about this on how best to this feature (#7695)