-
Notifications
You must be signed in to change notification settings - Fork 28.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: cluster: make scheduler configurable #11546
Conversation
const { sendHelper } = require('internal/cluster/utils'); | ||
const getOwnPropertyNames = Object.getOwnPropertyNames; | ||
const { create, getOwnPropertyNames } = Object; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really sure I'll ever get used to that one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name aliasing for this style is even harder to remember/get used to.
} | ||
shutdown() { | ||
for (var handle; handle = this.handles.shift(); handle.close()) | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly neater/more readable as
var handle;
while (handle = this.handles.shift()) {
handle.close();
}
linter might complain less too?
This class-based approach seems way overkill. Why is plugging in a custom schedule function not sufficient? |
I agree, and I think that it should be sufficient (see "Remaining questions" in the PR description). It really just comes down to whether or not people who do this need to define their own data structures for managing the workers and handles, or if we can standardize on the current data structures. |
Ping @Yemanu and @redonkulus. I'd like your feedback before dropping the class based approach. |
@cjihrig, we still want to use all functionality provided by the RR scheduler, but we want to customize the schedule(callback) method. It looks like RR is no longer visible to the public API, right? |
@yemanett all of the functionality will still be available, but you'll be responsible for scheduling connections to the worker yourself. For RR, that's pretty simple: you grab the first worker out of an array, then put that worker in the back of the array. My question is whether or not you need special hooks for managing the data structures for the connection queue and workers. My guess is no. |
@cjihrig sorry I missed line https://github.com/cjihrig/node-1/blob/c47fe62a1d53959a12c741731608b7efaf5856e1/lib/internal/cluster/master.js#L300 which lead me to my previous incorrect comment. Either way should be fine, but the Class based approach feels more intuitive. In that case we need just addWorker(), removeWorker and schedule. |
@yemanett may I ask how you plan to use the If I recall correctly, your original use case was to take workers offline for some amount of time. In that scenario, I would recommend that the child process sends an IPC message to the cluster master saying that it needs to go offline. The master can then take that information into account when calling |
@cjihrig Basically, we are planning to use it with our application runner module which manages the node processes including re-spawning a new worker when a child process dies. The application runner requires cluster. The runner loads the scheduler script in the master process. The scheduler takes a process out of rotation based on its algorithm by calling the removeWorker(w) method which removes w from the ‘this.free’ list and notify the worker via IPC message. The worker can then perform some tasks and notify master via PCI to put it back in rotation, in which case master calls addWorker(w) to add w into this.free list. |
That's not how
Based on this, it still seems like Also not how |
@cjihrig yes schedule is what we need, but currently it takes worker and handle, how do we get the handle? |
What do you need the handle for? That said, |
Our use case simple, it is just to take a worker OOR(out of rotation) so that master won't distribute new incoming requests to the worker while the worker is OOR . We still want to continue to use RR scheduler. But we just need to add a functionality to take a worker offline. The sequence diagram look something like this |
@bnoordhuis do you have any preference between:
|
I'd say option 3, probably with a way to opt in so people with no need for the handle don't have to pay the overhead. |
b20bf08
to
63ec46c
Compare
@bnoordhuis I've gotten rid of the scheduler class. You would just have to pass in a scheduler function now. You can request that the socket be passed in by adding a flag to the scheduler function. Before I go about writing docs and tests, what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an alright approach to me.
socket = new net.Socket({ | ||
handle, | ||
readable: false, | ||
writable: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readable = writable = false. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that the socket should only be used to determine where to pass the handle, and not actually read or written.
e36a038
to
6a01cda
Compare
@cjihrig, the scheduler works nicely for me, The changes looks good to me. Thanks, |
@bnoordhuis updated.
I went with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM although the wording in the documentation about not modifying the array could still be stronger.
@bnoordhuis do you have any specific wording in mind? I'm happy to try to scare users. |
"The array should under no condition be mutated"? |
OK, I'll update to that, although we do mutate the array in the round robin scheduler. |
Yes, but we are allowed to break our own rules. |
Updated to the suggested wording. CI: https://ci.nodejs.org/job/node-test-pull-request/7265/. Of course Windows wouldn't work. |
The following test is failing: test/windows-fanned |
@cjihrig - do you know why test test/windows-fanned is failing? the 'Detail' link returns status 502 |
@yemanett the CI results are only kept around temporarily. It looks like those ones are no longer available. |
@cjihrig - when will this be merged ? |
@yemanett It needs a rebase on the documentation, and some work to make the CI pass on Windows. |
@cjihrig are you still actively working on this PR? If so, when do you think it can be completed? |
@redonkulus still working on it, but if you wanted to debug the test on Windows, I wouldn't try to stop you. |
@cjihrig do you think the PR issue will be resolved and merged soon? |
Labeling |
Someone else can take this over if they want. |
This is a WIP for making the cluster scheduler configurable by users. This still needs docs, tests, etc.
In the current implementation, the
cluster
module exports aScheduler
class. The class provides the following methods/hooks:constructor()
- Used to configure data structures for managing connections (handles) and cluster workers.addConnection()
- Called when a new connection is received. This is where the user would store the incoming connection handle to be distributed later.addWorker()
- Called when a new cluster worker is added. This is where the user would add the worker into the pool of workers.removeWorker()
- Called when a cluster worker is being removed from the pool.shutdown()
- Called when the scheduler is finished working. This would be used to clean up any lingering resources such as handles.schedule()
- This is the scheduling algorithm. The output of the algorithm should be a connection handle to distribute, and the worker to distribute it to.I tried to minimize the exposure of the cluster's inner workings to incoming connection handles. Worker objects are already part of the public API.
Remaining questions:
addConnection()
hook.addWorker()
andremoveWorker()
hooks.Scheduler
class, and just provide aschedule()
function that takes the available workers and handles as input, and schedules accordingly. Ideally, I'd like to go this simpler route, but want to check with some people who are actually doing this in practice to make sure it would be flexible enough. cc: @Yemanu and @redonkulus from RFC cluster: make scheduler pluggable #10880. Would this approach work for your needs?Closes #10880
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
cluster