Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

cluster: implement API for pluggable distribute() for round-robin scheduler #6001

Closed
kaero opened this issue Aug 6, 2013 · 6 comments
Closed

Comments

@kaero
Copy link

kaero commented Aug 6, 2013

Looks usefull to provide API in the cluster module to define user own distribute method.

Something like following:

cluster.setupMaster({
  schedulingPolicy: cluster.SCHED_RR,
  distributeRequest: function() {
    // ...
    return worker;
  }
});

Previous discussion with @bnoordhuis in the issue #4435 (after closing)

@kaero
Copy link
Author

kaero commented Aug 6, 2013

\cc @indutny

@ghost ghost assigned indutny Aug 6, 2013
@indutny
Copy link
Member

indutny commented Aug 6, 2013

Assigning to me.

@bnoordhuis
Copy link
Member

For posterity, I haven't decided yet if making distribute() configurable is actually a good thing.

There's a number of corner cases that an implementer will need to deal with, like workers coming online/going offline.

There's also the fact that distribute() is currently an implementation detail. Making it configurable effectively means making it public and freezing it for all eternity. I don't know if I'm ready to commit to that.

indutny added a commit to indutny/node that referenced this issue Aug 6, 2013
`distribute` is a user-specified callback for asynchronous balancing of
incoming connections to cluster workers.

fix nodejs#6001
@andrewdep
Copy link

I understand your concerns about making distribute() configurable.
I'll explain our clustering need. We would definitely be willing to deal with the complexity of implementation if distribute() was configurable... We would also be fine with distribute() remaining unconfigurable, as long as there is some way to either choose an option similar to what I describe below, or at least contribute this as another policy?
We have a wide variety of server side functions available in our API that can all be called within the context of a user session. Some of these functions are CPU intensive and can end up blocking a worker for a bit.
In addition, most of these functions make heavy use of session state. That session state is stored in memcache, but could also be cached locally in the worker as a sort of super-fast "L1" cache with this sort of distribute policy.
We would love to keep particular client sessions sort of sticky to a particular node worker in order to minimize memcache access. That is, we would route all requests to the same worker - but switch to another worker if the "sort of sticky" worker is busy or under load at the moment of the API request. In that case the new worker would have to hit memcache if it hasn't cached that user's session or it is out of sync (which it most likely will be). Performance can be further boosted if the secondary worker for a particular user is consistent (whenever the first worker is busy or under load, go to the same secondary worker - whenever that is busy or under load, go to the same third worker).
In testing we've found this arrangement can give us enormous boosts in scalability. We've experimented with routing between master and workers manually, but the overhead to do this without direct native access to the socket and event loop is quite a bit larger than node's built in clustering.
If distribute() gave us even slightly more access to the underlying mechanisms for routing requests to workers then I believe much of that overhead could be eliminated.
One unique (and complicating) aspect of this policy is the introduction of some sort of state or argument to the distribution algorithm. It would have to know the user/client of a particular request and also know how to map that consistently to the first, second, third, etc worker. We used a hashing algorithm in our tests, but just getting access to the identifier of the request is an interesting complication.
Anyway, food for thought.

@defunctzombie
Copy link

@indutny @bnoordhuis One thing this would help with is users who want to use servers which require "sticky session" like functionality. We have this issue with socket.io users that want to use cluster module. Having access to a distribute method would allow us to support that use case.

While it may be a bit "dangerous" to expose this with certain edge cases, not exposing this or a way to control request endpoints makes cluster unusable for certain classes of modules which confuses users who expect things shipped with core to work pretty transparently (even tho the docs for cluster do indeed call out sticky session limitations).

@Fishrock123
Copy link
Member

@defunctzombie +1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants