New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disadvantages of using cluster api without any listening sockets #970

Closed
pimlie opened this Issue Nov 14, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@pimlie

pimlie commented Nov 14, 2017

I was wondering if there are any major disadvantages of using the Cluster api to parallelise cpu-bound tasks that don't use/require any listening sockets?

It seems that at the moment the Cluster api is the only built-in api that can be used for general master/worker setups. Eg it provides all functionalities for creating a master process that delegates jobs to workers, regardless whether those workers are using listening sockets or not. But the documentation only speaks about using the cluster package for clustering connections.

To give a real-life example, currently I am using Cluster in the nuxt-generate-cluster package. This package provides a cli command to nuxt to generate all dynamic pages using multiple processes. That works great, but it still feels like using the Cluster api is wrong because the cli command just runs once and doesnt listen for incoming inconnections.
That said, I looked at implementing a master/worker solution myself using child_process directly but I dont see any benefits for that except for providing me extra work due to possible bugs I introduce.

So what is the general opinion about this? Should you only use the Cluster api when using listening sockets? Or is it ok to use it without any?

Follow-up questions:

  • If it is 👍
    Should the docs reflect that it is ok as well?
  • If it is 👎
    Would it be an idea to split the master/worker implementation from Cluster into a separate api which Cluster itself will extend as well?
@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Nov 14, 2017

Seem fine to me, no reason you couldn't or shouldn't use it that way.

As to whether it should be called out in the documentation, I'm leaning towards no. Documentation should be as simple and straightforward as possible. Docs that wander all over the place are terrible for getting things done.

Cluster's primary use case is networking because that's one of the biggest if not the biggest use case for node, so IMO it's proper that networking is what the documentation talks about.

Reasonable / unreasonable?

@pimlie

This comment has been minimized.

pimlie commented Nov 14, 2017

Thanks for your quick response. I looked at the cluster implementation but unfortunately wasn't really sure how child.js and a worker relate to each other. I guess child.js is used somewhere deeper within node. But is it correct there is no (or no major) overhead due to the networking part as that is only introduced once you really start listening for a socket? Afaik that overhead is only introduced once a worker has called listen() on the http/net api which will trigger the queryServer method?

I agree that documentation should be as simple as possible, but on the other hand you could say the docs are a bit ambiguous which is also not good. E.g. all the examples in the docs only speak indeed about Cluster's primary use for networking and even the name Cluster screams much more 'just for networking' then not. I would think that if you could use Cluster networking-less without any overhead/disadvantages a configuration option like schedulingPolicy would at least mention it only applies when you actually use networking?

Maybe just a small note would be enough under How it works? Something like, Note: although the primary use for the cluster module is networking, this module can also be used without additional/major(?) overhead for a generic networking-less master/worker implementation?

Just because whats trivial for one (probably you?) may be inconclusive for another (I guess me) 😉

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Nov 14, 2017

But is it correct there is no (or no major) overhead due to the networking part as that is only introduced once you really start listening for a socket?

Correct. It's at its core just a wrapper around child_process with some builtin smarts for sending sockets across processes.

Maybe just a small note would be enough under How it works?

I can't promise it gets accepted but you're welcome to try a pull request.

@pimlie

This comment has been minimized.

pimlie commented Nov 14, 2017

Great, thanks for the support!

@pimlie pimlie closed this Nov 14, 2017

pimlie added a commit to pimlie/node that referenced this issue Nov 14, 2017

docs: add note about using cluster networking-less
Although the primary use-case for the cluster module is networking, the
module provides a generic master/worker interface that could also be used
if you dont use networking at all. Currently the docs are a bit ambiguous
about this as only the primary use-case is ever mentioned, this remark
should clarify that the cluster module can also be used without
disadvantages if you dont use networking.

Refs: nodejs/help#970

pimlie added a commit to pimlie/node that referenced this issue Nov 14, 2017

docs: add note about using cluster without networking
Although the primary use-case for the cluster module is networking, the
module provides a generic master/worker interface that could also be
used if you dont use networking at all. Currently the docs are a bit
ambiguous about this as only the primary use-case is ever mentioned,
this remark should clarify that the cluster module can also be used
without disadvantages if you dont use networking.

Refs: nodejs/help#970

addaleax added a commit to nodejs/node that referenced this issue Nov 18, 2017

doc: add note about using cluster without networking
Although the primary use-case for the cluster module is networking, the
module provides a generic master/worker interface that could also be
used if you dont use networking at all. Currently the docs are a bit
ambiguous about this as only the primary use-case is ever mentioned,
this remark should clarify that the cluster module can also be used
without disadvantages if you dont use networking.

PR-URL: #17031
Refs: nodejs/help#970
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

MylesBorins added a commit to nodejs/node that referenced this issue Dec 12, 2017

doc: add note about using cluster without networking
Although the primary use-case for the cluster module is networking, the
module provides a generic master/worker interface that could also be
used if you dont use networking at all. Currently the docs are a bit
ambiguous about this as only the primary use-case is ever mentioned,
this remark should clarify that the cluster module can also be used
without disadvantages if you dont use networking.

PR-URL: #17031
Refs: nodejs/help#970
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

gibfahn added a commit to nodejs/node that referenced this issue Dec 19, 2017

doc: add note about using cluster without networking
Although the primary use-case for the cluster module is networking, the
module provides a generic master/worker interface that could also be
used if you dont use networking at all. Currently the docs are a bit
ambiguous about this as only the primary use-case is ever mentioned,
this remark should clarify that the cluster module can also be used
without disadvantages if you dont use networking.

PR-URL: #17031
Refs: nodejs/help#970
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

MylesBorins added a commit to nodejs/node that referenced this issue Dec 19, 2017

doc: add note about using cluster without networking
Although the primary use-case for the cluster module is networking, the
module provides a generic master/worker interface that could also be
used if you dont use networking at all. Currently the docs are a bit
ambiguous about this as only the primary use-case is ever mentioned,
this remark should clarify that the cluster module can also be used
without disadvantages if you dont use networking.

PR-URL: #17031
Refs: nodejs/help#970
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

gibfahn added a commit to nodejs/node that referenced this issue Dec 20, 2017

doc: add note about using cluster without networking
Although the primary use-case for the cluster module is networking, the
module provides a generic master/worker interface that could also be
used if you dont use networking at all. Currently the docs are a bit
ambiguous about this as only the primary use-case is ever mentioned,
this remark should clarify that the cluster module can also be used
without disadvantages if you dont use networking.

PR-URL: #17031
Refs: nodejs/help#970
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>

msoechting added a commit to hpicgs/node that referenced this issue Feb 7, 2018

doc: add note about using cluster without networking
Although the primary use-case for the cluster module is networking, the
module provides a generic master/worker interface that could also be
used if you dont use networking at all. Currently the docs are a bit
ambiguous about this as only the primary use-case is ever mentioned,
this remark should clarify that the cluster module can also be used
without disadvantages if you dont use networking.

PR-URL: nodejs#17031
Refs: nodejs/help#970
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment