make it possible to locally configure cluster #3367
Conversation
May I ask what the long running process do? |
The project is an http server that uses cluster for two different purposes depending how it's invoked. The first is for hot (re)loading code, where a new cluster worker is created for each request (as well as subsequent requests within a short period). I'm aware that alternative approaches exist for this (i.e. watching for filesystem changes), but this works pretty well and is the way I prefer to do it. The second use is more typical, using cluster to make use of multiple CPUs. The primary mechanism for running this server is as a standalone program (i.e. would not need a cluster module whose configuration has global side-effects), but it also provides a library interface. Part of the motivation for this is to be able to drive the functionality from unit tests (via a framework that runs each test within the same process), but I think it might be useful beyond this. Either way, I find avoiding global side-effects to make things testable is generally a good pattern. |
No for 0.8. Maybe for 0.9. Let's review this more closely in a few weeks. |
Cool. Let me know if there's stuff I can do to improve the proposed change when the time comes. Thanks Tom |
I hope I understood the problem correctly: You want to create Worker processes with different settings (a different executable or with different command line arguments) than the settings provided to setupMaster. If I understood it correctly, an alternative solution would be allowing fork() calls to override cluster setup: var cluster = require('cluster');
// Setting up cluster:
cluster.setupMaster({ exec: "some/file.js", args:"blah blah" }); // Global config.
// a worker with global settings:
worker = new cluster.Worker();
// Executable overridden, arguments kept:
custom_worker = new cluster.Worker({ exec:"some/other/file.js" });
// Both executable and arguments are overridden:
custom_worker_2 = new cluster.Worker({ exec:"some/another/file.js", args: "even more blah blah" }); |
Yeah, that would be fine if that resulted in a worker being forked. Currently I don't think you can create a forked worker like that, you need to use cluster.fork(), which only takes the env parameter. Tom |
I am developing a web server on Node and it has a master process and multiple child processes for each core. I personally really did not like using the same module for both kind of work and separating code blocks with is_master is_worker. So, I designed a master process which runs from a js file and child processes that run from another js. To achieve this in v.0.6, I had to patch it. Half of what I did was in fact already documented; but not implemented in code. For other half, I did what I explained in my previous comment. I was going to fork and request a pull; but after seeing 1900+ forks of the project and seeing the cluster module already changed a lot at 0.7, I did not do that. 0.8 provides half of what wanted. I originally designed the system in a way that child processes are created with custom command-line arguments. Since 0.8 now provides the main need, having different commands for master and child, I did not want to patch it again; but instead, I did what I wanted to do with command line arguments via an initial message sent to child processes after they are created. Then I liked this even more than my original approach since I can have much complex object instead of basic strings as command line arguments. I hope this can be a solution for your problem too. |
I'm still not convinced this is a good idea. @hasanyasin your API proposal raise most concern. I often get mails from there ask how they can have different workers, separated in different files, so they can pipe incoming requests depending on there IP or some signature. The problem is that is not possible with any type of cluster there use the OS to load balance, it distribute the TCP requests equally (almost) between each listener there attached to the main handler (fd). Having different workers is simply a bad design decision. However if that is what you need you can stil do it. The procedure is to create you own load balancer in node by creating workers with Furthermore, you are speaking about two half could you describe them more precisely, so we can understand the usecase. A personal request, if I may, please describe both what you want and how you did it, often I only get what someone did and it almost always shows that there could be some improvements or that it isn't the purpose of cluster. @tomyan I'm not totally against your API proposal, however I seams very complicated to support both the current and the new API proposal. Could it not be simplified down to that you always have to call Also note that I'm not sure why code reloading should be done with cluster, could you explain that? It seams to me that this could just as well, without any more complexity, be implemented with In any case it sounds to me that your different implementations, will never run at the same time. One or development and one for production. If that is the case I don't see the need for precreating two different cluster configurations, when the different configurations simply can be invoked at runtime. That in it self, dose of course doesn’t make the API bad, what really concerns me is the following: Extendibility: the main purpose with the cluster rewrite was to make it more extendable by exposing the necessary information directly from cluster, without making an opinionated plugin API. For instance an "auto restart" module could be implemented with: var cluster = require('cluster');
module.exports = function autorestart() {
cluster.on('exit', function () {
cluster.fork();
});
}; Of course this could just as well be implemented with var cluster = require('cluster');
module.exports = function autorestart(settings) {
var master = cluster.setupMaster(settings);
master.on('exit', function () {
master.fork();
});
return master;
}; Since can can not be extended more using other modules there use the same structure, since that would just create a new master. The first extend design can this because that takes only a cluster object. Master principal: I also often get mails about how to restart the Of course this is all about protection userland and node.js doesn’t do that. But have been watching |
Hey Andreas, thank very much for all the details and explanation. I am just misunderstood. :) What I was trying to tell was I am really happy with the state in 0.8. What I wanted to separate was Master and Worker processes, not workers between each other. This had been already documented in 0.6. Below is from cluster.js in 0.6.19: // options.workerFilename
// Specifies the script to execute for the child processes. Default is
// process.argv[1] However the function implementation did not have this so I just added the argument and made it possible to have different sources for Master and Worker processes. With 0.8, now we can design Master and Worker processes to have different sources. I do not mean having master.js, worker1.js, worker2.js, etc... What I mean is having just one master.js and one worker.js to have cleaner code instead of keeping both in one file and using if/else blocks to keep things separate. In addition to this, I had also made it possible to call Worker processes with different arguments. This is not to create completely different child processes. It was just to keep some variables. Now I send a message to each worker instead of calling them with different arguments. It makes me even happier than using arguments for this since I already use messages between Master and Worker processes for many things and this makes things much more consistent. In summary, I am very happy with the current state of the child process creation. UPDATE: I now realized that the cause of misunderstanding was my first comment, not the previous one. I had tried to give an alternative implementation to tomyan's request. In my own use, I do not have different worker processes, just one. |
@hasanyasin okay, I think I got it. You are speaking about the confusion comment at https://github.com/joyent/node/blob/v0.6/lib/cluster.js#L58 you should know that is some leftover from the original original cluster there worked by calling
Are you speaking about the API proposal or an actual implementation? |
I had thought that it was documented as designed; but not yet implemented. FYI, below is what I had done. I am not posting this as a suggestion or anything. I had just tried to solve my problem quickly this way and as you asked, I am writing. 60:
--- // options.workerFilename
+++ // options.path
71:
--- function startMaster() {
+++ function startMaster(options) {
77-78:
--- workerFilename = process.argv[1];
--- workerArgs = process.argv.slice(2);
+++ if (options) {
+++ workerFilename = options.path || process.argv[1];
+++ workerArgs = options.args || process.argv.slice(2);
+++ }
+++ else {
+++ workerFilename = process.argv[1];
+++ workerArgs = process.argv.slice(2);
+++ }
145:
--- cluster.fork = function() {
+++ cluster.fork = function(args) {
161:
--- var worker = fork(workerFilename, workerArgs, { env: envCopy });
+++ var worker = fork( workerFilename, (args || workerArgs), { env: envCopy }); |
Thanks for taking the time to go through it. Comments inline...
I like the idea of having
I was having trouble doing it with a single listening socket between reloads. I wasn't able to bind to a socket without accepting connections in the parent process, which I think I would need - see #2289. I could do it by accepting the connections in the parent process and then passing the handles to the child process I guess, but it seems like cluster does almost exactly what I need. I'm also not too keen on doing per request handle passing between processes as I've tried this before in another server (and language) and it didn't perform as well as accepting in each child - this isn't too much of an issue for the code reloading use-case as that would be only used in development, but I'd like to avoid the complexity if possible.
They run in the same process in my tests, or at least I would like them too. I'd also like it to work as a library without having to caveat its usage.
I think the auto-restart feature would be one I'd love to see in cluster, but I think that might be ignoring your point :-). On your direct point (about extendibility), I think that extensions are better where they don't have a global (side-)effect. I think this kind of pattern can lead to action at a distance, and hence non-obvious behaviour and difficult to track down bugs. I'm therefore, much more in favour of your second API (in most cases there will only be one master anyway).
That's only an issue if the modules you use make use of global data. The modules I tend to use (and write) avoid using global mutable data as this makes the code more reusable within different parts of an application without those parts treading on each other's toes (I guess this is the principal I'm trying to push here).
What did you think of the structural change in my pull request? I found it much easier to implement the changes by splitting the code that was shared between parent and child, from the code for the child, and the code for the parent. If you (and @isaacs) agree to the API proposal, it would be good to know that the approach to adding the feature is okay too. Anyway, thanks for the feedback and for the module, is really useful. Tom |
It means To make it totally clear:
Ahh, I think I understand. The problem is that the port isn't cleared fast enough, but using
Well abstraction testcases between programs is quite easy, I don't see that as a problem. Also the only case where running both What goes for you second comment I don't understand it.
Well if global is the only thing there can exist, there won't be any confusion. If both global and local can exist that will be an disaster, that is also why I'm in favour of not supporting the current API if the patch lands.
I must say, when I discovered that the first line of real change would introduce a major bug I did not read the rest. Also I think we need to decide how/if this should be backward compatible before using to much time on the code. But don't worry I will read it careful when that is decided :) |
On 6 July 2012 11:46, Andreas Madsen <
Not sure I'm 100% clear yet. Do you mean that I'd call Why not make this work as it does today:
..and then deprecate setupMaster, introducing a new method to get a master
That seems to make the common case simple, whilst making the the less
Exactly.
I don't have a separate set of development and production tests, I just What goes for you second comment I don't understand it.
Yeah, I think think I agree with you on that. At least, I think it would be
That's regrettable. Let me know what it is and I'll add a unit test that
Cool, I'll hold off on any more changes until you guys have had a chance to Thanks Tom
|
I'm cool with using exports = module.exports = new Master();
exports.setupMaster();
exports.createMaster = function (settings) {
var o = new Master();
o.setupMaster(settings);
return 0;
};
A module of mine ( thintalk ) is an RPC abstraction there provide the same API over two different communication protocols. Since the API is the same, the expected result should also be the same. As you see in the testcases I have sperate files for
I think that was my comment. |
@@ -33,85 +35,17 @@ var debug; | |||
if (process.env.NODE_DEBUG && /cluster/.test(process.env.NODE_DEBUG)) { | |||
debug = function(x) { | |||
var prefix = process.pid + ',' + | |||
(process.env.NODE_UNIQUE_ID ? 'Worker' : 'Master'); | |||
(process.env.NODE_CLUSTER_ID ? 'Worker' : 'Master'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will introduce a bug, since child_process.fork()
should be called with NODE_UNIQUE_ID
in its env and it is used in src/node.js
On 7 July 2012 14:58, Andreas Madsen <
I was suggesting that we remove (i.e. deprecate) setupMaster altogether, so
Thanks Tom
|
You will have to ask @isaacs, but if |
Can one of the admins verify this patch? |
What is the status of this change? How can we help? I would like to use this in a module which setups its workers but without polluting cluster's global status for any other uses. I am considering temporarily ripping cluster.js from node sources into my module so that the status is local but preferably I would fix node.js itself. |
Possibly relevant to the original motivation, as of #7671 cluster.setupMaster() can be called multiple times to reconfigure some aspects of the cluster's behaviour over time. Being able to create a cluster.Master() instance looks like a useful step toward a less automagical and more explicit cluster API. Sounds like A Good Thing ™️ Given the code freeze/slush, does this still belong in the v0.12 milestone? |
rmg: I'd be happy to see this cleaned up and applied. Could look at it at some point, but probably not for a week or two. Thanks Tom |
@tomyan ... I know it's been a while, but is this still something you want to pursue? If yes, the PR would need to be updated significantly. If not, we should likely go ahead and close. |
Closing due to lack of activity. Can revisit if someone is interested in updating the PR. Given that this is a feature request, however, it would need to be targeted at either http://github.com/nodejs/io.js or http://github.com/nodejs/node |
The goal of the change is to make it possible to configure and use the cluster module without modifying global config. The changes leaves the current interface entirely in tact, but additionally allows configuration of a cluster with the following:
Using the existing interface still works:
..except that the global master instance has no effect on the other master, and vice versa. This is kinda similar to how you can configure a local http.Agent without affecting the global one.
The reason I wanted to propose this change was because I have a project that uses cluster to do some long running work. The project exposes a library interface and I would like users to be able to use that library interface without having side-effects other than the one they have initiated by using the library (i.e. does not modify global cluster config).