Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Cluster master process reload without downtime #5050

Open
anton-kotenko opened this Issue · 3 comments

4 participants

@anton-kotenko

According to documentation, main function of cluster module is possibility to use more than one processor/core in more-or-less convenient way. It is implemented as multiple worker processes, that also gives additional reliability to all cluster, and provides workers code reload without downtime.
But in present cluster module interface there are no way to reload cluster's master process without downtime. To avoid downtime it's required to have at least two running clusters, and some kind of load balancer (for example nginx).
So cluster lacks this feature to solve reliability problem, as addition to scalability.
Technically, it can be implemented, by forking master process, and sending cluster's shared sockets table to child, using process.send(message, socket).
Prove of concept code, that uses undocumented node's features to get access to shared sockets table (cluster module serverHandlers private variable)

//runtime patch cluster module, to get access to shared sockets table 
process.binding('natives').cluster =
  process.binding('natives').cluster + '\ncluster.serverHandlers = serverHandlers';
var Cluster = require('cluster');
var Http = require('http');
var Util = require('util');
var ChildProcess = require('child_process');
var argv = {};
process.argv.forEach(function (arg) {
  arg = arg.split('=');
  argv[arg[0]] = arg[1] !== undefined ? arg[1] : true;
});
argv.generation = Number(argv.generation) || 0;
//worker body function, handle requests over http. 
//on /reload/* request notify master, to reload
var serve = function () {
  var counter = 0;
  Http.createServer(function(req, res) {
    if (req.url.match(/\/reload.*/)) {
      process.send('reload');
    }
    console.log(argv.generation);
    var cnt = new Buffer("hello world\n");
    res.writeHead(200, {'content-length': cnt.length, 'content-type': 'text/plain'});
    res.end(cnt);
  }).listen(8000);
};
//fork one worker, listen worker for reload notification
var fork = function () {
  Cluster.fork().on('message', function (msg) {
    if (msg === 'reload') {
      console.log('asked to reload');
      var secondMaster,
        key;
      //fork second master process
      secondMaster = ChildProcess.fork(__filename, ['generation=' + (argv.generation + 1) ]);
      // send first shared socket to second master (it's only test implementation)
      key = Object.keys(Cluster.serverHandlers)[0];
      secondMaster.send(key, Cluster.serverHandlers[key]);
      //stop current worker, normally -- ideally has to be graceful stop of
      //all workers, after second master will be ready
      secondMaster.unref();
      process.exit();
    }
  });
};
var start = function () {
  if (Cluster.isMaster) {
    console.log('start cluster master ' + argv.generation);
    //process is forked by node, suppose, that we are running
    // from other master, so wait for shared sockets
    if (process.send) {
      process.on('message', function (message, socket) {
        Cluster.serverHandlers[message] = socket;
        fork();
      });
    } else {
      fork();
    }
  } else {
    console.log('start worker ' + argv.generation);
    serve();
  }
};
start();

So there is a question:
Is it possible to change cluster module interface to implement all this without hacks? or this makes cluster module too much complicated?

@puzrin

I also would be glad to see this feature in node. Here is the link to detailed reload algorythm description in nginx http://nginx.org/en/docs/control.html#upgrade

@freeformsystems

Likewise I would be keen to see this implemented. I was researching implementing hot reload similar to nginx but currently it does not seem to be possible without hacking access to node internals.

A public API to allow this would indeed be very useful.

I was thinking that if the cluster module exposed access to the server in the master process then intercepting SIGUSR2 could respawn the program and pass a reference to the existing server, the replacement master and worker processes would then listen to that server handle to prevent EADDRINUSE errors.

Once the replacement master was up and running QUIT can be sent to close connections on the original process(es).

Would this approach work? Is it feasible to easily expose a reference to the server in the master process within the cluster module?

@jeffbski

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.