Skip to content
Writeup on how to make node.js cluster and socket.io play nice
Branch: master
Clone or download
Latest commit a80a3ac Jan 7, 2018

README.md

node-cluster-socket.io

Socket.IO doesn't work out of the box with a node.js cluster. This is a writeup based on sticky-session and the Socket.IO multiple node documentation that explains how to make them play nice if you don't feel like wrapping your server code with it.

This is a brief explanation of how this is done with heavily commented code that will make it easy for you to integrate with your project.

I use Node.js + Express.js + Socket.IO + cluster intentionally to show how it works with all those pieces.

Assumptions:

What we need to do

  • Proxy connections from the master to the workers, making sure that connections originating from the same IP address end up in the same worker
  • Persistent storage (redis instead of memory)

How does it work

Say your server runs on port 3000:

var express = require('express'),
    cluster = require('cluster'),
    sio = require('socket.io');

var port = 3000,
    num_processes = require('os').cpus().length;

if (cluster.isMaster) {
	for (var i = 0; i < num_processes; i++) {
		cluster.fork();
	}
} else {
	var app = new express();

	// Here you might use middleware, attach routes, etc.

	var server = app.listen(port),
	    io = sio(server);

	// Here you might use Socket.IO middleware for authorization etc.
}

Instead of starting the node.js server on that port and listening in each worker, we need to introduce a tiny proxy layer to make sure that connections from the same host end up in the same worker.

The way to do this is to create a single server listening on port 3000 and consistently map source IP addresses to our workers. We then pass the connection to the worker, which emits a connection event on its server. To prevent data loss where data is sent on the connection before it has been passed to the worker, the server sets the pauseOnConnect option. That way, connections are paused immediately and workers can .resume() them to receive data when they're ready. Processing then proceeds as normal:

var express = require('express'),
    cluster = require('cluster'),
    net = require('net'),
    sio = require('socket.io'),
    sio_redis = require('socket.io-redis'),
    farmhash = require('farmhash');

var port = 3000,
    num_processes = require('os').cpus().length;

if (cluster.isMaster) {
	// This stores our workers. We need to keep them to be able to reference
	// them based on source IP address. It's also useful for auto-restart,
	// for example.
	var workers = [];

	// Helper function for spawning worker at index 'i'.
	var spawn = function(i) {
		workers[i] = cluster.fork();

		// Optional: Restart worker on exit
		workers[i].on('exit', function(code, signal) {
			console.log('respawning worker', i);
			spawn(i);
		});
    };

    // Spawn workers.
	for (var i = 0; i < num_processes; i++) {
		spawn(i);
	}

	// Helper function for getting a worker index based on IP address.
	// This is a hot path so it should be really fast. The way it works
	// is by converting the IP address to a number by removing non numeric
  // characters, then compressing it to the number of slots we have.
	//
	// Compared against "real" hashing (from the sticky-session code) and
	// "real" IP number conversion, this function is on par in terms of
	// worker index distribution only much faster.
	var worker_index = function(ip, len) {
		return farmhash.fingerprint32(ip) % len; // Farmhash is the fastest and works with IPv6, too
	};

	// Create the outside facing server listening on our port.
	var server = net.createServer({ pauseOnConnect: true }, function(connection) {
		// We received a connection and need to pass it to the appropriate
		// worker. Get the worker for this connection's source IP and pass
		// it the connection.
		var worker = workers[worker_index(connection.remoteAddress, num_processes)];
		worker.send('sticky-session:connection', connection);
	}).listen(port);
} else {
    // Note we don't use a port here because the master listens on it for us.
	var app = new express();

	// Here you might use middleware, attach routes, etc.

	// Don't expose our internal server to the outside.
	var server = app.listen(0, 'localhost'),
	    io = sio(server);

	// Tell Socket.IO to use the redis adapter. By default, the redis
	// server is assumed to be on localhost:6379. You don't have to
	// specify them explicitly unless you want to change them.
	io.adapter(sio_redis({ host: 'localhost', port: 6379 }));

	// Here you might use Socket.IO middleware for authorization etc.

	// Listen to messages sent from the master. Ignore everything else.
	process.on('message', function(message, connection) {
		if (message !== 'sticky-session:connection') {
			return;
		}

		// Emulate a connection event on the server by emitting the
		// event with the connection the master sent us.
		server.emit('connection', connection);

		connection.resume();
	});
}

That should do it. Please let me know if this doesn't work or if you have any comments.

Benchmarks

There's a script you can run to test the various hashing functions. It generates a million random IP addresses and then hashes them using each of four hashing algorithms to get a consistent IP address -> array index mapping.

The time it took is printed in milliseconds (less is better) and distribution of IP addresses to array index is printed (more equal distribution the better).

To run:

$ node benchmark <num_workers>

Here's output from my machine:

$ node benchmark 4
IPv4
----------
benchmarking int31...
  time (ms): 874
  scatter: { '0': 249145, '1': 250189, '2': 249969, '3': 250697 }
benchmarking numeric_real...
  time (ms): 441
  scatter: { '0': 249084, '1': 251221, '2': 250609, '3': 249086 }
benchmarking simple_regex...
  time (ms): 281
  scatter: { '0': 247994, '1': 249241, '2': 251699, '3': 251066 }
benchmarking simple_loop...
  time (ms): 559
  scatter: { '0': 247994, '1': 249241, '2': 251699, '3': 251066 }
benchmarking farmhash...
  time (ms): 234
  scatter: { '0': 249192, '1': 250640, '2': 250570, '3': 249598 }


IPv6
----------
benchmarking int31...
  time (ms): 418
  scatter: { '0': 543029, '1': 143286, '2': 141239, '3': 172446 }
benchmarking numeric_real...
  time (ms): 821
  scatter: { NaN: 1000000 }
benchmarking simple_regex...
  time (ms): 714
  scatter: { NaN: 1000000 }
benchmarking simple_loop...
  time (ms): 1261
  scatter: { '0': 890953, '1': 34728, '2': 38949, '3': 35370 }
benchmarking farmhash...
  time (ms): 314
  scatter: { '0': 249034, '1': 250866, '2': 249923, '3': 250177 }

$

The algorithm used in the example above is "simple_loop."

You can’t perform that action at this time.