handshake propagation issue when using cluster #952

ashwinphatak · 2012-07-11T09:10:41Z

I'm using node 0.8.1, socket.io 0.9.6, websockets and the cluster module. The Redis module uses pubsub to communicate handshaking events across processes, but looking at the code, it might not behave correctly under heavy load due to timing issues.

I've experienced the following problem trying to replace Redis with RabbitMQ, but as far as I can tell the problem is timing related and independent of what pubsub tool we use.

Scenario:

Let's say there are two worker processes in the cluster: W1 & W2

The initial request to allocate a client session/websocket (say /socket.io/1/?t=1341994956158) comes to W1, which updates it's list of 'handshaken' clients. It also publishes this handshaking event for other processes to update their lists.

Due to clustering, W2 receives the HTTP Upgrade request (say /socket.io/1/websocket/1860678371557773727 ) before it gets the 'handshake' event published by W1.

W2 doesn't find 1860678371557773727 in the list of 'handshaken' clients, and discards the transport with a "client not handshaken - should reconnect" error.

During the reconnect tried by the browser, the same story repeats (with workers interchanged), leading to the browser failing to establish a websocket connection with the server even after multiple retries.

If the 'handshake' event sent by W1 reaches W2 before the HTTP Upgrade request, everything seems to work fine.

Has anyone faced this or similar issues? Or, am I missing something?

ashwinphatak · 2012-07-11T09:35:10Z

Manager.prototype.handleUpgrade = function (req, socket, head) {
  var data = this.checkRequest(req)
    , self = this;

  if (!data) {
    if (this.enabled('destroy upgrade')) {
      socket.end();
      this.log.debug('destroying non-socket.io upgrade');
    }

    return;
  }

  req.head = head;

  // HOT FIX
  setTimeout(function() {
    self.handleClient(data, req);
  }, 1000);

  // ORIGINAL this.handleClient(data, req);
};

If we introduce an artificial delay during Upgrade as above, it gives the 'handshake' events enough time to propagate, and the "client not handshaken - should reconnect" errors go away.

I'm not in any way suggesting this as a fix, just using it to illustrate the issue better.

trungnb · 2012-07-23T00:10:54Z

I'm using nodejs, socket.io 0.9.6, nginx patched with tcp_proxy module and redis for scaling socket processes. Now I'm got stuck with situation similar with yours. Client could not "handshake" with server (but sometimes successfully!) and in log file I see:

debug: websocket writing 2::
debug: set heartbeat timeout for client 4767100961459878228
debug: got heartbeat packet
debug: cleared heartbeat timeout for client 4767100961459878228
debug: set heartbeat interval for client 4767100961459878228

Client sent request connect which is not success, so he repeatly send requests again!. Very appreciate if you could give me some advice. Thanks.

agubler · 2013-11-13T15:02:35Z

@guille @LearnBoost Are there any plans to address this issue?

gkorland · 2014-02-08T10:44:34Z

+1 blocker

gkorland · 2014-02-21T09:09:03Z

Does anyone know if the future to come Socket.io 1.0 still has this issue?

mkoryak · 2014-03-08T00:57:56Z

+1

This was referenced Oct 7, 2013

Error "handshake authorized" in Cluster #1244

Closed

AssertionError: Invalid topic "disconnect:3jjhnPfKZVtle28NqGCa". strongloop/strong-cluster-socket.io-store#8

Closed

bajtos mentioned this issue Apr 24, 2017

Add support for mixed sync/async bindings loopbackio/loopback-next#193

Merged

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handshake propagation issue when using cluster #952

handshake propagation issue when using cluster #952

ashwinphatak commented Jul 11, 2012

ashwinphatak commented Jul 11, 2012

trungnb commented Jul 23, 2012

agubler commented Nov 13, 2013

gkorland commented Feb 8, 2014

gkorland commented Feb 21, 2014

mkoryak commented Mar 8, 2014

handshake propagation issue when using cluster #952

handshake propagation issue when using cluster #952

Comments

ashwinphatak commented Jul 11, 2012

ashwinphatak commented Jul 11, 2012

trungnb commented Jul 23, 2012

agubler commented Nov 13, 2013

gkorland commented Feb 8, 2014

gkorland commented Feb 21, 2014

mkoryak commented Mar 8, 2014