Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handshake propagation issue when using cluster #952

Closed
ashwinphatak opened this issue Jul 11, 2012 · 6 comments
Closed

handshake propagation issue when using cluster #952

ashwinphatak opened this issue Jul 11, 2012 · 6 comments

Comments

@ashwinphatak
Copy link

I'm using node 0.8.1, socket.io 0.9.6, websockets and the cluster module. The Redis module uses pubsub to communicate handshaking events across processes, but looking at the code, it might not behave correctly under heavy load due to timing issues.

I've experienced the following problem trying to replace Redis with RabbitMQ, but as far as I can tell the problem is timing related and independent of what pubsub tool we use.

Scenario:

Let's say there are two worker processes in the cluster: W1 & W2

The initial request to allocate a client session/websocket (say /socket.io/1/?t=1341994956158) comes to W1, which updates it's list of 'handshaken' clients. It also publishes this handshaking event for other processes to update their lists.

Due to clustering, W2 receives the HTTP Upgrade request (say /socket.io/1/websocket/1860678371557773727 ) before it gets the 'handshake' event published by W1.

W2 doesn't find 1860678371557773727 in the list of 'handshaken' clients, and discards the transport with a "client not handshaken - should reconnect" error.

During the reconnect tried by the browser, the same story repeats (with workers interchanged), leading to the browser failing to establish a websocket connection with the server even after multiple retries.

If the 'handshake' event sent by W1 reaches W2 before the HTTP Upgrade request, everything seems to work fine.

Has anyone faced this or similar issues? Or, am I missing something?

@ashwinphatak
Copy link
Author

Manager.prototype.handleUpgrade = function (req, socket, head) {
  var data = this.checkRequest(req)
    , self = this;

  if (!data) {
    if (this.enabled('destroy upgrade')) {
      socket.end();
      this.log.debug('destroying non-socket.io upgrade');
    }

    return;
  }

  req.head = head;

  // HOT FIX
  setTimeout(function() {
    self.handleClient(data, req);
  }, 1000);

  // ORIGINAL this.handleClient(data, req);
};

If we introduce an artificial delay during Upgrade as above, it gives the 'handshake' events enough time to propagate, and the "client not handshaken - should reconnect" errors go away.

I'm not in any way suggesting this as a fix, just using it to illustrate the issue better.

@trungnb
Copy link

trungnb commented Jul 23, 2012

I'm using nodejs, socket.io 0.9.6, nginx patched with tcp_proxy module and redis for scaling socket processes. Now I'm got stuck with situation similar with yours. Client could not "handshake" with server (but sometimes successfully!) and in log file I see:

debug: websocket writing 2::
debug: set heartbeat timeout for client 4767100961459878228
debug: got heartbeat packet
debug: cleared heartbeat timeout for client 4767100961459878228
debug: set heartbeat interval for client 4767100961459878228

Client sent request connect which is not success, so he repeatly send requests again!. Very appreciate if you could give me some advice. Thanks.

@agubler
Copy link

agubler commented Nov 13, 2013

@guille @LearnBoost Are there any plans to address this issue?

@gkorland
Copy link

gkorland commented Feb 8, 2014

+1 blocker

@gkorland
Copy link

Does anyone know if the future to come Socket.io 1.0 still has this issue?

@mkoryak
Copy link

mkoryak commented Mar 8, 2014

+1

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants