Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORS pre-flight breaks socket.io behind load balancer #279

Closed
dmkc opened this issue Sep 2, 2014 · 43 comments
Closed

CORS pre-flight breaks socket.io behind load balancer #279

dmkc opened this issue Sep 2, 2014 · 43 comments

Comments

@dmkc
Copy link

dmkc commented Sep 2, 2014

I ran into an issue on our servers. We are running socket.io v1.0.6 on multiple server instances behind a load balancer. For polling, the requests go through the ELB with sticky sessions turned on. Our real-time service is on a subdomain, and thanks to CORS pre-flight requests, socket.io fucks up. Here is what happens on the client when the polling transport is used:

  1. A socket.io handshake POST request occurs. The response comes back valid with an sid, and the headers include the AWS ELB cookie.
  2. Next, a pre-flight OPTIONS request is made by the browser. The ELB cookie is not included by the browser here. As a result, the OPTIONS request is routed to a potentially different server which will not recognize the sid in the query string.
  3. When the request is routed to the wrong server, socket.io responds with a 400 HTTP status code and an Session ID unknown error.
  4. Since the pre-flight request fails, the browser also fails the actual GET polling request, and tries to re-do the handshake from the beginning
  5. Possibly due to the headers being sent, the browser sends the OPTIONS pre-flight request fairly regularly as opposed to doing it only once, so this cycle repeats over and over.

The fix on our end currently is to respond to all OPTIONS requests with a 200 and all the usual Access-Control-Allow-… headers the browser knows and loves. We do this before they even get to socket.io in our nginx config.

Now, engine.io appears to already handle this case here: https://github.com/Automattic/engine.io/blob/master/lib/transports/polling-xhr.js#L40

However, that check is only reached if the sid is valid here: https://github.com/Automattic/engine.io/blob/master/lib/server.js#L180

which it isn't, of course. I can submit a PR but I'd like to know how you guys think it'd be best to handle this. AFAIK, if a request method is OPTIONS, we can make the assumption that we are polling. But, since we don't have a valid sid to look up a client by, this might mean moving fairly transport-specific logic into server.js which sounds less than ideal.

Thoughts?

@rauchg
Copy link
Contributor

rauchg commented Sep 2, 2014

This is one of the best issue descriptions I've read in a while. Thanks for taking the time.

Can you send me the complete headers of the request that yields OPTIONS? Ideally we wouldn't need the pre-flight. I'm suspecting you're sending binary data which is resulting in a Content-Type switch?

Requests that do not need pre-flight according to MDN are:

A simple cross-site request is one that:

Only uses GET, HEAD or POST. If POST is used to send data to the server, the Content-Type of the data sent to the server with the HTTP POST request is one of application/x-www-form-urlencoded, multipart/form-data, or text/plain.
Does not set custom headers with the HTTP Request (such as X-Modified, etc.)

Before proceeding with the OPTIONS fix, I want to make sure it's happening for a good reason.

@mokesmokes
Copy link
Contributor

Plus the performance hit and additional server load.... do what you can to avoid them as part of your connection.

@rauchg
Copy link
Contributor

rauchg commented Sep 2, 2014

+1 :datfeel:

@defunctzombie
Copy link
Contributor

There are two issues here. We should be supporting OPTIONS unconditionally so that even if the browser makes a pre-flight it doesn't break the existing session stickiness. And we should also see if we can make engine.io not do the pre-flight request at all.

@mokesmokes
Copy link
Contributor

I think the default behavior should stay as today (break) since in 90% of cases it's unintended behavior and we don't want this happening silently. We can add a flag that if explicitly set the server sends OK on every OPTIONS request.

@rauchg
Copy link
Contributor

rauchg commented Sep 3, 2014

Agreed with @mokesmokes


Guillermo Rauch – @rauchg https://twitter.com/rauchg

On Tue, Sep 2, 2014 at 9:14 PM, mokesmokes notifications@github.com wrote:

I think the default behavior should stay as today (break) since in 90% of
cases it's unintended behavior and we don't want this happening silently.
We can add a flag that if explicitly set the server sends OK on every
OPTIONS request.


Reply to this email directly or view it on GitHub
#279 (comment)
.

@dmkc
Copy link
Author

dmkc commented Sep 3, 2014

@guille Thanks for looking into this so quickly! The Content-Type for the requests is indeed application/octet-stream. According to MDN that makes it a preflighted request. Hrm. Here are the complete headers:

Accept:*/*
Accept-Encoding:gzip,deflate
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Content-Length:65
Content-type:application/octet-stream
Cookie:io=Cu4ixu0L9h01xctVABFa; AWSELB=D92B29591C6D9DF48A890DD33BA2DB332633D5E9B0B9494FF3E6BBC16DECE6A0CB3F6B058AEAE907E103688331ECDF2318C3C96EE3035ABD6F0AAE7F5EE38DAA554DC79045
Host:subdomainof.awesome.internal.site.com
Origin:http://awesome.internal.site.com
Referer:http://awesome.internal.site.com
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36

@defunctzombie
Copy link
Contributor

Why don't we want this happening silently? I would think we want this to just work without messing about with settings.

@mokesmokes
Copy link
Contributor

@defunctzombie because in many cases preflighted requests occur due to careless coding, and can be avoided. I think if they occur the dev should be fully aware that they are happening, not make it a silent default decision. Just my $0.02 :)

@dmkc
Copy link
Author

dmkc commented Oct 9, 2014

@mokesmokes Here the preflight requests are occurring because of the request's content-type. Is choosing this content-type a decision made by socket.io? I am not configuring it anywhere explicitly. If that's the case then the default treatment of the OPTIONS req makes sense.

@defunctzombie
Copy link
Contributor

We should fix this. OPTIONS preflights can happen when doing CORS stuff and this behavior is really annoying to an end user to deal with. It basically renders this module unusable behind the amazon ELB even when sticky sessions are on. I do not think this has anything to do with careless coding and is happening under the simplest uses of this module.

@defunctzombie
Copy link
Contributor

For anyone that runs into this before we fix it. You can work around it by using the manual request handling (on request from the http server) and responding to OPTIONS yourself.

@mokesmokes
Copy link
Contributor

Yup, it's not the user's fault, it's engine.io: https://github.com/Automattic/engine.io-client/blob/master/lib/transports/polling-xhr.js#L166
Binary transfers are the culprit. In this case wouldn't it just be better to encode/parse the text in engine.io rather than doing xhr.setRequestHeader('Content-type', 'application/octet-stream');? We really should avoid OPTIONS at all costs, it's a total performance killer and resource hog on the server.

@mokesmokes
Copy link
Contributor

So basically what I think we need is:

  1. OPTIONS turned off by default on engine.io server side
  2. By default forceBase64 for XHR if client detects CORS scenario (WS is of course fine)
  3. Add a forceBinaryXHRCors flag (default=false) which will disable (2) above, thus the browser will emit OPTIONS requests, and the developer will need to turn on the OPTIONS flag server-side.

So basically, by default all works smoothly and no OPTIONS as most people would probably prefer. If people insist on using octet-stream then they should manually configure this, but it should be supported. I really don't think we should just send binary data blindly in CORS scenarios.

@rauchg
Copy link
Contributor

rauchg commented Dec 1, 2014

+1. Big time priority.

On Mon Dec 01 2014 at 3:17:05 AM Mark Mokryn notifications@github.com
wrote:

So basically what I think we need is:

  1. OPTIONS turned off by default on engine.io server side
  2. By default forceBase64 for XHR if client detects CORS scenario (WS is
    of course fine)
  3. Add a forceBinaryXHRCors flag (default=false) which will disable (2)
    above, thus the browser will emit OPTIONS requests, and the developer will
    need to turn on the OPTIONS flag server-side.

So basically, by default all works smoothly and no OPTIONS as most people
would probably prefer. If people insist on using octet-stream then they
should manually configure this, but it should be supported. I really don't
think we should just send binary data blindly in CORS scenarios.


Reply to this email directly or view it on GitHub
#279 (comment)
.

@3rd-Eden
Copy link
Contributor

3rd-Eden commented Dec 1, 2014

Just to get this straight, you would rather have a increased file size for the data that you transfer (for every single message) instead of a one time OPTIONS request? The upgrading of transports is already happening through probing so it shouldn't delay the current connection and I highly doubt that a single extra request to your sever is so heavy that it would affect performance or dramatically increases server load.

The suggestions solutions only add more odd edge cases on the client instead of a simple fix on the server. This doesn't make a lot of sense to me.

@rauchg
Copy link
Contributor

rauchg commented Dec 1, 2014

Yes. Bandwidth is hardly ever the bottleneck, latency is.

@rauchg
Copy link
Contributor

rauchg commented Dec 1, 2014

In this case, an additional OPTIONS request guarantees an extra roundtrip for us. The extra bandwidth cost for frames would hardly ever result in extra roundtrips.

@3rd-Eden
Copy link
Contributor

3rd-Eden commented Dec 1, 2014

@guille But latency isn't an issue here as you are already connected using jsonp

@mokesmokes
Copy link
Contributor

@3rd-Eden it's only a one-time request if Access-Control-Max-Age is honored by the browser, and I'm not sure (bears checking) if we satisfy the requirements. Even then, browser performance is inconsistent, see http://stackoverflow.com/questions/23543719/cors-access-control-max-age-is-ignored

EDIT: I think we will see OPTIONS on every request due to our cache busting :-/
So I think in this case we agree that in CORS we much prefer base64 binary transfers.

@robertjustjones
Copy link

+1

@mokesmokes
Copy link
Contributor

Need this for the fix: socketio/engine.io-parser#36 , so we can encode binary data when sending in a CORS situation

@defunctzombie
Copy link
Contributor

Mildly related: #300

@ashaffer
Copy link

+1 for this issue

@ashaffer
Copy link

For anyone else with this issue, the following code will fix it:

module.exports = function(srv) {
  var listeners = srv.listeners('request').slice(0);
  srv.removeAllListeners('request');
  srv.on('request', function(req, res) {
    if(req.method === 'OPTIONS' && req.url.indexOf('/socket.io') === 0) {
      var headers = {};
      if (req.headers.origin) {
        headers['Access-Control-Allow-Credentials'] = 'true';
        headers['Access-Control-Allow-Origin'] = req.headers.origin;
      } else {
        headers['Access-Control-Allow-Origin'] = '*';
      }

      headers['Access-Control-Allow-Headers'] = 'origin, content-type, accept';
      res.writeHead(200, headers);
      res.end();
    } else {
      listeners.forEach(function(fn) {
        fn.call(srv, req, res);
      });
    }
  });
};

Applied after socket.io:

e.g:

io.listen(server);
handleOptions(server);

@sacheendra
Copy link

There is a small addition to the above code that is needed to fix it.
There needs to be a access-control-allow-headers header

module.exports = function(srv) {
  var listeners = srv.listeners('request').slice(0);
  srv.removeAllListeners('request');
  srv.on('request', function(req, res) {
    if(req.method === 'OPTIONS' && req.url.indexOf('/socket.io') === 0) {
      var headers = {};
      if (req.headers.origin) {
        headers['Access-Control-Allow-Credentials'] = 'true';
        headers['Access-Control-Allow-Origin'] = req.headers.origin;
      } else {
        headers['Access-Control-Allow-Origin'] = '*';
      }

      headers['Access-Control-Allow-Methods'] = 'GET,HEAD,PUT,PATCH,POST,DELETE';
      headers['Access-Control-Allow-Headers'] = 'origin, content-type, accept';
      res.writeHead(200, headers);
      res.end();
    } else {
      listeners.forEach(function(fn) {
        fn.call(srv, req, res);
      });
    }
  });
};

This solved the problem for me.

@darrachequesne
Copy link
Member

Has this been fixed yet?

  1. OPTIONS turned off by default on engine.io server side
  2. By default forceBase64 for XHR if client detects CORS scenario (WS is of course fine)
  3. Add a forceBinaryXHRCors flag (default=false) which will disable (2) above, thus the browser will emit OPTIONS requests, and the developer will need to turn on the OPTIONS flag server-side.

@MiLk
Copy link

MiLk commented Feb 27, 2017

Hello,

I'm trying to set up socket.io servers behind AWS ALB.
The stickiness is using cookies as stated in http://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#sticky-sessions

Application Load Balancers support load balancer-generated cookies only. The name of the cookie is AWSALB. The contents of these cookies are encrypted using a rotating key. You cannot decrypt or modify load balancer-generated cookies.

Because the web application and the ALB are not on the same domain, OPTIONS requests are sent during the handshake phase for long-polling.
However the OPTIONS requests don't have cookies attached to them, and are sent following the ALB distribution method instead of being sticky.
The OPTIONS requests being handled late inside the polling-xhr transport, there is a verification step being done early which fails because the session is not valid on all the servers.
All of that leads to the handshake failure.

While I agree that the aforementioned points should be addressed to avoid doing OPTIONS requests, OPTIONS requests should not trigger all the described logic.

The rfc2616 says:

The OPTIONS method represents a request for information about the communication options available on the request/response chain identified by the Request-URI. This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval.
From my understanding, OPTIONS should only specify the headers for CORS allowing subsequent requests to be sent.
It SHOULD NOT:

  • Verify that the session is valid (
    this.verify(req, false, function (err, success) {
    )
  • Initiate the handshake (
    self.handshake(req._query.transport, req);
    )
  • Try to initiate a connection (

    engine.io/lib/server.js

    Lines 316 to 339 in bd1e81e

    var socket = new Socket(id, this, transport, req);
    var self = this;
    if (false !== this.cookie) {
    transport.on('headers', function (headers) {
    headers['Set-Cookie'] = cookieMod.serialize(self.cookie, id,
    {
    path: self.cookiePath,
    httpOnly: self.cookiePath ? self.cookieHttpOnly : false
    });
    });
    }
    transport.onRequest(req);
    this.clients[id] = socket;
    this.clientsCount++;
    socket.once('close', function () {
    delete self.clients[id];
    self.clientsCount--;
    });
    this.emit('connection', socket);
    )

I've come up with a way to by-pass the verification steps and delegate earlier the processing of the OPTIONS requests to the transport with MiLk@c8012eb
I'm not convinced that's the best way to handle it.
Maybe the handling the OPTIONS requests directly in the listener is a better idea as proposed by other people in this thread.

I'm willing to spend more time on this issue.
What are your views on that?

I'll explore the other solution which is trying to disable CORS by changing the content-type and see where I go with that.

Edit:
I've added an option to let the application handle OPTIONS requests instead of engine.io.
#484

@JohnCoding94
Copy link

Hi @MiLk,

I am also trying to use socket.io behind an AWS ALB.
Could you please share a sample of code enabling us to make the connection on the server side?

I see a lot of peoples having the same issue on stackoverflow, it would be really helpful!

Thanks!

@MiLk
Copy link

MiLk commented Jun 14, 2017

Hi @JohnCoding94,

We ended-up disabling the polling transport and forcing all the clients to use websockets, because the OPTIONS requests are not sending the cookies, and ALB is using rotating keys for their cookies.

What I did in #484, would allow the handshake to work, but the session would still not be right.

However, it might work if you are not using CORS, because the only issue I can think of is the OPTIONS request done during the handshake. https://github.com/socketio/socket.io-client/blob/master/docs/API.md#with-websocket-transport-only

We have now a few thousands WS connections established staying stable over 24h periods. We force connections to be closed in the middle of the night).

@darrachequesne
Copy link
Member

What I did in #484, would allow the handshake to work, but the session would still not be right.

@MiLk could you please explain why that wouldn't work? Is there something we can fix?

@MiLk
Copy link

MiLk commented Jun 15, 2017

The flow I currently see is:

  1. GET /socket.io/?EIO=3&transport=polling&t=...
  2. GET /socket.io/?EIO=3&transport=polling&t=...&sid=...
  3. OPTIONS /socket.io/?EIO=3&transport=polling&t=...&sid=...
  4. POST /socket.io/?EIO=3&transport=polling&t=...&sid=...

In 1., the session is created. socket.io assigns a sid and ALB create a new cookie.
In 2., we send a new request with the sid provided by socket.io and the cookie sent by ALB.
In 3., we send an OPTIONS request before doing the POST request used in the ping process.
But it seems that OPTIONS requests are not sending the cookies.
(https://fetch.spec.whatwg.org/#cors-protocol-and-credentials Note that even so, a CORS-preflight request never includes credentials.).
Because of that, ALB send a new cookie, which might associate the client to the same server or a new server.
In 4., if the cookie links the user to a new server, the ping request will arrive to a server without the session, and we have to re-establish the connection from 1.

All of that is happening because:

The only way to make the polling transport in the current state of ALB would be to avoir the preflight requests.

However, it should work fine with any loadbalancer using a consistent hashing algorithm, or without setting a new cookie on preflight requests.

Basically, the issue didn't change much since the original post. #484 is just making it fail at the POST request instead of the OPTIONS request.

@JohnCoding94
Copy link

@MiLk You may already be aware of that but just for information, i ended up setting a connection from a subdomain to my racine domain using SSL, and any OPTIONS request is made, i didn't set any specific options on my socket.io server.

So it performs the classic long pooling handshake, then switch to websocket protocol, and everything seems to work fine.

@darrachequesne
Copy link
Member

darrachequesne commented Jun 16, 2017

@MiLk maybe a Access-Control-Allow-Methods: GET, POST is missing then?

@MiLk
Copy link

MiLk commented Jun 17, 2017

No, it's because the OPTIONS request get a new ALB session cookie, which makes the POST request to go to a server where the socket.io session doesn't exist.
I would need to setup something to test again, but IIRC the error came from https://github.com/socketio/engine.io/blob/master/lib/server.js#L161

@darrachequesne
Copy link
Member

What I meant is adding Access-Control-Allow-Methods: GET, POST would remove the OPTIONS request in 3/, right?

  1. OPTIONS /socket.io/?EIO=3&transport=polling&t=... (allow GET and POST, not only GET)
  2. GET /socket.io/?EIO=3&transport=polling&t=...
  3. GET /socket.io/?EIO=3&transport=polling&t=...&sid=...
  4. POST /socket.io/?EIO=3&transport=polling&t=...&sid=...

@MiLk
Copy link

MiLk commented Jun 17, 2017

I'm trying to reproduce the issue, but so far the connection seems actually stable.
screen shot 2017-06-17 at 18 02 06
screen shot 2017-06-17 at 18 01 33

I'm using https://github.com/socketio/engine.io/tree/a63c7b787c54b3a47da7f355826bf2770139c62b.

var app = require('express')();
var cors = require('cors');

app.options('*', cors({
  origin: true,
  methods: 'POST',
  allowedHeaders: ['Content-Type'],
  credentials: true,
}));

var server = require('http').Server(app);
var io = require('socket.io')(server, {
  handlePreflightRequest: false
});

...

@MiLk
Copy link

MiLk commented Jun 17, 2017

The issue is still happening on Safari
screen shot 2017-06-17 at 18 19 35

@scream314
Copy link

What solved the same issue for me:
I noticed I had an old version of socket.io dropped in my code and it was used instead of an npm package or whatever. After switching to an up-t-date version of socket.io-client the issue disappeared, there is no OPTIONS request sent funking up the AWSALB cookie.
Afterwards I tried to reproduce the issue in Chrome, FF and Safari, too, with no success.

@MiLk How do you receive a 400 for the GETs above? Is this happening only in Safari? Could you check what does Firefox's Console say? (So far it was the best one I used, at least when it comes to meaningful error messages.)

@MiLk
Copy link

MiLk commented Jun 20, 2017

Chrome and FF seem to be both OK. Only Safari 10.1.1 has an issue.
But the behaviour is definitely different from what I observed a few months ago.

However, my socket.io client and server libraries are not up-to-date (1.1.0 and 1.8 respectively).
Since we forced our clients to use websockets, it's no longer an issue for us, but we will definitively update everything within a few months, I will be able to give a new feedback then.

@franza
Copy link

franza commented Apr 30, 2018

Any updates on this issue? Is it really enough to use handlePreflightRequest: false and handle OPTIONS requests as @MiLk showed?

@MiLk
Copy link

MiLk commented May 1, 2018

Using 2.0.3 on the server and the following on the client, we successfully serve WebSockets and long polling with a HA setup in production behind ALB (about 2k conn / server).

   "socket.io": {
      "version": "1.6.0",
      "resolved": "https://registry.npmjs.org/socket.io/-/socket.io-1.6.0.tgz",
      "integrity": "sha1-PkDZMmN+a9kjmBslyvfFPoO24uE=",
      "requires": {
        "debug": "2.3.3",
        "engine.io": "1.8.0",
        "has-binary": "0.1.7",
        "object-assign": "4.1.0",
        "socket.io-adapter": "0.5.0",
        "socket.io-client": "1.6.0",
        "socket.io-parser": "2.3.1"
      }

We didn't do any change on the server since last August (except Node version upgrade).

The setup is what I described earlier in #279 (comment)

@darrachequesne
Copy link
Member

That should be fixed in engine.io v4 (see 61b9492). The new syntax:

new Server({
  cors: {
    origin: "https://example.com",
    methods: ["GET"],
    allowedHeaders: ["Authorization"],
    credentials: true
  }
})

Previously:

new Server({
  handlePreflightRequest: (req, res) => {
    res.writeHead(200, {
      "Access-Control-Allow-Origin": 'https://example.com',
      "Access-Control-Allow-Methods": 'GET',
      "Access-Control-Allow-Headers": 'Authorization',
      "Access-Control-Allow-Credentials": true
    });
    res.end();
  }
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests