Connections that cannot write to server (is there a way to kill them?) #141

Closed
carlosvillademor opened this Issue Jan 29, 2014 · 20 comments

Comments

Projects
None yet
4 participants
@carlosvillademor

Hi,
We are finding we have client connections to primus but writes have not been received by the server. Over time, primus reports a much larger number of connections that we have received writes for.

  1. Is there a way to close the zombie connections from the server to free up resources? We can distinguish them, there just doesn't appear to be a api call for closing sparks.

  2. Why might this be happening? Could it be that primus is reconnecting due to timeout of a connection and not cleaning up the old connection?

Thanks

@lpinca

This comment has been minimized.

Show comment Hide comment
@lpinca

lpinca Jan 29, 2014

Member

This seems similar to #138.
@carlosvillademor can you try to use the master branch?

npm i git://github.com/primus/primus#master
Member

lpinca commented Jan 29, 2014

This seems similar to #138.
@carlosvillademor can you try to use the master branch?

npm i git://github.com/primus/primus#master
@tombburnell

This comment has been minimized.

Show comment Hide comment
@tombburnell

tombburnell Jan 29, 2014

@lpinca Could be - we'll have a go with master.
The clients connect but then no writes are received from client to server. It's as though the handshake has failed to complete successfully but the connection remains open. We have struggled to replicate it ourselves but see a lot of evidence of this from traffic to site.

It would be very useful to be able to retrieve the spark.id from the client so that we can make a GET request to the server in the on('open') to confirm whether the open handler is fired during this scenario

@lpinca Could be - we'll have a go with master.
The clients connect but then no writes are received from client to server. It's as though the handshake has failed to complete successfully but the connection remains open. We have struggled to replicate it ourselves but see a lot of evidence of this from traffic to site.

It would be very useful to be able to retrieve the spark.id from the client so that we can make a GET request to the server in the on('open') to confirm whether the open handler is fired during this scenario

@lpinca

This comment has been minimized.

Show comment Hide comment
@lpinca

lpinca Jan 30, 2014

Member

Can you provide some additional info about the configuration?
What transformer are you using? Are the client and the server application on the same domain?

Member

lpinca commented Jan 30, 2014

Can you provide some additional info about the configuration?
What transformer are you using? Are the client and the server application on the same domain?

@tombburnell

This comment has been minimized.

Show comment Hide comment
@tombburnell

tombburnell Jan 30, 2014

Sorry, yes should have said that before. We are using sockjs transformer. The client is running on the same TLD as the server but different subdomain and the server is running on a different port as this was recommended as a way to help avoid issues with mobiles going through proxies.

Which would you say is the most reliable transformer? We read there were problems with socket.io and engine.io?

Sorry, yes should have said that before. We are using sockjs transformer. The client is running on the same TLD as the server but different subdomain and the server is running on a different port as this was recommended as a way to help avoid issues with mobiles going through proxies.

Which would you say is the most reliable transformer? We read there were problems with socket.io and engine.io?

@lpinca

This comment has been minimized.

Show comment Hide comment
@lpinca

lpinca Jan 30, 2014

Member

Hard to say but i would give engine.io a go.
@3rd-Eden recently fixed some CORS issues in their client.
This will also help to understand if the issue is limited to a single transformer or not.

Member

lpinca commented Jan 30, 2014

Hard to say but i would give engine.io a go.
@3rd-Eden recently fixed some CORS issues in their client.
This will also help to understand if the issue is limited to a single transformer or not.

@3rd-Eden

This comment has been minimized.

Show comment Hide comment
@3rd-Eden

3rd-Eden Jan 30, 2014

Owner

Before we can clean up any possible zombie connections we must first find out which ones that is. We currently don’t have any detection possiblities for that. The only cause that I can come up with is that SockJS is not closing connections correctly when your user disconnects from the internet. As SockJS doesn’t use the heartbeats to as a serverside timeout. We’ve recently added support for this in Primus as more transformers were lacking this. Engine.IO and Socket.IO being the notable exceptions here.

So as @lpinca suggested, using a different transport would help track this down. If the issues go away when using Engine.IO we can safely assume that it’s SockJS or our implementation of SockJS. In addition to that, using the master with SockJS allows you to rule out the lack of heartbeats in SockJS.

Having a reproducible test case for this would make it easier for us to help you debug as well.

Owner

3rd-Eden commented Jan 30, 2014

Before we can clean up any possible zombie connections we must first find out which ones that is. We currently don’t have any detection possiblities for that. The only cause that I can come up with is that SockJS is not closing connections correctly when your user disconnects from the internet. As SockJS doesn’t use the heartbeats to as a serverside timeout. We’ve recently added support for this in Primus as more transformers were lacking this. Engine.IO and Socket.IO being the notable exceptions here.

So as @lpinca suggested, using a different transport would help track this down. If the issues go away when using Engine.IO we can safely assume that it’s SockJS or our implementation of SockJS. In addition to that, using the master with SockJS allows you to rule out the lack of heartbeats in SockJS.

Having a reproducible test case for this would make it easier for us to help you debug as well.

@tombburnell

This comment has been minimized.

Show comment Hide comment
@tombburnell

tombburnell Jan 30, 2014

At the moment the primus build appears to be failing - is it safe to use master?

At the moment the primus build appears to be failing - is it safe to use master?

@lpinca

This comment has been minimized.

Show comment Hide comment
@lpinca

lpinca Jan 30, 2014

Member

I think i have set a timeout too low on a test and this cause Travis CI build to fail.
Tests pass fine on my local environment.
Can you try to run tests against master and see if they pass?

Edit: build fails because @3rd-Eden pushed some commits to sparky branch.
You should be safe with master.

Member

lpinca commented Jan 30, 2014

I think i have set a timeout too low on a test and this cause Travis CI build to fail.
Tests pass fine on my local environment.
Can you try to run tests against master and see if they pass?

Edit: build fails because @3rd-Eden pushed some commits to sparky branch.
You should be safe with master.

@3rd-Eden

This comment has been minimized.

Show comment Hide comment
@3rd-Eden

3rd-Eden Jan 30, 2014

Owner

I think the badge doesnt point to the master branch but all branches. My sparky branch is currently failing so this could trigger the red badge

Sent from my iPhone

On Jan 30, 2014, at 11:50, Luigi Pinca notifications@github.com wrote:

I think i have set a timeout too low on a test and this cause Travis CI build to fail.
Tests pass fine on my local environment.
Can you try to run tests against master and see if they pass?


Reply to this email directly or view it on GitHub.

Owner

3rd-Eden commented Jan 30, 2014

I think the badge doesnt point to the master branch but all branches. My sparky branch is currently failing so this could trigger the red badge

Sent from my iPhone

On Jan 30, 2014, at 11:50, Luigi Pinca notifications@github.com wrote:

I think i have set a timeout too low on a test and this cause Travis CI build to fail.
Tests pass fine on my local environment.
Can you try to run tests against master and see if they pass?


Reply to this email directly or view it on GitHub.

@tombburnell

This comment has been minimized.

Show comment Hide comment
@tombburnell

tombburnell Jan 30, 2014

We are trying engine.io and getting the following in Chrome.

XMLHttpRequest cannot load http://subdomain1.domain.com/primus/?EIO=2&transport=polling&sid=w79IMWh38A_ImyKjAAAP. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://subdomain2.domain.com is therefore not allowed access.

WebSocket connection to 'ws://subdomain.domain.com/primus/?EIO=2&transport=websocket&sid=fLh0T9Uedf9HdfGlAAAF' failed: WebSocket is closed before the connection is established.

Couple of questions - feel free to send me to engine.io mailing list..

  • Does engine.io require sticky sessions?
  • Should we be able to run the engine IO on a separate subdomain? If so, do we need to do?
  • What would be closing the connections in the 2nd error?

Thanks, Tom

We are trying engine.io and getting the following in Chrome.

XMLHttpRequest cannot load http://subdomain1.domain.com/primus/?EIO=2&transport=polling&sid=w79IMWh38A_ImyKjAAAP. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://subdomain2.domain.com is therefore not allowed access.

WebSocket connection to 'ws://subdomain.domain.com/primus/?EIO=2&transport=websocket&sid=fLh0T9Uedf9HdfGlAAAF' failed: WebSocket is closed before the connection is established.

Couple of questions - feel free to send me to engine.io mailing list..

  • Does engine.io require sticky sessions?
  • Should we be able to run the engine IO on a separate subdomain? If so, do we need to do?
  • What would be closing the connections in the 2nd error?

Thanks, Tom

@lpinca

This comment has been minimized.

Show comment Hide comment
@lpinca

lpinca Jan 30, 2014

Member

@TomHowe a very simple test for CORS support with Primus and engine.io transformer here https://gist.github.com/lpinca/8710546. It seems to work fine on Chrome.

Member

lpinca commented Jan 30, 2014

@TomHowe a very simple test for CORS support with Primus and engine.io transformer here https://gist.github.com/lpinca/8710546. It seems to work fine on Chrome.

@tombburnell

This comment has been minimized.

Show comment Hide comment
@tombburnell

tombburnell Jan 30, 2014

Will take a look.
At the moment, we are have gone back to sockjs and trying to resolve an issue which is affecting firefox giving us the following error

The connection to ws://domain.com/app/primus/363/58344jcj/websocket was interrupted while the page was loading.

Any idea what this means?

Will take a look.
At the moment, we are have gone back to sockjs and trying to resolve an issue which is affecting firefox giving us the following error

The connection to ws://domain.com/app/primus/363/58344jcj/websocket was interrupted while the page was loading.

Any idea what this means?

@tombburnell

This comment has been minimized.

Show comment Hide comment
@tombburnell

tombburnell Jan 30, 2014

Keep seeing this as well:
Error: INVALID_STATE_ERR primus.js:2600

Keep seeing this as well:
Error: INVALID_STATE_ERR primus.js:2600

@lpinca

This comment has been minimized.

Show comment Hide comment
@lpinca

lpinca Jan 31, 2014

Member

@TomHowe the first issue should have no impact on your application.
The second one i don't know, but you can post those two on engine.io and sockjs issues page.

Member

lpinca commented Jan 31, 2014

@TomHowe the first issue should have no impact on your application.
The second one i don't know, but you can post those two on engine.io and sockjs issues page.

@3rd-Eden

This comment has been minimized.

Show comment Hide comment
@3rd-Eden

3rd-Eden Jan 31, 2014

Owner

@TomHowe Could you post a full stacktrace of that error as well?

Owner

3rd-Eden commented Jan 31, 2014

@TomHowe Could you post a full stacktrace of that error as well?

@tombburnell

This comment has been minimized.

Show comment Hide comment
@tombburnell

tombburnell Jan 31, 2014

@3rd-Eden If you mean the Error: INVALID_STATE_ERR primus.js:2600. It doesn't give a stacktrace - only a string. It happens when we try to post.

@3rd-Eden If you mean the Error: INVALID_STATE_ERR primus.js:2600. It doesn't give a stacktrace - only a string. It happens when we try to post.

@tombburnell

This comment has been minimized.

Show comment Hide comment
@tombburnell

tombburnell Jan 31, 2014

In respect of our original problem, which is that we are getting connections to the server but not receiving any messages from the client. A few things about our setup:

  • These zombie connections are moslty coming from IE 8.5,IE9.5 and Chrome 32
  • We are using port 8080 for primus to cater for mobile
  • We have noticed a lot of timeouts on the client side. We report any on('timeout') to the backend
  • We suspect the issue could be around the downgrading to long polling in the case that websockets are not working.

Q1: Is there a way to determine if the connection is using websockets or long polling?

Q2: Whats the recommended port configuration? Is using 8080 advisable? Should we prefer 80 and only use 8080 for mobile? Should we use 443 instead?

Thanks for your help,
Tom

In respect of our original problem, which is that we are getting connections to the server but not receiving any messages from the client. A few things about our setup:

  • These zombie connections are moslty coming from IE 8.5,IE9.5 and Chrome 32
  • We are using port 8080 for primus to cater for mobile
  • We have noticed a lot of timeouts on the client side. We report any on('timeout') to the backend
  • We suspect the issue could be around the downgrading to long polling in the case that websockets are not working.

Q1: Is there a way to determine if the connection is using websockets or long polling?

Q2: Whats the recommended port configuration? Is using 8080 advisable? Should we prefer 80 and only use 8080 for mobile? Should we use 443 instead?

Thanks for your help,
Tom

@lpinca

This comment has been minimized.

Show comment Hide comment
@lpinca

lpinca Feb 28, 2014

Member

@TomHowe i know that some ISP blocks WebSocket connections on port 80.
If you can set up an HTTPS server and use a secure WebSocket connection it could help.

Member

lpinca commented Feb 28, 2014

@TomHowe i know that some ISP blocks WebSocket connections on port 80.
If you can set up an HTTPS server and use a secure WebSocket connection it could help.

@3rd-Eden

This comment has been minimized.

Show comment Hide comment
@3rd-Eden

3rd-Eden Mar 25, 2014

Owner

CORS support has been added in to master which should fix the issues that were stated in comment: #141 (comment)

Q1: No, there currently isn't a way of figuring out which transport the underlaying transformers are using
Q2: Always use HTTPS (443)

Owner

3rd-Eden commented Mar 25, 2014

CORS support has been added in to master which should fix the issues that were stated in comment: #141 (comment)

Q1: No, there currently isn't a way of figuring out which transport the underlaying transformers are using
Q2: Always use HTTPS (443)

@lpinca

This comment has been minimized.

Show comment Hide comment
@lpinca

lpinca Apr 6, 2014

Member

I close this, if needed reopen it.

Member

lpinca commented Apr 6, 2014

I close this, if needed reopen it.

@lpinca lpinca closed this Apr 6, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment