Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to detect loss of network connection client-side in Chrome #259

Closed
mbech opened this issue Aug 18, 2016 · 16 comments
Closed

How to detect loss of network connection client-side in Chrome #259

mbech opened this issue Aug 18, 2016 · 16 comments

Comments

@mbech
Copy link

mbech commented Aug 18, 2016

I'm having difficulty identifying when the websocket connection is down in Chrome. I've looked through the docs and closed issues (some similarities to #230), but wasn't able to find a solution or insights into this particular issue. I hope to confirm that I'm not overlooking an existing Sente feature/option that could help.

Scenario:

Server deployed on Heroku, client logs in, websocket established/first-open, successfully sending messages back/forth. Manually disable wifi on client, cutting the connection.

Expected behavior (both Safari and Firefox):

Within a few seconds (~2-5) the chsk-state updates to reflect that the connection is no longer :open?. My add-watch on chsk-state picks up the change and renders a notification to the user. Sente begins trying to reconnect every few seconds (longer and longer pauses between attempts). Once the network connection is back up, Sente reestablishes a websocket on next retry attempt.

Unexpected actual behavior (Chrome):

No change to chsk-state occurs, at least in the 5 minutes I've let it sit after disabling wifi. No event-msgs or errors/warnings appear to be thrown. Once the network is back up, websocket is reestablished and works, but I haven't been able to find anything to watch/capture to inform the client when it's disconnected.

Using latest Chrome Version 52.0.2743.116 (64-bit) on OSX El Capitan 10.11.6

Notes:

After researching, it seems this may be due to Chrome not throwing an "onClose" event when the connection dies, as opposed to Firefox/Safari which do throw that event (which I think Sente captures and reflects in the chsk-state)?

To get around it, I've considered setting up a ping/pong loop from client to server that fires every few seconds, and assume the connection is down if the client misses a few pongs in a row. But I figured it was worth asking if I overlooked something before building out that workaround, especially since keeping an eye on chsk-state works so great in Firefox/Safari.

@ptaoussanis
Copy link
Member

ptaoussanis commented Aug 19, 2016

Hi there. Thanks for the detailed, clear report- that was very welcome.

Just have my hands full today so haven't confirmed, but think you've identified a bug here related to #230.

Basically: #230 addressed the problem on the server side, and looks like we'll need an equivalent strategy on the client end to reliably identify these kinds of sudden disconnects across all browsers/devices. Didn't realise Chrome behaved this way.

Since we already have the server broadcasting pings at the necessary interval, we just need to mod the client with a simple timer to check if it's received any comms from the server w/in a timeout window.

Should be a simple fix, just fully occupied for the next week or two. Will try get to this asap.

I've considered setting up a ping/pong loop from client to server that fires every few seconds, and assume the connection is down if the client misses a few pongs in a row.

That sounds reasonable as a workaround so long.

Cheers!

@ptaoussanis
Copy link
Member

Quick note if you're not against running a fork so long- it might be easier for you to just use the built-in pings. You'll need to fork to make them available to your app:

https://github.com/ptaoussanis/sente/blob/ab45ef8839b0df17acd88cc2fcc992963e57b1c1/src/taoensso/sente.cljc#L1044

ptaoussanis added a commit that referenced this issue Aug 19, 2016
Quick sketch of a possible implementation.
Didn't test, or think about this much.
ptaoussanis added a commit that referenced this issue Aug 19, 2016
Quick sketch of a possible implementation.
Didn't test, or think about this much.

Situation before #230:
  Clients maintain keep-alive, sending pings to server.
  On sudden conn break (e.g. airplane mode), clients would know about
  the break but the server wouldn't

Situation after #230:
  Server maintains keep-alive, sending pings to clients.
  On sudden conn break (e.g. airplane mode), server would know about
  the break but clients wouldn't.

As of this commit:
  Server and clients each maintain a keep-alive[1]. I.e. each side will
  attempt to ping if it hasn't heard from the other side in a
  prescribed window.

  On sudden conn break, each side will be made aware of the break when
  its own scheduled window fires and fails to successfully send a ping.

[1] Timeouts should be different. In particular, the client may like to
choose a much more aggressive window (e.g. 5s) to provide a rapid
indicator to users.
@mbech
Copy link
Author

mbech commented Aug 19, 2016

Sounds good. Thanks for confirming that I wasn't overlooking an existing config or other solution for this particular situation.

I'll read through the current Sente ws-ping keep-alive code more closely. If adding a "pong" server response there makes sense I'll create a fork, otherwise it should be straightforward to add to my existing app as a standard cshk-send!/event-msg-handler set on a few second interval.

Thank you again for the timely response. This is the first project I've used Sente in and it's been a great experience (especially with the help of the awesome readme and all the examples).

@ptaoussanis
Copy link
Member

Please note that a pong response isn't what you want since when the connection's down, the client won't receive the ping (pong response).

Had a free moment yesterday so quickly sketched out one approach at 7ef5971

Entirely untested, but may give you a starting point if you want to do your own pings. Basically, just need to send a client->server ping every x seconds if there haven't been any other messages sent or received in that time. The server can ignore the ping, the point would be for the client to attempt a send that will help it identify a broken connection.

ptaoussanis added a commit that referenced this issue Aug 24, 2016
This reverts commit 7ef5971.
ptaoussanis added a commit that referenced this issue Aug 31, 2016
…ane mode)

Situation before #230:
  Clients maintain keep-alive, sending pings to server.
  On sudden conn break (e.g. airplane mode), clients would know about
  the break but the server wouldn't

Situation after #230:
  Server maintains keep-alive, sending pings to clients.
  On sudden conn break (e.g. airplane mode), server would know about
  the break but clients wouldn't.

As of this commit:
  Server and clients each maintain a keep-alive[1]. I.e. each side will
  attempt to ping if it hasn't heard from the other side in a
  prescribed window.

  On sudden conn break, each side will be made aware of the break when
  its own scheduled window fires and fails to successfully send a ping.

[1] Timeouts should be different. In particular, the client may like to
choose a much more aggressive window (e.g. 5s) to provide a rapid
indicator to users.
@ptaoussanis
Copy link
Member

ptaoussanis commented Aug 31, 2016

Have pushed [com.taoensso/sente "1.11.0-alpha3"] to Clojars which adds a client-side :ws-kalive-ms option (defaults to 20s).

Haven't had an opportunity to test yet, please let me know if this addresses your issue? Thanks!

ptaoussanis added a commit that referenced this issue Sep 8, 2016
…ane mode)

Situation before #230:
  Clients maintain keep-alive, sending pings to server.
  On sudden conn break (e.g. airplane mode), clients would know about
  the break but the server wouldn't

Situation after #230:
  Server maintains keep-alive, sending pings to clients.
  On sudden conn break (e.g. airplane mode), server would know about
  the break but clients wouldn't.

As of this commit:
  Server and clients each maintain a keep-alive[1]. I.e. each side will
  attempt to ping if it hasn't heard from the other side in a
  prescribed window.

  On sudden conn break, each side will be made aware of the break when
  its own scheduled window fires and fails to successfully send a ping.

[1] Timeouts should be different. In particular, the client may like to
choose a much more aggressive window (e.g. 5s) to provide a rapid
indicator to users.
@ptaoussanis
Copy link
Member

Any update on this? May I close?

@ptaoussanis
Copy link
Member

Will assume that this is resolved.

@rafd
Copy link
Collaborator

rafd commented Feb 27, 2017

In response for the request for feedback (#259 (comment)). Here is a report on some tests I've done:

Summary:

  • adding :ws-kalive-ms back to the client (d925b66 )
    has made it possible to detect abnormal disconnects in Chrome, but the feature is currently limited: it takes ~50s to detect a disconnect (even with a :ws-kalive-ms set to 5s)

  • :ws-kalive-ms has no effect on Firefox or Safari, which trigger a websocket disconnect event independent of messages sent [edit: ws-kalive-ms is still necessary for detecting server-side disconnects, see my comment further below]

  • Chrome seems to trigger a disconnect only on the first message sent that is ~45s after the network actually disconnects

If the intent of :ws-kalive-ms is to provide an upper-bound on "time to detect disconnect", then the implementation will need to be updated.

Given the behaviour of Chrome, one way to implement such a bound in sente would be for the client to ping every X seconds, and then trigger a disconnect if no pong response is detected within Y seconds.

[Edit: removed some preliminary data, added more rigorous data in a comment below]

@danielcompton
Copy link
Collaborator

Thanks for the testing, where did you test it? On localhost, or over the internet? The reason I ask is that I could imagine Safari might have different behaviour over the internet than on localhost if it's listening to the OS network adapter.

@rafd
Copy link
Collaborator

rafd commented Feb 27, 2017

@danielcompton I tested over the internet (both deployed to a VPS, and also running in dev on a different machine on the same network).

I will do some tests in a few hours where the server's network connection drops (vs. my previous tests, which were all for the client).

@ptaoussanis
Copy link
Member

Hey Rafal, thanks a lot for the detailed info! Don't have an opportunity to look into this right away, but will try make it a priority next time I'm doing a batch of open-source work.

Any additional tests/details/conclusions you (or others) can put together in the meantime would of course be a huge help!

Cheers :-)

@ptaoussanis ptaoussanis reopened this Feb 27, 2017
@rafd
Copy link
Collaborator

rafd commented Feb 28, 2017

I did some more testing on disconnect-detection with the network being disabled on the server-side and client-side. Here are my results:

Time for Client to Notice an Abnormal Client-side Network Disconnect

browser w/ keep-alive w/o keep-alive after next message
Chrome 45 sec infinite? (> 4 min) 0 sec
Firefox 10 sec 10 sec n/a
Safari 0 sec 0 sec n/a

screen shot 2017-03-02 at 12 07 26 pm

Time for Client to Notice an Abnormal Server-side Network Disconnect

browser w/ keep-alive w/o keep-alive after next message
Chrome 45-120 sec infinite? (> 4 min) 1 - 3 min
Firefox 45-60 sec infinite? (> 4 min) 2 - 3 min
Safari 25 sec infinite? (> 4 min) 55 sec

screen shot 2017-03-02 at 12 03 06 pm

screen shot 2017-03-02 at 12 02 57 pm

Comments:

  • Firefox sends its own ping message immediately when the client network is disabled
  • about 50% of the time on Firefox, instead of gracefully disconnecting, a javascript error occurs:
Error: Invariant violation in `taoensso.timbre:?` [pred-form, val]:
 [(string? ?msg-fmt), [Exception... "Unexpected error"  nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)"  location: "JS frame :: http://192.168.255.24:5555/js/desktop/out/taoensso/sente.js :: taoensso.sente.ChWebSocket.prototype.taoensso$sente$IChSocket$_chsk_send_BANG_$arity$3 :: line 3823"  data: no]]
  • keep-alive is necessary to detect abnormal server-side network disconnects on all browsers

    • (it's possible that the browsers would eventually detect a disconnect w/o keep-alive, I only tested up to 4 minutes)
  • keep-alive is necessary to detect abnormal client-side network disconnects on Chrome

  • time-to-detect-disconnection varies greatly (both for the same browser, and between browsers)

  • the browsers are using combinations of different strategies to trigger websocket disconnection:

    • immediately close when the OS signals that the network is disabled (Safari)
    • issue a ping with a timeout when the OS signals that network is disabled (Firefox)
    • waits until the OS-level TCP timeout and only on the next attempted outgoing websocket message trigger the websocket disconnect ( Chrome [1] [2] and Firefox, Safari seems to use it's own timeout )

Raw data:
https://docs.google.com/spreadsheets/d/1OVXbNPN2-TNRBQvnmhKK8IXM_vQ0NiD7ptUEijpQWeo/edit?usp=sharing

[1] https://bugs.chromium.org/p/chromium/issues/detail?id=197841
[2] https://bugs.chromium.org/p/chromium/issues/detail?id=76358

@rafd
Copy link
Collaborator

rafd commented Mar 2, 2017

I've updated my post above w/ new data.

With the client-side :ws-keep-alive, each browser will eventually notice a disconnect (which wasn't the case before).

However, the time to detect varies greatly between browsers (and even for the same browser), so, it would be nice for sente users to have a way to optionally set their own, more aggressive and standardized threshold. However, this is a further extension, so, perhaps it deserves its own issue?

@ptaoussanis
Copy link
Member

Sorry for the delay handling this, have been swamped lately. Will make this my first priority next time I've got some time to batch work on Sente!

@ptaoussanis
Copy link
Member

This should hopefully be addressed by the forthcoming v1.18 release.

Will ping when the first public alpha is out, and keep this issue open until there's general consensus that the issue is adequately resolved.

Apologies for the long delay on this!

ptaoussanis added a commit that referenced this issue Mar 7, 2023
BEFORE THIS COMMIT

  Client sends regular ping to server on inactivity.
  When the connection is broken, this sometimes (but not always)
  triggers a connection close that will then trigger reconnect.

AFTER THIS COMMIT

  Client sends regular ping to server on inactivity AND EXPECTS
  REPLY FROM SERVER. If the server doesn't reply within a timeout,
  explicitly triggers a connection close that will then trigger
  reconnect.
ptaoussanis added a commit that referenced this issue Mar 7, 2023
BEFORE THIS COMMIT

  Client sends regular ping to server on inactivity.
  When the connection is broken, this sometimes (but not always)
  triggers a connection close that will then trigger reconnect.

AFTER THIS COMMIT

  Client sends regular ping to server on inactivity AND EXPECTS
  REPLY FROM SERVER. If the server doesn't reply within a timeout,
  explicitly triggers a connection close that will then trigger
  reconnect.
@ptaoussanis
Copy link
Member

Think it's safe to now close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants