-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Irregular heartbeats with Safari #348
Comments
I can consistently reproduce this. The ping interval respects the settings in my case but the client disconnects after 4/5 pings. |
The issue also happens when using (function () {
var start = +new Date();
function timeout() {
var now = +new Date();
console.log(now - start);
start = now;
setTimeout(timeout, 25000);
}
setTimeout(timeout, 25000);
})(); This is what I got:
We probably can't do much about this. |
Fair enough, we'll just keep using the workaround of increasing the server heartbeat timeout then. FYI: I posted another similar issue about Android devices: #350. |
Chrome also works fine, I didn't test on mobile though. I also tried with Safari 8.0.5 and it acts exactly like 8.0.3. |
Web Workers are not affected by this, so a possible solution is to tweak https://github.com/unshiftio/tick-tock to make it use Web Workers when possible. |
From my testing, Safari will execute timers more often if we simply have a timer running at a lower interval. I'm doing some investigation into its behavior. Webworkers would be great, but I am not sure how long that behavior will last (until they are throttled properly too). I'm going to try forking tick-tock to give it a shot. |
It appears, in Safari, that having other timers active at lower intervals caps the effective range within which Safari will throttle. If I have a 10000ms and a 25000ms timer set, for example, the longest time I see print for the 25000ms timer is 30000ms. We could effectively cap the error by running a noop |
@STRML I tried adding a bogus interval to my previous test case: (function () {
var start = +new Date()
, count = 0;
setInterval(function () {
count++;
}, 1000);
function timeout() {
var now = +new Date();
console.log(now - start);
start = now;
setTimeout(timeout, 25000);
}
setTimeout(timeout, 25000);
})(); but I still get the same results in Safari 9.0. |
Just another update: var src = [
'onmessage = function (e) {',
' var start = e.data;',
'',
' setTimeout(function () {',
' var now = Date.now();',
' postMessage(now - start);',
' }, 25000);',
'}'
];
var blob = new Blob(src, { type: 'application/javascript' });
var worker = new Worker(URL.createObjectURL(blob));
worker.postMessage(Date.now());
worker.onmessage = function (e) {
console.log(e.data);
worker.postMessage(Date.now());
}; This doesn't help. |
Then I believe it's what I feared; that webworkers were only a temporary hack around this problem. In Safari 8.0, this is the output I get with the webworker:
IMO it's not worth putting together a websocket solution if it's going to break with new power-saving updates. We'll have to keep investigating. A server-sent heartbeat option would be a nice way around, assuming those callbacks actually trigger on time. |
I guess the only way to resolve this is to do a major refactor on our heartbeat system, making it more forgiving and have the timers only be present on the server, not the client. |
Yeah, this is unfortunately going to cripple any non-SharedWorker implementation of tab connection sharing as whichever tab has the connection could be throttled when inactive. I'm curious, what led to the decision to move heartbeating to the client in the first place? |
@STRML no particular reason. Furthermore, I doubt that Primus is the only library affected by this. If I remember correctly the heartbeat system in Engine.IO is implemented in the same way. |
Ah, I thought there was a particular reason because (for instance), protocol-level WS pings are from the server to the client. I agree re: timer coalescing. Of course, if the browser developers provided a way to break the throttling, probably every ad network in the world would do it for their own reasons. Of course, one (very wasteful) workaround is to receive messages on a socket at intervals. It will require some thought to figure out a way to reverse the heartbeat that is non-breaking. I am thinking it must be a parameter sent by the client on connection. If the server responds that it supports this mode, the client will expect heartbeats, otherwise it will send them as usual. Otherwise of course we could break it entirely and release a major. |
Not necessarily, I think that either endpoint can send a ping frame in the WS protocol to which the other end should respond with a pong frame, but I may be wrong. Yes it will probably be a major change, also I'm not sure how to handle the |
There are a few reasons on why client->server pings are a good idea. It is currently the only way to get a reliable indication of the message latency between client and server so you can optimize the message traffic from the client. E.g. by queueing messages to make larger updates instead of the more frequent updates. Removing the client->server pong would eliminate this functionality, but I would rather lose this ability instead of connections that break up. |
Also ping packets from the WS protocol are initiated by the client not by the server. |
@3rd-Eden , it appears the spec doesn't specify, only saying that an endpoint (which I am reading to be client or server) can send a ping at any time and the other endpoint must send a pong. And at any time, either side can send pongs, which are meant to just be unanswered "I'm still here" messages. Re: Latency, we could do a 3-step server ping/client pong/server pong heartbeat. |
Before refactoring the code I wonder if we can find how the timers are throttled. |
Something new in Chrome has made this much worse; I'm seeing timeouts as long as 80s when running the setInterval code you posted above. We may have to disable heartbeating entirely for now as this is killing connections really regularly. |
https://bugs.chromium.org/p/chromium/issues/detail?id=650594 may be related; I just noticed this happening a lot more recently and it lines up. It must be some kind of heuristic based on how much CPU time the tab is taking up. On our testnet sites I can't get it to delay timers by more than 1s, but on our production site (where there is much more realtime data), I start seeing the following pattern after a while on a 1s timer: |
@STRML Chrome 56? I cannot repro on Chrome 55. |
Yep, Version 56.0.2924.67 beta (64-bit). It appears to be restricted to sites with a lot of background data or CPU usage. I'm testing on my own site, https://www.bitmex.com/app/trade/XBTUSD. |
From the link you posted above:
|
Well, that's utterly ridiculous without some way to define high- and low-priority timer tasks. What are they thinking? |
I agree this kinda sucks. |
Confirming this is how it is operating; I have triggered this even on localhost by just setting our market mocking bots on overdrive so the timers get more expensive. Perhaps it is time we start doing heartbeats on the server. |
Yes, maybe the heartbeat system can also be improved by running only one timer for all connections instead of one per connection. |
That sounds worthwhile; a single timer could scan all open connections and close those that don't have a flag set, then unset the flag. Getting a response to the heartbeat sets the flag again. |
Great news; Chromium team is willing to bend on this. They have also suggested putting the socket in a ServiceWorker. That should at least solve the heartbeating issue for us, but will require some development effort. |
Nice, I was also playing with server sent pings. I will publish a PoC in a PR if everything works. |
I think that's fine; I've never had any luck with |
Just an FYI: Chrome has committed to delay this until M57 at least. The current implementation even throttles the response to WS messages, so reversing the flow will still break as of 56 beta. Hoping we'll see more positive progress; we got a lot of developers aware via HN & Reddit! |
Ouch. |
After #534 this is no longer an issue, at least on Safari, closing. |
We are using primus 2.4.12 with the engine.io transport and we encountered a strange issue where some clients keep getting disconnected and reconnecting about once a minute. The common factor between these clients seems to be OS X 10.10.2 and Safari 8.0.3.
When investigating the issue we increased the server heartbeat timeout and added debug logging on the 'incoming::ping' event and noticed that the interval between heartbeats from the Safari clients is quite irregular (ping interval is set to 25 000 ms):
46998 ms since last ping or connect
46999 ms since last ping or connect
30992 ms since last ping or connect
Meanwhile a Google Chrome client will keep sending heartbeats regularly ~26 s apart. Any ideas what could be causing this and how to investigate it further? As a workaround we increased the server heartbeat timeout to 90 seconds which seems to mitigate the issue for now.
The text was updated successfully, but these errors were encountered: