New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessively long timeout on stream reconnect #696

Closed
awwx opened this Issue Feb 18, 2013 · 12 comments

Comments

Projects
None yet
3 participants
@awwx
Contributor

awwx commented Feb 18, 2013

stream_client uses an exponential back-off algorithm when attempting reconnects, maxing out to an interval of 30 minutes between reconnect attempts (!). This is a miserable experience on mobile devices with an intermittent Internet connection, as the Internet connection returns but the app doesn't attempt to reconnect for long periods.

For reconnects unassisted by a "retry now" UI, it's best if the reconnect attempt interval not be longer than 30 seconds for a reasonable experience. 30 seconds is actually a long time to wait if one has gotten back on Internet and is waiting for the app to wake up again, but it's bearable... longer wait times get frustrating.

Of course if the potential impact on the server cluster wasn't an issue it would be an easy fix, just change RETRY_MAX_TIMEOUT on the client to 30000 :-)

@gschmidt

This comment has been minimized.

Member

gschmidt commented Feb 19, 2013

Gosh, I remember the day (long ago) that I set it to 30 minutes, but I don't remember why I chose that value. I agree, it seems excessively long. I will circulate the idea of reducing it to 30 seconds and see if anyone objects.

I suppose ideally we'd use an even shorter retry time on platforms where battery life or data charges are known not to be an issue.

@gschmidt

This comment has been minimized.

Member

gschmidt commented Feb 19, 2013

@sixolet noticed that HTML5 has navigator.onLine which lets us detect good times to reconnect!
www.w3.org/TR/offline-webapps

Cool demo:
http://html5demos.com/nav-online

One direction here would be to keep the max timeout relatively long (still probably not 30 minutes) but force an immediate retry when we get the HTML5 event telling us that we've just reestablished connectivity.

@awwx

This comment has been minimized.

Contributor

awwx commented Feb 19, 2013

Couple issues with navigator.onLine ... on mobile devices, it detects whether there's a mobile data or WiFi connection, but not the quality of the connection. I can be on the edge of the range of a WiFi router and appear to be online, but not actually be able to make connections.

On the desktop navigator.onLine is fairly useless... it detects if the user has manually put the browser into offline mode, but not whether the laptop has an active Internet connection at all.

What makes doing retries at the same interval as the longpoll interval worse than longpolls? Wouldn't they involve about the same amount of network activity?

avital added a commit that referenced this issue May 10, 2013

Faster reconnects. Fixes #696
- Lower reconnect timeout to 5m
- Respond to the 'online' event to reconnect immediately
  eg. in case you switched from 3G to Wi-Fi

@avital avital closed this May 10, 2013

@avital

This comment has been minimized.

Contributor

avital commented May 10, 2013

One thing I learned while doing this - navigator.onLine on desktop does detect whether you have a wi-fi connection open (at least on Chrome on Mac OS)

The reason we need to make sure retries on reconnect are less frequent than long polling is because we might reach a disconnected state specifically because of a server load issue, in which case we must make sure to connect less frequently to enable to server to recover.

@awwx

This comment has been minimized.

Contributor

awwx commented May 10, 2013

This is an improvement, but the issue is not resolved. Five minutes to reconnect (after the Internet connection has been poor enough to so that the browser was unable to connect to the server, without actually being fully offline and notified of that state) is still a miserable experience.

The fix for server overload is to throttle connections at the server and return a 503 from the server when it is busy. I understand that you wouldn't want to resolve this issue by means of reducing the client timeout until you have something like that implemented on the server, but that just means that addressing your server architecture is a prerequisite for fixing this issue.

A note for comparison is that in my informal testing with your competition the recovery time is usually around 30 seconds.

@avital

This comment has been minimized.

Contributor

avital commented May 10, 2013

Gmail has a similar timeout
On May 10, 2013 5:20 AM, "Andrew Wilcox" notifications@github.com wrote:

This is an improvement, but the issue is not resolved. Five minutes to
reconnect (after the Internet connection has been poor enough to so that
the browser was unable to connect to the server, without actually being
fully offline and notified of that state) is still a miserable experience.

The fix for server overload is to throttle connections at the server and
return a 503 from the server when it is busy. I understand that you
wouldn't want to resolve this issue by means of reducing the client
timeout until you have something like that implemented on the server, but
that just means that addressing your server architecture is a prerequisite
for fixing this issue.

A note for comparison is that in my informal testing with your competition
the recovery time is usually around 30 seconds.


Reply to this email directly or view it on GitHubhttps://github.com//issues/696#issuecomment-17717169
.

@awwx

This comment has been minimized.

Contributor

awwx commented May 10, 2013

Gmail has a similar timeout

30 sec or 5 min?

@avital

This comment has been minimized.

Contributor

avital commented May 10, 2013

Much longer.
On May 10, 2013 6:10 AM, "Andrew Wilcox" notifications@github.com wrote:

Gmail has a similar timeout

30 sec or 5 min?


Reply to this email directly or view it on GitHubhttps://github.com//issues/696#issuecomment-17719124
.

@awwx

This comment has been minimized.

Contributor

awwx commented May 10, 2013

I'm sorry, I'm not clear on what you're saying... it's OK to have a bad mobile experience on Meteor because gmail is also bad? :) Or that you intend to implement gmail's "Try Now" UI, which while crappy, at least makes it possible for the user to reconnect?

If you want to have a seamless "just works" experience without user intervention, 30 seconds is about as long as you can wait before the user starts to wonder why the application isn't working when they've walked back into range of the router and other sites are working. 30 seconds is also the recovery time I noticed when I was trying out some of the other real time services, though my testing was informal.

On the other hand, maybe it's not your intention to provide the same level of real-time service on *.meteor.com (for free :-) ? And perhaps you'll have a sample "reconnect now" UI, similar to the accounts-ui package? Which is also fine, though to close out this issue I'd suggest documenting it somewhere, whether in the wiki or on the roadmap or wherever.

@awwx

This comment has been minimized.

Contributor

awwx commented May 10, 2013

Oops, I think I may be guilty of projecting my own goals onto Meteor ^_^

@avital

This comment has been minimized.

Contributor

avital commented May 10, 2013

All I am saying is that if we were to support 30 second timeouts we would have to make sure to do something intelligent on the server to make sure we don't overload our severs in certain failure cases. We might want to end up doing that, but we don't have that at the moment. GMail and Asana do similar things, for similar reasons as far as I can tell.

I think a package that shows an overlay when you're disconnected, with a timer and a "reconnect now" button is a great idea. Would you like to build that? Alternatively, you're welcome to add this to your list of issues on mobile. I don't think such a thing should be on our roadmap as the roadmap is intended to be at a much higher level.

@awwx

This comment has been minimized.

Contributor

awwx commented May 11, 2013

I think a package that shows an overlay when you're disconnected, with a timer and a "reconnect now" button is a great idea. Would you like to build that?

That'd be easy to write with Meteor.status and Meteor.reconnect, and I think would make a great first project for someone wanting to get started writing smart packages.

When I said I was "guilty of projecting my own goals onto Meteor" what I meant was that personally for my own projects I would prefer something that was better than gmail along this particular dimension. But that's something I could buy if I wanted to: I could run my own server, or see if one of the commercial messaging services will do what I want. "As good as gmail" is a fine goal (awesome, actually) for a service which is a) free and b) an order of magnitude easier to implement with. Sorry for the rant! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment