Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Longpoll fallback is preserved after server restart and browser reload #5741

Closed
clarkware opened this issue Mar 4, 2024 · 25 comments
Closed

Comments

@clarkware
Copy link
Contributor

clarkware commented Mar 4, 2024

Environment

  • Elixir version (elixir -v): 1.16.1
  • Phoenix version (mix deps): 1.7.11
  • Operating system: macOS Sonoma 14.3.1

Actual behavior

In development, when the browser is on a LiveView page and the server is restarted, the longpoll fallback kicks in. However, reloading the browser page doesn't attempt to reconnect a websocket since the phx:longpoll key is set to true in sessionStorage.

Expected behavior

I'm not sure if the behavior above is expected or not. I found it confusing during development since you often restart the server for one reason or another, and expect things to continue working over websockets.

@mainlymortal
Copy link
Contributor

mainlymortal commented Mar 4, 2024

I have the exact same issue but assumed the default transport was mistakenly changed to longpoll instead of websocket.

A member on the elixir forum also has this issue by the looks of it as they can only connect via websocket by opening a new tab.

https://elixirforum.com/t/liveview-falls-back-to-longpoll-after-dev-server-restart/61736

@SteffenDE
Copy link
Contributor

You could do something like this

longPollFallbackMs: location.host.startsWith("localhost") ? null : 2500,

to only configure longpoll when not in development.

@josevalim
Copy link
Member

Perhaps we should change the built in generators as well.

@mainlymortal
Copy link
Contributor

Would this issue still not happen in production though if a server a client is connected to happens to go down for any reason. The client will be stuck on long polling until they open a new browser tab / window.

@chrismccord
Copy link
Member

We still fallback when websocket drops and fails to connect within the failback threshold, but we now only memorize the fallback for the session if the websocket transport never successfully established a connection. This strikes a nice balance between saving the user from moving to a network that perm fails websockets (but their tab previously had a good websocket) and unnecessarily memorizing for the duration of the session.

@clarkware
Copy link
Contributor Author

That works nicely. It also seems to have resolved a spurious Presence leave event I was getting which I'm guessing was related to memorizing the fallback.

Thanks for the quick fix!

@bcardarella
Copy link
Contributor

bcardarella commented Apr 2, 2024

What exactly is the rationale for memoizing the transport to begin with? There could be any number of temporary reasons for the websocket to fail. Memoizing the transport now makes longpoll semi-permenant. I understand wanting to ensure faster reconnects by avoiding a WS connection that may not be working but from what I've seen today there doesn't seem to be any way to restore the original transport method once the WS issue is resolved other than nuke client state entirely.

@bcardarella
Copy link
Contributor

bcardarella commented Apr 2, 2024

So to better articulate what I ran into today: I was working on a LiveView app and mistakenly introduced a change to the url in app.js that the LiveSocket connects to. (i.e. LiveSocket('foo')) This now set the transport to LongPoll because the WS failed to connect and I couldn't get out of it without clearing the app storage. The LongPoll transport method was set despite LongPoll failing to connect as well.

This happened again later when I had a breakpoint set which I'm guessing allowed the longPollFallbackMs to expire and now LongPoll was set again.

It would be better if the app's current version was respected. For example, if the app issues a LiveReload event to the client then the memoized transport should probably be ignored in favor of a re-try of the application's declared preferred transport method. For production, if a new deploy happens bust that cache and try WS again. Maybe this is already happening for prod, I'm not sure as I was just working in dev.

@chrismccord
Copy link
Member

What exactly is the rationale for memoizing the transport to begin with? There could be any number of temporary reasons for the websocket to fail.

Precisely :)

Some "corporate proxies" or who knows what networking infra will 101 the ws upgrade, then drop all the traffic on the floor. So there is no reliable way to know whether we have a good websocket or not other than trying to open one, wait for the onopen, then attempt to push a message and get an ack back (socket.ping). If we don't get an ack within our expected time, we fallback to longpoll and we memoize in sessionStorage because the fallback timeout (say 2.5-3s) would be incurred for every visit in one of these obscure ws failing scenarios. So it makes sense to avoid probing for ws health every time. Note that we only memoize if we never previously had a good ws to begin with. For dev, you could conditionally pass the fallback based on localhost:

let liveSocket = new LiveSocket("/live", Socket, {
  longPollFallbackMs: location.host.startsWith("localhost") ? undefined : 2500,
  params: {_csrf_token: csrfToken}
})

@bcardarella
Copy link
Contributor

@chrismccord thoughts my my proposal of a new versin of the app opting you back into a retry on the WS? I would also argue that if a phoenix app is behind a proxy and causing WS issues that shouldn't penalize everybody else that may have temporary WS issues by setting customer's transport over to longpoll

@chrismccord
Copy link
Member

No one gets penalized. As far as the user (and even the dev in 99% of the cases are concerned) it is an impl detail. Again we only fallback if ws never worked or took too long. The localhost snippet saves dev from falling back (such as maybe a long compile time with the code reloader) and in prod you want it to fallback and stay there if ws was never heatlhy for the best experience.

@bcardarella
Copy link
Contributor

The default as I understand it is to fallback permanently (until app cache clears) to LongPoll. Is that incorrect?

@chrismccord
Copy link
Member

for the browser session, only if ws never worked for that page load

@bcardarella
Copy link
Contributor

bcardarella commented Apr 3, 2024

But there's no way for the JS to determine if the WS "never worked" is temporary or permanent. If the WS connection failure is a temporary problem due to a bad deploy or some network issue wouldn't we want to favor the WS connection in the future when it is available?

@bcardarella
Copy link
Contributor

bcardarella commented Apr 3, 2024

To better illustrate the concern, the LongPoll connection is a downgraded experience. I get that providing a working app is a better experience than a non-working app. But a slow app is arguably worse than a non-working app. This effectively puts customers into an unrecoverable performance hit state with no way from the server to upgrade them out of it. Telling all of your customers to clear their application cache to make the website fast again seems like a silly ask to make.

@chrismccord
Copy link
Member

Right – the logic is tricky and hard to build a simple yet robust solution. If you want to attempt ws again in the near future, you either need to do so in the bg then replace the LP transport, but by doing so you bounce the connection/state, or you incur the latency in attempting to establish the ws connection up to the fallback time. The simplest solution for good UX we can provide is we fallback to LP for the browser session, which means we'll try again later at some point in the future while giving the user a working app. Anything more complex is probably best left to the dev and their needs.

It's also worth mentioning LP transport is fully featured for LV, so for users it's not an issue. In fact, on https, you avoid an extra roundtrip for TLS handshake so it can be faster initial connection in some cases.

@bcardarella
Copy link
Contributor

the logic is tricky and hard to build a simple yet robust solution

Let it fail? This is why I suggested that if a new deploy happens that should be the opportunity for a retry of the WS connection rather than the memoized transport method.

@SteffenDE
Copy link
Contributor

It may be worth to mention again that the fallback is only stored in the sessionStorage. If a user closes the LV tab a WS connection will be tried on the next visit.

@bcardarella
Copy link
Contributor

@SteffenDE oh I'm sorry I overlooked that detail. If that's the case then I withdraw from the debate 😄

@chrismccord
Copy link
Member

Getting a "Version" of the app is doable, but non trivial, but it doesn't really solve anything in this case for users. A new deploy is not a make or break event as far as the user's websocket working or not working is concerned. If you want a more aggressive retry, you can implement it yourself in the same way we do (we use public APIs), but tying it to deploys won't help users. For dev, avoiding fallback on localhost is a sane choice and the phoenix team has previously discussed doing that for the generated app.js, but in general it simply doesn't matter which transport the user lands on :)

@bcardarella
Copy link
Contributor

bcardarella commented Apr 3, 2024

FWIW we're trying to mimic exactly what LiveView does in LiveView Native. That includes the channels connection and managing the connection through downgrades when necessary. IMO LVN shouldn't be deviating and doing our own thing that may be different and unexpected from what LiveView itself is doing just because I have a different opinion. Our design goal is consistency so if something happens for a reason with LiveView it should happen for the same reason with LVN. But if the current behavior only limits the memoization to the current browser tab that is good enough for me to justify us not memoizing the transport between native application instances. Otherwise we'd be in a far more problematic place because clearing those caches in native is far more difficult.

@acrogenesis
Copy link

acrogenesis commented Apr 4, 2024

When using websockets without a browser (graphql subscriptions) the socket fails with the error:
Cannot read properties of undefined (reading 'sessionStorage').

Setting longpollerTimeout: null, still gives the same error.
This only happens in "phoenix": "1.7.11"

@chrismccord
Copy link
Member

You can pass your own sessionStorage impl as an option

@acrogenesis
Copy link

Yes setting sessionStorage: InMemoryStorage, with

class InMemoryStorage {
  storage: {};
  constructor() { this.storage = {} }
  getItem(keyName) { return this.storage[keyName] || null }
  removeItem(keyName) { delete this.storage[keyName] }
  setItem(keyName, keyValue) { this.storage[keyName] = keyValue }
}

worked.
The types package @types/phoenix is outdated which was also giving me issues, removing it made it work.

@slashmili
Copy link
Contributor

slashmili commented Apr 29, 2024

I had the same problem using phoenix 1.7.11 in dev env. I was scratching my head, trying to understand why did it fallback to longpoll in my localhost.

I understand that it's a complex subject and maybe not worth to change for sake of dev env. However I'd suggest to document it somewhere. not sure where is the right place though.

Just the fact that why it happened in dev and I can reset it by deleting phx:longpoll was enough for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants