Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LiveView + Caddy does not handle server restarts gracefully #2544

Closed
kevinlang opened this issue Mar 21, 2023 · 4 comments
Closed

LiveView + Caddy does not handle server restarts gracefully #2544

kevinlang opened this issue Mar 21, 2023 · 4 comments

Comments

@kevinlang
Copy link

kevinlang commented Mar 21, 2023

Environment

  • Elixir version (elixir -v): Erlang/OTP 25 [erts-13.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Elixir 1.14.3 (compiled with Erlang/OTP 25)

  • Phoenix version (mix deps): 1.7.1
  • Phoenix LiveView version (mix deps): 0.18.18
  • Operating system: Ubuntu 22.04 / macOs
  • Browsers you attempted to reproduce this bug on (the more the merrier): Firefox, Chrome
  • Does the problem persist after removing "assets/node_modules" and trying again? Yes/no: Yes

Actual behavior

This bug has the same root cause as #2421 .

  1. Set up Caddy to reverse proxy Phoenix w/ LiveView
  2. Run caddy reload --force or similar
  3. Websocket connection is lost and not re-established by the web client

Here is a simple way to repo this:

  1. Load up fresh live view application in dev (e.g., bound on port 4000)
  2. Run caddy reverse-proxy --from :4001 --to :4000
  3. Connect to localhost:4001
  4. Close and restart the caddy reverse-proxy process
  5. Observe that the websocket connection is lost and not restablished.

Expected behavior

The websocket connection should be established.

Root cause

This is because when Caddy goes down or reloads/restarts, it closes all websocket connections with a status code of 1001.

https://github.com/caddyserver/caddy/blob/90798f3eea6b7a219a9bb55f680919c0504263e3/modules/caddyhttp/reverseproxy/streaming.go#L254

This results in the the web client JS code to unload and no longer retry to re-establish a connection.

https://github.com/phoenixframework/phoenix_live_view/blob/main/assets/js/phoenix_live_view/live_socket.js#L504-L509

This issue was hit previously by Bandit in #2421 , and the solution there was for Bandit to return 1000 in conformance with how Cowboy works - but I don't think that is a correct solution to the root cause.

I don't view Caddy returning 1001 in this scenario to be incorrect. Per the RFC:

1001 indicates that an endpoint is "going away", such as a server
going down or a browser having navigated away from a page.

Caddy is indeed closing the connection to "go down" in this scenario, so 1001 seems like a reasonable code to give.

@kevinlang
Copy link
Author

The fix may be the code block proposed in this comment:

#2421 (comment)

@kevinlang
Copy link
Author

Some more digging today:

  1. Caddy uses websocket code 1001 when shutting down
  2. Firefox uses websocket code 1001 when navigating away
  3. Both are in conformance to the vague language of the RFC
  4. LiveView currently assumes 1001 always means (2) and unloads in that scenario

I can't think of a good solution on the LiveView side for handling this ambiguity, unfortunately. Stuck between a rock in a hard place.

For our own app, we will explore switching to nginx to work around this issue. Separate from this issue, we were considering doing this anyway since Caddy does not support graceful config reloads for websockets, impacting our blue-green deployment strategy, as it will close all active websocket connections everytime the config changes. See caddyserver/caddy#3143 .

@carterbryden
Copy link

I'm also running into this issue, which unfortunately happens whenever Caddy's config is updated/reloaded.

It results in the UI showing an Internet Disconnected error, but the livesocket debug doesn't show a disconnected event. It only shows the liveview being destroyed.

If it's the 1001 code that's the issue, is there something else we could compare against or check to see if we're actually navigating away?

@SteffenDE
Copy link
Collaborator

This should be fixed by 953a862.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants