Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot successfully connect presence on selfhosted when using a reverse proxy to handle SSL. #993

Open
2 tasks done
Destreyf opened this issue May 18, 2024 · 12 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@Destreyf
Copy link

Destreyf commented May 18, 2024

Bug report

  • I confirm this is a bug with Supabase, not with my own application.
  • I confirm I have searched the Docs, GitHub Discussions, and Discord.

Describe the bug

I am unable to successfully run the supabase/realtime server behind an nginx-proxy nor the AWS ALB, I have other websocket applications deployed and working behind both of these environments.

If I connect to the server directly via ip:port it works just fine, however when I connect over https using the load balancer endpoint I get the following logs.

<domain>  | ** (UndefinedFunctionError) function RealtimeWeb.RealtimeChannel.handle_out/3 is undefined or private
<domain>  |     (realtime 2.28.40) RealtimeWeb.RealtimeChannel.handle_out("presence_diff", %{joins: %{"d6d2b22b-8472-4088-8e0e-4bc6793e2d94" => %{metas: [%{:phx_ref => "F9Cd_xKcHtJMugXS", "state" => "online"}]}}, leaves: %{}}, %Phoenix.Socket{assigns: %{access_token: "<redacted>", ack_broadcast: false, channel_name: "presence", claims: %{"exp" => 1716073957, "role" => "anon", "sid" => "d6d2b22b-8472-4088-8e0e-4bc6793e2d94"}, confirm_token_ref: #Reference<0.2234418692.2823290882.151704>, db_conn: #PID<0.3294.0>, headers: [{"x-forwarded-for", "<my-ip>"}, {"x-forwarded-proto", "https"}, {"x-forwarded-scheme", "https"}, {"x-real-ip", "<my-ip>"}], is_new_api: true, jwt_jwks: nil, jwt_secret: "<redacted>>", limits: %{max_bytes_per_second: 100000, max_channels_per_client: 100, max_concurrent_users: 1000, max_events_per_second: 100, max_joins_per_second: 100}, log_level: :error, pg_change_params: [], pg_sub_ref: nil, policies: nil, postgres_cdc_module: Extensions.PostgresCdcRls, postgres_extension: %{"db_host" => "<redacted>", "db_name" => "<redacted>", "db_password" => "<redacted>", "db_port" => "auevoqDKvsPBm+i/ssWgjw==", "db_user" => "7AAfEqbva4oVA0swrz+Qkg==", "poll_interval_ms" => 100, "poll_max_changes" => 100, "poll_max_record_bytes" => 1048576, "publication" => "supabase_realtime", "region" => "us-west-2", "slot_name" => "supabase_realtime_replication_slot", "ssl_enforced" => true}, presence_key: "d6d2b22b-8472-4088-8e0e-4bc6793e2d94", public?: true, rate_counter: %Realtime.RateCounter{id: {:channel, :events, "realtime"}, avg: 1.1818181818181819, bucket: [0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 0, 2, 0], max_bucket_len: 60, tick: 1000, tick_ref: #Reference<0.2234418692.2823290881.177652>, idle_shutdown: :infinity, idle_shutdown_ref: nil, telemetry: %{emit: true, event_name: [:realtime, :rate_counter, :channel, :events], measurements: %{limit: 100, sum: 0}, metadata: %{id: {:channel, :events, "realtime"}, tenant: "realtime"}}}, self_broadcast: false, tenant: "realtime", tenant_token: "<redacted>", tenant_topic: "realtime:presence", using_broadcast?: true}, channel: RealtimeWeb.RealtimeChannel, channel_pid: #PID<0.3454.0>, endpoint: RealtimeWeb.Endpoint, handler: RealtimeWeb.UserSocket, id: "user_socket:realtime", joined: true, join_ref: "40", private: %{log_handle_in: :info, log_join: :info}, pubsub_server: Realtime.PubSub, ref: nil, serializer: Phoenix.Socket.V1.JSONSerializer, topic: "realtime:presence", transport: :websocket, transport_pid: #PID<0.3327.0>})
<domain>  |     (phoenix 1.7.7) lib/phoenix/channel/server.ex:338: Phoenix.Channel.Server.handle_info/2
<domain>  |     (stdlib 4.3) gen_server.erl:1123: :gen_server.try_dispatch/4
<domain>  |     (stdlib 4.3) gen_server.erl:1200: :gen_server.handle_msg/6
<domain>  |     (stdlib 4.3) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
<domain>  | Last message: %Phoenix.Socket.Broadcast{topic: "realtime:presence", event: "presence_diff", payload: %{joins: %{"d6d2b22b-8472-4088-8e0e-4bc6793e2d94" => %{metas: [%{:phx_ref => "F9Cd_xKcHtJMugXS", "state" => "online"}]}}, leaves: %{}}}
<domain>  | State: %Phoenix.Socket{assigns: %{access_token: "<redacted>", ack_broadcast: false, channel_name: "presence", claims: %{"exp" => 1716073957, "role" => "anon", "sid" => "d6d2b22b-8472-4088-8e0e-4bc6793e2d94"}, confirm_token_ref: #Reference<0.2234418692.2823290882.151704>, db_conn: #PID<0.3294.0>, headers: [{"x-forwarded-for", "<my-ip>"}, {"x-forwarded-proto", "https"}, {"x-forwarded-scheme", "https"}, {"x-real-ip", "<my-ip>"}], is_new_api: true, jwt_jwks: nil, jwt_secret: "<redacted>>", limits: %{max_bytes_per_second: 100000, max_channels_per_client: 100, max_concurrent_users: 1000, max_events_per_second: 100, max_joins_per_second: 100}, log_level: :error, pg_change_params: [], pg_sub_ref: nil, policies: nil, postgres_cdc_module: Extensions.PostgresCdcRls, postgres_extension: %{"db_host" => "<redacted>", "db_name" => "<redacted>", "db_password" => "<redacted>", "db_port" => "auevoqDKvsPBm+i/ssWgjw==", "db_user" => "7AAfEqbva4oVA0swrz+Qkg==", "poll_interval_ms" => 100, "poll_max_changes" => 100, "poll_max_record_bytes" => 1048576, "publication" => "supabase_realtime", "region" => "us-west-2", "slot_name" => "supabase_realtime_replication_slot", "ssl_enforced" => true}, presence_key: "d6d2b22b-8472-4088-8e0e-4bc6793e2d94", public?: true, rate_counter: %Realtime.RateCounter{id: {:channel, :events, "realtime"}, avg: 1.1818181818181819, bucket: [0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 0, 2, 0], max_bucket_len: 60, tick: 1000, tick_ref: #Reference<0.2234418692.2823290881.177652>, idle_shutdown: :infinity, idle_shutdown_ref: nil, telemetry: %{emit: true, event_name: [:realtime, :rate_counter, :channel, :events], measurements: %{limit: 100, sum: 0}, metadata: %{id: {:channel, :events, "realtime"}, tenant: "realtime"}}}, self_broadcast: false, tenant: "realtime", tenant_token: "<redacted>", tenant_topic: "realtime:presence", using_broadcast?: true}, channel: RealtimeWeb.RealtimeChannel, channel_pid: #PID<0.3454.0>, endpoint: RealtimeWeb.Endpoint, handler: RealtimeWeb.UserSocket, id: "user_socket:realtime", joined: true, join_ref: "40", private: %{log_handle_in: :info, log_join: :info}, pubsub_server: Realtime.PubSub, ref: nil, serializer: Phoenix.Socket.V1.JSONSerializer, topic: "realtime:presence", transport: :websocket, transport_pid: #PID<0.3327.0>}

The subscribe call emits a SUBSCRIBED then CHANNEL_ERROR state on the subscribe handler.

subscribe state SUBSCRIBED
subscribe state CHANNEL_ERROR

This is effectively the boilerplate demo found in the realtime-js repo: https://github.com/supabase/realtime-js?tab=readme-ov-file#presence

My nginx setup is using nginx-proxy-manager for testing, ideally I would use the ALB in aws, here's my config file for that

# ------------------------------------------------------------
# realtime.<domain>
# ------------------------------------------------------------

map $scheme $hsts_header {
  https   "max-age=63072000; preload";
}

server {
  set $forward_scheme http;
  set $server         "localhost";
  set $port           4000;

  listen 80;
  listen [::]:80;

  listen 443 ssl;
  listen [::]:443 ssl;

  server_name realtime.<domain>;

  location ^~ /.well-known/acme-challenge/ {
    auth_basic off;
    auth_request off;
    allow all;

    default_type "text/plain";

    root /data/letsencrypt-acme-challenge;
  }

  location = /.well-known/acme-challenge/ {
    return 404;
  }
  
  ssl_session_timeout 5m;
  ssl_session_cache shared:SSL:50m;

  # intermediate configuration. tweak to your needs.
  ssl_protocols TLSv1.2 TLSv1.3;
  ssl_ciphers 'ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384';
  ssl_prefer_server_ciphers off;

  ssl_certificate /etc/letsencrypt/live/npm-1/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/npm-1/privkey.pem;

  # Force SSL
  set $test "";
  if ($scheme = "http") {
    set $test "H";
  }
  if ($request_uri = /.well-known/acme-challenge/test-challenge) {
    set $test "${test}T";
  }
  if ($test = H) {
    return 301 https://$host$request_uri;
  }

  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection $http_connection;
  proxy_http_version 1.1;

  access_log /data/logs/proxy-host-1_access.log proxy;
  error_log /data/logs/proxy-host-1_error.log warn;

  location / {
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $http_connection;
    proxy_http_version 1.1;

    # Proxy!
    add_header       X-Served-By $host;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Scheme $scheme;
    proxy_set_header X-Forwarded-Proto  $scheme;
    proxy_set_header X-Forwarded-For    $proxy_add_x_forwarded_for;
    proxy_set_header X-Real-IP          $remote_addr;
    proxy_pass       $forward_scheme://$server:$port$request_uri;
  }
}

image

I duplicated my "realtime" config to "52" so I can test connecting via IP, which works but obviously isn't through the reverse proxy.

System information

  • OS: linux (aarch64)
  • Browser: chrome, safari, firefox
  • Version of realtime-js: 2.9.5
  • Version of realtime: v2.28.40
@Destreyf Destreyf added the bug Something isn't working label May 18, 2024
@Destreyf
Copy link
Author

This is what my "messages" tab shows in the network traffic for this connection:

{"topic":"realtime:presence","event":"phx_join","payload":{"config":{"broadcast":{"ack":false,"self":false},"presence":{"key":"d6d2b22b-8472-4088-8e0e-4bc6793e2d94"},"postgres_changes":[]},"access_token":"<token>"},"ref":"1","join_ref":"1"}

{"event":"phx_reply","payload":{"response":{"postgres_changes":[]},"status":"ok"},"ref":"1","topic":"realtime:presence"}

{"topic":"realtime:presence","event":"access_token","payload":{"access_token":"<token>"},"ref":"2","join_ref":"1"}

{"topic":"realtime:presence","event":"presence","payload":{"type":"presence","event":"track","payload":{"state":"online"}},"ref":"3","join_ref":"1"}

{"event":"presence_state","payload":{},"ref":null,"topic":"realtime:presence"}

{"event":"presence_diff","payload":{"joins":{"d6d2b22b-8472-4088-8e0e-4bc6793e2d94":{"metas":[{"phx_ref":"F9Cf3GfuHXxMuhIR","state":"online"}]}},"leaves":{}},"ref":null,"topic":"realtime:presence"}

{"event":"phx_reply","payload":{"response":{},"status":"ok"},"ref":"3","topic":"realtime:presence"}

{"event":"phx_error","payload":{},"ref":"1","topic":"realtime:presence"}

This set of messages repeats infinitely when connecting to the load balancer.

@Destreyf
Copy link
Author

Destreyf commented Jun 3, 2024

Any chance I can get some help looking through this?

I even went through the process of directly doing SSL on the server by passing a custom runtime.exs in and linking my certificates.

If I connect to the IP directly on port 4000 it works great, but if I connect to my ssl port 4433 I get the exact same errors even though I am no longer behind a proxy.

@filipecabaco
Copy link
Contributor

Hi @Destreyf do you see the same error when you try to broadcast a message?

I suspect that something might be blocking the socket upgrade from HTTP to WS, namely in the location as we use wss://<host>/socket

@Destreyf
Copy link
Author

I ended up dropping supabase realtime, I was simply trying to use it to sync presence between users, I wrote my own implementation and was able to deploy it behind the same load balancers with zero issue, using the exact same host/path combination.

The websocket was upgrading just fine, and was sending/receiving messages, you can see the messages in my comment here: #993 (comment)

During my last attempt I listened to SSL directly on port 4433 using supabase realtime, no proxy, no firewalls, nothing, there was literally zero infrastructure/routing between the supabase realtime server and the clients.

@filipecabaco
Copy link
Contributor

And were you able to see if there was any Realtime log errors?

@Destreyf
Copy link
Author

Destreyf commented Jun 17, 2024

The errors are included in my initial post here... the gen server claims the function/handler does not exist, at which point the websocket receives the phx_error payload.

<domain>  | ** (UndefinedFunctionError) function RealtimeWeb.RealtimeChannel.handle_out/3 is undefined or private
<domain>  |     (realtime 2.28.40) RealtimeWeb.RealtimeChannel.handle_out("presence_diff", %{joins: %{"d6d2b22b-8472-4088-8e0e-4bc6793e2d94" => %{metas: [%{:phx_ref => "F9Cd_xKcHtJMugXS", "state" => "online"}]}}, leaves: %{}}, %Phoenix.Socket{assigns: %{access_token: "<redacted>", ack_broadcast: false, channel_name: "presence", claims: %{"exp" => 1716073957, "role" => "anon", "sid" => "d6d2b22b-8472-4088-8e0e-4bc6793e2d94"}, confirm_token_ref: #Reference<0.2234418692.2823290882.151704>, db_conn: #PID<0.3294.0>, headers: [{"x-forwarded-for", "<my-ip>"}, {"x-forwarded-proto", "https"}, {"x-forwarded-scheme", "https"}, {"x-real-ip", "<my-ip>"}], is_new_api: true, jwt_jwks: nil, jwt_secret: "<redacted>>", limits: %{max_bytes_per_second: 100000, max_channels_per_client: 100, max_concurrent_users: 1000, max_events_per_second: 100, max_joins_per_second: 100}, log_level: :error, pg_change_params: [], pg_sub_ref: nil, policies: nil, postgres_cdc_module: Extensions.PostgresCdcRls, postgres_extension: %{"db_host" => "<redacted>", "db_name" => "<redacted>", "db_password" => "<redacted>", "db_port" => "auevoqDKvsPBm+i/ssWgjw==", "db_user" => "7AAfEqbva4oVA0swrz+Qkg==", "poll_interval_ms" => 100, "poll_max_changes" => 100, "poll_max_record_bytes" => 1048576, "publication" => "supabase_realtime", "region" => "us-west-2", "slot_name" => "supabase_realtime_replication_slot", "ssl_enforced" => true}, presence_key: "d6d2b22b-8472-4088-8e0e-4bc6793e2d94", public?: true, rate_counter: %Realtime.RateCounter{id: {:channel, :events, "realtime"}, avg: 1.1818181818181819, bucket: [0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 0, 2, 0], max_bucket_len: 60, tick: 1000, tick_ref: #Reference<0.2234418692.2823290881.177652>, idle_shutdown: :infinity, idle_shutdown_ref: nil, telemetry: %{emit: true, event_name: [:realtime, :rate_counter, :channel, :events], measurements: %{limit: 100, sum: 0}, metadata: %{id: {:channel, :events, "realtime"}, tenant: "realtime"}}}, self_broadcast: false, tenant: "realtime", tenant_token: "<redacted>", tenant_topic: "realtime:presence", using_broadcast?: true}, channel: RealtimeWeb.RealtimeChannel, channel_pid: #PID<0.3454.0>, endpoint: RealtimeWeb.Endpoint, handler: RealtimeWeb.UserSocket, id: "user_socket:realtime", joined: true, join_ref: "40", private: %{log_handle_in: :info, log_join: :info}, pubsub_server: Realtime.PubSub, ref: nil, serializer: Phoenix.Socket.V1.JSONSerializer, topic: "realtime:presence", transport: :websocket, transport_pid: #PID<0.3327.0>})
<domain>  |     (phoenix 1.7.7) lib/phoenix/channel/server.ex:338: Phoenix.Channel.Server.handle_info/2
<domain>  |     (stdlib 4.3) gen_server.erl:1123: :gen_server.try_dispatch/4
<domain>  |     (stdlib 4.3) gen_server.erl:1200: :gen_server.handle_msg/6
<domain>  |     (stdlib 4.3) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
<domain>  | Last message: %Phoenix.Socket.Broadcast{topic: "realtime:presence", event: "presence_diff", payload: %{joins: %{"d6d2b22b-8472-4088-8e0e-4bc6793e2d94" => %{metas: [%{:phx_ref => "F9Cd_xKcHtJMugXS", "state" => "online"}]}}, leaves: %{}}}
<domain>  | State: %Phoenix.Socket{assigns: %{access_token: "<redacted>", ack_broadcast: false, channel_name: "presence", claims: %{"exp" => 1716073957, "role" => "anon", "sid" => "d6d2b22b-8472-4088-8e0e-4bc6793e2d94"}, confirm_token_ref: #Reference<0.2234418692.2823290882.151704>, db_conn: #PID<0.3294.0>, headers: [{"x-forwarded-for", "<my-ip>"}, {"x-forwarded-proto", "https"}, {"x-forwarded-scheme", "https"}, {"x-real-ip", "<my-ip>"}], is_new_api: true, jwt_jwks: nil, jwt_secret: "<redacted>>", limits: %{max_bytes_per_second: 100000, max_channels_per_client: 100, max_concurrent_users: 1000, max_events_per_second: 100, max_joins_per_second: 100}, log_level: :error, pg_change_params: [], pg_sub_ref: nil, policies: nil, postgres_cdc_module: Extensions.PostgresCdcRls, postgres_extension: %{"db_host" => "<redacted>", "db_name" => "<redacted>", "db_password" => "<redacted>", "db_port" => "auevoqDKvsPBm+i/ssWgjw==", "db_user" => "7AAfEqbva4oVA0swrz+Qkg==", "poll_interval_ms" => 100, "poll_max_changes" => 100, "poll_max_record_bytes" => 1048576, "publication" => "supabase_realtime", "region" => "us-west-2", "slot_name" => "supabase_realtime_replication_slot", "ssl_enforced" => true}, presence_key: "d6d2b22b-8472-4088-8e0e-4bc6793e2d94", public?: true, rate_counter: %Realtime.RateCounter{id: {:channel, :events, "realtime"}, avg: 1.1818181818181819, bucket: [0, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 0, 2, 0], max_bucket_len: 60, tick: 1000, tick_ref: #Reference<0.2234418692.2823290881.177652>, idle_shutdown: :infinity, idle_shutdown_ref: nil, telemetry: %{emit: true, event_name: [:realtime, :rate_counter, :channel, :events], measurements: %{limit: 100, sum: 0}, metadata: %{id: {:channel, :events, "realtime"}, tenant: "realtime"}}}, self_broadcast: false, tenant: "realtime", tenant_token: "<redacted>", tenant_topic: "realtime:presence", using_broadcast?: true}, channel: RealtimeWeb.RealtimeChannel, channel_pid: #PID<0.3454.0>, endpoint: RealtimeWeb.Endpoint, handler: RealtimeWeb.UserSocket, id: "user_socket:realtime", joined: true, join_ref: "40", private: %{log_handle_in: :info, log_join: :info}, pubsub_server: Realtime.PubSub, ref: nil, serializer: Phoenix.Socket.V1.JSONSerializer, topic: "realtime:presence", transport: :websocket, transport_pid: #PID<0.3327.0>}

edit: meant the handler_out/3 reference, not process, it's been a long day.

@filipecabaco
Copy link
Contributor

🤦 sorry late on my timezone

got it thank you for the detailed report and we will open PR to fix it 🙏

@Destreyf
Copy link
Author

I get it, it's been a long day for myself as well.

My original deployment was running on aarch64 (linux/arm64), but I tested this on an x86/64bit machine as well to rule that out.

@filipecabaco
Copy link
Contributor

It's really interesting that it's hitting a handle_out pattern matching error 🤔 will investigate it.

thank you again for the detailed information

@barrownicholas
Copy link
Contributor

barrownicholas commented Jun 18, 2024

Here's the NGINX config we use for reverse proxy (I'll discuss SSL offloading in afterwards):

server {
    listen 80;
    listen [::]:80;
    server_name supabase.ourdomain.com; # change this
    client_max_body_size 20M;
    large_client_header_buffers 4 32k;

    location / {
        proxy_pass http://127.0.0.1:8000; # this is where supabase Kong is running
        proxy_buffering off;
        proxy_redirect off;
        proxy_read_timeout 86400; # necessary to avoid websocket timeout disconnect
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-Host $host;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Authorization $http_authorization;
        proxy_buffer_size 32k;
        proxy_buffers 8 32k;
    }
}

You can manually add SSL, but honestly, Certbot and Lets Encrypt is the way to automate things in production environments.

sudo apt-get install -y certbot python3-certbot-nginx
sudo certbot --nginx --non-interactive --agree-tos -m email@example.com -d yourdomain.com -d supabase.yourdomain.com --expand

This alone should get you going (we use this exactly in production now)

Edit:

A bit more explanation: Certbot uses the NGINX config from above and automatically adds everything needed to offload the SSL encryption. After running the above commands, you can see how Certbot adds/modifies your site-enabled .conf if you want extra insight.

@Destreyf
Copy link
Author

@barrownicholas I had that exact config at one point as well, it did not work and threw the exact same error messages.

The error itself has nothing to do with the proxy layer either as it happened when connecting directly to the realtime service over https with my custom runtime.exs file.

Something to note is that I did not use kong, I was only using reatime by itself.

@filipecabaco
Copy link
Contributor

I agree with @Destreyf , from the error it really seems that Realtime is receiving some payload that is not expected and fails to pattern match so I need to check the error message and understand what payload could be causing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants