Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not works with Amazon Aurora #270

Closed
mats16 opened this issue Aug 2, 2022 · 16 comments
Closed

Not works with Amazon Aurora #270

mats16 opened this issue Aug 2, 2022 · 16 comments
Labels
bug Something isn't working p4 Priority 4

Comments

@mats16
Copy link
Contributor

mats16 commented Aug 2, 2022

Bug report

Describe the bug

The supabase-realtime server does not work with Amazon Aurora PostgreSQL.

To Reproduce

  1. Change parameters for replications
      parameters: {
        'rds.logical_replication': '1',
        'max_replication_slots': '5',
        'max_wal_senders': '10',
        'max_logical_replication_workers': '2',
        'max_slot_wal_keep_size': '1024',
      },
  1. Create PostgreSQL Cluster (v14.3)

  2. Launch supabase-realtime server and connect DB

  3. The following log is displayed

2022-08-02 00:41:28.315 [info] Running RealtimeWeb.Endpoint with cowboy 2.8.0 at :::4000 (http)

2022-08-02 00:41:28.316 [info] Access RealtimeWeb.Endpoint at http://localhost:4000

2022-08-02 00:42:28.341 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}

2022-08-02 00:42:28.377 [error] %Postgrex.Error{connection_id: 16419, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}

Expected behavior

It works well, because it does not rely on extensions.

Screenshots

N/A

System information

  • Version of supabase-realtime: v0.22.7
  • Version of PostgreSQL: 14.3

Additional context

Add any other context about the problem here.

@mats16 mats16 added the bug Something isn't working label Aug 2, 2022
@w3b6x9
Copy link
Member

w3b6x9 commented Aug 2, 2022

@mats16 did you make sure you have everything configured according to instructions found here: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Replication.Logical.html#AuroraPostgreSQL.Replication.Logical.Configure

@mats16
Copy link
Contributor Author

mats16 commented Aug 2, 2022

Thanks! I already checked it...

I had created supabase_admin as rds_superuser, but it does not work.

@w3b6x9
Copy link
Member

w3b6x9 commented Aug 3, 2022

@mats16 can you please share your Realtime server config, if possible?

@mats16
Copy link
Contributor Author

mats16 commented Aug 4, 2022

Thanks.
This is my settings.

ParameterGroup (for Amazon Aurora PostgreSQL 14):

    const parameterGroup = new rds.ParameterGroup(scope, 'ParameterGroup', {
      engine: rds.DatabaseClusterEngine.auroraPostgres({ version: rds.AuroraPostgresEngineVersion.of('14.3', '14') }),
      parameters: {
        'rds.logical_replication': '1',
        'max_replication_slots': '5', // Default Aurora:20, Supabase:5
        'max_wal_senders': '10', // Default Aurora:20, Supabase:10
        'max_logical_replication_workers': '2',
        'autovacuum_max_workers': 'GREATEST({DBInstanceClassMemory/64371566592},2)', // Default: GREATEST({DBInstanceClassMemory/64371566592},3)
        'max_parallel_workers': '4', // Default: GREATEST(${DBInstanceVCPU/2},8)

        'max_slot_wal_keep_size': '1024', // https://github.com/supabase/realtime
      },
    });

Realtime server config as docker-compose.yml:

version: '3'
services:
  realtime:
    image: supabase/realtime:v0.22.7
    ports:
      - 4000:4000
    environment:
      DB_HOST: "xxxxx.cluster-xxxxx.us-west-2.rds.amazonaws.com"
      DB_NAME: postgres
      DB_USER: supabase_admin # it is rds_superuser
      DB_PASSWORD: "xxxx"
      DB_PORT: 5432
      DB_SSL: "false"
      PORT: 4000
      JWT_SECRET: "xxx"
      REPLICATION_MODE: RLS
      REPLICATION_POLL_INTERVAL: 100
      SECURE_CHANNELS: "false"
      SLOT_NAME: realtime_rls
      TEMPORARY_SLOT: "true"
      MAX_REPLICATION_LAG_MB: 1000
    command: >
      bash -c "./prod/rel/realtime/bin/realtime eval Realtime.Release.migrate
      && ./prod/rel/realtime/bin/realtime start"

Test CDK project: https://github.com/mats16/supabase-on-aws

@w3b6x9
Copy link
Member

w3b6x9 commented Aug 6, 2022

@mats16 can you take a look at https://github.com/supabase/realtime/blob/master/dev/postgres/mnt/00-setup.sql and make sure you have everything like a realtime schema, a publication called supabase_realtime, etc.?

@mats16
Copy link
Contributor Author

mats16 commented Aug 7, 2022

@w3b6x9 My referrence is https://github.com/supabase/supabase/blob/master/docker/volumes/db/init/00-initial-schema.sql, so I have already created schema and publication.

I noticed that logs published at intervals of the value of SUBSCRIPTION_SYNC_INTERVAL.

Logs about SUBSCRIPTION_SYNC_INTERVAL: 10000

environment-realtime-1  | 2022-08-07 07:25:39.301 [info] Migrations already up
environment-realtime-1  | 2022-08-07 07:25:46.679 [info] Running RealtimeWeb.Endpoint with cowboy 2.8.0 at :::4000 (http)
environment-realtime-1  | 2022-08-07 07:25:46.680 [info] Access RealtimeWeb.Endpoint at http://localhost:4000
environment-realtime-1  | 2022-08-07 07:25:49.829 [info] tzdata release in place is from a file last modified Wed, 21 Oct 2020 18:40:20 GMT. Release file on server was last modified Wed, 16 Mar 2022 13:36:02 GMT.
environment-realtime-1  | 2022-08-07 07:25:51.029 [info] Tzdata has updated the release from 2020d to 2022a
environment-realtime-1  | 2022-08-07 07:25:56.696 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1  | 2022-08-07 07:25:56.939 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1  | 2022-08-07 07:26:06.711 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1  | 2022-08-07 07:26:07.276 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1  | 2022-08-07 07:26:16.727 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1  | 2022-08-07 07:26:17.667 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1  | 2022-08-07 07:26:26.744 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1  | 2022-08-07 07:26:27.078 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1  | 2022-08-07 07:26:36.760 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1  | 2022-08-07 07:26:37.447 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1  | 2022-08-07 07:26:46.774 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1  | 2022-08-07 07:26:46.858 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1  | 2022-08-07 07:26:56.791 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1  | 2022-08-07 07:26:57.220 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}
environment-realtime-1  | 2022-08-07 07:27:06.806 [error] %RuntimeError{message: "operation :rollback is manually rolling back, which is not supported by Ecto.Multi"}
environment-realtime-1  | 2022-08-07 07:27:07.555 [error] %Postgrex.Error{connection_id: 8062, message: nil, postgres: %{code: :undefined_object, file: "slot.c", line: "418", message: "replication slot \"cloud9_realtime_rls\" does not exist", pg_code: "42704", routine: "ReplicationSlotAcquire", severity: "ERROR", unknown: "ERROR"}, query: nil}

Which query is to confirm connected subscribers and the tables?
I could not find it, also I am not good with Elixir.

@w3b6x9
Copy link
Member

w3b6x9 commented Aug 7, 2022

@mats16 do you see a realtime.schema_migrations table in your database? If you do, what's the latest migration you see?

@mats16
Copy link
Contributor Author

mats16 commented Aug 7, 2022

@w3b6x9 My latest migration version is 20220712093339.

@w3b6x9
Copy link
Member

w3b6x9 commented Aug 7, 2022

@mats16 This should be the culprit:

def sync_subscriptions(params_list) do
.

I do want to back up and mention that SECURE_CHANNELS is required to be true when REPLICATION_MODE is RLS. If you don't care about RLS and just want to get it working for the time being you should try it with REPLICATION_MODE set to STREAM.

Otherwise, make sure that you set SECURE_CHANNELS to true, sign a JWT with your JWT secret and make sure there's a role key in your claims. The role points to any existing db role (e.g. supabase_admin). Have you added any tables to the supabase_realtime publication yet? If not, you should do that as soon as possible.

@mats16
Copy link
Contributor Author

mats16 commented Aug 8, 2022

@w3b6x9 thanks.
I connected in stream mode and it worked.

I also noticed the following error about web socket connection and no record existed in realtime.subscription table.

{
  "event":"phx_reply",
  "payload":{
    "response":{
      "reason":"error occurred when joining realtime:public:messages with user token"
    },
    "status":"error"
  },
  "ref":"1",
  "topic":"realtime:public:messages"
}

It is probably error related pgjwt or permissions.

@mats16
Copy link
Contributor Author

mats16 commented Aug 8, 2022

We cannot use pgjwt on RDS, so we need to install it via SQL. If functions such as verify work in extensions schema, there is no problem, right?

(Also, sorry for creating the issue as a bug.)

@w3b6x9
Copy link
Member

w3b6x9 commented Aug 8, 2022

@mats16 ah, that's my bad. We've had a few versions of Realtime and the latest one no longer respects the SECURE_CHANNELS flag. You will have to set it to true, generate a JWT (you can use an online tool for convenience) with your JWT_SECRET (whatever xxx actually is), pass that in, and you should be fine.

Realtime doesn't use the pgjwt extension. JWT verification and validation is done on the Elixir side.

Also, sorry for creating the issue as a bug.

No worries, just want to make sure you're unblocked as soon as possible.

@mats16
Copy link
Contributor Author

mats16 commented Aug 9, 2022

@w3b6x9 Thanks

I think I have found the root cause.

I confirmed that Aurora had received a query select rolname from pg_authid .
https://github.com/supabase/realtime/blob/master/server/lib/realtime/rls/subscriptions/subscriptions.ex#L52

But we can not have permission to pg_authid table on Amazon Aurora/RDS, because rds_superuser role is not real SUPERUSER role.

Is it possible to change using pg_authid to pg_roles (or pg_user)?
I feel it is appropriate for security purposes not only for Amazon Aurora. Because pg_authid contains a password. (although it is hashed)

@w3b6x9
Copy link
Member

w3b6x9 commented Aug 9, 2022

Is it possible to change using pg_authid to pg_roles (or pg_user)?

@mats16 you're right, switching to pg_roles is better for security as it blanks the password. Would you like to do the honors and submit a PR? It should be as simple as replacing pg_authid for pg_roles.

@mats16
Copy link
Contributor Author

mats16 commented Aug 9, 2022

@w3b6x9 I tried it and confirmed it works with replacing pg_authid for pg_roles. I'll create PR.

@w3b6x9
Copy link
Member

w3b6x9 commented Aug 9, 2022

@mats16 thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p4 Priority 4
Projects
None yet
Development

No branches or pull requests

2 participants