Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oban crashing with RDS Proxy #869

Closed
alvarezloaiciga opened this issue Mar 16, 2023 · 5 comments
Closed

Oban crashing with RDS Proxy #869

alvarezloaiciga opened this issue Mar 16, 2023 · 5 comments
Labels
area:oss Related to Oban OSS

Comments

@alvarezloaiciga
Copy link

alvarezloaiciga commented Mar 16, 2023

Environment

  • Oban Version: 2.14.2
  • Oban pro: 0.12.5
  • Oban web: 2.9.4
  • PostgreSQL Version: 14
  • Elixir & Erlang/OTP Versions (elixir --version): Elixir 1.14.2 (compiled with Erlang/OTP 23)

Current Behavior

We have configured Amazon RDS Proxy in one of our databases, and as soon as we changed the DATABASE_URL in the project, Oban started failing with:

[error] GenServer {Oban.Registry, {Oban, Oban.Midwife}} terminating
** (stop) exited in: GenServer.call(#PID<0.1327.0>, {:listen, #PID<0.1328.0>, [:signal]}, 5000)
    ** (EXIT) time out
    (elixir 1.14.2) lib/gen_server.ex:1038: GenServer.call/3
    (postgrex 0.16.5) lib/postgrex/simple_connection.ex:212: Postgrex.SimpleConnection.call/3
    (oban 2.14.2) lib/oban/midwife.ex:31: Oban.Midwife.handle_continue/2
    (stdlib 4.1.1) gen_server.erl:1123: :gen_server.try_dispatch/4
    (stdlib 4.1.1) gen_server.erl:865: :gen_server.loop/7
    (stdlib 4.1.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message: {:continue, :start}
State: %Oban.Midwife.State{conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Pro.Queue.SmartEngine, get_dynamic_repo: nil, log: false, name: Oban, node: "NodeName", notifier: Oban.Notifiers.Postgres, peer: Oban.Peers.Postgres, plugins: [{Oban.Plugins.Cron, [crontab: [{"* * * * *", Namespace.AWorker, [queue: :low]}]]}, {Oban.Pro.Plugins.DynamicLifeline, []}, {Oban.Web.Plugins.Stats, []}, {Oban.Plugins.Gossip, []}], prefix: "public", queues: [high: [limit: 10], mid: [limit: 10], low: [limit: 10]], repo: Namespace.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}}
[error] GenServer {Oban.Registry, {Oban, Oban.Peer}} terminating
** (stop) exited in: GenServer.call(#PID<0.1327.0>, {:listen, #PID<0.1329.0>, [:leader]}, 5000)
    ** (EXIT) time out
    (elixir 1.14.2) lib/gen_server.ex:1038: GenServer.call/3
    (postgrex 0.16.5) lib/postgrex/simple_connection.ex:212: Postgrex.SimpleConnection.call/3
    (oban 2.14.2) lib/oban/peers/postgres.ex:82: Oban.Peers.Postgres.handle_continue/2
    (stdlib 4.1.1) gen_server.erl:1123: :gen_server.try_dispatch/4
    (stdlib 4.1.1) gen_server.erl:865: :gen_server.loop/7
    (stdlib 4.1.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message: {:continue, :start}
State: %Oban.Peers.Postgres.State{conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Pro.Queue.SmartEngine, get_dynamic_repo: nil, log: false, name: Oban, node: "NodeName", notifier: Oban.Notifiers.Postgres, peer: Oban.Peers.Postgres, plugins: [{Oban.Plugins.Cron, [crontab: [{"* * * * *", Namespace.AWorker, [queue: :low]}]]}, {Oban.Pro.Plugins.DynamicLifeline, []}, {Oban.Web.Plugins.Stats, []}, {Oban.Plugins.Gossip, []}], prefix: "public", queues: [high: [limit: 10], mid: [limit: 10], low: [limit: 10]], repo: Namespace.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, name: {:via, Registry, {Oban.Registry, {Oban, Oban.Peer}}}, timer: nil, interval: 30000, leader?: false, leader_boost: 2}
[error] GenServer {Oban.Registry, {Oban, Oban.Stager}} terminating
** (stop) exited in: GenServer.call(#PID<0.1327.0>, {:listen, #PID<0.1330.0>, [:stager]}, 5000)
    ** (EXIT) time out
    (elixir 1.14.2) lib/gen_server.ex:1038: GenServer.call/3
    (postgrex 0.16.5) lib/postgrex/simple_connection.ex:212: Postgrex.SimpleConnection.call/3
    (oban 2.14.2) lib/oban/stager.ex:64: Oban.Stager.handle_continue/2
    (stdlib 4.1.1) gen_server.erl:1123: :gen_server.try_dispatch/4
    (stdlib 4.1.1) gen_server.erl:865: :gen_server.loop/7
    (stdlib 4.1.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message: {:continue, :start}
State: %Oban.Stager.State{conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Pro.Queue.SmartEngine, get_dynamic_repo: nil, log: false, name: Oban, node: "NodeName", notifier: Oban.Notifiers.Postgres, peer: Oban.Peers.Postgres, plugins: [{Oban.Plugins.Cron, [crontab: [{"* * * * *", Namespace.Worker, [queue: :low]}]]}, {Oban.Pro.Plugins.DynamicLifeline, []}, {Oban.Web.Plugins.Stats, []}, {Oban.Plugins.Gossip, []}], prefix: "public", queues: [high: [limit: 10], mid: [limit: 10], low: [limit: 10]], repo: Namespace.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, name: {:via, Registry, {Oban.Registry, {Oban, Oban.Stager}}}, timer: nil, interval: 1000, limit: 5000, mode: :global, ping_at_tick: 0, swap_at_tick: 5, tick: 0}
[notice] Application exited: Namespace.Application.start(:normal, []) returned an error: shutdown: failed to start child: Oban
    ** (EXIT) shutdown: failed to start child: {:plugin, Oban.Web.Plugins.Stats}
        ** (EXIT) exited in: GenServer.call(#PID<0.1327.0>, {:listen, #PID<0.1333.0>, [:gossip]}, 5000)
            ** (EXIT) time out

I found out that Oban depends on PG notifications and those are usually not in place when a PG Pool processor is in the middle. I added notifier: Oban.Notifiers.PG, to the configuration, but even after that I am still getting an error:

State: %Oban.Peers.Postgres.State{conf: %Oban.Config{dispatch_cooldown: 5, engine: Oban.Pro.Queue.SmartEngine, get_dynamic_repo: nil, log: false, name: Oban, node: "node-name", notifier: Oban.Notifiers.PG, peer: Oban.Peers.Postgres, plugins: [{Oban.Plugins.Cron, [crontab: [{"* * * * *", App.Worker, [queue: :low]}]]}, {Oban.Pro.Plugins.DynamicLifeline, []}, {Oban.Web.Plugins.Stats, []}, {Oban.Plugins.Gossip, []}, {Oban.Plugins.Repeater, []}], prefix: "public", queues: [high: [limit: 10], mid: [limit: 10], low: [limit: 10]], repo: Namespace.Repo, shutdown_grace_period: 15000, stage_interval: 1000, testing: :disabled}, name: {:via, Registry, {Oban.Registry, {Oban, Oban.Peer}}}, timer: nil, interval: 30000, leader?: false, leader_boost: 2}
[error] GenServer {Oban.Registry, {Oban, Oban.Stager}} terminating
** (stop) exited in: GenServer.call(#PID<0.1444.0>, :leader?, 5000)
    ** (EXIT) an exception was raised:
        ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 1965ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:

  1. Ensuring your database is available and that you can connect to it
  2. Tracking down slow queries and making sure they are running fast enough
  3. Increasing the pool_size (although this increases resource consumption)
  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval

See DBConnection.start_link/2 for more information

            (db_connection 2.4.3) lib/db_connection.ex:953: DBConnection.transaction/3
            (oban 2.14.2) lib/oban/peers/postgres.ex:94: anonymous fn/2 in Oban.Peers.Postgres.handle_info/2
            (telemetry 0.4.3) .../deps/telemetry/src/telemetry.erl:272: :telemetry.span/3
            (oban 2.14.2) lib/oban/peers/postgres.ex:92: Oban.Peers.Postgres.handle_info/2
            (stdlib 4.1.1) gen_server.erl:1123: :gen_server.try_dispatch/4
            (stdlib 4.1.1) gen_server.erl:865: :gen_server.loop/7
            (stdlib 4.1.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
    (elixir 1.14.2) lib/gen_server.ex:1038: GenServer.call/3
    (oban 2.14.2) lib/oban/peer.ex:91: Oban.Peer.leader?/2
    (oban 2.14.2) lib/oban/stager.ex:112: Oban.Stager.check_leadership_and_stage/1
    (oban 2.14.2) lib/oban/stager.ex:86: anonymous fn/2 in Oban.Stager.handle_info/2
    (telemetry 0.4.3) .../deps/telemetry/src/telemetry.erl:272: :telemetry.span/3
    (oban 2.14.2) lib/oban/stager.ex:85: Oban.Stager.handle_info/2
    (stdlib 4.1.1) gen_server.erl:1123: :gen_server.try_dispatch/4
    (stdlib 4.1.1) gen_server.erl:1200: :gen_server.handle_msg/6
    (stdlib 4.1.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3

We have tested the connection to the DB and it's properly working using the DATABASE_URL.

@sorentwo
Copy link
Member

You're correct, the Postgres notifier (and any Postgres Pub/Sub functionality) don't work with a database proxy.

Do other queries work with Oban.Repo? It's a light wrapper around your app's Repo that merely standardizes some options; I wouldn't expect Oban queries to differ.

Try the following query to check functionality:

import Ecto.Query

Oban.Repo.one(Oban.config(), last(Oban.Job))

Your stacktrace implies that connections are working, but the pool is overloaded. Has your average query time gone up from the proxy? Is your database pool large enough?

@sorentwo sorentwo added the area:oss Related to Oban OSS label Mar 16, 2023
@sorentwo
Copy link
Member

@alvarezloaiciga Any update? If this isn't related to Oban, I'd love to close this issue out.

@alvarezloaiciga
Copy link
Author

@sorentwo sorry I got pulled into other things. Will run this tomorrow. One issue is that the app was crashing on start, after we added RDS Proxy.

@sorentwo
Copy link
Member

In lieu of more information, I'm closing this issue. We can reopen it in the future if there are more details!

@sorentwo sorentwo closed this as not planned Won't fix, can't repro, duplicate, stale Mar 30, 2023
@alvarezloaiciga
Copy link
Author

Hi @sorentwo we were able to fix this by adding ssl config in our DB connection. Found out that the error was:

[error] Postgrex.Protocol (#PID<0.1350.0>) failed to connect: ** (Postgrex.Error) FATAL 28000 (invalid_authorization_specification) This RDS Proxy requires TLS connections

To fix we just added this to the repo config:

  ssl: true,
  ssl_opts: [
    versions: [:"tlsv1.2"]
  ],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:oss Related to Oban OSS
Projects
None yet
Development

No branches or pull requests

2 participants