docker-compose-example-pg.yml
docker-compose-example-redis.yml
Describe the bug
During testing on a kubernetes cluster I noticed that if/when the broadcaster backend goes away (pod restarts, network connection lost etc..) opal-server does not fully re-establish the broadcaster setup. The behaviour differs slightly with Postgres and Redis, but both have issues. For Postgres, I see no reconnection attempts being made at all. For Redis, new connections to Redis are established however the Pubsub "listener" is never re-created. the Redis issue only occurs if Redis is configured with a password/authentication. Both of these issues can be easily reproduced with a docker-compose setup.
When using the Postgres backend and opal-server is in this "disconnected from broadcaster" state, from that point on all new attempts by a opal-client instance to establish a websocket connection with opal-server fail with a connection reset after the websocket upgrade is done. When this happens there is no indication from the opal-sever logs that the issue is at all related to the broadcaster being disconnected, but that turned out to be the root cause. We were seeing things in this state where opal-client pods could never connect with opal-server until opal-server was restarted (upon which it would reconnect to the broadcaster). opal-server does not properly do a graceful shutdown or reconnect to postgres when this happens.
When using the Redis backend (configured with a password) and opal-server is in this "connected to Redis with no listener" state opal-client instances are able to connect to opal-server without issues, but the broadcast system is silently broken because there is nothing listening to the "EventNotifier" channel in Redis. The attempt to re-create the listener ends up failing in the asyncio_redis module due this bug jonathanslenders/asyncio-redis#82
To Reproduce
In both cases the docker compose setup can be used, once the services are up, the "broadcast_channel" service can be restarted and then postgres or redis queries respectively can be used to confirm that the listener has not been re-established.
..for postgres...
- bring up opal-server, pgsql and opal-client
docker compose -f docker-compose-example-pg.yml up
- list the listeners in postgres after things are up..
➜ docker compose -f docker-compose-example-pg.yml exec broadcast_channel psql -U postgres -d postgres -P pager=off -c "SELECT datname, pid, application_name, state, query FROM pg_stat_activity WHERE query ILIKE '%LISTEN%';"
datname | pid | application_name | state | query
----------+-----+------------------+--------+---------------------------------------------------------------------------------------------------------
postgres | 33 | | idle | LISTEN "EventNotifier"
postgres | 43 | | idle | LISTEN "EventNotifier"
postgres | 44 | | idle | LISTEN "EventNotifier"
postgres | 84 | psql | active | SELECT datname, pid, application_name, state, query FROM pg_stat_activity WHERE query ILIKE '%LISTEN%';
(4 rows)
- restart the broadcast_channel service (postgres)
docker compose -f docker-compose-example-pg.yml restart broadcast_channel
..from this point on you'll start seeing opal-client logs like this as client connections start failing
opal_client-1 | 2026-02-17T03:59:28.805526+0000 | 19 | opal_client.policy.updater | INFO | Disconnected from server
opal_client-1 | 2026-02-17T03:59:28.805727+0000 | 19 | fastapi_websocket_rpc.websocket_rpc_c...|ERROR | RPC Error
opal_client-1 | Traceback (most recent call last):
...etc...
opal_client-1 | websockets.exceptions.ConnectionClosedOK: received 1000 (OK); then sent 1000 (OK)
- Check the current start of the listeners in postgres.. no listeners have been re-created
➜ docker compose -f docker-compose-example-pg.yml exec broadcast_channel psql -U postgres -d postgres -P pager=off -c "SELECT datname, pid, application_name, state, query FROM pg_stat_activity WHERE query ILIKE '%LISTEN%';"
datname | pid | application_name | state | query
----------+-----+------------------+--------+---------------------------------------------------------------------------------------------------------
postgres | 42 | psql | active | SELECT datname, pid, application_name, state, query FROM pg_stat_activity WHERE query ILIKE '%LISTEN%';
(1 row)
..at this point the listeners will not get re-created unless the opal-server is restarted.
..for redis..
- bring up opal-server, redis and opal-client
docker compose -f docker-compose-example-redis.yml up
- list the pubsub channels in redis after things are up..
➜ docker compose -f docker-compose-example-redis.yml exec broadcast_channel redis-cli -a password1234 pubsub channels
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
1) "EventNotifier"
- restart the broadcast_channel service (redis)
docker compose -f docker-compose-example-redis.yml restart broadcast_channel
- Checking the current start of the pubsub channels in redis, shows there are none.. the listener(s) have been not been re-created
➜ docker compose -f docker-compose-example-redis.yml exec broadcast_channel redis-cli -a password1234 pubsub channels
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
(empty array)
..at this point the listeners will not get re-created unless the opal-server is restarted.
Expected behavior
If the connection to the broadcaster backend is lost, connections should be re-established when possible and the broadcast listener(s) re-created.
Screenshots
If applicable, add screenshots to help explain your problem.
OPAL version
docker-compose-example-pg.yml
docker-compose-example-redis.yml
Describe the bug
During testing on a kubernetes cluster I noticed that if/when the broadcaster backend goes away (pod restarts, network connection lost etc..)
opal-serverdoes not fully re-establish the broadcaster setup. The behaviour differs slightly with Postgres and Redis, but both have issues. For Postgres, I see no reconnection attempts being made at all. For Redis, new connections to Redis are established however the Pubsub "listener" is never re-created. the Redis issue only occurs if Redis is configured with a password/authentication. Both of these issues can be easily reproduced with a docker-compose setup.When using the Postgres backend and
opal-serveris in this "disconnected from broadcaster" state, from that point on all new attempts by aopal-clientinstance to establish a websocket connection withopal-serverfail with a connection reset after the websocket upgrade is done. When this happens there is no indication from theopal-severlogs that the issue is at all related to the broadcaster being disconnected, but that turned out to be the root cause. We were seeing things in this state whereopal-clientpods could never connect withopal-serveruntilopal-serverwas restarted (upon which it would reconnect to the broadcaster).opal-serverdoes not properly do a graceful shutdown or reconnect to postgres when this happens.When using the Redis backend (configured with a password) and
opal-serveris in this "connected to Redis with no listener" stateopal-clientinstances are able to connect toopal-serverwithout issues, but the broadcast system is silently broken because there is nothing listening to the "EventNotifier" channel in Redis. The attempt to re-create the listener ends up failing in the asyncio_redis module due this bug jonathanslenders/asyncio-redis#82To Reproduce
In both cases the docker compose setup can be used, once the services are up, the "broadcast_channel" service can be restarted and then postgres or redis queries respectively can be used to confirm that the listener has not been re-established.
..for postgres...
..from this point on you'll start seeing opal-client logs like this as client connections start failing
..at this point the listeners will not get re-created unless the opal-server is restarted.
..for redis..
..at this point the listeners will not get re-created unless the opal-server is restarted.
Expected behavior
If the connection to the broadcaster backend is lost, connections should be re-established when possible and the broadcast listener(s) re-created.
Screenshots
If applicable, add screenshots to help explain your problem.
OPAL version