Skip to content

fix(api): inherit REDIS_USE_CLUSTERS for event bus pubsub config (#35291) 🤖🤖🤖#35338

Open
jeanibarz wants to merge 3 commits intolanggenius:mainfrom
jeanibarz:fix/35291-pubsub-redis-clusters-inherit
Open

fix(api): inherit REDIS_USE_CLUSTERS for event bus pubsub config (#35291) 🤖🤖🤖#35338
jeanibarz wants to merge 3 commits intolanggenius:mainfrom
jeanibarz:fix/35291-pubsub-redis-clusters-inherit

Conversation

@jeanibarz
Copy link
Copy Markdown
Contributor

@jeanibarz jeanibarz commented Apr 16, 2026

Summary

Fixes #35291.

The event bus pub/sub config (PUBSUB_REDIS_* / EVENT_BUS_REDIS_*) already inherits connection details from the generic Redis config (REDIS_HOST, REDIS_PORT, REDIS_USERNAME, REDIS_PASSWORD, REDIS_DB, REDIS_USE_SSL) via the existing normalized_pubsub_redis_url property, but PUBSUB_REDIS_USE_CLUSTERS was not covered by that fallback. With the old False default, _create_pubsub_client at api/extensions/ext_redis.py:454 always invoked redis.Redis.from_url(...) — never RedisCluster.from_url(...) — even when the operator had set REDIS_USE_CLUSTERS=true for the main Redis. In cluster-only deployments this makes the event bus fail.

Three commits, all scoped to the fix:

  1. fix(api) — extend RedisConfigDefaults with REDIS_USE_CLUSTERS, change the PUBSUB_REDIS_USE_CLUSTERS default from False to None, add a computed normalized_pubsub_use_clusters property mirroring normalized_pubsub_redis_url, and route the single call site through it.
  2. fix(docker) — without this follow-up the Python-level fallback is shadowed by docker-compose.yaml injecting EVENT_BUS_REDIS_USE_CLUSTERS: ${EVENT_BUS_REDIS_USE_CLUSTERS:-false} into every container. That makes Pydantic see the literal string "false" before the new inheritance logic can fire, reproducing the bug for docker users (the reported deployment context). The .env.example and generated docker-compose.yaml defaults are now empty; a small field_validator coerces empty/whitespace-only env values to None so Pydantic does not reject "".
  3. test(ext_redis) — pin the call-site swap with three integration-style tests that patch the client factories and assert _create_pubsub_client receives the correct use_clusters kwarg. Without these, a future refactor that restores the raw PUBSUB_REDIS_USE_CLUSTERS read at init_app would silently reintroduce the bug with all existing property tests still passing.

Explicitly setting either PUBSUB_REDIS_USE_CLUSTERS or EVENT_BUS_REDIS_USE_CLUSTERS still wins over the inherited flag. Non-cluster deployments with both flags unset keep the current behavior (cluster off).

Verification

Unit tests — 8 new, all passing locally (config layer + runtime routing):

$ uv run pytest api/tests/unit_tests/configs/test_dify_config.py \
    api/tests/unit_tests/extensions/test_redis.py -q
46 passed in 4.65s

The 5 config-layer tests cover: inheritance, explicit override, pubsub-only cluster, empty-string env (docker-compose ${VAR:-} pattern), and the no-flag default. The 3 ext_redis tests assert the runtime-path swap: init_app calls _create_pubsub_client(url, True) when normalized_pubsub_use_clusters is True, ...(url, False) when explicitly False, and skips pubsub client construction when the URL is absent.

End-to-end verification against a live Redis clusterdocker run -p 7000-7005:7000-7005 grokzen/redis-cluster (3 masters + 3 replicas, all 16384 slots assigned, cluster_state:ok), then ran the same init_app code path with the operator's reported env:

Scenario upstream/main this PR
REDIS_USE_CLUSTERS=true, EVENT_BUS_REDIS_USE_CLUSTERS=""
(the post-fix docker-compose default)
ValidationError: Input should be a valid boolean, unable to interpret input [input_value=''] — API fails to start normalized_pubsub_use_clusters=True, pubsub client is redis.cluster.RedisCluster, live PUBLISH reaches the cluster
REDIS_USE_CLUSTERS=true, EVENT_BUS_REDIS_USE_CLUSTERS="false"
(the pre-fix docker-compose default)
Silent bug — use_clusters=False, pubsub client is standalone redis.Redis Explicit override still wins (standalone) — correct per design; this env is no longer injected by the updated compose template

Against a vanilla Redis Cluster, redis-py's standalone client transparently follows MOVED redirects for data commands, and plain PUBLISH gossip-propagates across cluster nodes — so the misrouting does not always produce a hard failure in vanilla tests. The consequential failures reported in #35291 happen on managed cluster-only endpoints (AWS ElastiCache cluster-mode, GCP Memorystore cluster-mode, Kubernetes deployments behind a single LB endpoint) that reject non-cluster-protocol connections outright. The fix is to always use the correct client type when the operator has opted into cluster mode for the main Redis.

Static checksruff format + check clean on all three files, basedpyright and mypy clean on configs/middleware/cache/redis_pubsub_config.py and extensions/ext_redis.py (Success: no issues found in 2 source files).

Screenshots

N/A — backend-only config, docker template, and test change.

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint && make type-check (backend) and cd web && pnpm exec vp staged (frontend) to appease the lint gods

From Claude Code

The event bus pub/sub config (PUBSUB_REDIS_*/EVENT_BUS_REDIS_*) inherits
connection details from the generic Redis config (HOST, PORT, USERNAME,
PASSWORD, DB, USE_SSL) via normalized_pubsub_redis_url, but
PUBSUB_REDIS_USE_CLUSTERS did not fall back to REDIS_USE_CLUSTERS. With
the old False default, _create_pubsub_client called redis.Redis.from_url
instead of RedisCluster.from_url whenever operators set REDIS_USE_CLUSTERS
alone, causing the event bus to fail in cluster-only deployments.

Change PUBSUB_REDIS_USE_CLUSTERS default from False to None, add a
normalized_pubsub_use_clusters property that mirrors the existing
normalized_pubsub_redis_url fallback pattern, and route the single
ext_redis.py call site through it. Explicitly set values still win over
the inherited flag; non-cluster deployments remain unchanged.
The Python-level fallback added in the previous commit is bypassed by
the docker-compose template, which injects
`EVENT_BUS_REDIS_USE_CLUSTERS: ${EVENT_BUS_REDIS_USE_CLUSTERS:-false}`
into every container's environment. With a literal "false" string
reaching the API container, Pydantic coerces it to `False` before the
new inheritance logic can fire, so the issue reproducer (operator sets
`REDIS_USE_CLUSTERS=true` without touching the event-bus flag) still
fails for docker deployments — the primary reported context of langgenius#35291.

- Drop the hard-coded `false` default from .env.example and document
  that leaving the value empty inherits `REDIS_USE_CLUSTERS`.
- Regenerate docker-compose.yaml so the compose-level default becomes
  an empty string, matching the new .env.example.
- Add a `field_validator` on `PUBSUB_REDIS_USE_CLUSTERS` that coerces
  empty / whitespace-only env values to `None`, because Pydantic
  otherwise rejects `""` for `bool | None`.
- Add unit tests for the empty-string path and for the "pubsub-only
  cluster" configuration (main Redis standalone, event bus clustered).
@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Apr 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

No changes detected.

The property tests on DifyConfig pin the config-layer semantics but don't
catch regressions in the one-line call-site swap at init_app — if someone
later restores `dify_config.PUBSUB_REDIS_USE_CLUSTERS` in place of
`normalized_pubsub_use_clusters`, all existing unit tests would still pass
while the langgenius#35291 bug silently returns.

Add three integration-style tests that patch the redis client factories and
assert `_create_pubsub_client` receives the correct `use_clusters` kwarg for:
  - inheritance path (normalized_pubsub_use_clusters=True)
  - explicit-false override (normalized_pubsub_use_clusters=False)
  - no pubsub URL configured (client never constructed)
@github-actions
Copy link
Copy Markdown
Contributor

Pyrefly Diff

base → PR
--- /tmp/pyrefly_base.txt	2026-04-16 20:51:04.258946249 +0000
+++ /tmp/pyrefly_pr.txt	2026-04-16 20:50:53.477839681 +0000
@@ -5777,7 +5777,7 @@
 ERROR Cannot index into `object` [bad-index]
    --> tests/unit_tests/extensions/otel/test_celery_sqlcommenter.py:140:20
 ERROR Object of class `Retry` has no attribute `_retries` [missing-attribute]
-  --> tests/unit_tests/extensions/test_redis.py:34:16
+  --> tests/unit_tests/extensions/test_redis.py:35:16
 ERROR Argument `dict[str, bytes | str]` is not assignable to parameter `headers` with type `Headers | Mapping[bytes, bytes] | Mapping[str, str] | Sequence[tuple[bytes, bytes]] | Sequence[tuple[str, str]] | None` in function `httpx._models.Response.__init__` [bad-argument-type]
   --> tests/unit_tests/factories/test_build_from_mapping.py:75:21
 ERROR Object of class `NoneType` has no attribute `storage_key` [missing-attribute]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-revision size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EVENT_BUS_REDIS_USE_CLUSTER can not handler redis cluster

1 participant