-
Notifications
You must be signed in to change notification settings - Fork 646
CLUSTER NODES ignores ClusterPreferredEndpointType=hostname and still returns worker bind addresses #1650
Description
When using --cluster-announce-hostname together with --cluster-preferred-endpoint-type hostname, Garnet only partially switches cluster metadata to the external client-facing endpoint.
CLUSTER SLOTS and redirection behavior can prefer the announced hostname, but CLUSTER NODES still returns the internal bind or Docker bridge address as the primary node address. From code inspection, CLUSTER SHARDS appears to use the same raw worker address path.
Environment
- Garnet image:
ghcr.io/microsoft/garnet:1.1.1 - Two-node cluster in Docker bridge mode
- Nodes listen on the same ports that are published to the host
- Host-reachable client endpoints:
10.20.22.211:17000and10.20.22.211:17001 - Garnet started with:
--cluster--aof--bind 0.0.0.0--port <published-port>--cluster-announce-hostname 10.20.22.211--cluster-preferred-endpoint-type hostname
Reproduction
- Start two Garnet nodes in Docker bridge mode.
- Publish their ports to the host and set
--cluster-announce-hostnameto a host-reachable address or hostname. - Set
--cluster-preferred-endpoint-type hostname. - Form a cluster and connect from outside the Docker network using the published host endpoints.
- Run
CLUSTER NODES.
Actual behavior
Observed CLUSTER NODES output:
0342309892aa1c7b5d0cdde9f8e92bf1af0f96c0 172.18.0.2:17000@27000,10.20.22.211 myself,master - 0 0 16 connected 0-16383
ad13485b48e14bc16dd3b0653e1401a5d3bb970f 172.18.0.3:17001@27001,10.20.22.211 slave 0342309892aa1c7b5d0cdde9f8e92bf1af0f96c0 639104637187557405 639104637187553617 17 connected
Observed related replication output from the same cluster:
slave0:ip=172.18.0.3,port=17001,state=online,offset=64,lag=0
So even though clients are expected to use 10.20.22.211:17000 and 10.20.22.211:17001, CLUSTER NODES still serializes 172.18.0.2 and 172.18.0.3 as the primary node addresses and only appends the announced host after the comma.
From source inspection, CLUSTER SHARDS also appears to build its address field from workers[workerId].Address, so this likely affects CLUSTER SHARDS as well.
Expected behavior
One of these outcomes would solve the problem:
CLUSTER NODESandCLUSTER SHARDShonorClusterPreferredEndpointTypeand expose the announced hostname or preferred external endpoint as the primary address.- Garnet provides an explicit configuration option that makes these commands emit the announced external endpoint instead of the internal worker bind address.
- If the raw bind address must remain for compatibility, Garnet provides a separate cluster metadata view that is consistent with
CLUSTER SLOTSand redirection responses.
Why this matters
- External diagnostics become misleading because the primary address reported by
CLUSTER NODESis not reachable from the host. - Cluster-aware clients or topology probes may still attempt side connections to the internal Docker address even when the configured seed endpoints are correct.
- Control planes and dashboards have to special-case internal Docker addresses instead of relying on Garnet's own cluster metadata.
Relevant code references
GetEndpointByPreferredType(...)is where preferred endpoint selection is implemented for slot or redirect-style endpoint paths:GetNodeInfo(...)currently emitsworkers[workerId].Addressas the leadingip:port@cportfield inCLUSTER NODES, and only appends hostname metadata after the comma:GetSlotsInfo(...)routes through the preferred-endpoint path forCLUSTER SLOTS:GetShardsInfo(...)appears to serialize theaddressfield fromworkers[workerId].Addressas well:
Related issues checked
I checked these before filing, and they look related but not duplicates:
#1446added--cluster-announce-hostnameand hostname-preferred endpoint support for client-facing behavior such asCLUSTER SLOTSand redirections.#640fixed stale announce data after node movement or recovery.#1561fixed hostname handling for MIGRATE and automatic slot migration.#931fixed consistency issues betweenCLUSTER NODESandCLUSTER REPLICAS.
This issue appears to be a remaining metadata-selection gap specifically in CLUSTER NODES, and likely CLUSTER SHARDS, after the earlier hostname work.
Additional notes
I can provide a minimal two-container repro or sanitized capture files if that would help validate the behavior quickly.