Skip to content

fix: close old discovery proxy before rebinding in NlbSimulator#852

Open
dkropachev wants to merge 2 commits intoscylladb:scylla-4.xfrom
dkropachev:fix/nlb-simulator-rebind
Open

fix: close old discovery proxy before rebinding in NlbSimulator#852
dkropachev wants to merge 2 commits intoscylladb:scylla-4.xfrom
dkropachev:fix/nlb-simulator-rebind

Conversation

@dkropachev
Copy link

@dkropachev dkropachev commented Mar 20, 2026

Summary

  • Fix NlbSimulator.rebuildDiscoveryProxy() creating a new RoundRobinProxy on the same port before closing the old one, causing BindException: Address already in use. Add addTarget()/removeTarget() to RoundRobinProxy so the discovery proxy is created once and updated in-place.
  • Fix ClientRoutesTopologyMonitor.savePort() saving the NLB proxy port (e.g. 29043) as the fallback for broadcastRpcAddress. This caused Metadata.findNode() to fail matching TOPOLOGY_CHANGE REMOVED_NODE events (which carry the real native transport port 9042), so decommissioned nodes were never removed from metadata. Fix by adding a nativeTransportPort field to ClientRoutesConfig (default 9042, configurable via advanced.client-routes.native-transport-port) and setting it in the constructor. broadcastRpcAddress is used only for event matching — connections go through ClientRoutesEndPoint which resolves via NLB proxy addresses in the routes cache.

Test plan

  • ClientRoutesIT.should_survive_full_node_replacement_through_nlb passes (previously failed with BindException at addNode(2), then timeout after decommission)
  • ClientRoutesTopologyMonitorTest.savePort_should_use_native_transport_port_from_config passes
  • Unit tests pass
  • Full verify (formatting) passes

Fixes #851

@dkropachev dkropachev force-pushed the fix/nlb-simulator-rebind branch 6 times, most recently from eb87f70 to 484ad59 Compare March 20, 2026 03:49
@dkropachev dkropachev marked this pull request as draft March 20, 2026 10:40
@dkropachev dkropachev force-pushed the fix/nlb-simulator-rebind branch 5 times, most recently from aa8e3d1 to 893a686 Compare March 20, 2026 15:22
NlbSimulator.addNode() was tearing down and recreating the discovery
RoundRobinProxy on every call, which failed with "Address already in
use" because the new proxy tried to bind while the old one still held
the port.

Add addTarget()/removeTarget() to RoundRobinProxy so the discovery
proxy is created once (on the first addNode) and updated in-place as
nodes join or leave.

Fixes scylladb#851
@dkropachev dkropachev force-pushed the fix/nlb-simulator-rebind branch from 893a686 to 986581d Compare March 20, 2026 15:24
ClientRoutesTopologyMonitor.savePort() was saving the NLB proxy port
(e.g. 29043) as the fallback for broadcastRpcAddress construction.
This caused Metadata.findNode() to fail matching TOPOLOGY_CHANGE
REMOVED_NODE events (which carry the real native transport port 9042),
so decommissioned nodes were never removed from metadata.

Add a nativeTransportPort field to ClientRoutesConfig (default 9042,
configurable via advanced.client-routes.native-transport-port) and set
it in the ClientRoutesTopologyMonitor constructor. The savePort()
override is now a no-op since the port is already initialized.

broadcastRpcAddress is used only for event matching — connections go
through ClientRoutesEndPoint which resolves via NLB proxy addresses
in the routes cache.
@dkropachev dkropachev force-pushed the fix/nlb-simulator-rebind branch from 986581d to 2342a2b Compare March 20, 2026 15:29
@dkropachev dkropachev requested a review from nikagra March 20, 2026 15:35
@dkropachev dkropachev marked this pull request as ready for review March 20, 2026 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ClientRoutesIT.should_survive_full_node_replacement_through_nlb fails with Address already in use

1 participant