Skip to content

Resolve control node identity via system.local before initial metadata refresh#853

Merged
dkropachev merged 3 commits into
scylladb:scylla-4.xfrom
dkropachev:remove-contactpoints-from-initial-refresh
Mar 27, 2026
Merged

Resolve control node identity via system.local before initial metadata refresh#853
dkropachev merged 3 commits into
scylladb:scylla-4.xfrom
dkropachev:remove-contactpoints-from-initial-refresh

Conversation

@dkropachev
Copy link
Copy Markdown

@dkropachev dkropachev commented Mar 20, 2026

Summary

Refactors how contact points transition into metadata nodes:

  • ControlConnection resolves contact points eagerly: After opening a channel to a contact point, the control connection queries system.local via getChannelNodeInfo() and registers the node into metadata (MetadataManager.registerNode()) before InitialNodeListRefresh runs.
  • InitialNodeListRefresh simplified: No longer receives contact points. Creates new nodes for all discovered hosts, reusing any pre-registered node (matched by hostId) to preserve connection state.
  • New MetadataManager.registerNode(): Atomically creates and inserts a metadata node from NodeInfo on the admin executor, preventing races with concurrent metadata refreshes.
  • Control node tracking by hostId: New controlNode() accessor and isControlNode() method match by hostId instead of endpoint comparison, fixing false matches behind NLB/proxy where multiple nodes share the same endpoint.
  • Local DC inference from control connection: LBP helpers (OptionalLocalDcHelper, MandatoryLocalDcHelper, InferringLocalDcHelper) infer the local DC from the control connection node instead of contact points, removing wasImplicitContactPoint() and contact-point DC checks.
  • Contact points without metrics: DefaultNode.newContactPoint() creates nodes with NoopNodeMetricUpdater since contact points are temporary objects replaced by real metadata nodes.
  • TopologyMonitor API: getChannelEndpoint() renamed to getChannelNodeInfo(), returning full NodeInfo instead of just EndPoint.
  • Edge case handling: Channel closed during resolve, control node removed from metadata on reconnect, empty proxy targets in RoundRobinProxy, NlbSimulator cleanup when all nodes removed.

Test plan

  • Existing unit tests updated and pass: ControlConnectionTest, InitialNodeListRefreshTest, MetadataManagerTest, LBP init/distance tests
  • New unit tests: control node lifecycle (resolve, clear on close, restore after reconnect), registerNode idempotency, contact-point-to-metadata-node resolution
  • Integration tests updated: ConnectIT (DC inference from control connection, LBP init failure cleanup), NodeStateIT (event expectations for new node lifecycle)
  • CI/CD pipeline passes

@dkropachev dkropachev force-pushed the remove-contactpoints-from-initial-refresh branch 27 times, most recently from 92752c0 to 4a40853 Compare March 27, 2026 09:15
@dkropachev dkropachev changed the title Remove contact points merging from InitialNodeListRefresh Resolve control node identity via system.local before initial metadata refresh Mar 27, 2026
@dkropachev dkropachev force-pushed the remove-contactpoints-from-initial-refresh branch from 5209fd1 to a1b3ceb Compare March 27, 2026 09:56
@dkropachev dkropachev marked this pull request as ready for review March 27, 2026 13:36
@dkropachev dkropachev requested a review from nikagra March 27, 2026 13:59
@dkropachev dkropachev self-assigned this Mar 27, 2026
- Add port range validation in ClientRoutesConfig.withNativeTransportPort
- Handle null metricUpdater in DefaultNode.setEndPoint (transient field
  can be null on deserialized nodes) and clear old metrics on update
- Close NlbSimulator discovery proxy when all nodes are removed
- Handle empty targets list in RoundRobinProxy to avoid division by zero
- Rename savePort test to reflect actual behavior
Contact point nodes are ephemeral objects used only for the connection
query plan. They don't need metrics registration, which avoids
unnecessary metric churn during initialization and reconnection.

- Add DefaultNode.newContactPoint() static factory that creates nodes
  with NoopNodeMetricUpdater instead of registering real metrics
- Update MetadataManager.addContactPoints to use the factory
- Update LoadBalancingPolicyWrapper to use the factory
- Update TestNodeFactory to distinguish between full nodes (with
  metrics) and contact points (without)
@dkropachev dkropachev force-pushed the remove-contactpoints-from-initial-refresh branch from a7ac61f to ad36409 Compare March 27, 2026 15:58
@scylladb scylladb deleted a comment from nikagra Mar 27, 2026
@dkropachev dkropachev force-pushed the remove-contactpoints-from-initial-refresh branch from ad36409 to e62adde Compare March 27, 2026 19:42
Copy link
Copy Markdown

@nikagra nikagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One clarifying comment suggestion, 👍 otherwise

@dkropachev dkropachev force-pushed the remove-contactpoints-from-initial-refresh branch from 4d41bd1 to bafa36e Compare March 27, 2026 22:35
…a refresh

Instead of carrying contact point nodes through to InitialNodeListRefresh
and matching them by endpoint, the control connection now resolves the
connected node's full identity (hostId, datacenter, rack, etc.) by
querying system.local immediately after establishing the channel. The
resolved node is registered into metadata via MetadataManager.registerNode
before the initial node list refresh runs, so the refresh can find and
reuse it by hostId — preserving connection state.

Key changes:
- TopologyMonitor: rename getChannelEndpoint to getChannelNodeInfo,
  returning full NodeInfo instead of just an EndPoint
- ControlConnection: add controlNode() accessor; resolve contact point
  nodes via system.local and register them into metadata; track
  currentControlNode/pendingControlNode for event matching
- MetadataManager: add registerNode() to atomically insert a node into
  metadata by hostId; remove contact points parameter from
  InitialNodeListRefresh
- InitialNodeListRefresh: match discovered nodes against existing
  metadata entries by hostId instead of matching contact points by
  endpoint
- Load balancing helpers: infer local DC from control connection
  (controlNode hostId or channel endpoint) instead of contact points;
  replace checkLocalDatacenterCompatibility with a warning when the
  configured DC doesn't match any node
@dkropachev dkropachev force-pushed the remove-contactpoints-from-initial-refresh branch from bafa36e to a6f116e Compare March 27, 2026 22:56
@dkropachev dkropachev merged commit 12e6acb into scylladb:scylla-4.x Mar 27, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants