Skip to content

3.x: Fix LWT PRESERVE_REPLICA_ORDER routing to include non-replica nodes#834

Merged
dkropachev merged 1 commit intoscylla-3.xfrom
fix-lwt-routing-preserve-replica-order-833
Mar 6, 2026
Merged

3.x: Fix LWT PRESERVE_REPLICA_ORDER routing to include non-replica nodes#834
dkropachev merged 1 commit intoscylla-3.xfrom
fix-lwt-routing-preserve-replica-order-833

Conversation

@dkropachev
Copy link
Copy Markdown

@dkropachev dkropachev commented Mar 6, 2026

Summary

  • Fixed PreserveReplicaOrderIterator in TokenAwarePolicy to include non-replica nodes after replicas in the query plan, preventing "No node was available" failures when replicas are insufficient
  • When replicas exist, non-replica nodes from child policy are appended after all replicas (local first, then remote)
  • When no replicas are available (all DOWN/IGNORED), the full child policy fallback is preserved
  • Updated all LWT routing tests to verify non-replica inclusion

Test plan

  • All existing unit tests updated and passing (579 tests, 0 failures)
  • Verified non-replica nodes appear after replicas in LWT query plans
  • Verified full child policy fallback when all replicas are DOWN
  • Verified full child policy fallback when replica list is empty
  • CI pipeline validation

Fixes #833
Port of #831 (4.x fix) to scylla-3.x

@dkropachev dkropachev changed the title Fix LWT PRESERVE_REPLICA_ORDER routing to include non-replica nodes 3.x: Fix LWT PRESERVE_REPLICA_ORDER routing to include non-replica nodes Mar 6, 2026
The PreserveReplicaOrderIterator in TokenAwarePolicy previously returned
only replica nodes in the query plan. When replicas were unavailable
(e.g. prepared statements before parameter binding, or replicas going
down after plan construction), the query plan could be empty or
insufficient, causing "No node was available" errors.

The fix adds a third pass that appends non-replica nodes from the child
policy after all replicas have been returned. When no replicas are
available at all (all DOWN/IGNORED), the full child policy fallback is
preserved.

Fixes #833
@dkropachev dkropachev force-pushed the fix-lwt-routing-preserve-replica-order-833 branch from a57613d to e960106 Compare March 6, 2026 15:48
@dkropachev dkropachev requested a review from nikagra March 6, 2026 16:13
@dkropachev
Copy link
Copy Markdown
Author

Query Plan Ordering Summary for TokenAwarePolicy

Regular statements (non-LWT)

TokenAwarePolicy always returns LOCAL replicas first, then delegates to the child policy for the rest. The child policy determines what "LOCAL" means:

1. No DC/rack awareness (RoundRobinPolicy)

All replicas (LOCAL) → all other nodes (round-robin)

Every node is LOCAL. Replicas come first (in TOPOLOGICAL/RANDOM/NEUTRAL order), then remaining nodes from the child's round-robin.

2. DC-aware (DCAwareRoundRobinPolicy)

Local DC replicas → local DC non-replicas (round-robin) → remote DC nodes (limited by usedHostsPerRemoteDc)

Only nodes in the configured local DC are LOCAL. Remote DC replicas are not prioritized — they appear in the child policy's tail alongside other remote nodes.

3. Rack-aware (RackAwareRoundRobinPolicy)

Local DC replicas → local rack non-replicas → remote rack (same DC) non-replicas → remote DC nodes

Same LOCAL = same DC. But the child policy internally orders: local rack first, then other racks in the same DC, then remote DCs.


LWT statements (PRESERVE_REPLICA_ORDER)

The PreserveReplicaOrderIterator ignores the child policy for replica ordering and uses its own 3-pass strategy:

Local DC replicas (metadata order) → Remote DC replicas (metadata order) → non-replicas from child policy

Key differences from regular:

  • Remote replicas come before local non-replicas (any replica > any non-replica for Paxos)
  • Replica order is deterministic from cluster metadata (token ring), not round-robin
  • Rack awareness is intentionally skipped — all local DC replicas are treated equally to avoid Paxos contention hotspots
  • The RackAwareRoundRobinPolicy also explicitly disables rack prioritization for LWT in its own newQueryPlan

@dkropachev dkropachev marked this pull request as ready for review March 6, 2026 16:20
@dkropachev dkropachev self-assigned this Mar 6, 2026
@dkropachev dkropachev merged commit 0c358d2 into scylla-3.x Mar 6, 2026
11 checks passed
@dkropachev dkropachev deleted the fix-lwt-routing-preserve-replica-order-833 branch March 6, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants