(improvement): LWT routing plan cache#769
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves the hot-path performance of token-aware routing for LWT queries by introducing a zero-allocation LWT pick path and caching immutable per-statement routing metadata (routingPlan) for lock-free reads across queries. It also removes duplicated routing-key marshaling logic by unifying routing key creation into a single helper.
Changes:
- Add
pickLWTReplicasfor deterministic, allocation-minimized LWT host ordering (local-first, then remote tiers, then fallback). - Cache per-statement
routingPlaninSession.routingPlans(sync.Map) and expose it viaqueryRoutingInfo.plan(atomic.Pointer) for lock-free reads. - Unify routing key construction into
createRoutingKey(indexes, types, values)and update Query/Batch to use it.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
policies.go |
Adds LWT-specific pick iterator and avoids replica cloning for LWT queries. |
session.go |
Introduces routingPlan + session-level cache, atomically stored plan pointer, and unified routing key creation. |
conn.go |
Skips redundant routingInfo writes when plan exists; invalidates plan cache on UNPREPARED; uses qry.Keyspace()/Table() for tablet hint decoding. |
policies_test.go |
Adds unit tests + benchmarks covering LWT ordering, fallback behavior, rack-aware tiers, and high-RF behavior. |
session_unit_test.go |
Adds unit tests + benchmarks for routingPlan fast-path reads and unified createRoutingKey. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes query routing for lightweight transactions (LWT) in the GoCQL driver by introducing a dedicated low-allocation LWT replica pick path and caching immutable per-statement routing metadata to reduce hot-path locking and repeated metadata work.
Changes:
- Add a dedicated LWT replica selection path in
tokenAwareHostPolicy.Pick()with deterministic local-before-remote ordering and reduced allocations. - Introduce a per-
Sessionsync.Maprouting plan cache and a per-queryatomic.Pointerto enable lock-free reads for routing metadata (isLWT, partitioner, keyspace/table). - Unify routing key construction into a single
createRoutingKey(indexes, types, values)implementation.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
policies.go |
Adds pickLWTReplicas and routes LWT queries through it to reduce allocations and enforce deterministic ordering. |
session.go |
Adds routingPlans cache + atomic plan pointer on queries/batches and unifies routing-key creation. |
conn.go |
Avoids redundant routingInfo writes when a plan exists; invalidates cached plan(s) on RequestErrUnprepared. |
policies_test.go |
Adds unit tests + benchmarks covering LWT pick ordering/dedup and plan-based LWT flag behavior. |
session_unit_test.go |
Adds unit tests + benchmarks for routing plan reads and unified createRoutingKey. |
Comments suppressed due to low confidence (1)
session.go:1464
- When a routing plan is present but its keyspace is the empty string, this code skips the existing routingInfo.keyspace fallback and goes straight to session.cfg.Keyspace. That can change behavior vs the pre-plan path (and differs from Batch.Keyspace(), which still checks routingInfo.keyspace when plan.keyspace is empty). Consider falling back to routingInfo.keyspace when plan.keyspace is empty to preserve prior semantics and keep Query/Batch consistent.
func (q *Query) Keyspace() string {
if q.getKeyspace != nil {
return q.getKeyspace()
}
if p := q.routingInfo.plan.Load(); p != nil {
if p.keyspace != "" {
return p.keyspace
}
} else if q.routingInfo.keyspace != "" {
return q.routingInfo.keyspace
}
if q.session == nil {
return ""
}
// TODO(chbannis): this should be parsed from the query or we should let
// this be set by users.
return q.session.cfg.Keyspace
}
25bf253 to
7c4f3a1
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes token-aware host selection and routing-key computation, focusing on LWT (lightweight transaction) queries and prepared-statement routing metadata reuse across queries in the GoCQL driver.
Changes:
- Add a dedicated LWT replica-pick path (
pickLWTReplicas) to reduce allocations and enforce deterministic local-before-remote ordering. - Introduce a bounded per-session
routingPlancache (routingPlanLRU) and per-queryatomic.Pointerto enable lock-free reads of LWT/partitioner/keyspace/table on hot paths. - Unify routing-key construction logic into a single
createRoutingKey(indexes, types, values)implementation with bounds validation, and invalidate cached routing metadata onUNPREPARED.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| topology.go | Documents immutability requirement for replica slices used by the LWT pick fast-path. |
| session.go | Adds routingPlan + LRU cache + atomic plan pointer usage; unifies routing-key creation and adds validation. |
| conn.go | Avoids redundant routingInfo writes when plan is present; invalidates routingPlan/routingKeyInfo caches on UNPREPARED; uses qry.Keyspace()/Table() for tablet hints. |
| policies.go | Adds LWT-specific pick path, avoids replica slice clone for LWT, and guards against nil host from empty token ring. |
| session_unit_test.go | Adds unit tests and benchmarks for routingPlan cache behavior, plan-based accessors, and createRoutingKey validation. |
| policies_test.go | Adds extensive tests/benchmarks for LWT pick ordering, fallback behavior, and the nil-host guard. |
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [9]*HostInfo array — sized for NTS with 3 DCs at RF=3 each (9 replicas). Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR scylladb#769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR scylladb#769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
There was a problem hiding this comment.
Pull request overview
This PR optimizes token-aware host selection for LWT queries and reduces routing-metadata overhead by introducing a cached, immutable per-statement routing plan and a zero-allocation LWT replica selection path, plus a small safety fix for empty token rings.
Changes:
- Add a dedicated
pickLWTReplicaspath for deterministic, allocation-reduced LWT host iteration (and guard nil host on empty token ring). - Cache immutable
routingPlanobjects per statement in a bounded LRU and expose lock-free reads viaatomic.PointerinqueryRoutingInfo. - Unify routing-key construction into a single
createRoutingKey(indexes, types, values)helper and update Query/Batch routing-key lookup to use the cached plan.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
topology.go |
Documents the immutability contract for hostTokens.hosts relied on by the new LWT pick path. |
policies.go |
Introduces pickLWTReplicas, avoids cloning replicas for LWT, and guards against nil host from an empty token ring. |
conn.go |
Avoids redundant routingInfo writes when a plan is present; invalidates routing caches on RequestErrUnprepared; uses qry.Keyspace()/Table() for tablet hint parsing. |
session.go |
Adds routingPlanLRU + Session.getRoutingPlan, atomic plan pointer in queryRoutingInfo, and unifies routing key creation with added bounds checks. |
session_unit_test.go |
Adds unit tests and benchmarks for routing-plan reads, cache behavior, and routing-key creation validation. |
policies_test.go |
Adds extensive unit tests and benchmarks for the LWT pick behavior, determinism, tier ordering, and nil-host guard. |
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR scylladb#769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
7c4f3a1 to
9c9482b
Compare
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR scylladb#769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR scylladb#769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR scylladb#769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR scylladb#769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR scylladb#769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
a8ca47b to
f1f4746
Compare
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR #769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR scylladb#769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
Replace the per-query heap-allocated map[*HostInfo]bool (used for deduplication in the Pick() closure) with an inline hostSet backed by a fixed [4]*HostInfo array — sized for RF=3-4, covering most production deployments. Overflow is handled gracefully (silently skipping tracking) with no correctness impact. Replace shuffleHosts() (which copies into a new slice and uses the global math/rand mutex) with shuffleHostsInPlace() using math/rand/v2 for lock-free operation. Replace the healthyReplicas/unhealthyReplicas make+append pattern with partitionHealthy(), which performs an in-place stable partition using a small stack buffer. Add conditional cloning so replicas are only copied when mutation (shuffle or slow-replica avoidance) is actually needed, otherwise the ring's slice is referenced directly. Combined savings: ~250-500 bytes per token-aware query. Note: This change targets the non-LWT path. PR #769 adds a dedicated pickLWTReplicas() function for LWT queries. Both changes are compatible and complementary — this PR handles the non-LWT hot path.
f1f4746 to
8088d6a
Compare
|
it is a bad idea to persist plan, it may change on node events or when driver get new tablets |
It may be sub-optimal. Node changes - such as failures? or additions? The latter is immaterial, the former I can look into. New tablets - already handled by the server. Note that this PR has multiple independent items. Should I extract them separately anyway? (need to resolve multiple conflicts anyway here!) |
|
Closing - will return to this some day. |
Summary
pickLWTReplicas) that avoids heap allocations for the used-host map and replica clone, with deterministic local-before-remote replica orderingroutingPlanstructs per prepared statement in a bounded LRU (routingPlanLRU), enabling lock-free reads forisLWT(),getPartitioner(),Keyspace(), andTable()viaatomic.PointercreateRoutingKey/createRoutingKeyFromPlaninto a single function, eliminating ~30 lines of duplicated marshaling logicGetHostForTokenon empty token ring (pre-existing bug, independent fix)Commits
91f2b1b1227ce93ccddad23b6806Design
Commit 1: pickLWTReplicas
[9]*HostInfoarray for dedup tracking; lazily allocates amap[*HostInfo]boolonly if replicas exceed 9isLWTcheck hoisted before replica lookuppolicyHostTierhelper deduplicates tier computation between LWT and non-LWT pathshostTokens.hosts(LWT path relies on this to skip cloning)Commit 2: routingPlan cache
routingPlanis an immutable struct holdingkeyspace,table,partitioner,lwt,indexes,typesSession.routingPlanCache(routingPlanLRU)routingPlanLRUwrapsinternal/lru.CachewithGet/Put/Removemethods;Putimplements get-or-store semantics under a single mutex lock to handle concurrent racesMaxRoutingKeyInfo(default 1000) to prevent unbounded memory growthqueryRoutingInfo.planis anatomic.Pointer[routingPlan]— when set, hot-path readers bypass theRWMutexentirelyconn.go*RequestErrUnpreparedhandlers invalidate bothroutingPlanCacheandroutingKeyInfoCacheentries, plus the per-query/batch plan pointer, so recursive re-execution does not see stale plan data after schema changesBatch.Keyspace()now consults the cachedroutingPlanbefore falling back to session default, enabling tablet routing for batches (previously a no-op sinceBatch.Table()returned""without a plan)Commit 3: createRoutingKey unification
createRoutingKey(indexes []int, types []TypeInfo, values []interface{})replaces both old functionsQuery.GetRoutingKey()andBatch.GetRoutingKey()extractindexes/typesfrom cached planCommit 4: nil host guard (pre-existing bug fix)
tokenRing.tokensis empty,GetHostForTokenreturns nil. The old code unconditionally wrapped this in a[]*HostInfo{nil}slice, which propagated topolicyHostTier(nil, ...)→fallback.IsLocal(nil)→nil.DataCenter()→ panicTests
34 new unit tests covering:
All tests pass with
-race.Benchmark Results
Benchmarks run on 12th Gen Intel Core i7-1270P,
go test -tags unit -bench=... -benchtime=1s -count=5. All values below are medians of 5 runs.Commit 1: LWT Pick Path — First Host Selection
The baseline for LWT queries is
NonLWT(the standard pick path), since before this PR all queries — including LWT — used the same code path.Commit 1: LWT Pick Path — Full Drain (all 12 hosts)
Commit 2: routingPlan Cache — isLWT() Hot Path
The baseline is
WithoutPlan(the existingRWMutex.RLockpath).WithPlanuses the newatomic.Pointerlock-free read.Commit 2: routingPlan Cache — getPartitioner() Hot Path
Commit 2: routingPlan Cache — LRU Lookup
Commit 3: createRoutingKey (Unified)
No regression from unification — identical performance to the previous two-function implementation:
Pre-existing Issues (not addressed)
TestInitialRetryPolicyfails on master due topeers_v2error message mismatch — unrelatedTestShardAwarePortMockedNoReconnectionsflaky — unrelatedKeyspace()andTable()reading mutex-protected fields without lock is pre-existing, not a regression