planner, copr: estimate TiCI fulltext row count | tidb-test=13ccf8de48e8db2290ff884598444d0508606bbf tiflash=feature-fts#68056
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds end-to-end TiCI (TiFlash Compute Index) count-estimation: new KV request/interface, planner hook to call estimator, copr store implementation with shard grouping/sampling and RPC, driver forwarding, unit tests, and dependency version bumps in go.mod. Changes
Sequence DiagramsequenceDiagram
participant Planner as Planner
participant Driver as TiKV Driver
participant Store as Copr Store
participant Cache as TiCI Shard Cache
participant TiFlash as TiFlash Endpoint
Planner->>Planner: validate TiCI prerequisites (FTS, index, ranges)
Planner->>Driver: EstimateTiCICount(ctx, req, 50ms)
Driver->>Store: EstimateTiCICount(ctx, req, 50ms)
Store->>Cache: Get shards for table/index
Cache-->>Store: shard list with local cache addrs
Store->>Store: group shards by local addr & dedupe IDs
Store->>Store: choose best shard group
Store->>Store: marshal FTS query into protobuf request
Store->>TiFlash: CmdGetEstimateTiCICount (with timeout)
TiFlash-->>Store: estimate response or error
Store->>Store: scale estimate by totalUnique/sampledUnique (ceil)
Store-->>Driver: scaled estimate
Driver-->>Planner: scaled estimate (planner caps by realtime count)
alt TiCI unavailable or error
Planner->>Planner: fallback to heuristic min(realtime/10,1000)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @wshwsh12. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
pkg/planner/core/stats.go (1)
252-260: Also gate onpath.Index.IsTiCIIndex()to avoid wasted 50ms RPCs on non-TiCI FTS paths.The current guard accepts any path with
FtsQueryInfo != nil, butTiCIIndexRangesToKVRangesand the downstreamCmdGetEstimateTiCICountare TiCI-shard-specific. If a non-TiCI FTS path ever reaches here (legacy TiFlash FTS, future FTS index types, or an unfilled TiCI shard layout), the planner will pay up to the 50ms timeout per such path before falling back to the heuristic, andgetTiCIShardTypemay return a misleading shard type for the KV-range conversion.Adding the index-kind check makes the contract with
getTiCIShardTypeexplicit and short-circuits cleanly for non-TiCI indexes:♻️ Suggested change
func deriveTiCISearchPathStats(ds *logicalop.DataSource, path *util.AccessPath) (float64, bool) { sctx, ok := ds.SCtx().(sessionctx.Context) if !ok || path == nil || path.Index == nil || path.FtsQueryInfo == nil || len(path.Ranges) == 0 { return 0, false } + if !path.Index.IsTiCIIndex() { + return 0, false + } provider, ok := sctx.GetStore().(kv.TiCIEstimateCountProvider)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/planner/core/stats.go` around lines 252 - 260, In deriveTiCISearchPathStats add an early guard that checks path.Index.IsTiCIIndex() and return (0, false) when it is false so we only call TiCI-specific logic for TiCI indexes; specifically, before obtaining the TiCI estimate provider or invoking TiCIIndexRangesToKVRanges / CmdGetEstimateTiCICount (and before calling getTiCIShardType), ensure path != nil && path.Index != nil && path.Index.IsTiCIIndex() to short-circuit non-TiCI FTS paths and avoid the 50ms RPC overhead and misleading shard-type conversions.pkg/store/copr/tici_estimate_count.go (2)
97-106: Consider populatingkvrpcpb.Contextwith at least request-source/priority for observability and QoS.The request is sent with
Context: &kvrpcpb.Context{}, which means the TiFlash side won't see a request source tag, SQL digest, priority, or resource-control context for this RPC. Other coprocessor calls in this package set these fields so that estimation traffic can be observed, throttled, and attributed correctly. Since TiCI estimate calls run on the planner hot path, having them invisible to TiFlash-side metrics/throttling is a missed signal.If there is a session/statement context reachable here (or one that can be threaded via
req), populate at minimumRequestSourceandPriority; otherwise consider letting the TiFlash request-source SDK attach them via the existing wrappers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/store/copr/tici_estimate_count.go` around lines 97 - 106, The TiCIEstimateCountRequest is built with an empty kvrpcpb.Context, which omits RequestSource/Priority and prevents TiFlash from observing/throttling this RPC; update the request construction in the function that returns &coprocessor.TiCIEstimateCountRequest{} so that the Context is populated (kvrpcpb.Context) with at least RequestSource and Priority taken from the available session/statement context or from fields on req (or by calling the same helper/wrapper used by other coprocessor calls in this package), ensuring the request carries the same observability/QoS metadata as other coprocessor RPCs.
85-89: Minor: dead branch in scaling check.
sampledUniqueShards == 0cannot trigger here —group.shardInfosandgroup.uniqueShardIDare populated in lockstep inbuildTiCIEstimateShardGroups, and the earlier guardlen(group.shardInfos) == 0already errors out before this point. The expression simplifies tosampledUniqueShards >= totalUniqueShards. Dropping the== 0clause makes the invariant clearer and removes the suggestion that a zero-divisor case needs handling.♻️ Suggested simplification
- sampledUniqueShards := uint64(len(group.uniqueShardID)) - if sampledUniqueShards == 0 || sampledUniqueShards >= totalUniqueShards { + sampledUniqueShards := uint64(len(group.uniqueShardID)) + if sampledUniqueShards >= totalUniqueShards { return pbResp.EstCount, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/store/copr/tici_estimate_count.go` around lines 85 - 89, The guard that checks sampledUniqueShards currently includes a dead branch `sampledUniqueShards == 0`; since `group.shardInfos` and `group.uniqueShardID` are populated together in buildTiCIEstimateShardGroups and an earlier check already rejects empty groups, simplify the condition in the scaling check to only `sampledUniqueShards >= totalUniqueShards` (i.e., replace `if sampledUniqueShards == 0 || sampledUniqueShards >= totalUniqueShards` with `if sampledUniqueShards >= totalUniqueShards`) and update any nearby comment to reflect the invariant that sampledUniqueShards cannot be zero.pkg/store/copr/tici_estimate_count_test.go (1)
78-92: Optional: extend coverage to the validation/short-circuit branches ofEstimateTiCICount.The helper functions are well covered, but the public method
(*Store).EstimateTiCICountitself has untested guard branches (nil request, nilFTSQueryInfo, nil/emptyKeyRanges, missing TiCI shard cache). These can be exercised by passing minimally constructed inputs without needing a real RPC client and would lock in the documented short-circuit contract:
- nil request → error "TiCI estimate count request is nil"
- nil FTSQueryInfo → error "TiCI estimate count request missing FTS query info"
- nil/empty KeyRanges → returns
(0, nil)(no RPC)Not required for this PR, but cheap to add and protects the early-return semantics that callers in
pkg/planner/core/stats.gorely on.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/store/copr/tici_estimate_count_test.go` around lines 78 - 92, Add unit tests for the public method (*Store).EstimateTiCICount to cover its early-return validation branches: create tests that (1) pass a nil request and assert the error message equals "TiCI estimate count request is nil", (2) pass a request with nil FTSQueryInfo and assert the error message equals "TiCI estimate count request missing FTS query info", and (3) pass a request with nil or empty KeyRanges and assert it returns (0, nil) without attempting RPC. Put these tests alongside TestBuildTiCIEstimatePBRequest in pkg/store/copr/tici_estimate_count_test.go, instantiate a Store without a real RPC client (nil) and with minimal TiCI shard cache state so tests exercise only the short-circuit logic in EstimateTiCICount.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Around line 381-384: The go.mod currently contains temporary replace
directives pointing to a personal fork (github.com/wshwsh12/kvproto and
github.com/wshwsh12/client-go/v2); remove these replaces by upstreaming the
protobuf and client changes into pingcap/kvproto and tikv/client-go/v2, then
update the corresponding require versions (the current require entries around
the former Lines 104 and 127) to the new upstream tags/versions and delete the
two replace blocks, and finally run make bazel_prepare and include the generated
Bazel metadata changes in the PR so the workspace stays in sync.
In `@pkg/planner/core/stats.go`:
- Around line 276-284: The call to provider.EstimateTiCICount currently uses
context.Background(), which detaches cancellation/tracing from the planner flow;
instead propagate a parent context (e.g., the caller's ctx derived from sctx)
into provider.EstimateTiCICount so upstream cancellations and tracing are
honoured. Update deriveSearchPathStats (or its callers) to accept and pass a
context parameter down to the EstimateTiCICount invocation (replacing
context.Background()), so provider.EstimateTiCICount(...) receives the
propagated ctx while keeping the 50ms timeout/deadline applied.
- Around line 290-299: The Info-level log inside deriveTiCISearchPathStats (the
logutil.BgLogger().Info(...) call that logs "TiCI estimate count succeeded" with
fields like tableID, indexName, indexID, readTS, estimatedCount,
plannerCountAfterAccess, realtimeCount, rangeCount) should be demoted to Debug
to avoid flooding production logs on the planner hot path; locate that call in
pkg/planner/core/stats.go and replace the Info invocation with Debug (or gate it
behind a feature/log-level check) while keeping the same structured fields and
the existing return (plannerCount, true).
---
Nitpick comments:
In `@pkg/planner/core/stats.go`:
- Around line 252-260: In deriveTiCISearchPathStats add an early guard that
checks path.Index.IsTiCIIndex() and return (0, false) when it is false so we
only call TiCI-specific logic for TiCI indexes; specifically, before obtaining
the TiCI estimate provider or invoking TiCIIndexRangesToKVRanges /
CmdGetEstimateTiCICount (and before calling getTiCIShardType), ensure path !=
nil && path.Index != nil && path.Index.IsTiCIIndex() to short-circuit non-TiCI
FTS paths and avoid the 50ms RPC overhead and misleading shard-type conversions.
In `@pkg/store/copr/tici_estimate_count_test.go`:
- Around line 78-92: Add unit tests for the public method
(*Store).EstimateTiCICount to cover its early-return validation branches: create
tests that (1) pass a nil request and assert the error message equals "TiCI
estimate count request is nil", (2) pass a request with nil FTSQueryInfo and
assert the error message equals "TiCI estimate count request missing FTS query
info", and (3) pass a request with nil or empty KeyRanges and assert it returns
(0, nil) without attempting RPC. Put these tests alongside
TestBuildTiCIEstimatePBRequest in pkg/store/copr/tici_estimate_count_test.go,
instantiate a Store without a real RPC client (nil) and with minimal TiCI shard
cache state so tests exercise only the short-circuit logic in EstimateTiCICount.
In `@pkg/store/copr/tici_estimate_count.go`:
- Around line 97-106: The TiCIEstimateCountRequest is built with an empty
kvrpcpb.Context, which omits RequestSource/Priority and prevents TiFlash from
observing/throttling this RPC; update the request construction in the function
that returns &coprocessor.TiCIEstimateCountRequest{} so that the Context is
populated (kvrpcpb.Context) with at least RequestSource and Priority taken from
the available session/statement context or from fields on req (or by calling the
same helper/wrapper used by other coprocessor calls in this package), ensuring
the request carries the same observability/QoS metadata as other coprocessor
RPCs.
- Around line 85-89: The guard that checks sampledUniqueShards currently
includes a dead branch `sampledUniqueShards == 0`; since `group.shardInfos` and
`group.uniqueShardID` are populated together in buildTiCIEstimateShardGroups and
an earlier check already rejects empty groups, simplify the condition in the
scaling check to only `sampledUniqueShards >= totalUniqueShards` (i.e., replace
`if sampledUniqueShards == 0 || sampledUniqueShards >= totalUniqueShards` with
`if sampledUniqueShards >= totalUniqueShards`) and update any nearby comment to
reflect the invariant that sampledUniqueShards cannot be zero.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c3434051-b885-4ffb-a722-1395afc8eb4c
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (6)
go.modpkg/kv/tici_estimate.gopkg/planner/core/stats.gopkg/store/copr/tici_estimate_count.gopkg/store/copr/tici_estimate_count_test.gopkg/store/driver/tikv_driver.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## feature/fts #68056 +/- ##
====================================================
- Coverage 76.8610% 39.4385% -37.4226%
====================================================
Files 1960 1716 -244
Lines 555677 475548 -80129
====================================================
- Hits 427099 187549 -239550
- Misses 127116 271099 +143983
- Partials 1462 16900 +15438
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
cb36c32 to
ae87825
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
go.mod (2)
104-382:⚠️ Potential issue | 🟠 MajorPlease include Bazel metadata sync for this
go.modupdateThis PR changes
go.mod, but no Bazel metadata updates are shown in the provided diff. Please runmake bazel_prepareand include generated changes if any.#!/bin/bash set -euo pipefail # 1) Confirm go module files changed git diff --name-only | rg -n '^(go\.mod|go\.sum)$' || true # 2) Check whether Bazel metadata files are included in this diff git diff --name-only | rg -n '(^DEPS\.bzl$|(^|/)BUILD\.bazel$|\.bzl$)' || trueBased on learnings: "Applies to /*.bazel|/BUILD.bazel|**/*.bzl : MUST run
make bazel_prepareand include resulting Bazel metadata changes in the PR when ... changinggo.mod/go.sum."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go.mod` around lines 104 - 382, The PR updated go.mod but omitted Bazel metadata; run the workspace preparation step (make bazel_prepare) to regenerate Bazel metadata, then stage and commit the resulting Bazel files (e.g., DEPS.bzl, any changed BUILD.bazel or *.bzl files) alongside the go.mod/go.sum updates; verify with git diff that go.mod/go.sum changes and any generated Bazel files are included before pushing.
382-382:⚠️ Potential issue | 🔴 CriticalBlocker: personal-fork
client-goreplace must not shipLine 382 still redirects
github.com/tikv/client-go/v2to a personal fork. This is a supply-chain and availability risk for release artifacts. Please switch back to upstreamgithub.com/tikv/client-go/v2after the required TiCI RPC changes are available there, then remove thisreplace.#!/bin/bash set -euo pipefail # Verify whether any personal-fork replace directives remain. rg -n '^\s*replace\s+github\.com/tikv/client-go/v2\s*=>\s*github\.com/wshwsh12/client-go/v2' go.mod rg -n '^\s*replace\s+github\.com/pingcap/kvproto\s*=>\s*github\.com/wshwsh12/kvproto' go.mod || true # Show current upstream requirement pinned in go.mod. rg -n '^\s*github\.com/tikv/client-go/v2\s+v' go.mod🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@go.mod` at line 382, The go.mod currently contains a replace directive redirecting github.com/tikv/client-go/v2 to a personal fork (the replace line shown) which must be removed; edit go.mod to delete the replace entry for github.com/tikv/client-go/v2 => github.com/wshwsh12/client-go/v2 and ensure the module requires the upstream github.com/tikv/client-go/v2 (restore or pin the intended upstream version if needed), then remove any corresponding replace for github.com/pingcap/kvproto if present and re-run `go mod tidy` to verify there are no remaining personal-fork replaces.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@go.mod`:
- Around line 104-382: The PR updated go.mod but omitted Bazel metadata; run the
workspace preparation step (make bazel_prepare) to regenerate Bazel metadata,
then stage and commit the resulting Bazel files (e.g., DEPS.bzl, any changed
BUILD.bazel or *.bzl files) alongside the go.mod/go.sum updates; verify with git
diff that go.mod/go.sum changes and any generated Bazel files are included
before pushing.
- Line 382: The go.mod currently contains a replace directive redirecting
github.com/tikv/client-go/v2 to a personal fork (the replace line shown) which
must be removed; edit go.mod to delete the replace entry for
github.com/tikv/client-go/v2 => github.com/wshwsh12/client-go/v2 and ensure the
module requires the upstream github.com/tikv/client-go/v2 (restore or pin the
intended upstream version if needed), then remove any corresponding replace for
github.com/pingcap/kvproto if present and re-run `go mod tidy` to verify there
are no remaining personal-fork replaces.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 317dfe08-8a24-475e-862d-d5b1ad886860
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (1)
go.mod
Signed-off-by: wshwsh12 <793703860@qq.com>
ae87825 to
3b00f60
Compare
Signed-off-by: wshwsh12 <793703860@qq.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/planner/core/stats.go (2)
252-256: Type assertion silently disables TiCI estimation in non-sessionctx.Contextcases.
ds.SCtx()returnsbase.PlanContext; the assertion tosessionctx.Contextwill succeed in production but may quietly fail in tests/mock contexts that supply only the planner-facing subset, in which case TiCI estimation is skipped without any log. If that is intentional (TiCI estimation is best-effort and the heuristic fallback is acceptable), consider adding a one-line comment documenting the intent so future readers don’t mistake the earlyreturn 0, falsefor an oversight.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/planner/core/stats.go` around lines 252 - 256, deriveTiCISearchPathStats currently type-asserts ds.SCtx() to sessionctx.Context in deriveTiCISearchPathStats which causes TiCI estimation to be silently skipped in test/mocked PlanContexts; add a one-line clarifying comment above the assertion in deriveTiCISearchPathStats (referencing ds.SCtx() and sessionctx.Context) that this is an intentional best-effort check and that non-sessionctx.PlanContext cases (e.g., mocks) will skip TiCI estimation, so future readers won’t treat the early return as a bug.
188-241: Deduplicate TiCI shard-type derivation.
getTiCIShardTypereproduces the same precedence (HybridInfo/Sharding → CommonHandle → IntHandle) thatfillIndexPathderives inline at lines 190-206. Having two copies invites drift if the rules ever change (e.g., a new shard variant is added in one place but not the other). Consider routingfillIndexPaththrough the new helper so there is a single source of truth.♻️ Suggested refactor
func fillIndexPath(ds *logicalop.DataSource, path *util.AccessPath, conds []expression.Expression, possiblePK *model.IndexInfo) error { ticiType := distsql.NotTiCIIndex if path.Index.IsTiCIIndex() { - if path.Index.HybridInfo != nil && path.Index.HybridInfo.Sharding != nil { - ticiType = distsql.TiCIShardExtraShardingKey + ticiType = getTiCIShardType(ds, path) + switch ticiType { + case distsql.TiCIShardExtraShardingKey: path.Ranges = ranger.FullRange() - } else if ds.TableInfo.IsCommonHandle { - ticiType = distsql.TiCIShardCommonHandle + case distsql.TiCIShardCommonHandle: path.Ranges = ranger.FullNotNullRange() - } else { - ticiType = distsql.TiCIShardIntHandle + default: // TiCIShardIntHandle // Int Handle's range is a special one. unsignedFlag := false - // We will not get the column for the _tidb_rowid case. if intHandle := ds.TableInfo.GetPkColInfo(); intHandle != nil { unsignedFlag = mysql.HasUnsignedFlag(intHandle.GetFlag()) } path.Ranges = ranger.FullIntRange(unsignedFlag) }Also applies to: 306-314
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/planner/core/stats.go` around lines 188 - 241, The code in fillIndexPath duplicates TiCI shard-type logic (HybridInfo/Sharding → CommonHandle → IntHandle) that getTiCIShardType encapsulates; refactor fillIndexPath to call getTiCIShardType(ds, path.Index, possiblePK) (or the appropriate signature of the helper) to obtain ticiType and then handle only the side effects (setting path.Ranges for each ticiType branch) and the subsequent column conversions (expression.TiCIIndexInfo2ShardCols / IndexInfo2Cols), removing the inline precedence checks; ensure the IntHandle unsignedFlag logic (using ds.TableInfo.GetPkColInfo and mysql.HasUnsignedFlag) is preserved when applying the IntHandle case and keep the existing updates to path.IdxCols/IdxColLens/FullIdxCols/FullIdxColLens and the subsequent detachCondAndBuildRangeForPath call so behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/planner/core/stats.go`:
- Around line 280-292: The synchronous call to provider.EstimateTiCICount in
stats.go can block planning up to the hard-coded 50ms timeout per FTS path; add
a failure short-lived negative cache or simple circuit-breaker around
EstimateTiCICount (cached failure state keyed by the TiCI provider or
table/index + backoff TTL and failure threshold) so consecutive failures quickly
skip the RPC and return the fallback, and make the timeout configurable (expose
it as a session variable or config value used instead of the hard-coded
50*time.Millisecond) so callers of EstimateTiCICount and the callsite in this
file can use the tunable deadline.
---
Nitpick comments:
In `@pkg/planner/core/stats.go`:
- Around line 252-256: deriveTiCISearchPathStats currently type-asserts
ds.SCtx() to sessionctx.Context in deriveTiCISearchPathStats which causes TiCI
estimation to be silently skipped in test/mocked PlanContexts; add a one-line
clarifying comment above the assertion in deriveTiCISearchPathStats (referencing
ds.SCtx() and sessionctx.Context) that this is an intentional best-effort check
and that non-sessionctx.PlanContext cases (e.g., mocks) will skip TiCI
estimation, so future readers won’t treat the early return as a bug.
- Around line 188-241: The code in fillIndexPath duplicates TiCI shard-type
logic (HybridInfo/Sharding → CommonHandle → IntHandle) that getTiCIShardType
encapsulates; refactor fillIndexPath to call getTiCIShardType(ds, path.Index,
possiblePK) (or the appropriate signature of the helper) to obtain ticiType and
then handle only the side effects (setting path.Ranges for each ticiType branch)
and the subsequent column conversions (expression.TiCIIndexInfo2ShardCols /
IndexInfo2Cols), removing the inline precedence checks; ensure the IntHandle
unsignedFlag logic (using ds.TableInfo.GetPkColInfo and mysql.HasUnsignedFlag)
is preserved when applying the IntHandle case and keep the existing updates to
path.IdxCols/IdxColLens/FullIdxCols/FullIdxColLens and the subsequent
detachCondAndBuildRangeForPath call so behavior is unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dba5b70b-61bb-4439-b3ba-8a9d3c170cab
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (2)
go.modpkg/planner/core/stats.go
✅ Files skipped from review due to trivial changes (1)
- go.mod
| count, err := provider.EstimateTiCICount(ctx, &kv.TiCIEstimateCountRequest{ | ||
| TableID: tableID, | ||
| IndexID: path.Index.ID, | ||
| StartTS: readTS, | ||
| FTSQueryInfo: path.FtsQueryInfo, | ||
| KeyRanges: keyRanges, | ||
| TimeZoneName: tzName, | ||
| TimeZoneOffset: tzOffset, | ||
| }, 50*time.Millisecond) | ||
| if err != nil { | ||
| logutil.BgLogger().Warn("failed to estimate TiCI search path row count", zap.Error(err), zap.Int64("tableID", tableID), zap.Int64("indexID", path.Index.ID)) | ||
| return 0, false | ||
| } |
There was a problem hiding this comment.
Synchronous TiCI RPC adds up to 50 ms per FTS path during planning.
EstimateTiCICount is a blocking call on the planner thread guarded only by a 50 ms timeout. When TiFlash/TiCI is unhealthy (network partition, restart, slow node) every FTS query that reaches this code will eat the full deadline before falling back, and a query with multiple FTS access paths will pay this cost more than once. The fallback at the call site is correct, but consider:
- A short-lived negative cache or circuit breaker on consecutive failures so a degraded TiCI does not regress every concurrent planner.
- Surfacing the timeout as a tunable (session var or config) — 50 ms may be too tight for cross-AZ deployments and too loose under heavy load.
Not a blocker for this PR, but worth tracking before the feature is enabled by default.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/planner/core/stats.go` around lines 280 - 292, The synchronous call to
provider.EstimateTiCICount in stats.go can block planning up to the hard-coded
50ms timeout per FTS path; add a failure short-lived negative cache or simple
circuit-breaker around EstimateTiCICount (cached failure state keyed by the TiCI
provider or table/index + backoff TTL and failure threshold) so consecutive
failures quickly skip the RPC and return the fallback, and make the timeout
configurable (expose it as a session variable or config value used instead of
the hard-coded 50*time.Millisecond) so callers of EstimateTiCICount and the
callsite in this file can use the tunable deadline.
9debcad to
289077e
Compare
winoros
left a comment
There was a problem hiding this comment.
planner part lgtm.
Enough for the 1st executable version.
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
|
/retest |
|
@wshwsh12: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: wshwsh12 <793703860@qq.com>
Signed-off-by: wshwsh12 <793703860@qq.com>
| golang.org/x/text v0.33.0 | ||
| golang.org/x/sys v0.41.0 | ||
| golang.org/x/term v0.40.0 | ||
| golang.org/x/text v0.34.0 |
There was a problem hiding this comment.
why there is some other changes?
There was a problem hiding this comment.
kvproto has updated these dependencies.... It was not caused by my change, but by dependency updates introduced by other commits on the kvproto master branch.
| clientReq := tikvrpc.NewRequest(tikvrpc.CmdGetEstimateTiCICount, pbReq) | ||
| clientReq.StoreTp = getEndPointType(kv.TiFlash) |
There was a problem hiding this comment.
these codes looks too low level to me, i think tidb already have a unified code to handle these?
There was a problem hiding this comment.
I don't find a unified code to handle these.
| if !ok { | ||
| return 0, false | ||
| } | ||
| readTS, err := sessiontxn.GetTxnManager(sctx).GetStmtReadTS() |
There was a problem hiding this comment.
will this request a new start_ts from pd?
There was a problem hiding this comment.
No. But this would move the stage of obtaining read_ts from the execution phase to the optimizer phase, which is indeed a problem. Considering that getting the estimated row count from TiCI does not actually require read_ts, I decided to remove this.
Signed-off-by: wshwsh12 <793703860@qq.com>
|
@wshwsh12: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@wshwsh12: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: windtalker, winoros The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
What problem does this PR solve?
Issue Number: None
Problem Summary:
TiDB currently does not ask TiCI/TiFlash for fulltext predicate cardinality, so the optimizer cannot use TiCI-side sampled estimate count for fulltext access paths.
What changed and how does it work?
Check List
Tests
Manual test:
tici-estimate-skew-20260426-2240.EXPLAIN SELECT COUNT(*) FROM estimate_skew WHERE MATCH(content) AGAINST ('heavyhit' IN BOOLEAN MODE);.4321; exact match count was4000; total table row count was9400.TiCI estimate count succeeded,GetEstimateTiCICount done, and TiCIEstimateCountResult { estimated_total_count: 4321, ... }.Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Chores
Tests