planner, statistics: rewrite dynamic partition analyze for legacy stats version#67178
Conversation
|
Review Complete Findings: 2 issues ℹ️ Learn more details on Pantheon AI. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors analyze-version resolution to remove the Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Planner as Planner
participant StatsAnalyze as StatsAnalyze
participant StatsHandle as StatsHandle
participant MySQL as MySQL
Client->>Planner: RUN ANALYZE TABLE ... (with requestedVersion)
Planner->>StatsAnalyze: ResolveAnalyzeVersion(tblInfo, requestedVersion)
StatsAnalyze->>MySQL: Read global stats for tblInfo.ID
alt partitioned table
StatsAnalyze->>MySQL: Read stats for each partition.ID
end
StatsAnalyze-->>Planner: (resolvedVersion, versionMatches)
alt versionMatches == false and dynamic partition pruning
Planner->>Planner: recompute physicalIDs/partitionNames (all partitions)
Planner-->>Client: append session warning about rewrite & analyzing all partitions
end
Planner->>StatsHandle: perform analyze tasks (scan, collect, persist)
StatsHandle->>MySQL: Update `mysql.stats_histograms` (set `stats_ver = 2`)
MySQL-->>Client: Warnings (include legacy-rewrite message)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/statistics/handle/autoanalyze/exec/exec_test.go (1)
127-143: Decouple this warning assertion from earlier log history.
recordedstill contains the warning from the non-partitioned half of the test, sorequire.Len(t, warnLogs, 2)andwarnLogs[1]make this branch depend on earlier observer state. Snapshot the baseline before thisAutoAnalyzecall, or reset the observer, and assert only the new entry.♻️ Suggested change
+ beforeWarns := len(recorded.FilterMessage("auto analyze rewrites legacy statistics version 1 to version 2").All()) require.NotPanics(t, func() { ok := exec.AutoAnalyze( sctx, handle, dom.SysProcTracker(), statistics.Version2, true, "analyze table %n partition %n", "pt", "p0", ) require.True(t, ok) }) warnLogs = recorded.FilterMessage("auto analyze rewrites legacy statistics version 1 to version 2").All() - require.Len(t, warnLogs, 2) - require.Equal(t, "analyze table `pt` partition `p0`", warnLogs[1].ContextMap()["sql"]) + require.Len(t, warnLogs, beforeWarns+1) + require.Equal(t, "analyze table `pt` partition `p0`", warnLogs[beforeWarns].ContextMap()["sql"])Based on learnings: Applies to
**/*_test.go: Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/statistics/handle/autoanalyze/exec/exec_test.go` around lines 127 - 143, The test is brittle because recorded already contains an earlier warning, so adjust the assertion around exec.AutoAnalyze to only inspect logs produced by that call: before calling exec.AutoAnalyze(sctx, handle, dom.SysProcTracker(), statistics.Version2, true, "analyze table %n partition %n", "pt", "p0"), capture a baseline (e.g., len(recorded.All()) or call recorded.Clear()/reset the observer) and then after the call filter the new messages and assert a single new warning with ContextMap()["sql"] == "analyze table `pt` partition `p0` — i.e., reference the existing recorded variable and the AutoAnalyze invocation to isolate and assert only the new log entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/statistics/handle/autoanalyze/exec/exec_test.go`:
- Around line 127-143: The test is brittle because recorded already contains an
earlier warning, so adjust the assertion around exec.AutoAnalyze to only inspect
logs produced by that call: before calling exec.AutoAnalyze(sctx, handle,
dom.SysProcTracker(), statistics.Version2, true, "analyze table %n partition
%n", "pt", "p0"), capture a baseline (e.g., len(recorded.All()) or call
recorded.Clear()/reset the observer) and then after the call filter the new
messages and assert a single new warning with ContextMap()["sql"] == "analyze
table `pt` partition `p0` — i.e., reference the existing recorded variable and
the AutoAnalyze invocation to isolate and assert only the new log entry.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 719b5d89-c457-4df0-9acb-46fc4fb6339d
📒 Files selected for processing (6)
pkg/executor/test/analyzetest/analyze_test.gopkg/planner/core/BUILD.bazelpkg/planner/core/planbuilder.gopkg/statistics/handle/autoanalyze/exec/exec_test.gopkg/statistics/handle/globalstats/global_stats.gopkg/statistics/handle/globalstats/global_stats_async.go
💤 Files with no reviewable changes (2)
- pkg/statistics/handle/globalstats/global_stats_async.go
- pkg/statistics/handle/globalstats/global_stats.go
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #67178 +/- ##
================================================
+ Coverage 77.8013% 78.5338% +0.7325%
================================================
Files 2022 1952 -70
Lines 555411 547443 -7968
================================================
- Hits 432117 429928 -2189
+ Misses 121550 117052 -4498
+ Partials 1744 463 -1281
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/planner/core/planbuilder.go (1)
2966-2969: Optional: deduplicate analyze-version mismatch handling in index paths.
buildAnalyzeIndexandbuildAnalyzeAllIndexnow have the same resolve+warning block. Consider extracting a small helper to keep warning semantics in one place and reduce drift risk.Also applies to: 2980-2983
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/planner/core/planbuilder.go` around lines 2966 - 2969, Extract the repeated resolve+warning logic into a small helper (e.g., handleAnalyzeVersionMismatch or ensureAnalyzeVersionConsistency) that calls statsHandle.ResolveAnalyzeVersion(tblInfo, version) and, when versionIsSame is false, appends the same warning via b.ctx.GetSessionVars().StmtCtx.AppendWarning(errors.NewNoStackError(...)); then replace the duplicated blocks in buildAnalyzeIndex and buildAnalyzeAllIndex with a call to that helper so the ResolveAnalyzeVersion call and the exact warning message are managed in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/planner/core/planbuilder.go`:
- Around line 2966-2969: Extract the repeated resolve+warning logic into a small
helper (e.g., handleAnalyzeVersionMismatch or ensureAnalyzeVersionConsistency)
that calls statsHandle.ResolveAnalyzeVersion(tblInfo, version) and, when
versionIsSame is false, appends the same warning via
b.ctx.GetSessionVars().StmtCtx.AppendWarning(errors.NewNoStackError(...)); then
replace the duplicated blocks in buildAnalyzeIndex and buildAnalyzeAllIndex with
a call to that helper so the ResolveAnalyzeVersion call and the exact warning
message are managed in one place.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 81feb920-5432-46c8-ad21-45840fb2da0f
📒 Files selected for processing (3)
pkg/planner/core/planbuilder.gopkg/statistics/handle/autoanalyze/autoanalyze.gopkg/statistics/handle/types/interfaces.go
0xPoe
left a comment
There was a problem hiding this comment.
🔢 Self-check (PR reviewed by myself and ready for feedback)
-
Code compiles successfully
-
Unit tests added
-
No AI-generated elegant nonsense in PR.
-
Bazel files updated
-
Comments added where necessary
-
PR title and description updated
-
Documentation PR created (or confirmed not needed)
-
PR size is reasonable
|
|
||
| tk.MustExec("analyze table t partition p0") | ||
| tk.MustQuery("show warnings").CheckContain( | ||
| "The analyze version from the session is not compatible with the existing statistics of the table. TiDB will analyze all partitions to rewrite the table statistics with the session-selected version", |
There was a problem hiding this comment.
Is this something that would happen before this change? I mean that a single partition ANALYZE is expanded to full table ANALYZE and only given a warning afterwards?
Also is this documented?
What happens if you would do a single partition ANALYZE on a non-analyzed table?
There was a problem hiding this comment.
Is this something that would happen before this change? I mean that a single partition ANALYZE is expanded to full table ANALYZE and only given a warning afterwards?
No, because we didn't handle this case at all. We just blindly merged them together, I believe.
There was a problem hiding this comment.
Also is this documented?
This will be documented in my docs PR after I finish the entire removal project.
There was a problem hiding this comment.
What happens if you would do a single partition ANALYZE on a non-analyzed table?
Check https://docs.pingcap.com/tidb/stable/system-variables/#tidb_skip_missing_partition_stats-new-in-v730
There was a problem hiding this comment.
OK, maybe we should follow tidb_skip_missing_partition_stats also when the tidb_analyze_version is not matching between partitions? I.e. if we are currently using v2, simply skip the partitions with v1 data if tidb_skip_missing_partition_stats is ON, otherwise simply fail with a descriptive error message?
If you consider this to be too much, I'm OK with simply document the behavior, since v1 is going away :)
There was a problem hiding this comment.
OK, maybe we should follow
tidb_skip_missing_partition_statsalso when thetidb_analyze_versionis not matching between partitions? I.e. if we are currently using v2, simply skip the partitions with v1 data iftidb_skip_missing_partition_statsis ON, otherwise simply fail with a descriptive error message?
This doesn't seem suitable for tidb_skip_missing_partition_stats. This setting controls the behavior when statistics are missing. However, in our case, the statistics exist but are mismatched. I believe a better approach is to clean up explicitly rather than quietly ignoring the issue until we happen to analyze this particular partition next time. I guess when users check the parition stats, they may also be confused because the stats exist, but they never get merged.
There was a problem hiding this comment.
OK, when documenting this, please consider adding a note about the possible workaround to remove the v1 stats first, to avoid a full table re-analyze during upgrade when only issuing single partition analyze.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/planner/core/tests/analyze/analyze_test.go (1)
76-84: Consider guarding partition-definition indexing for test resilience.
pi.Definitions[0]/[2]is correct for current DDL, but a small guard makes failures clearer if someone edits partition layout later.♻️ Suggested small hardening
pi := tbl.Meta().GetPartitionInfo() require.NotNil(t, pi) + require.GreaterOrEqual(t, len(pi.Definitions), 3) p0Stats := h.GetPhysicalTableStats(pi.Definitions[0].ID, tbl.Meta()) require.False(t, p0Stats.Pseudo) require.True(t, p0Stats.IsAnalyzed())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/planner/core/tests/analyze/analyze_test.go` around lines 76 - 84, The test directly indexes pi.Definitions[0] and [2]; add a guard asserting pi.Definitions has at least 3 entries before those accesses (e.g. using require.GreaterOrEqual/require.Len) so the failure is explicit if partition layout changes; then proceed to call h.GetPhysicalTableStats with pi.Definitions[0].ID and pi.Definitions[2].ID and keep the existing assertions on p0Stats and p2Stats.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/planner/core/tests/analyze/analyze_test.go`:
- Around line 76-84: The test directly indexes pi.Definitions[0] and [2]; add a
guard asserting pi.Definitions has at least 3 entries before those accesses
(e.g. using require.GreaterOrEqual/require.Len) so the failure is explicit if
partition layout changes; then proceed to call h.GetPhysicalTableStats with
pi.Definitions[0].ID and pi.Definitions[2].ID and keep the existing assertions
on p0Stats and p2Stats.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8708b1ed-48f8-45db-9e4b-01c70a5ed5ef
📒 Files selected for processing (1)
pkg/planner/core/tests/analyze/analyze_test.go
|
/retest |
| globalStats := sa.statsHandle.GetPhysicalTableStats(tblInfo.ID, tblInfo) | ||
| if _, versionMatches := statistics.ResolveAnalyzeVersionOnTable(globalStats, requestedVersion); !versionMatches { |
There was a problem hiding this comment.
Is globalStats for the given table guaranteed to never be pseudo stats?
I suppose I'm not sure what the right thing to do would be if they were: should pseudo stats always be considered a version mismatch? (Please correct me if I'm misunderstanding how pseudo stats work, though.)
There was a problem hiding this comment.
Is
globalStatsfor the given table guaranteed to never be pseudo stats?
No, it's not 100%, but in most cases, it would be non-pseudo. We will have a background worker that loads the basic modify_count and count into memory. In reality, in most cases, it would be non-pseudo. The same rule applies to partition statistics as well.
There was a problem hiding this comment.
I suppose I'm not sure what the right thing to do would be if they were: should pseudo stats always be considered a version mismatch? (Please correct me if I'm misunderstanding how pseudo stats work, though.)
I believe that in most cases, it is acceptable since the initial stats will load all metadata after the TiDB instance has started. So normally, it would be non-pseudo.
|
/retest |
|
I was able to reproduce a blocker on the PR head with a minimal static-prune test. Repro outline:
Expected: Actual on this branch: Minimal regression test used locallyfunc TestAnalyzePartitionStaticVersionMismatchDoesNotExpandColumnScope(t *testing.T) {
store, dom := testkit.CreateMockStoreAndDomain(t)
tk := testkit.NewTestKit(t, store)
testkit.WithPruneMode(tk, variable.Static, func() {
tk.MustExec("use test")
tk.MustExec("set @@session.tidb_analyze_version = 2")
tk.MustExec(`create table t (a int, b int, c int, primary key(a))
partition by range (a) (
partition p0 values less than (10),
partition p1 values less than (20)
)`)
tk.MustExec("insert into t values (1, 1, 1), (2, 2, 2), (11, 11, 11), (12, 12, 12)")
tbl, err := dom.InfoSchema().TableByName(context.Background(), ast.NewCIStr("test"), ast.NewCIStr("t"))
require.NoError(t, err)
tblInfo := tbl.Meta()
pi := tblInfo.GetPartitionInfo()
require.NotNil(t, pi)
h := dom.StatsHandle()
tk.MustExec("analyze table t partition p0 columns b")
tk.MustExec("analyze table t partition p1")
tk.MustExec("update mysql.stats_histograms set stats_ver = 1 where table_id = ?", pi.Definitions[1].ID)
h.Clear()
require.NoError(t, h.Update(context.Background(), dom.InfoSchema(), pi.Definitions[0].ID, pi.Definitions[1].ID))
p0HistSQL := fmt.Sprintf(
"select hist_id, stats_ver from mysql.stats_histograms where table_id = %d and is_index = 0 order by hist_id",
pi.Definitions[0].ID,
)
tk.MustQuery(p0HistSQL).Check(testkit.Rows(
fmt.Sprintf("%d 2", tblInfo.Columns[0].ID),
fmt.Sprintf("%d 2", tblInfo.Columns[1].ID),
))
tk.MustExec("analyze table t partition p0 columns b")
tk.MustQuery(p0HistSQL).Check(testkit.Rows(
fmt.Sprintf("%d 2", tblInfo.Columns[0].ID),
fmt.Sprintf("%d 2", tblInfo.Columns[1].ID),
))
})
}I verified it locally with:
and it fails on this branch with: expected:
[1 2]
[2 2]
actual:
[1 2]
[2 2]
[3 2]The root cause looks like the new I think the fix should split the checks:
A regression test along the lines of |
|
/hold For #67178 (comment) |
6e4473a to
6b1a264
Compare
|
@lybcodes: The specified target(s) for Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
…ts version planner: remove redundant tableStatsGetter helper interface fix: update bazel Signed-off-by: 0xPoe <techregister@pm.me> fix: better API design Signed-off-by: 0xPoe <techregister@pm.me> fix: better comment Signed-off-by: 0xPoe <techregister@pm.me> fix: add sapce Signed-off-by: 0xPoe <techregister@pm.me> fix: add more checks Signed-off-by: 0xPoe <techregister@pm.me> fix: better code Signed-off-by: 0xPoe <techregister@pm.me> fix: update bazel Signed-off-by: 0xPoe <techregister@pm.me> fix: better code Signed-off-by: 0xPoe <techregister@pm.me> fix: update broken tests Signed-off-by: 0xPoe <techregister@pm.me> fix: update bazel Signed-off-by: 0xPoe <techregister@pm.me> fix: update broken tests Signed-off-by: 0xPoe <techregister@pm.me> fix: static mode mismatch Signed-off-by: 0xPoe <techregister@pm.me> chore: update bazel file fix: better code Signed-off-by: 0xPoe <techregister@pm.me> fix Signed-off-by: 0xPoe <techregister@pm.me> fix Signed-off-by: 0xPoe <techregister@pm.me> fix: add more comments Signed-off-by: 0xPoe <techregister@pm.me>
ccba023 to
4edecda
Compare
mjonss
left a comment
There was a problem hiding this comment.
LGTM, with some minor change requests.
|
/hold Please feel free to unhold, when addressed my review comments. |
Signed-off-by: 0xPoe <techregister@pm.me>
Signed-off-by: 0xPoe <techregister@pm.me>
Signed-off-by: 0xPoe <techregister@pm.me>
Relax the assertion to match the stable partial-stats prefix instead of depending on specific eviction detail strings. Signed-off-by: 0xPoe <techregister@pm.me>
Signed-off-by: 0xPoe <techregister@pm.me>
mjonss
left a comment
There was a problem hiding this comment.
LGTM, and thank you for also reflecting the changes in the new function names!
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: henrybw, mjonss The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold |
|
/retest |
|
@0xPoe: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: ref #63579
Problem Summary:
Dynamic partition analyze could rebuild global stats with the session-selected analyze version while some persisted table or partition stats were still on legacy version 1.
What changed and how does it work?
When dynamic
ANALYZE TABLE ... PARTITION ...sees a stats-version mismatch, it now expands to all partitions before the global-stats merge. This PR also adds manual-analyze and auto-analyze regressions for the rewrite path and removes the related staleFIXMEs.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
Bug Fixes
Refactor
Tests