Skip to content

*: bump tipb and refactor analyze sampling helpers#68414

Merged
ti-chi-bot[bot] merged 3 commits into
pingcap:masterfrom
0xPoe:tipb-bump-and-refactor
May 18, 2026
Merged

*: bump tipb and refactor analyze sampling helpers#68414
ti-chi-bot[bot] merged 3 commits into
pingcap:masterfrom
0xPoe:tipb-bump-and-refactor

Conversation

@0xPoe
Copy link
Copy Markdown
Member

@0xPoe 0xPoe commented May 15, 2026

What problem does this PR solve?

Issue Number: ref #67449

Problem Summary:

Split from #68157

Splitting prep work off from the larger analyze NDV-rate change so review can focus on each piece. Three independent commits, no behavior change:

  • Bump github.com/pingcap/tipb to pick up the merged singleton-sketch fields (proto: add NDV rate to analyze request tipb#410).
  • Share FM sketch hashing helpers so future callers can reuse the same hash path.
  • Rename ltotalLen in analyze sampling for consistency with the row sampler, dropping a redundant redeclaration.

What changed and how does it work?

Pure dep bump + refactors.

Check List

Tests

  • Unit test

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • Chores

    • Updated internal dependencies to the latest revisions.
  • Refactor

    • Optimized internal code structures for sampling statistics and hashing operations to improve consistency and maintainability.
  • Tests

    • Adjusted test execution configuration.

Review Change Stack

0xPoe added 3 commits May 15, 2026 16:39
Match the name the row sampler already uses for the columns + column
groups slot count, and drop the redundant later redeclaration that
recomputed the same value from e.colsInfo/e.indexes.
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 15, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 15, 2026

@0xPoe I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. component/statistics sig/planner SIG: Planner release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/needs-linked-issue labels May 15, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

This PR updates the tipb dependency revision and refactors statistics hashing and column sampling code. FMSketch hashing logic is extracted into reusable helper functions, and the column sampling pipeline is updated to compute and propagate a consistent totalLen parameter throughout.

Changes

Statistics Hashing and Sampling Updates

Layer / File(s) Summary
Dependency and build configuration updates
DEPS.bzl, go.mod, cmd/tidb-server/BUILD.bazel
tipb dependency pinned to new revision in Bazel and Go module metadata; lumberjack marked as indirect; test shard count increased from 6 to 7 for parallel execution.
FMSketch hashing helper extraction
pkg/statistics/fmsketch.go
InsertValue and InsertRowValue refactored to use new hashDatum and hashRow helper functions that return uint64 hash values; hashing and encoding logic centralized with consistent error handling via errors.Trace wrapping.
Column sampling parameter consistency
pkg/executor/analyze_col_sampling.go
buildSamplingStats now computes totalLen early from ColumnsInfo and ColumnGroups and passes it throughout; subMergeWorker signature updated to accept totalLen instead of separate l parameter, used consistently to size all per-worker and per-task sample collectors.

Sequence Diagram(s)

(Diagrams included in the hidden review stack artifact above.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • pingcap/tidb#68158: Updates the same github.com/pingcap/tipb dependency in DEPS.bzl and go.mod to a different pinned revision, indicating coordinated dependency management work.

Suggested labels

component/statistics, size/M, release-note-none

Suggested reviewers

  • henrybw
  • terry1purcell
  • mjonss
  • time-and-fate

Poem

🐰 The stats hop in, with hashes so clean,
Extracting helpers like a dream,
Sampling flows with totalLen in sight,
Dependencies bump, and tests shard right!
A rabbit's gift: refactored delight. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title "*: bump tipb and refactor analyze sampling helpers" clearly and concisely summarizes the main changes: a dependency bump and refactoring work across the codebase.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description includes all major required sections: problem statement with issue reference, summary of changes, test confirmation, side effects, and release note.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 64.28571% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.0520%. Comparing base (4598d48) to head (9ac8f8b).
⚠️ Report is 44 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68414        +/-   ##
================================================
- Coverage   77.7123%   76.0520%   -1.6604%     
================================================
  Files          1991       2019        +28     
  Lines        552087     579691     +27604     
================================================
+ Hits         429040     440867     +11827     
- Misses       122127     136908     +14781     
- Partials        920       1916       +996     
Flag Coverage Δ
integration 43.7467% <64.2857%> (+3.9449%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <ø> (ø)
parser ∅ <ø> (∅)
br 49.3113% <ø> (-13.7821%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/statistics/fmsketch.go (1)

142-178: 💤 Low value

Consider consistent error handling style between helpers.

Both hashDatum (line 144) and hashRow (line 168) handle encoding errors, but they use slightly different patterns:

  • hashDatum calls sc.HandleError(err) directly
  • hashRow calls errCtx.HandleError(err) after errCtx := sc.ErrCtx()

While functionally equivalent, using a consistent style across both helpers would improve readability.

♻️ Align error handling style

Option 1: Both use sc.HandleError directly (simpler):

 func hashRow(sc *stmtctx.StatementContext, values []types.Datum) (uint64, error) {
 	b := make([]byte, 0, 8)
 	hashFunc := murmur3Pool.Get().(hash.Hash64)
 	hashFunc.Reset()
 	defer murmur3Pool.Put(hashFunc)
 
-	errCtx := sc.ErrCtx()
 	for _, v := range values {
 		b = b[:0]
 		b, err := codec.EncodeValue(sc.TimeZone(), b, v)
-		err = errCtx.HandleError(err)
+		err = sc.HandleError(err)
 		if err != nil {
 			return 0, err
 		}

Option 2: Both use sc.ErrCtx() (explicit):

 func hashDatum(sc *stmtctx.StatementContext, value types.Datum) (uint64, error) {
+	errCtx := sc.ErrCtx()
 	bytes, err := codec.EncodeValue(sc.TimeZone(), nil, value)
-	err = sc.HandleError(err)
+	err = errCtx.HandleError(err)
 	if err != nil {
 		return 0, err
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/statistics/fmsketch.go` around lines 142 - 178, Align the error-handling
style by using StatementContext.ErrCtx() consistently: in hashDatum replace the
direct sc.HandleError(err) call with errCtx := sc.ErrCtx() and err =
errCtx.HandleError(err) (mirroring hashRow), ensuring you preserve the same
early-return behavior on error; leave hashRow as-is (it already uses
errCtx.HandleError). This makes hashDatum and hashRow consistent while keeping
their behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/statistics/fmsketch.go`:
- Around line 142-178: Align the error-handling style by using
StatementContext.ErrCtx() consistently: in hashDatum replace the direct
sc.HandleError(err) call with errCtx := sc.ErrCtx() and err =
errCtx.HandleError(err) (mirroring hashRow), ensuring you preserve the same
early-return behavior on error; leave hashRow as-is (it already uses
errCtx.HandleError). This makes hashDatum and hashRow consistent while keeping
their behavior unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cf7849dc-e1e7-4d62-8082-1e615f6965cd

📥 Commits

Reviewing files that changed from the base of the PR and between f4369ec and 9ac8f8b.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (5)
  • DEPS.bzl
  • cmd/tidb-server/BUILD.bazel
  • go.mod
  • pkg/executor/analyze_col_sampling.go
  • pkg/statistics/fmsketch.go

Copy link
Copy Markdown
Member Author

@0xPoe 0xPoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔢 Self-check (PR reviewed by myself and ready for feedback)

  • Code compiles successfully

  • Unit tests added

  • No AI-generated elegant nonsense in PR.

  • Comments added where necessary

  • PR title and description updated

  • Documentation PR created (or confirmed not needed)

  • PR size is reasonable

/cc @winoros @time-and-fate

@ti-chi-bot ti-chi-bot Bot requested review from time-and-fate and winoros May 15, 2026 15:22
@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 18, 2026
@0xPoe
Copy link
Copy Markdown
Member Author

0xPoe commented May 18, 2026

/cc bb7133
/assign bb7133

@ti-chi-bot ti-chi-bot Bot requested a review from bb7133 May 18, 2026 09:27
Copy link
Copy Markdown
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bb7133, time-and-fate

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 18, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 18, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-18 06:37:31.558018353 +0000 UTC m=+159781.062149029: ☑️ agreed by time-and-fate.
  • 2026-05-18 18:19:45.570539778 +0000 UTC m=+201915.074670454: ☑️ agreed by bb7133.

@ti-chi-bot ti-chi-bot Bot merged commit 2b9527b into pingcap:master May 18, 2026
37 checks passed
@0xPoe
Copy link
Copy Markdown
Member Author

0xPoe commented May 18, 2026

Thanks for your review! 💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved component/statistics lgtm release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants