*: bump tipb and refactor analyze sampling helpers by 0xPoe · Pull Request #68414 · pingcap/tidb

0xPoe · 2026-05-15T14:41:02Z

What problem does this PR solve?

Issue Number: ref #67449

Problem Summary:

Split from #68157

Splitting prep work off from the larger analyze NDV-rate change so review can focus on each piece. Three independent commits, no behavior change:

Bump github.com/pingcap/tipb to pick up the merged singleton-sketch fields (proto: add NDV rate to analyze request tipb#410).
Share FM sketch hashing helpers so future callers can reuse the same hash path.
Rename l → totalLen in analyze sampling for consistency with the row sampler, dropping a redundant redeclaration.

What changed and how does it work?

Pure dep bump + refactors.

Check List

Tests

Unit test

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

None

Summary by CodeRabbit

Chores
- Updated internal dependencies to the latest revisions.
Refactor
- Optimized internal code structures for sampling statistics and hashing operations to improve consistency and maintainability.
Tests
- Adjusted test execution configuration.

Match the name the row sampler already uses for the columns + column groups slot count, and drop the redundant later redeclaration that recomputed the same value from e.colsInfo/e.indexes.

pantheon-ai · 2026-05-15T14:41:08Z

@0xPoe I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

coderabbitai · 2026-05-15T15:00:17Z

📝 Walkthrough

Walkthrough

This PR updates the tipb dependency revision and refactors statistics hashing and column sampling code. FMSketch hashing logic is extracted into reusable helper functions, and the column sampling pipeline is updated to compute and propagate a consistent totalLen parameter throughout.

Changes

Statistics Hashing and Sampling Updates

Layer / File(s)	Summary
Dependency and build configuration updates `DEPS.bzl`, `go.mod`, `cmd/tidb-server/BUILD.bazel`	tipb dependency pinned to new revision in Bazel and Go module metadata; lumberjack marked as indirect; test shard count increased from 6 to 7 for parallel execution.
FMSketch hashing helper extraction `pkg/statistics/fmsketch.go`	`InsertValue` and `InsertRowValue` refactored to use new `hashDatum` and `hashRow` helper functions that return uint64 hash values; hashing and encoding logic centralized with consistent error handling via `errors.Trace` wrapping.
Column sampling parameter consistency `pkg/executor/analyze_col_sampling.go`	`buildSamplingStats` now computes `totalLen` early from `ColumnsInfo` and `ColumnGroups` and passes it throughout; `subMergeWorker` signature updated to accept `totalLen` instead of separate `l` parameter, used consistently to size all per-worker and per-task sample collectors.

Sequence Diagram(s)

(Diagrams included in the hidden review stack artifact above.)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

pingcap/tidb#68158: Updates the same github.com/pingcap/tipb dependency in DEPS.bzl and go.mod to a different pinned revision, indicating coordinated dependency management work.

Suggested labels

component/statistics, size/M, release-note-none

Suggested reviewers

henrybw
terry1purcell
mjonss
time-and-fate

Poem

🐰 The stats hop in, with hashes so clean,
Extracting helpers like a dream,
Sampling flows with totalLen in sight,
Dependencies bump, and tests shard right!
A rabbit's gift: refactored delight. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title "*: bump tipb and refactor analyze sampling helpers" clearly and concisely summarizes the main changes: a dependency bump and refactoring work across the codebase.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description includes all major required sections: problem statement with issue reference, summary of changes, test confirmation, side effects, and release note.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-15T15:00:31Z

Codecov Report

❌ Patch coverage is 64.28571% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.0520%. Comparing base (4598d48) to head (9ac8f8b).
⚠️ Report is 44 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #68414        +/-   ##
================================================
- Coverage   77.7123%   76.0520%   -1.6604%     
================================================
  Files          1991       2019        +28     
  Lines        552087     579691     +27604     
================================================
+ Hits         429040     440867     +11827     
- Misses       122127     136908     +14781     
- Partials        920       1916       +996

Flag	Coverage Δ
integration	`43.7467% <64.2857%> (+3.9449%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`60.4888% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`49.3113% <ø> (-13.7821%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

🧹 Nitpick comments (1)

pkg/statistics/fmsketch.go (1)

142-178: 💤 Low value

Consider consistent error handling style between helpers.

Both hashDatum (line 144) and hashRow (line 168) handle encoding errors, but they use slightly different patterns:

hashDatum calls sc.HandleError(err) directly
hashRow calls errCtx.HandleError(err) after errCtx := sc.ErrCtx()

While functionally equivalent, using a consistent style across both helpers would improve readability.

♻️ Align error handling style

Option 1: Both use sc.HandleError directly (simpler):

 func hashRow(sc *stmtctx.StatementContext, values []types.Datum) (uint64, error) {
 	b := make([]byte, 0, 8)
 	hashFunc := murmur3Pool.Get().(hash.Hash64)
 	hashFunc.Reset()
 	defer murmur3Pool.Put(hashFunc)
 
-	errCtx := sc.ErrCtx()
 	for _, v := range values {
 		b = b[:0]
 		b, err := codec.EncodeValue(sc.TimeZone(), b, v)
-		err = errCtx.HandleError(err)
+		err = sc.HandleError(err)
 		if err != nil {
 			return 0, err
 		}

Option 2: Both use sc.ErrCtx() (explicit):

 func hashDatum(sc *stmtctx.StatementContext, value types.Datum) (uint64, error) {
+	errCtx := sc.ErrCtx()
 	bytes, err := codec.EncodeValue(sc.TimeZone(), nil, value)
-	err = sc.HandleError(err)
+	err = errCtx.HandleError(err)
 	if err != nil {
 		return 0, err
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/statistics/fmsketch.go` around lines 142 - 178, Align the error-handling
style by using StatementContext.ErrCtx() consistently: in hashDatum replace the
direct sc.HandleError(err) call with errCtx := sc.ErrCtx() and err =
errCtx.HandleError(err) (mirroring hashRow), ensuring you preserve the same
early-return behavior on error; leave hashRow as-is (it already uses
errCtx.HandleError). This makes hashDatum and hashRow consistent while keeping
their behavior unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/statistics/fmsketch.go`:
- Around line 142-178: Align the error-handling style by using
StatementContext.ErrCtx() consistently: in hashDatum replace the direct
sc.HandleError(err) call with errCtx := sc.ErrCtx() and err =
errCtx.HandleError(err) (mirroring hashRow), ensuring you preserve the same
early-return behavior on error; leave hashRow as-is (it already uses
errCtx.HandleError). This makes hashDatum and hashRow consistent while keeping
their behavior unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cf7849dc-e1e7-4d62-8082-1e615f6965cd

📥 Commits

Reviewing files that changed from the base of the PR and between f4369ec and 9ac8f8b.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (5)

DEPS.bzl
cmd/tidb-server/BUILD.bazel
go.mod
pkg/executor/analyze_col_sampling.go
pkg/statistics/fmsketch.go

0xPoe

🔢 Self-check (PR reviewed by myself and ready for feedback)

Code compiles successfully
Unit tests added
No AI-generated elegant nonsense in PR.
Comments added where necessary
PR title and description updated
Documentation PR created (or confirmed not needed)
PR size is reasonable

/cc @winoros @time-and-fate

0xPoe · 2026-05-18T09:27:05Z

/cc bb7133
/assign bb7133

bb7133

LGTM

ti-chi-bot · 2026-05-18T18:19:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bb7133, time-and-fate

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [bb7133,time-and-fate]
~~pkg/executor/OWNERS~~ [time-and-fate]
~~pkg/statistics/OWNERS~~ [time-and-fate]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-05-18T18:19:47Z

[LGTM Timeline notifier]

Timeline:

2026-05-18 06:37:31.558018353 +0000 UTC m=+159781.062149029: ☑️ agreed by time-and-fate.
2026-05-18 18:19:45.570539778 +0000 UTC m=+201915.074670454: ☑️ agreed by bb7133.

0xPoe · 2026-05-18T19:10:24Z

Thanks for your review! 💚 💙 💜 💛 ❤️

0xPoe added 3 commits May 15, 2026 16:39

*: bump tipb for analyze NDV rate

fea3bd4

statistics: share FM sketch hashing helpers

448c01a

executor: rename l to totalLen in analyze sampling

9ac8f8b

Match the name the row sampler already uses for the columns + column groups slot count, and drop the redundant later redeclaration that recomputed the same value from e.colsInfo/e.indexes.

ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 15, 2026

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

0xPoe commented May 15, 2026

View reviewed changes

ti-chi-bot Bot requested review from time-and-fate and winoros May 15, 2026 15:22

time-and-fate approved these changes May 18, 2026

View reviewed changes

ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 18, 2026

ti-chi-bot Bot assigned bb7133 May 18, 2026

ti-chi-bot Bot requested a review from bb7133 May 18, 2026 09:27

bb7133 approved these changes May 18, 2026

View reviewed changes

ti-chi-bot Bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 18, 2026

ti-chi-bot Bot merged commit 2b9527b into pingcap:master May 18, 2026
37 checks passed

coderabbitai Bot mentioned this pull request May 19, 2026

statistics: collect singleton sketches in row sampler #68499

Open

13 tasks

Conversation

0xPoe commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

pantheon-ai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

0xPoe left a comment

Choose a reason for hiding this comment

Uh oh!

0xPoe commented May 18, 2026

Uh oh!

bb7133 left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented May 18, 2026

Uh oh!

ti-chi-bot Bot commented May 18, 2026

[LGTM Timeline notifier]

Uh oh!

Uh oh!

0xPoe commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

0xPoe commented May 15, 2026 •

edited by coderabbitai Bot

Loading

pantheon-ai Bot commented May 15, 2026 •

edited

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading

codecov Bot commented May 15, 2026 •

edited

Loading