WIP: statistics, executor: collect singleton sketches for row sampling by 0xPoe · Pull Request #68157 · pingcap/tidb

0xPoe · 2026-05-04T09:32:17Z

What problem does this PR solve?

Issue Number: ref #67449

Problem Summary:

Row-sampling analyze needs singleton sketches to improve NDV estimation from distributed samples.

What changed and how does it work?

Bump tipb to include row-sample singleton sketch fields.
Collect and serialize singleton sketches in row sampler.
Use per-worker singleton sketches to estimate NDV during analyze stats building.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Manual test:

make bazel_prepare
make lint

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Improve NDV estimation for row-sampling ANALYZE by collecting singleton sketches.

Summary by CodeRabbit

Improvements
- Enhanced number-of-distinct-values estimation for column statistics using sketch-based methods.
- Improved row sampling accuracy with singleton sketch mechanism for statistics collection.
Dependencies
- Updated github.com/pingcap/tipb dependency.
Tests
- Added tests for sketch-based NDV estimation and singleton sketch sampling functionality.

ti-chi-bot · 2026-05-04T09:32:21Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

ti-chi-bot · 2026-05-04T09:32:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bb7133, time-and-fate for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tiprow · 2026-05-04T09:32:26Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-05-04T09:32:26Z

📝 Walkthrough

Walkthrough

This PR updates the github.com/pingcap/tipb dependency and implements a singleton FM-sketch based mechanism for estimating column NDV (number of distinct values) in histogram sampling, integrating sketch collection, merging, and NDV estimation across the statistics and executor layers.

Changes

Dependency Update

Layer / File(s)	Summary
Go Toolchain `DEPS.bzl`, `go.mod`	`github.com/pingcap/tipb` pinned to `v0.0.0-20260414032333-da912b84de6f` (updated from `20260210113932-1447c9d7e9fe`).

Singleton FM-Sketch NDV Estimation

Layer / File(s)	Summary
Sketch Hashing `pkg/statistics/fmsketch.go`	`InsertValue` and `InsertRowValue` refactored to delegate hashing to new `hashDatum` and `hashRow` helpers, centralizing error propagation for encode/hash failures.
Sketch Collection & Merging `pkg/statistics/row_sampler.go`	`baseCollector` extended with singleton sketch builders (`singletonBuilders`, `SingletonSketches`) and warm-up–throttled sketch sampling via `sketchSampleRate`. Updated `Collect`, `collectColumns`, `collectColumnGroups`, and merge methods to build and merge singleton sketches; protobuf serialization/deserialization updated to include singleton sketches and sample counts.
NDV Estimation Integration `pkg/executor/analyze_col_sampling.go`	`buildSamplingStats` accumulates per-node sketch data and computes sketch-based NDV estimates via new `estimateNDVsBySketch`, which skips "special" indexes to avoid overwriting pushdown results. `subBuildWorker` accepts `estimateNDVs` and conditionally overrides histogram NDV when sketch estimates exceed sample-derived NDV. New helpers `copySketches` and `estimateNDVsBySketch` extract and estimate NDV from collected sketches.
Tests `pkg/executor/analyze_utils_test.go`, `pkg/statistics/sample_test.go`	`TestEstimateNDVsBySketch` validates sketch-based NDV estimation with singleton sketches and special-index handling. `SubTestRowSampleSingletonSketches` exercises row sampling with singleton sketches, validates proto round-trip, and asserts NDV correctness. Helper `mustBuildFMSketch` constructs FM sketches for testing.

Sequence Diagram

sequenceDiagram
    participant Exec as Executor
    participant Sampler as Row Sampler
    participant Builder as Sketch Builder
    participant Merger as Sketch Merger
    participant NDVEst as NDV Estimator

    Exec->>Sampler: Collect rows with FMSketch
    Sampler->>Builder: Sample & insert into singleton builders (warm-up throttled)
    Builder->>Sampler: Accumulate hashed values per column/group
    Sampler->>Merger: BuildSingletonSketches() after iteration
    Merger->>Sampler: Populate SingletonSketches from builders
    Exec->>Merger: MergeCollector() to combine node results
    Merger->>Sampler: Merge singleton sketches & sample counts
    Exec->>NDVEst: Compute estimateNDVsBySketch(root sketches, node data, special index map)
    NDVEst->>Exec: Return per-column NDV estimates (skip special indexes)
    Exec->>Exec: Override hist.NDV when sketch estimate > sample NDV

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

statistics: refine GEE estimator helper #67518: Directly uses the new sketch-based NDV estimation and GEE NDV helper (statistics.EstimateNDVByGEE) for NDV computation validation.
statistics: simplify FMSketch by removing swiss map and pool #66590: Modifies pkg/statistics/fmsketch.go hashing and FMSketch methods in parallel with this PR's hashing refactoring.
statistics: add global singleton estimation for sample-based NDV #67593: Adds complementary EstimateGlobalSingletonBySketches logic that operates on the singleton FMSketches and NDV values collected by this PR.

Suggested reviewers

time-and-fate
mjonss
henrybw

🐰 Sketches now dance in pairs, both FM and singletons fair,
NDV flows through warm-up gates, with warm embrace it calculates,
No special index left behind—just merged and merged so fine!
Histogram histograms bloom, as singleton sketches light the room. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description follows the template with required sections completed: issue reference, problem summary, changes explained, tests checked, and release note provided.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title 'WIP: statistics, executor: collect singleton sketches for row sampling' clearly and specifically describes the main changes: collecting singleton sketches within the statistics/executor packages for row sampling.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

pantheon-ai · 2026-05-04T09:35:58Z

@0xPoe I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

_{ℹ️ Learn more details on Pantheon AI.}

pantheon-ai · 2026-05-04T09:44:29Z

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

_{ℹ️ Learn more details on Pantheon AI.}

tiprow · 2026-05-04T09:46:53Z

@0xPoe: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
fast_test_tiprow	`1d78d22`	link	true	`/test fast_test_tiprow`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

pantheon-ai

✅ Code looks good. No issues found.

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

pkg/statistics/sample_test.go (1)
275-316: ⚡ Quick win

Please cover the merge path too.

This subtest only exercises build + proto round-trip. The analyze path consumes singleton sketches after FromProto() and MergeCollector(), so a deterministic case where the same value is singleton in two children and must disappear after merge would protect the new behavior much better.

As per coding guidelines, "Prefer extending existing test suites and fixtures over creating new scaffolding."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/statistics/sample_test.go` around lines 275 - 316, The test
SubTestRowSampleSingletonSketches only checks Build + proto round-trip; add a
merge/analysis path to cover the case where the same value is singleton in two
child collectors and must be removed after MergeCollector/Analyze. Create two
ReservoirRowSampleCollector instances via NewReservoirRowSampleCollector, feed
them deterministic rows so a particular value is singleton in both, call
BuildSingletonSketches on each, use ToProto/FromProto or directly call
MergeCollector to combine them, then run the same analysis path that consumes
singleton sketches and assert that the merged collector no longer treats that
value as singleton (NDV decreases/removes it) and that downstream
FromProto/MergeCollector round-trip preserves this behavior; reuse the existing
rows/colGroups and assertions in SubTestRowSampleSingletonSketches rather than
adding new scaffolding.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DEPS.bzl`:
- Around line 6585-6591: The Bazel fetch is failing because the module zip for
com_github_pingcap_tipb-v0.0.0-20260414032333-da912b84de6f referenced in
DEPS.bzl (sha256 =
"68768a27ed6c35716fcb01a0b4a15ff13e5c1a5dc11acc7a3d44ba02a2742077", strip_prefix
= "github.com/pingcap/tipb@v0.0.0-20260414032333-da912b84de6f") is not available
on the configured mirrors (the listed URLs), so either upload the module zip for
that pseudo-version to all mirrors (http://bazel-cache.pingcap.net,
http://ats.apps.svc, https://cache.hawkingrei.com,
https://storage.googleapis.com/pingcapmirror) and confirm go.sum contains the
matching hash, or revert/remove this DEPS.bzl entry until the artifact is
mirrored; after uploading or reverting, re-run the CI pipeline to verify the
fetch succeeds.

In `@pkg/executor/analyze_col_sampling.go`:
- Around line 965-973: The code currently computes a single aggregated
sampleSize from nodeSketchSampleCounts/nodeSampleSizes and passes it to
EstimateNDVByGEE(), but sketches are populated per-slice (collectColumns() may
skip nulls), so you must track and pass a per-slice sample count: change the
loop that builds nodeSketchSampleCounts to also compute per-index
actualSampleCounts[i] (the number of rows that contributed to each sketch, using
the same logic collectColumns() uses to skip nulls), and replace uses of the
aggregated sampleSize with actualSampleCounts[i] when computing
sampleNDV/singletonItems and when calling EstimateNDVByGEE() (also apply the
same fix in the similar block around lines 992-999). Ensure variable names
referenced are nodeSketchSampleCounts, nodeSampleSizes, sampleSize,
collectColumns(), EstimateNDVByGEE(), sampleNDV, and singletonItems so the
correct per-slice counts are used.

In `@pkg/statistics/row_sampler.go`:
- Around line 435-450: The current mergeSingletonSketches uses MergeFMSketch
(union), which incorrectly treats "seen-once" items as still singletons after
merge; instead preserve singleton sketches at the original child-collector
granularity by changing the storage and merge behavior: stop calling
MergeFMSketch and instead append a copy of each incoming singletonSketch to a
per-child list (e.g., change baseCollector.SingletonSketches from []*FMSketch to
[][]*FMSketch or otherwise store sketches by child index) inside
mergeSingletonSketches (use singletonSketch.Copy()), and update
buildSamplingStats/estimateNDVsBySketch consumers to iterate the per-child
sketches rather than a single merged sketch so singletonItems aren’t
double-counted.
- Around line 485-487: When deserializing singleton sketches, FMSketchFromProto
leaves maxSize == 0 which later causes shrink logic/corruption when those
sketches are merged; after calling FMSketchFromProto(pbSketch) in the loop that
populates s.SingletonSketches, set the returned sketch's maxSize to the expected
non-zero capacity (e.g. the same maxSize used for new sketches in this package
or derived from s or pbCollector) so that MergeFMSketch and
mergeSingletonSketches operate on a properly initialized sketch; update the loop
that builds s.SingletonSketches to normalize maxSize on each sketch returned by
FMSketchFromProto before appending.

---

Nitpick comments:
In `@pkg/statistics/sample_test.go`:
- Around line 275-316: The test SubTestRowSampleSingletonSketches only checks
Build + proto round-trip; add a merge/analysis path to cover the case where the
same value is singleton in two child collectors and must be removed after
MergeCollector/Analyze. Create two ReservoirRowSampleCollector instances via
NewReservoirRowSampleCollector, feed them deterministic rows so a particular
value is singleton in both, call BuildSingletonSketches on each, use
ToProto/FromProto or directly call MergeCollector to combine them, then run the
same analysis path that consumes singleton sketches and assert that the merged
collector no longer treats that value as singleton (NDV decreases/removes it)
and that downstream FromProto/MergeCollector round-trip preserves this behavior;
reuse the existing rows/colGroups and assertions in
SubTestRowSampleSingletonSketches rather than adding new scaffolding.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f100b334-ed63-4074-9cae-e68df8946d0b

📥 Commits

Reviewing files that changed from the base of the PR and between 33ae9e3 and 1d78d22.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (7)

DEPS.bzl
go.mod
pkg/executor/analyze_col_sampling.go
pkg/executor/analyze_utils_test.go
pkg/statistics/fmsketch.go
pkg/statistics/row_sampler.go
pkg/statistics/sample_test.go

coderabbitai · 2026-05-04T10:01:16Z

+        sha256 = "68768a27ed6c35716fcb01a0b4a15ff13e5c1a5dc11acc7a3d44ba02a2742077",
+        strip_prefix = "github.com/pingcap/tipb@v0.0.0-20260414032333-da912b84de6f",
        urls = [
-            "http://bazel-cache.pingcap.net:8080/gomod/github.com/pingcap/tipb/com_github_pingcap_tipb-v0.0.0-20260210113932-1447c9d7e9fe.zip",
-            "http://ats.apps.svc/gomod/github.com/pingcap/tipb/com_github_pingcap_tipb-v0.0.0-20260210113932-1447c9d7e9fe.zip",
-            "https://cache.hawkingrei.com/gomod/github.com/pingcap/tipb/com_github_pingcap_tipb-v0.0.0-20260210113932-1447c9d7e9fe.zip",
-            "https://storage.googleapis.com/pingcapmirror/gomod/github.com/pingcap/tipb/com_github_pingcap_tipb-v0.0.0-20260210113932-1447c9d7e9fe.zip",
+            "http://bazel-cache.pingcap.net:8080/gomod/github.com/pingcap/tipb/com_github_pingcap_tipb-v0.0.0-20260414032333-da912b84de6f.zip",
+            "http://ats.apps.svc/gomod/github.com/pingcap/tipb/com_github_pingcap_tipb-v0.0.0-20260414032333-da912b84de6f.zip",
+            "https://cache.hawkingrei.com/gomod/github.com/pingcap/tipb/com_github_pingcap_tipb-v0.0.0-20260414032333-da912b84de6f.zip",
+            "https://storage.googleapis.com/pingcapmirror/gomod/github.com/pingcap/tipb/com_github_pingcap_tipb-v0.0.0-20260414032333-da912b84de6f.zip",


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Bazel build is broken — mirror artifacts return 404 for the new tipb version.

The pipeline confirms that the zip for com_github_pingcap_tipb-v0.0.0-20260414032333-da912b84de6f does not exist at either the cache.hawkingrei.com or storage.googleapis.com mirror URLs. Bazel resolves URLs in order, so once all four fail the build is completely blocked.

This typically means the artifact still needs to be uploaded/mirrored before the DEPS.bzl entry can be merged. The steps are usually:

Ensure go.sum contains the correct hash for the new pseudo-version.

Upload the module zip to all four mirror locations (internal cluster mirrors + the two public caches).

Re-run the pipeline to confirm the fetch succeeds before removing the do-not-merge label.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@DEPS.bzl` around lines 6585 - 6591, The Bazel fetch is failing because the module zip for com_github_pingcap_tipb-v0.0.0-20260414032333-da912b84de6f referenced in DEPS.bzl (sha256 = "68768a27ed6c35716fcb01a0b4a15ff13e5c1a5dc11acc7a3d44ba02a2742077", strip_prefix = "github.com/pingcap/tipb@v0.0.0-20260414032333-da912b84de6f") is not available on the configured mirrors (the listed URLs), so either upload the module zip for that pseudo-version to all mirrors (http://bazel-cache.pingcap.net, http://ats.apps.svc, https://cache.hawkingrei.com, https://storage.googleapis.com/pingcapmirror) and confirm go.sum contains the matching hash, or revert/remove this DEPS.bzl entry until the artifact is mirrored; after uploading or reverting, re-run the CI pipeline to verify the fetch succeeds.

coderabbitai · 2026-05-04T10:01:16Z

+	var sampleSize uint64
+	for _, size := range nodeSketchSampleCounts {
+		sampleSize += uint64(size)
+	}
+	if sampleSize == 0 {
+		for _, size := range nodeSampleSizes {
+			sampleSize += uint64(size)
+		}
+	}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Use per-slice sketch sample counts here.

sampleSize is aggregated once for the whole collector, but the sketches are not populated uniformly: collectColumns() skips null single-column values while multi-column groups still hash them. That means sampleNDV/singletonItems for a nullable slice can be computed from fewer sampled rows than the shared sampleSize passed to EstimateNDVByGEE(), which skews the estimate. Please track the number of rows that actually contributed to each sketch and use that per i.

Also applies to: 992-999

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/executor/analyze_col_sampling.go` around lines 965 - 973, The code currently computes a single aggregated sampleSize from nodeSketchSampleCounts/nodeSampleSizes and passes it to EstimateNDVByGEE(), but sketches are populated per-slice (collectColumns() may skip nulls), so you must track and pass a per-slice sample count: change the loop that builds nodeSketchSampleCounts to also compute per-index actualSampleCounts[i] (the number of rows that contributed to each sketch, using the same logic collectColumns() uses to skip nulls), and replace uses of the aggregated sampleSize with actualSampleCounts[i] when computing sampleNDV/singletonItems and when calling EstimateNDVByGEE() (also apply the same fix in the similar block around lines 992-999). Ensure variable names referenced are nodeSketchSampleCounts, nodeSampleSizes, sampleSize, collectColumns(), EstimateNDVByGEE(), sampleNDV, and singletonItems so the correct per-slice counts are used.

coderabbitai · 2026-05-04T10:01:16Z

+func (s *baseCollector) mergeSingletonSketches(singletonSketches []*FMSketch) {
+	if len(singletonSketches) == 0 {
+		return
+	}
+	if len(s.SingletonSketches) < len(singletonSketches) {
+		s.SingletonSketches = append(s.SingletonSketches, make([]*FMSketch, len(singletonSketches)-len(s.SingletonSketches))...)
+	}
+	for i, singletonSketch := range singletonSketches {
+		if singletonSketch == nil {
+			continue
+		}
+		if s.SingletonSketches[i] == nil {
+			s.SingletonSketches[i] = singletonSketch.Copy()
+		} else {
+			s.SingletonSketches[i].MergeFMSketch(singletonSketch)
+		}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't merge singleton sketches with FM union.

singletonSketch means “seen exactly once in this partition”, but MergeFMSketch() only unions hash membership. If a value is singleton in two child collectors, it still survives in the merged sketch even though it is no longer a singleton for the merged partition. buildSamplingStats() later feeds these merged sketches into estimateNDVsBySketch(), so singletonItems is biased upward and NDV can be overstated. Please preserve singleton sketches at the original collector granularity or merge them with once/multiple state instead.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/statistics/row_sampler.go` around lines 435 - 450, The current mergeSingletonSketches uses MergeFMSketch (union), which incorrectly treats "seen-once" items as still singletons after merge; instead preserve singleton sketches at the original child-collector granularity by changing the storage and merge behavior: stop calling MergeFMSketch and instead append a copy of each incoming singletonSketch to a per-child list (e.g., change baseCollector.SingletonSketches from []*FMSketch to [][]*FMSketch or otherwise store sketches by child index) inside mergeSingletonSketches (use singletonSketch.Copy()), and update buildSamplingStats/estimateNDVsBySketch consumers to iterate the per-child sketches rather than a single merged sketch so singletonItems aren’t double-counted.

coderabbitai · 2026-05-04T10:01:16Z

+	s.SingletonSketches = make([]*FMSketch, 0, len(pbCollector.GetSingletonSketch()))
+	for _, pbSketch := range pbCollector.GetSingletonSketch() {
+		s.SingletonSketches = append(s.SingletonSketches, FMSketchFromProto(pbSketch))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Initialize deserialized singleton sketches with a non-zero maxSize.

FMSketchFromProto() leaves maxSize == 0. Here that sketch can later be copied into s.SingletonSketches and become the destination of MergeFMSketch() in mergeSingletonSketches(), which makes every insert trigger shrink logic and corrupts the sketch. Please normalize maxSize during deserialization before reusing these sketches.

Suggested fix

for _, pbSketch := range pbCollector.GetSingletonSketch() { - s.SingletonSketches = append(s.SingletonSketches, FMSketchFromProto(pbSketch)) + sketch := FMSketchFromProto(pbSketch) + if sketch != nil && sketch.maxSize == 0 { + sketch.maxSize = MaxSketchSize + } + s.SingletonSketches = append(s.SingletonSketches, sketch) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/statistics/row_sampler.go` around lines 485 - 487, When deserializing singleton sketches, FMSketchFromProto leaves maxSize == 0 which later causes shrink logic/corruption when those sketches are merged; after calling FMSketchFromProto(pbSketch) in the loop that populates s.SingletonSketches, set the returned sketch's maxSize to the expected non-zero capacity (e.g. the same maxSize used for new sketches in this package or derived from s or pbCollector) so that MergeFMSketch and mergeSingletonSketches operate on a properly initialized sketch; update the loop that builds s.SingletonSketches to normalize maxSize on each sketch returned by FMSketchFromProto before appending.

ti-chi-bot · 2026-05-04T10:01:34Z

@0xPoe: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-build-next-gen	`1d78d22`	link	true	`/test pull-build-next-gen`
idc-jenkins-ci-tidb/build	`1d78d22`	link	true	`/test build`
idc-jenkins-ci-tidb/check_dev	`1d78d22`	link	true	`/test check-dev`
idc-jenkins-ci-tidb/unit-test	`1d78d22`	link	true	`/test unit-test`
pull-unit-test-next-gen	`1d78d22`	link	true	`/test pull-unit-test-next-gen`
idc-jenkins-ci-tidb/check_dev_2	`1d78d22`	link	true	`/test check-dev2`
pull-integration-realcluster-test-next-gen	`1d78d22`	link	true	`/test pull-integration-realcluster-test-next-gen`
idc-jenkins-ci-tidb/mysql-test	`1d78d22`	link	true	`/test mysql-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

codecov · 2026-05-04T10:02:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.2950%. Comparing base (33ae9e3) to head (1d78d22).
⚠️ Report is 18 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #68157        +/-   ##
================================================
- Coverage   77.7624%   77.2950%   -0.4675%     
================================================
  Files          1990       1984         -6     
  Lines        551788     555565      +3777     
================================================
+ Hits         429084     429424       +340     
- Misses       121784     125397      +3613     
+ Partials        920        744       -176

Flag	Coverage Δ
integration	`50.7858% <ø> (+10.9839%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`60.4888% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`50.0549% <ø> (-13.0386%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 4, 2026

ti-chi-bot Bot added component/statistics sig/planner SIG: Planner labels May 4, 2026

ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 4, 2026

0xPoe marked this pull request as ready for review May 4, 2026 09:35

ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 4, 2026

0xPoe changed the title ~~statistics, executor: collect singleton sketches for row sampling~~ WIP: statistics, executor: collect singleton sketches for row sampling May 4, 2026

ti-chi-bot Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 4, 2026

0xPoe added 4 commits May 4, 2026 11:42

build(deps): bump tipb for row-sample singleton sketches

7bb6a5c

statistics: share FM sketch hashing helpers

e4a0d94

statistics: collect singleton sketches in row sampler

bc0af7b

executor: estimate analyze NDV from singleton sketches

1d78d22

0xPoe force-pushed the stats-row-sampler-singleton-sketch branch from 8a78790 to 1d78d22 Compare May 4, 2026 09:42

0xPoe marked this pull request as draft May 4, 2026 09:42

0xPoe changed the title ~~WIP: statistics, executor: collect singleton sketches for row sampling~~ statistics, executor: collect singleton sketches for row sampling May 4, 2026

0xPoe marked this pull request as ready for review May 4, 2026 09:44

ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 4, 2026

0xPoe changed the title ~~statistics, executor: collect singleton sketches for row sampling~~ WIP: statistics, executor: collect singleton sketches for row sampling May 4, 2026

ti-chi-bot Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 4, 2026

pantheon-ai Bot reviewed May 4, 2026

View reviewed changes

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Conversation

0xPoe commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot Bot commented May 4, 2026

Uh oh!

ti-chi-bot Bot commented May 4, 2026

Uh oh!

tiprow Bot commented May 4, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

pantheon-ai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pantheon-ai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tiprow Bot commented May 4, 2026

Uh oh!

pantheon-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot Bot commented May 4, 2026

Uh oh!

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

0xPoe commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading

pantheon-ai Bot commented May 4, 2026 •

edited

Loading

pantheon-ai Bot commented May 4, 2026 •

edited

Loading

codecov Bot commented May 4, 2026 •

edited

Loading