Skip to content

planner: Add memory overhead for HashAgg on TiKV#67953

Merged
ti-chi-bot[bot] merged 16 commits intopingcap:masterfrom
terry1purcell:hashaggcost
May 5, 2026
Merged

planner: Add memory overhead for HashAgg on TiKV#67953
ti-chi-bot[bot] merged 16 commits intopingcap:masterfrom
terry1purcell:hashaggcost

Conversation

@terry1purcell
Copy link
Copy Markdown
Contributor

@terry1purcell terry1purcell commented Apr 22, 2026

What problem does this PR solve?

Issue Number: close #68083

Problem Summary:

What changed and how does it work?

planner: fix HashAgg cost model — place hash table memory cost outside concurrency division

Previously, getPlanCostVer24PhysicalHashAgg divided ALL costs (agg, group, hash-build, hash-probe) by the concurrency factor (default 5). This gave HashAgg an artificial
discount on hash table memory, even though each partial worker maintains its own hash table and total memory does not decrease with more workers.

Root cause

The cost model treated hash table memory the same as CPU work — dividing it by concurrency. But while CPU work (aggregation, grouping, hash key computation, probing) is
genuinely parallelized across partial workers that each process a disjoint subset of input rows, memory is not: each partial worker builds its own hash table, so total memory
scales with concurrency * outputRows * rowSize.

Fix

For TiDB root and TiKV cop tasks, split the hash build cost into its CPU and memory components:

  • CPU work (agg, group, hash key computation, probe) stays inside /concurrency — partial workers genuinely parallelize this
  • Hash table memory (concurrency * outputRows * rowSize * memFactor) is placed outside the concurrency division

New formula (root/cop):

plan-cost = child-cost + concurrency * hashMemCost + (aggCost + groupCost + hashBuildCPUCost + hashProbeCost) / concurrency

MPP (unchanged):

plan-cost = child-cost + (aggCost + groupCost + hashBuildCost + hashProbeCost) / concurrency

Impact

  • High-NDV GROUP BY with wide output rows (outputRows ≈ inputRows): memory dominates, optimizer correctly prefers StreamAgg when an ordered index is available, reducing memory consumption from potentially 100MB–1GB (full hash table per worker) to ~1.5MB (streaming over ordered index output).
  • Low-NDV or no GROUP BY (e.g. count(*), sum(a), avg(a)): outputRows is small (~1), so the memory penalty is negligible. HashAgg retains its parallelism advantage and is still preferred. This is correct — these queries hash to a single value and don't suffer from large hash tables.
  • TiFlash MPP tasks: unchanged. Data is truly partitioned across nodes, so the original formula (divide everything by concurrency) is preserved.

Regression test

TestHashAggMemCostNotDividedByConcurrency verifies that StreamAgg is cheaper than HashAgg for a 100%-NDV GROUP BY with wide output rows and an available index.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Bug Fixes

    • Improved query cost estimation for aggregation operations, fixing memory cost calculation to prevent double-counting and better reflect execution efficiency across different task types.
  • Tests

    • Added regression test for aggregation cost model accuracy.
    • Updated query plan expectations across multiple test suites to reflect optimized execution strategies.

…ency division

Previously, getPlanCostVer24PhysicalHashAgg divided ALL costs (agg, group,
hash-build, hash-probe) by the concurrency factor (default 5). This gave
HashAgg a 5x artificial discount over StreamAgg even for single-threaded
scenarios where aggregation work is not reduced by parallelism — each input
row is processed exactly once regardless of worker count.

Root cause: threads share the same input stream so aggCost and groupCost
represent the total work done, not per-thread work. Only the hash table
operations (build + probe) genuinely benefit from parallel execution.

Fix: for TiDB root and TiKV cop tasks, move aggCost and groupCost outside
the concurrency division, matching the pattern already used by hash join
where build-side work sits outside /concurrency. The new formula is:

  plan-cost = child-cost + agg-cost + group-cost + (hash-build-cost + hash-probe-cost) / concurrency

For TiFlash MPP tasks, data is truly partitioned across nodes so each node
handles a disjoint subset of rows — the original formula (divide everything
by concurrency) is preserved to avoid disrupting MPP plan choices.

Impact: high-NDV GROUP BY on an indexed column now correctly prefers
StreamAgg over HashAgg, matching the optimizer's intent and reducing memory
consumption from potentially 100MB-1GB (full hash table) to ~1.5MB (streaming
over ordered index output). Low-NDV cases see minimal cost change.

Add regression test TestHashAggMemCostNotDividedByConcurrency that verifies
StreamAgg is cheaper than HashAgg for a 100%-NDV GROUP BY with an available
index.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Apr 22, 2026

@terry1purcell I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 22, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 22, 2026

Hi @terry1purcell. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request modifies the hash aggregation cost calculation in the query planner to differentiate memory and CPU cost handling between MPP and non-MPP task types, causing plan optimization decisions to shift from hash aggregation to stream aggregation in certain scenarios. Test fixtures across multiple test suites are updated to reflect the new expected plan outputs.

Changes

Cohort / File(s) Summary
Binary Plan Cost Updates
pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_out.json, binary_plan_suite_xut.json
Expected plan cost values updated for explain analyze format = 'binary' select sum(t.a) from t join t2 test case (cost increased from 2007109896.2117057 to 2007109934.6117058).
Enforce MPP Aggregation Changes
pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_out.json, enforce_mpp_suite_xut.json
Root aggregation operator changed from HashAgg to StreamAgg for select count(distinct b), json_objectagg(d,c) from t; in TestMPPSingleDistinct3Stage.
MPP Integration Plan Updates
pkg/planner/core/casetest/mpp/testdata/integration_suite_out.json, integration_suite_xut.json
Removed stream_count: 8 annotations from HashJoin and HashAgg nodes and their associated ExchangeReceiver/ExchangeSender nodes in TestMppFineGrainedJoinAndAgg test cases.
Physical Plan Test Updates
pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_out.json, plan_suite_xut.json
Restructured IndexHashJoin sub-tree plan for TestSemiJoinToInner, changing IndexReader-HashAgg nesting structure within the Apply pipeline.
Core Planner Test Fixtures
pkg/planner/core/casetest/testdata/integration_suite_out.json, integration_suite_xut.json, json_plan_suite_out.json, json_plan_suite_xut.json, plan_normalized_suite_out.json, plan_normalized_suite_xut.json
Multiple fixture updates: cost/operator ID renumbering, aggregation operator changes from HashAgg to StreamAgg, execution timing adjustments, and operator identifier synchronization across test cases.
Hash Aggregation Cost Calculation
pkg/planner/core/plan_cost_ver2.go
Modified getPlanCostVer24PhysicalHashAgg to apply different cost formulas for MPP vs non-MPP task types: for non-MPP tasks, memory cost (hashMemCost) is computed separately outside concurrency division, while CPU costs are divided by concurrency; MPP path retains original logic.
Cost Model Regression Test
pkg/planner/core/plan_cost_ver2_test.go
Added TestHashAggMemCostNotDividedByConcurrency to verify stream aggregation cost is lower than hash aggregation for high-NDV GROUP BY workloads.
Integration Test Result Updates
tests/integrationtest/r/access_path_selection.result, clustered_index.result, collation_agg_func_disabled.result, collation_agg_func_enabled.result, planner/cascades/integration.result
Updated explain format='plan_tree' output across multiple test suites to reflect aggregation operator changes from HashAgg to StreamAgg and plan restructuring in cascades optimization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • qw4990
  • guo-shaoge
  • AilinKid

Poem

🐰 A fuzzy quest through cost and care,
We counted coins both here and there,
Stream flows cheaper, hashes find their place,
New numbers dance at perfect pace! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed PR description comprehensively covers problem, root cause, solution, impact, and regression test with detailed explanations.
Title check ✅ Passed The title 'planner: Add memory overhead for HashAgg on TiKV' directly aligns with the main change: the cost model for HashAgg now treats memory costs differently for TiKV tasks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/planner/core/plan_cost_ver2_test.go (1)

815-860: LGTM — test directly validates the cost-model fix.

The test correctly targets the claim: with 100% NDV on an indexed column, HashAgg's hash-table memory cost (now not divided by concurrency) makes StreamAgg on the ordered index scan cheaper. Running under RunTestUnderCascadesWithDomain also confirms the behavior holds under both cascades off and on, and the per-iteration store/domain isolates session state — so no explicit cost-factor reset is needed.

Optional nit: the hot-loop buf.WriteString(fmt.Sprintf(...)) on lines 836–841 can be fmt.Fprintf(&buf, "(%d,%d)", i, i) to avoid the intermediate string per row. Trivial, purely stylistic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cost_ver2_test.go` around lines 815 - 860, Replace the
hot-loop string allocation inside TestHashAggMemCostNotDividedByConcurrency:
locate the loop that appends each row using
buf.WriteString(fmt.Sprintf("(%d,%d)", i, i)) and change it to use
fmt.Fprintf(&buf, "(%d,%d)", i, i) to avoid an intermediate fmt.Sprintf
allocation per iteration while preserving exact output.
pkg/planner/core/plan_cost_ver2.go (1)

622-637: Align the explanatory comments with the implemented formula.

Line 631 and Lines 673-674 say the hash-join/build-side work sits outside /concurrency, but Lines 675-676 still divide hashBuildCost. The formulas also omit startCost, which is added in both branches.

Proposed comment cleanup
-//	plan-cost = child-cost + agg-cost + group-cost + (hash-build-cost + hash-probe-cost) / concurrency
+//	plan-cost = start-cost + child-cost + agg-cost + group-cost + (hash-build-cost + hash-probe-cost) / concurrency
@@
-// Only the hash table operations (build + probe) benefit from parallel execution.
-// This matches the hash join pattern where build-side work is placed outside /concurrency.
+// Hash table build/probe costs remain concurrency-scaled by this model.
@@
-// For TiFlash MPP tasks, data is truly partitioned across nodes so each node handles a
-// disjoint subset of rows. All costs are divided by the MPP concurrency:
+// For TiFlash MPP tasks, data is partitioned across TiFlash nodes, so keep
+// concurrency scaling for the aggregation/grouping/hash portion:
@@
-//	plan-cost = child-cost + (agg-cost + group-cost + hash-build-cost + hash-probe-cost) / mppConcurrency
+//	plan-cost = start-cost + child-cost + (agg-cost + group-cost + hash-build-cost + hash-probe-cost) / mppConcurrency
@@
-		// Root (TiDB) and cop (TiKV) tasks: threads share the same input so aggregation
-		// work is not reduced by concurrency — each row is still processed once regardless
-		// of how many workers run in parallel. Only hash table operations benefit from
-		// parallel execution. This matches the hash join pattern where build-side work
-		// (including aggregation) sits outside /concurrency.
+		// Root (TiDB) and cop (TiKV) tasks: aggregation/grouping work is charged
+		// outside concurrency, while hash build/probe costs remain concurrency-scaled.

As per coding guidelines, “Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs, and SHOULD NOT restate what the code already makes clear.”

Also applies to: 663-677

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cost_ver2.go` around lines 622 - 637, The comment block
above getPlanCostVer24PhysicalHashAgg is inconsistent with the implemented
calculation: update the explanatory comments to accurately reflect the
implemented formulas (including that startCost is added in both TiDB/TiKV and
TiFlash/MPP branches) and correct the statement about hash-build/hash-probe
treatment — if hashBuildCost (and hashProbeCost) are divided by concurrency in
the code they must be described as benefiting from parallelism, otherwise note
they sit outside the /concurrency division; ensure the text around the two
branches and the earlier comment section (the area covering the rationale
currently spanning lines ~663-677) explicitly matches the code’s use of
hashBuildCost, hashProbeCost and startCost so the comment no longer restates
code but explains the intended concurrency semantics implemented by
getPlanCostVer24PhysicalHashAgg.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/planner/core/plan_cost_ver2_test.go`:
- Around line 815-860: Replace the hot-loop string allocation inside
TestHashAggMemCostNotDividedByConcurrency: locate the loop that appends each row
using buf.WriteString(fmt.Sprintf("(%d,%d)", i, i)) and change it to use
fmt.Fprintf(&buf, "(%d,%d)", i, i) to avoid an intermediate fmt.Sprintf
allocation per iteration while preserving exact output.

In `@pkg/planner/core/plan_cost_ver2.go`:
- Around line 622-637: The comment block above getPlanCostVer24PhysicalHashAgg
is inconsistent with the implemented calculation: update the explanatory
comments to accurately reflect the implemented formulas (including that
startCost is added in both TiDB/TiKV and TiFlash/MPP branches) and correct the
statement about hash-build/hash-probe treatment — if hashBuildCost (and
hashProbeCost) are divided by concurrency in the code they must be described as
benefiting from parallelism, otherwise note they sit outside the /concurrency
division; ensure the text around the two branches and the earlier comment
section (the area covering the rationale currently spanning lines ~663-677)
explicitly matches the code’s use of hashBuildCost, hashProbeCost and startCost
so the comment no longer restates code but explains the intended concurrency
semantics implemented by getPlanCostVer24PhysicalHashAgg.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c741fa5e-db51-419f-9123-c7966eddc28c

📥 Commits

Reviewing files that changed from the base of the PR and between 874ff37 and ce99b69.

📒 Files selected for processing (30)
  • pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_out.json
  • pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_xut.json
  • pkg/planner/core/casetest/cbotest/testdata/analyze_suite_out.json
  • pkg/planner/core/casetest/cbotest/testdata/analyze_suite_xut.json
  • pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_out.json
  • pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_xut.json
  • pkg/planner/core/casetest/hint/testdata/integration_suite_out.json
  • pkg/planner/core/casetest/hint/testdata/integration_suite_xut.json
  • pkg/planner/core/casetest/mpp/testdata/integration_suite_out.json
  • pkg/planner/core/casetest/mpp/testdata/integration_suite_xut.json
  • pkg/planner/core/casetest/partition/testdata/partition_pruner_out.json
  • pkg/planner/core/casetest/partition/testdata/partition_pruner_xut.json
  • pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_out.json
  • pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_xut.json
  • pkg/planner/core/casetest/rule/testdata/join_reorder_suite_out.json
  • pkg/planner/core/casetest/rule/testdata/join_reorder_suite_xut.json
  • pkg/planner/core/casetest/rule/testdata/outer2inner_out.json
  • pkg/planner/core/casetest/rule/testdata/outer2inner_xut.json
  • pkg/planner/core/casetest/scalarsubquery/testdata/plan_suite_out.json
  • pkg/planner/core/casetest/scalarsubquery/testdata/plan_suite_xut.json
  • pkg/planner/core/casetest/testdata/integration_suite_out.json
  • pkg/planner/core/casetest/testdata/integration_suite_xut.json
  • pkg/planner/core/casetest/testdata/json_plan_suite_out.json
  • pkg/planner/core/casetest/testdata/json_plan_suite_xut.json
  • pkg/planner/core/casetest/testdata/plan_normalized_suite_out.json
  • pkg/planner/core/casetest/testdata/plan_normalized_suite_xut.json
  • pkg/planner/core/casetest/tpch/testdata/tpch_suite_out.json
  • pkg/planner/core/casetest/tpch/testdata/tpch_suite_xut.json
  • pkg/planner/core/plan_cost_ver2.go
  • pkg/planner/core/plan_cost_ver2_test.go

Comment thread pkg/planner/core/plan_cost_ver2.go Outdated
//
// For TiDB root and TiKV cop tasks:
//
// plan-cost = child-cost + agg-cost + group-cost + (hash-build-cost + hash-probe-cost) / concurrency
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear comment and make sense

@terry1purcell terry1purcell changed the title planner: Add memory overhead for HashAgg planner: Add memory overhead for HashAgg on TiKV Apr 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/planner/core/plan_cost_ver2.go`:
- Around line 664-686: The code only checks the immediate child for
*physicalop.PhysicalTableReader when setting childIsTiFlash, so plans with unary
wrappers (e.g., Projection/Selection) hide a TiFlash TableReader; change the
detection to walk down unary single-child operators starting from
p.Children()[0] until you either find a *physicalop.PhysicalTableReader (set
childIsTiFlash = true) or hit a node with zero or multiple children (leave
false). Implement this by replacing the direct type assertion with a small loop
that examines p.Children()[0], repeatedly assigns child = child.Children()[0]
while len(child.Children())==1 and the child is one of the expected unary
wrapper types (e.g., physicalop.PhysicalProjection,
physicalop.PhysicalSelection, etc.), and then checks for
*physicalop.PhysicalTableReader to set childIsTiFlash.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f65c5efc-0223-4cf8-b4f0-f002e2ddcb15

📥 Commits

Reviewing files that changed from the base of the PR and between ee6105e and 3d43516.

📒 Files selected for processing (9)
  • pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_out.json
  • pkg/planner/core/casetest/enforcempp/testdata/enforce_mpp_suite_xut.json
  • pkg/planner/core/casetest/tpch/testdata/tpch_suite_out.json
  • pkg/planner/core/casetest/tpch/testdata/tpch_suite_xut.json
  • pkg/planner/core/plan_cost_ver2.go
  • tests/integrationtest/r/globalindex/aggregate.result
  • tests/integrationtest/r/index_merge.result
  • tests/integrationtest/r/infoschema/infoschema.result
  • tests/integrationtest/r/planner/cascades/integration.result
✅ Files skipped from review due to trivial changes (1)
  • tests/integrationtest/r/infoschema/infoschema.result

Comment thread pkg/planner/core/plan_cost_ver2.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
pkg/planner/core/casetest/testdata/json_plan_suite_out.json (1)

108-153: Consider excluding volatile runtime fields from recorded JSON fixtures.

executeInfo and related timing/memory values in Line 108-Line 153 are runtime-volatile, but the current verifier (pkg/planner/core/casetest/plan_test.go:365-396) only asserts stable fields (ID, EstRows, ActRows, TaskType, AccessObject, OperatorInfo). Stripping or normalizing volatile fields during record would reduce testdata churn.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/casetest/testdata/json_plan_suite_out.json` around lines 108
- 153, The recorded JSON fixture json_plan_suite_out.json contains volatile
runtime fields (executeInfo, memoryInfo, diskInfo and similar timing/memory
strings) that cause churn; update the recorder to strip or normalize these
fields when writing fixtures so only stable fields (ID, EstRows, ActRows,
TaskType, AccessObject, OperatorInfo, etc.) are kept, i.e., remove or replace
executeInfo/memoryInfo/diskInfo (and nested equivalents under
subOperators/IndexFullScan_*/IndexReader_*) with a stable sentinel (empty string
or null) during serialization so the existing verifier that asserts stable
fields continues to pass without noisy runtime differences.
pkg/planner/core/plan_cost_ver2.go (1)

686-695: Extract the CPU-only hash-build term instead of rebuilding it inline.

This branch now carries a second copy of the hash-build formula, while the full hashBuildCost is still created earlier and ignored here. Pulling the CPU-only part into a helper would keep the traced expressions in one place and avoid branch drift the next time hashBuildCostVer2 changes.

Refactor sketch
-func hashBuildCostVer2(option *costusage.PlanCostOption, buildRows, buildRowSize, nKeys float64, cpuFactor, memFactor costusage.CostVer2Factor) costusage.CostVer2 {
-	// TODO: 1) consider types of keys, 2) dedicated factor for build-probe hash table
+func hashBuildCPUCostVer2(option *costusage.PlanCostOption, buildRows, nKeys float64, cpuFactor costusage.CostVer2Factor) costusage.CostVer2 {
 	hashKeyCost := costusage.NewCostVer2(option, cpuFactor,
 		buildRows*nKeys*cpuFactor.Value,
 		func() string { return fmt.Sprintf("hashkey(%v*%v*%v)", buildRows, nKeys, cpuFactor) })
-	hashMemCost := costusage.NewCostVer2(option, memFactor,
-		buildRows*buildRowSize*memFactor.Value,
-		func() string { return fmt.Sprintf("hashmem(%v*%v*%v)", buildRows, buildRowSize, memFactor) })
 	hashBuildCost := costusage.NewCostVer2(option, cpuFactor,
 		buildRows*cpuFactor.Value,
 		func() string { return fmt.Sprintf("hashbuild(%v*%v)", buildRows, cpuFactor) })
+	return costusage.SumCostVer2(hashKeyCost, hashBuildCost)
+}
+
+func hashBuildCostVer2(option *costusage.PlanCostOption, buildRows, buildRowSize, nKeys float64, cpuFactor, memFactor costusage.CostVer2Factor) costusage.CostVer2 {
+	// TODO: 1) consider types of keys, 2) dedicated factor for build-probe hash table
+	hashMemCost := costusage.NewCostVer2(option, memFactor,
+		buildRows*buildRowSize*memFactor.Value,
+		func() string { return fmt.Sprintf("hashmem(%v*%v*%v)", buildRows, buildRowSize, memFactor) })
-	return costusage.SumCostVer2(hashKeyCost, hashMemCost, hashBuildCost)
+	return costusage.SumCostVer2(hashBuildCPUCostVer2(option, buildRows, nKeys, cpuFactor), hashMemCost)
 }

As per coding guidelines, "Code SHOULD remain maintainable for future readers with basic TiDB familiarity" and "Code SHOULD be self-documenting through clear naming and structure."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cost_ver2.go` around lines 686 - 695, The branch
duplicates the hash-build CPU formula by recreating hashBuildCPUCost inline
while the full hashBuildCost created earlier is ignored; extract the CPU-only
portion into a helper (e.g., cpuOnlyHashBuildCost or a method on costusage like
hashBuildCostCPUOnly) that returns the CPU-only CostVer2 for the hash build and
replace the inline construction of hashBuildCPUCost in this file with a call to
that helper, and update any other places that should use the same CPU-only term
(and keep the original hashBuildCost creation intact so traced expressions
remain consistent).
pkg/planner/core/plan_cost_ver2_test.go (1)

832-845: Make the “wide row” setup less margin-sensitive.

This regression depends on outputRowSize being large enough to keep HashAgg above StreamAgg, but the inserted literals are still fairly short relative to the declared varchar(200) columns. Filling c/d closer to their declared width would give the assertion more headroom against future factor tuning.

Suggested adjustment
 		// Insert rows where every b value is unique (100% NDV).
+		wideC := strings.Repeat("c", 180)
+		wideD := strings.Repeat("d", 180)
 		var buf strings.Builder
 		buf.WriteString("insert into t_high_ndv values ")
 		for i := 0; i < 1000; i++ {
 			if i > 0 {
 				buf.WriteString(",")
 			}
-			buf.WriteString(fmt.Sprintf("(%d,%d,'%s','%s')", i, i, "padding-data-for-wide-rows", "more-padding-data-here"))
+			buf.WriteString(fmt.Sprintf("(%d,%d,'%s','%s')", i, i, wideC, wideD))
 		}

As per coding guidelines, "Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/planner/core/plan_cost_ver2_test.go` around lines 832 - 845, The test's
"wide row" insert into table t_high_ndv is too short to reliably bias cost
toward HashAgg; modify the value generation used in the insert builder inside
plan_cost_ver2_test.go so the c and d literals are filled close to their
declared varchar(200) length (e.g., use strings.Repeat or a precomputed longPad
variable and insert longPad[:200] or similar) when building the loop that writes
values into t_high_ndv; update the loop that appends
fmt.Sprintf("(%d,%d,'%s','%s')", ...) to use those padded strings for columns c
and d so outputRowSize is clearly large and the test becomes less
margin-sensitive.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/planner/core/casetest/testdata/json_plan_suite_out.json`:
- Around line 108-153: The recorded JSON fixture json_plan_suite_out.json
contains volatile runtime fields (executeInfo, memoryInfo, diskInfo and similar
timing/memory strings) that cause churn; update the recorder to strip or
normalize these fields when writing fixtures so only stable fields (ID, EstRows,
ActRows, TaskType, AccessObject, OperatorInfo, etc.) are kept, i.e., remove or
replace executeInfo/memoryInfo/diskInfo (and nested equivalents under
subOperators/IndexFullScan_*/IndexReader_*) with a stable sentinel (empty string
or null) during serialization so the existing verifier that asserts stable
fields continues to pass without noisy runtime differences.

In `@pkg/planner/core/plan_cost_ver2_test.go`:
- Around line 832-845: The test's "wide row" insert into table t_high_ndv is too
short to reliably bias cost toward HashAgg; modify the value generation used in
the insert builder inside plan_cost_ver2_test.go so the c and d literals are
filled close to their declared varchar(200) length (e.g., use strings.Repeat or
a precomputed longPad variable and insert longPad[:200] or similar) when
building the loop that writes values into t_high_ndv; update the loop that
appends fmt.Sprintf("(%d,%d,'%s','%s')", ...) to use those padded strings for
columns c and d so outputRowSize is clearly large and the test becomes less
margin-sensitive.

In `@pkg/planner/core/plan_cost_ver2.go`:
- Around line 686-695: The branch duplicates the hash-build CPU formula by
recreating hashBuildCPUCost inline while the full hashBuildCost created earlier
is ignored; extract the CPU-only portion into a helper (e.g.,
cpuOnlyHashBuildCost or a method on costusage like hashBuildCostCPUOnly) that
returns the CPU-only CostVer2 for the hash build and replace the inline
construction of hashBuildCPUCost in this file with a call to that helper, and
update any other places that should use the same CPU-only term (and keep the
original hashBuildCost creation intact so traced expressions remain consistent).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ca39a838-8117-43ea-a3a1-e0cb05eb8a5a

📥 Commits

Reviewing files that changed from the base of the PR and between 3d43516 and 0960f58.

📒 Files selected for processing (11)
  • pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_out.json
  • pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_xut.json
  • pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_out.json
  • pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_xut.json
  • pkg/planner/core/casetest/testdata/json_plan_suite_out.json
  • pkg/planner/core/casetest/testdata/json_plan_suite_xut.json
  • pkg/planner/core/casetest/tpch/testdata/tpch_suite_out.json
  • pkg/planner/core/casetest/tpch/testdata/tpch_suite_xut.json
  • pkg/planner/core/plan_cost_ver2.go
  • pkg/planner/core/plan_cost_ver2_test.go
  • tests/integrationtest/r/planner/cascades/integration.result
✅ Files skipped from review due to trivial changes (2)
  • pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_xut.json
  • pkg/planner/core/casetest/binaryplan/testdata/binary_plan_suite_out.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/planner/core/casetest/physicalplantest/testdata/plan_suite_xut.json

@ti-chi-bot ti-chi-bot Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 28, 2026
terry1purcell and others added 2 commits April 28, 2026 08:59
Restore mpp integration_suite testdata to match master — the
stream_count removal was picked up as collateral during re-recording
and is unrelated to the HashAgg cost model change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 75.60976% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.1269%. Comparing base (5c19988) to head (a3ce481).
⚠️ Report is 12 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67953        +/-   ##
================================================
- Coverage   77.7565%   77.1269%   -0.6297%     
================================================
  Files          1990       1972        -18     
  Lines        551624     554928      +3304     
================================================
- Hits         428924     427999       -925     
- Misses       121780     126755      +4975     
+ Partials        920        174       -746     
Flag Coverage Δ
integration 41.7552% <75.6097%> (+1.9534%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4888% <ø> (ø)
parser ∅ <ø> (∅)
br 50.0597% <ø> (-13.0338%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The hashMemCost term added by the HashAgg cost-model fix was applied
unconditionally for root tasks, which over-penalized HashAgg whenever a
StreamAgg alternative would itself need an explicit Sort to consume the
child rows in GROUP BY order. In those cases the planner switched to
Sort+StreamAgg even though paying for a sort over the join/agg output is
strictly more expensive than HashAgg's hash table.

Gate hashMemCost on a structural check of the HashAgg's child plan: only
charge the memory penalty when the child reaches a base-table reader
(IndexReader / IndexLookUpReader / IndexMergeReader / TableReader)
through pass-through operators. Joins, aggregations, and other
order-breaking operators skip the penalty since the StreamAgg
alternative there carries its own (correctly-costed) Sort.

Add a companion unit test that exercises the gate on a hash-join input
to ensure HashAgg stays cheaper than Sort+StreamAgg in that scenario,
alongside the existing test that keeps StreamAgg cheaper for the
indexed high-NDV path.

Re-record planner casetest fixtures (binary_plan, json_plan, tpch);
the previously-recorded TPCH MPP plan flip reverts to its master shape,
confirming the gate's effect.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@terry1purcell terry1purcell changed the title planner: Add memory overhead for HashAgg on TiKV planner: Add memory overhead for HashAgg on TiKV | tidb=pr/2737 Apr 29, 2026
@terry1purcell terry1purcell changed the title planner: Add memory overhead for HashAgg on TiKV | tidb=pr/2737 planner: Add memory overhead for HashAgg on TiKV | tidb-test=pr/2737 Apr 29, 2026
@terry1purcell terry1purcell changed the title planner: Add memory overhead for HashAgg on TiKV | tidb-test=pr/2737 planner: Add memory overhead for HashAgg on TiKV Apr 29, 2026
Append two cases to plan_cost_ver2.test that capture the plan-shape
outcomes of the HashAgg memory-penalty gating as visible result diffs,
making the optimizer behavior change easy to review.

Case 1 (intended win): high-NDV GROUP BY on a clustered-PK table with
wide payload columns. The TableFullScan delivers rows in PK prefix
order for free, so the planner picks StreamAgg over HashAgg.

Case 2 (gating prevents regression): high-NDV GROUP BY over a hash-join
output. No free ordering on the group key, so the gating skips the
HashAgg memory penalty and the planner picks HashAgg rather than
Sort+StreamAgg.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 29, 2026

@terry1purcell: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest-required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@terry1purcell
Copy link
Copy Markdown
Contributor Author

/ok-to-test

@ti-chi-bot ti-chi-bot Bot added the ok-to-test Indicates a PR is ready to be tested. label May 1, 2026
Copy link
Copy Markdown
Contributor

@henrybw henrybw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just had a few nitpicks about making the comments more concise and less redundant, but those are nitpicks and thus nonblocking.

Comment thread pkg/planner/core/plan_cost_ver2.go Outdated
Comment on lines +624 to +629
// requiring an explicit Sort. The check is structural: walk through
// order-preserving operators and accept only base-table access paths
// (IndexReader / IndexLookUpReader / IndexMergeReader / TableReader). Joins,
// aggregations, applies, and other order-breaking operators return false —
// for those, the StreamAgg alternative would need a Sort, and applying the
// HashAgg memory penalty would unfairly favour Sort+StreamAgg over HashAgg.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think comments that just paraphrase what the code does add much value. This particular point about the memory cost also seems to be repeated three times (once in the comment above getPlanCostVer24PhysicalHashAgg and another time inside the if taskType == property.RootTaskType { block).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread pkg/planner/core/plan_cost_ver2.go Outdated
Comment on lines +708 to +718
// Root (TiDB) tasks: partial workers each process a disjoint subset of input
// rows, so all CPU work (agg, group, hash key, probe) is genuinely parallelized
// and divided by concurrency. However, each partial worker maintains its own
// hash table, so total memory scales with the number of workers. We model this
// as concurrency * outputRows * rowSize * memFactor, placed outside the
// concurrency division. The penalty is gated on whether the child plan can
// provide ordering on the GROUP BY keys naturally — only then is the StreamAgg
// alternative free of additional Sort cost and able to benefit from HashAgg's
// memory penalty. When no free ordering is available, we skip the penalty so
// the optimizer doesn't get steered into a Sort+StreamAgg plan that costs more
// than HashAgg in practice.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm not sure this comment is adding anything more than what the comment above this function has already said. At the very least, I think it could be made more concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 1, 2026
@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

1 similar comment
@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

Copy link
Copy Markdown
Member

@0xPoe 0xPoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

})
}

func TestHashAggMemCostGatedOnFreeOrdering(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test appears to pass without your fix. Perhaps we should redesign the case to ensure it can verify that the gate functions correctly. I typically verify it by running these regression tests without fixes to check if it truly serves as the regression test.

Please correct me if I misunderstood this case.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 3, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 3, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-01 23:38:30.21309871 +0000 UTC m=+2986715.418458757: ☑️ agreed by henrybw.
  • 2026-05-03 17:40:45.632620369 +0000 UTC m=+30318.505970331: ☑️ agreed by 0xPoe.

@terry1purcell
Copy link
Copy Markdown
Contributor Author

/retest-required

Copy link
Copy Markdown
Member

@0xPoe 0xPoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :shipit:

Feel free to unhold. Thank you for addressing my comments.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 0xPoe, henrybw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@0xPoe
Copy link
Copy Markdown
Member

0xPoe commented May 5, 2026

/unhold

@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2026
@0xPoe
Copy link
Copy Markdown
Member

0xPoe commented May 5, 2026

/retest

@ti-chi-bot ti-chi-bot Bot merged commit bf7884d into pingcap:master May 5, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm ok-to-test Indicates a PR is ready to be tested. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

planner: HashAgg consumes more memory than StreamAgg

4 participants