executor: support spill intermediate data for unparalleled hash agg#25714
executor: support spill intermediate data for unparalleled hash agg#25714ti-chi-bot merged 18 commits intopingcap:masterfrom
Conversation
Simple Benchmarkcpu: AMD Ryzen 7 3700X 8-Core Processor Memory usage testWorkload: tpch-sf=3g 1tidb 1tikv 1pd Memory usage in Grafana montior |
| return e.spillAction | ||
| } | ||
|
|
||
| const maxSpillTimes = 10 |
| listInDisk *chunk.ListInDisk | ||
| lastChunkNum int | ||
| processIdx int | ||
| spillMode uint32 | ||
| spillChunk *chunk.Chunk | ||
| spillAction *AggSpillDiskAction | ||
| childDrained bool |
There was a problem hiding this comment.
We need comments for these variables
| return err | ||
| } | ||
| } | ||
| if e.spillAction != nil { |
There was a problem hiding this comment.
Useless.. I remove the code now.
| } | ||
| if e.listInDisk != nil { | ||
| if err := e.listInDisk.Close(); err != nil { | ||
| return err |
There was a problem hiding this comment.
Don't we close the chilrenExec? This may cause leaks.
| e.executed, e.childDrained = false, false | ||
| e.listInDisk = chunk.NewListInDisk(retTypes(e.children[0])) | ||
| e.spillChunk = newFirstChunk(e.children[0]) | ||
| if e.ctx.GetSessionVars().TrackAggregateMemoryUsage { |
There was a problem hiding this comment.
Why do we need to check this?
There was a problem hiding this comment.
If tidb doesn't track aggregate executor memory usgae, should we also try to spill hashAgg when exceeded?
In addition, oom-use-tmp-storage also should be check... I add the check now. PTAL again.
| groupKey := string(e.groupKeyBuffer[j]) // do memory copy here, because e.groupKeyBuffer may be reused. | ||
| if !e.groupSet.Exist(groupKey) { | ||
| if atomic.LoadUint32(&e.spillMode) == 1 && e.groupSet.Count() > 0 { | ||
| e.spillChunk.Append(e.childResult, j, j+1) |
There was a problem hiding this comment.
Can we use Chunk.sel to optimize this if-block?
- We can check
e.groupSet.Exist(groupKey)and build theselfirstly, and then invokee.spillChunk.Appendbased on thesel. - Further, if
len(sel) == len(e.childResult), we can invokee.listInDisk.Add(e.childResult)directly.
|
|
||
| // spill unprocessed data when exceeded. | ||
| if len(sel) > 0 { | ||
| err = e.spillUnprocessedData(sel) |
There was a problem hiding this comment.
The input argument sel is useless?
e.childResult.SetSel(sel)There was a problem hiding this comment.
e.childResult.SetSel(sel) will let len(sel) == len(e.childResult) always true, and e.listInDisk.Add(e.childResult) directly. If there are only a few elements in sel, it maybe have performance issue.
I remove the logic e.listInDisk.Add(e.childResult) and always append to tmpChkForSpill, PTAL
| listInDisk *chunk.ListInDisk // listInDisk is the chunks to store row values for spilling data. | ||
| lastChunkNum int // lastChunkNum indicates the num of spilling chunk. | ||
| processIdx int // processIdx indicates the num of processed chunk in disk. | ||
| spillMode uint32 // spillMode means that no new groups are added to hash table. |
There was a problem hiding this comment.
- isSpillModeSet?
- Add an explanation for what does
0and1mean
Co-authored-by: HuaiyuXu <xuhuaiyu@pingcap.com>
|
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. DetailsReviewer can indicate their review by submitting an approval review. |
|
/merge |
|
This pull request has been accepted and is ready to merge. DetailsCommit hash: ffbcf52 |
|
@wshwsh12: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
|
/run-unit-test |

What problem does this PR solve?
Issue Number: close #xxx
Problem Summary:
What is changed and how it works?
Proposal: Design
What's Changed:
Based PR #25820, introduce soft limit.
Support spilling intermediate date for unparalleled hashAgg.
How it Works:
a. If the key exists in the Map, aggreagte the result.
b. If the key doesn't exist in the Map, spill the data to disk.
Related changes
pingcap/docs/pingcap/docs-cn:Check List
Tests
Set AggregateConcurrency to 1 and run all correctness test in tidb repo.
Side effects
Release note