Skip to content

executor: support spill intermediate data for unparalleled hash agg#25714

Merged
ti-chi-bot merged 18 commits intopingcap:masterfrom
wshwsh12:unparalleled-hash-agg
Jul 15, 2021
Merged

executor: support spill intermediate data for unparalleled hash agg#25714
ti-chi-bot merged 18 commits intopingcap:masterfrom
wshwsh12:unparalleled-hash-agg

Conversation

@wshwsh12
Copy link
Copy Markdown
Contributor

@wshwsh12 wshwsh12 commented Jun 23, 2021

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What is changed and how it works?

Proposal: Design

What's Changed:
Based PR #25820, introduce soft limit.
Support spilling intermediate date for unparalleled hashAgg.

How it Works:

  1. When the memory usage is higher than the quota, switch the HashAgg executor to spill-mode.
  2. When HashAgg is in spill-mode, keep the tuple in the hashMap no longer growing.
    a. If the key exists in the Map, aggreagte the result.
    b. If the key doesn't exist in the Map, spill the data to disk.
  3. After all data have been processed, output the aggreagte result in the Map, clear the Map. Then read the spilling data from disk, repeat the Step1-Step3 until all data have been aggregated.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    Set AggregateConcurrency to 1 and run all correctness test in tidb repo.

Side effects

  • Performance regression
    • Consumes more CPU
    • Consumes more MEM

Release note

  • Support spill intermediate data for unparalleled hash agg

@ti-chi-bot ti-chi-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 23, 2021
@github-actions github-actions Bot added the sig/execution SIG execution label Jun 23, 2021
@wshwsh12 wshwsh12 marked this pull request as ready for review June 24, 2021 03:34
@wshwsh12 wshwsh12 requested a review from a team as a code owner June 24, 2021 03:34
@wshwsh12 wshwsh12 requested review from XuHuaiyu and removed request for a team June 24, 2021 03:34
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 24, 2021
@wshwsh12
Copy link
Copy Markdown
Contributor Author

wshwsh12 commented Jun 28, 2021

Simple Benchmark

cpu: AMD Ryzen 7 3700X 8-Core Processor

Old Implement:
BenchmarkHashAggRows
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:100000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:100000,_concurrency:1,_sorted:true)-16         	      79	  23354686 ns/op
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:1000000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:1000000,_concurrency:1,_sorted:true)-16        	       7	 150919829 ns/op
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:10000000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:10000000,_concurrency:1,_sorted:true)-16       	       1	1315785257 ns/op

New Implement:
BenchmarkHashAggRows
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:100000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:100000,_concurrency:1,_sorted:true)-16         	      52	  20963917 ns/op
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:1000000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:1000000,_concurrency:1,_sorted:true)-16        	       7	 145583208 ns/op
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:10000000,_concurrency:1,_sorted:true)
BenchmarkHashAggRows/(execType:hash,_aggFunc:sum,_ndv:1000,_hasDistinct:false,_rows:10000000,_concurrency:1,_sorted:true)-16       	       1	1327683063 ns/op

Memory usage test

Workload: tpch-sf=3g 1tidb 1tikv 1pd
Config: oom-action = "log",mem-quota-query = 1 << 30, and close copr-cache
User-veriables: tidb_hashagg_final_concurrency = 1; tidb_hashagg_partial_concurrency = 1;

Old Implement
[tidb]> desc analyze select avg(L_EXTENDEDPRICE) from lineitem group by L_PARTKEY   ,L_SUPPKEY , L_QUANTITY;
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id                        | estRows     | actRows  | task      | access object  | execution info                                                                                                                                                                                                                                                                             | operator info                                                                                                                                                                                   | memory   | disk |
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| HashAgg_9                 | 14397287.20 | 16776228 | root      |                | time:4m23.7s, loops:16385                                                                                                                                                                                                                                                                  | group by:test.lineitem.l_partkey, test.lineitem.l_quantity, test.lineitem.l_suppkey, funcs:avg(Column#19, Column#20)->Column#18                                                                 | 3.14 GB  | N/A  |
| └─TableReader_10          | 14397287.20 | 17963641 | root      |                | time:3m40.2s, loops:41, cop_task: {num: 40, max: 32.5s, min: 1.55s, avg: 19.7s, p95: 31.7s, max_proc_keys: 613883, p95_proc_keys: 462536, tot_proc: 12m46.5s, tot_wait: 9.29s, rpc_num: 40, rpc_time: 13m6.7s, copr_cache: disabled}                                                       | data:HashAgg_5                                                                                                                                                                                  | 320.5 MB | N/A  |
|   └─HashAgg_5             | 14397287.20 | 17963641 | cop[tikv] |                | tikv_task:{proc max:19.4s, min:873ms, p80:18.1s, p95:19.3s, iters:17591, tasks:40}, scan_detail: {total_process_keys: 17996609, total_keys: 17996649, rocksdb: {delete_skipped_count: 0, key_skipped_count: 17996609, block: {cache_hit_count: 57170, read_count: 0, read_byte: 0 Bytes}}} | group by:test.lineitem.l_partkey, test.lineitem.l_quantity, test.lineitem.l_suppkey, funcs:count(test.lineitem.l_extendedprice)->Column#19, funcs:sum(test.lineitem.l_extendedprice)->Column#20 | N/A      | N/A  |
|     └─TableFullScan_8     | 17996609.00 | 17996609 | cop[tikv] | table:lineitem | tikv_task:{proc max:10.9s, min:511ms, p80:10.1s, p95:10.7s, iters:17591, tasks:40}                                                                                                                                                                                                         | keep order:false, stats:pseudo                                                                                                                                                                  | N/A      | N/A  |
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
4 rows in set (4 min 24.836 sec)


New Implement:
[tidb]> desc analyze select avg(L_EXTENDEDPRICE) from lineitem group by L_PARTKEY   ,L_SUPPKEY , L_QUANTITY;

+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| id                        | estRows     | actRows  | task      | access object  | execution info                                                                                                                                                                                                                                                                             | operator info                                                                                                                                                                                   | memory   | disk    |
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| HashAgg_9                 | 14397287.20 | 16776228 | root      |                | time:4m8.2s, loops:16385                                                                                                                                                                                                                                                                   | group by:test.lineitem.l_partkey, test.lineitem.l_quantity, test.lineitem.l_suppkey, funcs:avg(Column#19, Column#20)->Column#18                                                                 | 819.2 MB | 3.41 GB |
| └─TableReader_10          | 14397287.20 | 17963641 | root      |                | time:52.6s, loops:41, cop_task: {num: 40, max: 30.5s, min: 1.88s, avg: 22.7s, p95: 30.1s, max_proc_keys: 613883, p95_proc_keys: 462536, tot_proc: 14m39.4s, tot_wait: 18.7s, rpc_num: 40, rpc_time: 15m8.5s, copr_cache: disabled}                                                         | data:HashAgg_5                                                                                                                                                                                  | 137.3 MB | N/A     |
|   └─HashAgg_5             | 14397287.20 | 17963641 | cop[tikv] |                | tikv_task:{proc max:19.2s, min:1.27s, p80:17.5s, p95:18.9s, iters:17591, tasks:40}, scan_detail: {total_process_keys: 17996609, total_keys: 17996649, rocksdb: {delete_skipped_count: 0, key_skipped_count: 17996609, block: {cache_hit_count: 57170, read_count: 0, read_byte: 0 Bytes}}} | group by:test.lineitem.l_partkey, test.lineitem.l_quantity, test.lineitem.l_suppkey, funcs:count(test.lineitem.l_extendedprice)->Column#19, funcs:sum(test.lineitem.l_extendedprice)->Column#20 | N/A      | N/A     |
|     └─TableFullScan_8     | 17996609.00 | 17996609 | cop[tikv] | table:lineitem | tikv_task:{proc max:10.9s, min:731ms, p80:9.91s, p95:10.7s, iters:17591, tasks:40}                                                                                                                                                                                                         | keep order:false, stats:pseudo                                                                                                                                                                  | N/A      | N/A     |
+---------------------------+-------------+----------+-----------+----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
4 rows in set (4 min 8.565 sec)

Memory usage in Grafana montior

Left is the old impelement, right is the new implement.
metrics

@wshwsh12 wshwsh12 mentioned this pull request Jul 2, 2021
5 tasks
@XuHuaiyu XuHuaiyu added the type/enhancement The issue or PR belongs to an enhancement. label Jul 8, 2021
Comment thread executor/aggregate.go Outdated
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 8, 2021
Comment thread executor/aggregate.go
return e.spillAction
}

const maxSpillTimes = 10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment for this

Comment thread executor/aggregate.go Outdated
Comment on lines +200 to +206
listInDisk *chunk.ListInDisk
lastChunkNum int
processIdx int
spillMode uint32
spillChunk *chunk.Chunk
spillAction *AggSpillDiskAction
childDrained bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need comments for these variables

Comment thread executor/aggregate.go Outdated
return err
}
}
if e.spillAction != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useless.. I remove the code now.

Comment thread executor/aggregate.go Outdated
}
if e.listInDisk != nil {
if err := e.listInDisk.Close(); err != nil {
return err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we close the chilrenExec? This may cause leaks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Comment thread executor/aggregate.go Outdated
e.executed, e.childDrained = false, false
e.listInDisk = chunk.NewListInDisk(retTypes(e.children[0]))
e.spillChunk = newFirstChunk(e.children[0])
if e.ctx.GetSessionVars().TrackAggregateMemoryUsage {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tidb doesn't track aggregate executor memory usgae, should we also try to spill hashAgg when exceeded?
In addition, oom-use-tmp-storage also should be check... I add the check now. PTAL again.

Comment thread executor/aggregate.go Outdated
groupKey := string(e.groupKeyBuffer[j]) // do memory copy here, because e.groupKeyBuffer may be reused.
if !e.groupSet.Exist(groupKey) {
if atomic.LoadUint32(&e.spillMode) == 1 && e.groupSet.Count() > 0 {
e.spillChunk.Append(e.childResult, j, j+1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Chunk.sel to optimize this if-block?

  1. We can check e.groupSet.Exist(groupKey) and build the sel firstly, and then invoke e.spillChunk.Append based on the sel.
  2. Further, if len(sel) == len(e.childResult), we can invoke e.listInDisk.Add(e.childResult) directly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 12, 2021
@ti-chi-bot ti-chi-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 13, 2021
@wshwsh12 wshwsh12 requested a review from XuHuaiyu July 13, 2021 07:31
Comment thread executor/aggregate.go Outdated

// spill unprocessed data when exceeded.
if len(sel) > 0 {
err = e.spillUnprocessedData(sel)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input argument sel is useless?

e.childResult.SetSel(sel)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.childResult.SetSel(sel) will let len(sel) == len(e.childResult) always true, and e.listInDisk.Add(e.childResult) directly. If there are only a few elements in sel, it maybe have performance issue.
I remove the logic e.listInDisk.Add(e.childResult) and always append to tmpChkForSpill, PTAL

Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
listInDisk *chunk.ListInDisk // listInDisk is the chunks to store row values for spilling data.
lastChunkNum int // lastChunkNum indicates the num of spilling chunk.
processIdx int // processIdx indicates the num of processed chunk in disk.
spillMode uint32 // spillMode means that no new groups are added to hash table.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. isSpillModeSet?
  2. Add an explanation for what does 0 and 1 mean

Comment thread executor/aggregate.go Outdated
@wshwsh12 wshwsh12 requested a review from XuHuaiyu July 15, 2021 06:21
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Comment thread executor/aggregate.go Outdated
Co-authored-by: HuaiyuXu <xuhuaiyu@pingcap.com>
Copy link
Copy Markdown
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Copy Markdown
Member

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • XuHuaiyu
  • mmyj

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 15, 2021
@XuHuaiyu
Copy link
Copy Markdown
Contributor

/merge

@ti-chi-bot
Copy link
Copy Markdown
Member

This pull request has been accepted and is ready to merge.

DetailsCommit hash: ffbcf52

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 15, 2021
@ti-chi-bot
Copy link
Copy Markdown
Member

@wshwsh12: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@wshwsh12
Copy link
Copy Markdown
Contributor Author

/run-unit-test

@ti-chi-bot ti-chi-bot merged commit 2412437 into pingcap:master Jul 15, 2021
@wshwsh12 wshwsh12 deleted the unparalleled-hash-agg branch January 29, 2022 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sig/execution SIG execution size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants