Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: Support to trigger GC when memory usage is large. #38179

Merged
merged 22 commits into from Oct 12, 2022

Conversation

wshwsh12
Copy link
Contributor

@wshwsh12 wshwsh12 commented Sep 26, 2022

What problem does this PR solve?

Issue Number: ref #37816

Problem Summary:

What is changed and how it works?

Add the system variable tidb_server_memory_limit_gc_trigger and support to trigger GC when memory usgae is larger than tidb_server_memory_limit_gc_trigger * tidb_server_memory_limit.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Sep 26, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • XuHuaiyu
  • hawkingrei

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 26, 2022
@wshwsh12 wshwsh12 mentioned this pull request Sep 28, 2022
9 tasks
@wshwsh12 wshwsh12 marked this pull request as ready for review September 28, 2022 05:44
@wshwsh12 wshwsh12 requested a review from a team as a code owner September 28, 2022 05:44
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 28, 2022
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 28, 2022
// So we can change memory limit dynamically to avoid frequent GC when memory usage is greater than the soft limit.
type memoryLimitTuner struct {
finalizer *finalizer
isTuning atomic.Bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use atomicutil.Bool?

debug.SetMemoryLimit(softLimit)
}

func (t *memoryLimitTuner) calcSoftMemoryLimit() int64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (t *memoryLimitTuner) calcSoftMemoryLimit() int64 {
func (t *memoryLimitTuner) calcMemoryLimit() int64 {

ratio := float64(100+gogc) / 100
// If theoretical NextGC(Equivalent to HeapInUse * (100 + GOGC) / 100) is bigger than MemoryLimit twice in a row,
// the second GC is caused by MemoryLimit.
if r.HeapInuse > uint64(float64(debug.SetMemoryLimit(-1))/ratio) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is r.HeapInuse * ratio more readable according to the comment above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to explain why "twice". It really hard to understand


memory210mb := allocator.alloc(210 << 20)
require.True(t, gcNum < getNowGCNum())
// Test waiting for reset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check whether waitingReset is true here?


allocator.free(memory210mb)
allocator.free(memory100mb)
// Can GC in 80% again
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Can GC mean?

gcNum = getNowGCNum()
memory210mb = allocator.alloc(210 << 20)
time.Sleep(100 * time.Millisecond)
require.True(t, gcNum < getNowGCNum())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this GC trigger by memory_limit?

require.True(t, gcNum < getNowGCNum())
allocator.free(memory210mb)
allocator.free(memory600mb)
time.Sleep(1 * time.Second) // If test.count > 1, wait tuning finished.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check isTunning, waitingReset and times after this Sleep?

if t.times >= 2 && t.waitingReset.CompareAndSwap(false, true) {
t.times = 0
go func() {
debug.SetMemoryLimit(math.MaxInt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxInt or MaxInt64?

Comment on lines 52 to 58
// If theoretical NextGC(Equivalent to HeapInUse * (100 + GOGC) / 100) is bigger than MemoryLimit twice in a row,
// the second GC is caused by MemoryLimit.
// All GC is divided into the following three cases:
// 1. In normal, HeapInUse * (100 + GOGC) / 100 < MemoryLimit, NextGC = HeapInUse * (100 + GOGC) / 100.
// 2. The first time HeapInUse * (100 + GOGC) / 100 >= MemoryLimit, NextGC = MemoryLimit. But this GC is trigger by GOGC.
// 3. The second time HeapInUse * (100 + GOGC) / 100 >= MemoryLimit. This GC is trigger by MemoryLimit.
// We set MemoryLimit to MaxInt, so the NextGC will be HeapInUse * (100 + GOGC) / 100 again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If theoretical NextGC(Equivalent to HeapInUse * (100 + GOGC) / 100) is bigger than MemoryLimit twice in a row,
// the second GC is caused by MemoryLimit.
// All GC is divided into the following three cases:
// 1. In normal, HeapInUse * (100 + GOGC) / 100 < MemoryLimit, NextGC = HeapInUse * (100 + GOGC) / 100.
// 2. The first time HeapInUse * (100 + GOGC) / 100 >= MemoryLimit, NextGC = MemoryLimit. But this GC is trigger by GOGC.
// 3. The second time HeapInUse * (100 + GOGC) / 100 >= MemoryLimit. This GC is trigger by MemoryLimit.
// We set MemoryLimit to MaxInt, so the NextGC will be HeapInUse * (100 + GOGC) / 100 again.
This `if` checks whether the **last** GC was triggered by MemoryLimit as far as possible.
If the **last** GC was triggered by MemoryLimit, we'll set MemoryLimit to MAXVALUE to return control back to GOGC to avoid frequent GC when memory usage fluctuates above and below MemoryLimit.
The logic we judge whether the **last** GC was triggered by MemoryLimit is as follows:
suppose `NextGC` = `HeapInUse * (100 + GOGC) / 100)`,
- If NextGC < MemoryLimit, the **next** GC will **not** be triggered by MemoryLimit thus we do not care about why the **last** GC is triggered. And MemoryLimit will not be reset this time.
- Only if NextGC >= MemoryLimit , the **next** GC will be triggered by MemoryLimit. Thus we need to reset MemoryLimit after the next GC happens if needed.

Comment on lines 104 to 107
memoryLimit := t.calcMemoryLimit()
if EnableGOGCTuner.Load() {
memoryLimit = math.MaxInt64
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
memoryLimit := t.calcMemoryLimit()
if EnableGOGCTuner.Load() {
memoryLimit = math.MaxInt64
}
memoryLimit := math.MaxInt64
if !EnableGOGCTuner.Load() {
memoryLimit = t.calcMemoryLimit()
}

debug.SetMemoryLimit(math.MaxInt)
resetInterval := 1 * time.Minute // Wait 1 minute and set back, to avoid frequent GC
failpoint.Inject("testMemoryLimitTuner", func(val failpoint.Value) {
if val, ok := val.(bool); val && ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why check ok ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For linter, type assertion must be checked (forcetypeassert)

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 11, 2022
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 11, 2022
@wshwsh12
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 0c39f03

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Oct 12, 2022
@ti-chi-bot ti-chi-bot merged commit 37d4357 into pingcap:master Oct 12, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Oct 12, 2022

TiDB MergeCI notify

🔴 Bad News! New failing [1] after this pr merged.
These new failed integration tests seem to be caused by the current PR, please try to fix these new failed integration tests, thanks!

CI Name Result Duration Compare with Parent commit
idc-jenkins-ci-tidb/integration-ddl-test 🟥 failed 1, success 5, total 6 10 min New failing
idc-jenkins-ci/integration-cdc-test ✅ all 37 tests passed 25 min Fixed
idc-jenkins-ci-tidb/common-test 🟢 all 11 tests passed 16 min Existing passed
idc-jenkins-ci-tidb/integration-common-test 🟢 all 17 tests passed 15 min Existing passed
idc-jenkins-ci-tidb/sqllogic-test-1 🟢 all 26 tests passed 7 min 18 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-2 🟢 all 28 tests passed 7 min 7 sec Existing passed
idc-jenkins-ci-tidb/tics-test 🟢 all 1 tests passed 6 min 39 sec Existing passed
idc-jenkins-ci-tidb/mybatis-test 🟢 all 1 tests passed 3 min 38 sec Existing passed
idc-jenkins-ci-tidb/integration-compatibility-test 🟢 all 1 tests passed 2 min 58 sec Existing passed
idc-jenkins-ci-tidb/plugin-test 🟢 build success, plugin test success 4min Existing passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants