br: Enable checkpoint advancer to pause tasks lagged too large #51441

RidRisR · 2024-03-01T03:30:18Z

What problem does this PR solve?

Issue Number: close #50803

Problem Summary:

What changed and how does it work?

Now, a new option called CheckPointLagLimit is added to the advancer config. When it is set, advancer will check if the checkpoint is lagged too large on every tick. If so, advancer would send a warning to PD and unstall the task.
PS: Theoretically, we need a resume signal when the task could be resumed. However, there is no channel to send this signal now. Maybe we will add it in the future.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

A new option called `CheckPointLagLimit` is added to the advancer config. When it is set, advancer will check if the checkpoint is lagged too large on every tick. If so, advancer would send a warning to PD and unstall the task.

tiprow · 2024-03-01T03:30:33Z

Hi @RidRisR. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

YuJuncen · 2024-03-13T08:41:45Z

br/pkg/streamhelper/advancer_cliext.go

-					return
+			case resp, ok := <-pauseCh:
+				if !ok || !handleResponse(resp) {
+					pauseCh = nil


Nits: I think we'd better bring the default branch back and remove the if. Even in this context the ctx should be canceled and the channel must be closed. In a cleanup context, unbounded blocking is somehow dangerous -- it may lead to unexpected leakage or make the program fail to exit.

3pointer · 2024-03-19T10:46:09Z

LGTM. Note: for now it seems this function is disabled by default. For now, the configuration of TiDB cannot be passed to advancer, which means this function won't be usable in a TiDB binary. Should we enable this with a value safe enough, say, 72h or even longer? cc @3pointer @BornChanger

I think we should enable it with a default value, 3 days is safe enough and we could not let log backup block GC to much.

BornChanger · 2024-03-25T09:45:56Z

/type cherry-pick-for-release-6,5

ti-chi-bot · 2024-03-25T09:45:59Z

@BornChanger: The label(s) type/cherry-pick-for-release-6,5 cannot be applied, because the repository doesn't have them.

In response to this:

/type cherry-pick-for-release-6,5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

BornChanger · 2024-03-25T09:46:06Z

/type cherry-pick-for-release-6.5

BornChanger · 2024-03-25T09:46:12Z

/type cherry-pick-for-release-7.1

BornChanger · 2024-03-25T09:46:17Z

/type cherry-pick-for-release-7.5

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot · 2024-03-26T08:07:11Z

In response to a cherrypick label: new pull request created to branch release-6.5: #52105.

… (#52105) close #50803

BornChanger · 2024-04-12T08:20:50Z

/label needs-cherry-pick-release-7.1

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot · 2024-04-12T08:21:35Z

In response to a cherrypick label: new pull request created to branch release-7.1: #52554.

ti-chi-bot · 2024-04-12T08:22:05Z

In response to a cherrypick label: new pull request created to branch release-7.5: #52555.

… (#52554) close #50803

RidRisR added 4 commits February 29, 2024 11:49

draft modify

c2ceffe

before watch pause

5677a30

add taskCh

d803daf

refactor toTaskEvent

498eead

ti-chi-bot bot added do-not-merge/invalid-title do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 1, 2024

RidRisR marked this pull request as draft March 1, 2024 03:31

ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-triage-completed and removed do-not-merge/needs-linked-issue labels Mar 1, 2024

RidRisR added 8 commits March 1, 2024 12:15

refactor going on

76ea516

done

2bfea2a

ready for test

585b334

test event pause

02afa39

use interface

70cefca

ut finished

2cdc292

finish test

3428433

fix test

9a1857b

RidRisR changed the title ~~Advancer~~ Enable advancer to pluge tasks lagged too large Mar 5, 2024

RidRisR changed the title ~~Enable advancer to pluge tasks lagged too large~~ Enable checkpoint advancer to pluge tasks lagged too large Mar 5, 2024

ti-chi-bot bot added release-note and removed do-not-merge/needs-tests-checked release-note-none labels Mar 5, 2024

RidRisR changed the title ~~Enable checkpoint advancer to pluge tasks lagged too large~~ br: Enable checkpoint advancer to pluge tasks lagged too large Mar 5, 2024

ti-chi-bot bot removed the do-not-merge/invalid-title label Mar 5, 2024

RidRisR marked this pull request as ready for review March 5, 2024 06:28

ti-chi-bot bot removed the do-not-merge/needs-triage-completed label Mar 13, 2024

YuJuncen reviewed Mar 13, 2024

View reviewed changes

ti-chi-bot bot merged commit 7548df7 into pingcap:master Mar 13, 2024
30 of 35 checks passed

ti-chi-bot bot added the type/cherry-pick-for-release-6.5 label Mar 25, 2024

ti-chi-bot bot added the type/cherry-pick-for-release-7.1 label Mar 25, 2024

ti-chi-bot bot added the type/cherry-pick-for-release-7.5 label Mar 25, 2024

ti-chi-bot added the needs-cherry-pick-release-6.5 label Mar 26, 2024

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Mar 26, 2024

This is an automated cherry-pick of pingcap#51441

25e70e8

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot mentioned this pull request Mar 26, 2024

br: Enable checkpoint advancer to pause tasks lagged too large (#51441) #52105

Merged

13 tasks

ti-chi-bot bot pushed a commit that referenced this pull request Mar 28, 2024

br: Enable checkpoint advancer to pause tasks lagged too large (#51441)…

9f59322

… (#52105) close #50803

ti-chi-bot bot added the needs-cherry-pick-release-7.1 label Apr 12, 2024

BornChanger added needs-cherry-pick-release-7.5 and removed type/cherry-pick-for-release-6.5 type/cherry-pick-for-release-7.1 type/cherry-pick-for-release-7.5 labels Apr 12, 2024

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Apr 12, 2024

This is an automated cherry-pick of pingcap#51441

989e887

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>

ti-chi-bot mentioned this pull request Apr 12, 2024

br: Enable checkpoint advancer to pause tasks lagged too large (#51441) #52554

Merged

13 tasks

ti-chi-bot mentioned this pull request Apr 12, 2024

br: Enable checkpoint advancer to pause tasks lagged too large (#51441) #52555

Open

13 tasks

ti-chi-bot bot pushed a commit that referenced this pull request Apr 18, 2024

br: Enable checkpoint advancer to pause tasks lagged too large (#51441)…

be2e310

… (#52554) close #50803

ti-chi-bot removed the needs-cherry-pick-release-7.5 label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

br: Enable checkpoint advancer to pause tasks lagged too large #51441

br: Enable checkpoint advancer to pause tasks lagged too large #51441

RidRisR commented Mar 1, 2024 •

edited

Loading

tiprow bot commented Mar 1, 2024

YuJuncen Mar 13, 2024

3pointer commented Mar 19, 2024

BornChanger commented Mar 25, 2024

ti-chi-bot bot commented Mar 25, 2024

BornChanger commented Mar 25, 2024

BornChanger commented Mar 25, 2024

BornChanger commented Mar 25, 2024

ti-chi-bot commented Mar 26, 2024

BornChanger commented Apr 12, 2024

ti-chi-bot commented Apr 12, 2024

ti-chi-bot commented Apr 12, 2024

br: Enable checkpoint advancer to pause tasks lagged too large #51441

br: Enable checkpoint advancer to pause tasks lagged too large #51441

Conversation

RidRisR commented Mar 1, 2024 • edited Loading

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

tiprow bot commented Mar 1, 2024

YuJuncen Mar 13, 2024

Choose a reason for hiding this comment

3pointer commented Mar 19, 2024

BornChanger commented Mar 25, 2024

ti-chi-bot bot commented Mar 25, 2024

BornChanger commented Mar 25, 2024

BornChanger commented Mar 25, 2024

BornChanger commented Mar 25, 2024

ti-chi-bot commented Mar 26, 2024

BornChanger commented Apr 12, 2024

ti-chi-bot commented Apr 12, 2024

ti-chi-bot commented Apr 12, 2024

RidRisR commented Mar 1, 2024 •

edited

Loading