-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sink(cdc): fix internal retry algothrim #9530
Conversation
Signed-off-by: qupeng <qupeng@pingcap.com>
Signed-off-by: qupeng <qupeng@pingcap.com>
Signed-off-by: qupeng <qupeng@pingcap.com>
Signed-off-by: qupeng <qupeng@pingcap.com>
/test verify |
Signed-off-by: qupeng <qupeng@pingcap.com>
Signed-off-by: qupeng <qupeng@pingcap.com>
/test cdc-integration-mysql-test |
Signed-off-by: qupeng <qupeng@pingcap.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: asddongmen, CharlesCheung96 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
In response to a cherrypick label: new pull request created to branch |
/need-cherry-pick-release-6.5 |
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
close pingcap#9518 Signed-off-by: qupeng <qupeng@pingcap.com>
What problem does this PR solve?
Issue Number: close #9518
What is changed and how it works?
Term Definition
Prior to this PR, if the
DDLSink
orDMLSink
returned the same error continuously, theSinkManager
would retry it for 30 minutes. If different errors occurred, theSinkManager
would reset the retry time counter. Once the 30 minutes were up, the sink manager would report anUnretryableError
to theowner
and the changefeed would fail immediately. However, theSinkManager
couldn't accurately determine whether the error returned by theDDLSink
orDMLSink
was the same error (It does not reset the retry time counter when correctly), leading to incorrectUnretryableError
reports and unexpected changefeed failures.With this PR, if the
DDLSink
orDMLSink
returns the same error continuously, theSinkManager
will continue retrying indefinitely but will report a warning to theowner
. Since theowner
has a broader view of the changefeed, it can determine if the error reported by thesinkManager
is the same error. This PR takes the checkpoint into account to make the decision. The underlying logic is simple: if the checkpoint advanced since the last error was reported, the subsequent error is not the same as the previous error. Thus, the retry time is reset to ensure that the subsequent error will be also retry for 30 minutes.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note