Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

puller(ticdc): fix wrong update splitting behavior after table scheduling #11269

Merged
merged 12 commits into from
Jun 7, 2024

Conversation

lidezhu
Copy link
Collaborator

@lidezhu lidezhu commented Jun 6, 2024

What problem does this PR solve?

Issue Number: close #11219

What is changed and how it works?

After #11030, we introduce a mechanism to get the current timestamp thresholdTS from pd when changefeed starts, and split all update kv entries which commitTS is smaller than the thresholdTS.

This mechanism has the following problem:

  1. There are two cdc nodes A and B, and B start before A, that is thresholdTSB < thresholdTSA;
  2. The sync task of table t is first on A;
  3. Table t has an update event which commitTS is smaller than thresholdTSA and larger than thresholdTSB. So the update event is split to a delete event and an insert event on node A;
  4. But the delete event and insert event cannot be send to the downstream in an atomic way. So if after the delete event is send to downstream and before the insert event being send, the table sync task is scheduling to node B, the update event are received by node B again;
  5. The update event is not split by node B because its commitTS is larger than the thresholdTSB, and node B just send an update sql to downstream which cause data inconsistency;

And there is also another thing to notice that after scheduling, node B will send some events to downstream which are already send by node A; So node B must send these events in an idempotent way;
Previously, this is handled by getting a replicateTS in sink module when sink starts and split these events which commitTS is smaller than replicateTS. But this mechanism is also removed in #11030. So we need to handle this case in puller too.

In this pr, instead of maintaining a separate thresholdTS in sourcemanager, we try to get the replicateTS from sink when puller need to check whether to split the update event.
And since puller module starts working before sink module, so we give replicateTS a default value MAXUInt64 which means to split all update events. After sink starts working, replicateTS will be set to the correct value.

The last thing to notice, when sink restarts due to some error, after restart, the sink may send some events downstream which are already send before restart. These events also need be send in an idempotent way. But these events are already in sorter, so just restart sink cannot accomplish this goal. So we forbid restarting sink in this pr and just restart the changefeed when meet error.

Check List

Tests

  • Manual test (add detailed scripts or steps below)
  1. deploy a cluster with three cdc nodes;
  2. kill all nodes occasionally while running workload and check whether the data is consistent;

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/cherry-pick-not-approved The current cherry-pick pull request has not been approved and cannot be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2024
@lidezhu lidezhu changed the title puller(ticdc): fix split update puller(ticdc): fix wrong update split behavior after scheduling Jun 6, 2024
lidezhu and others added 5 commits June 7, 2024 00:51
Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
Signed-off-by: dongmen <414110582@qq.com>
@lidezhu lidezhu force-pushed the fix-split0606 branch 2 times, most recently from 26b99b2 to 6a88c7c Compare June 7, 2024 03:46
@lidezhu
Copy link
Collaborator Author

lidezhu commented Jun 7, 2024

/test dm-integration-test

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 7, 2024
Copy link
Contributor

ti-chi-bot bot commented Jun 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asddongmen, CharlesCheung96

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CharlesCheung96,asddongmen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

ti-chi-bot bot commented Jun 7, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-06-07 05:22:56.995042025 +0000 UTC m=+96531.048353950: ☑️ agreed by CharlesCheung96.
  • 2024-06-07 05:24:46.429412549 +0000 UTC m=+96640.482724473: ☑️ agreed by asddongmen.

@lidezhu
Copy link
Collaborator Author

lidezhu commented Jun 7, 2024

/test dm-integration-test

2 similar comments
@lidezhu
Copy link
Collaborator Author

lidezhu commented Jun 7, 2024

/test dm-integration-test

@lidezhu
Copy link
Collaborator Author

lidezhu commented Jun 7, 2024

/test dm-integration-test

@lidezhu
Copy link
Collaborator Author

lidezhu commented Jun 7, 2024

/test cdc-integration-pulsar-test

@ti-chi-bot ti-chi-bot added the cherry-pick-approved Cherry pick PR approved by release team. label Jun 7, 2024
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/cherry-pick-not-approved The current cherry-pick pull request has not been approved and cannot be merged. label Jun 7, 2024
@ti-chi-bot ti-chi-bot bot merged commit 7c968ee into pingcap:release-7.5 Jun 7, 2024
12 of 13 checks passed
@lidezhu lidezhu deleted the fix-split0606 branch June 7, 2024 13:32
@CharlesCheung96 CharlesCheung96 added needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. labels Jun 11, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: base branch (release-7.5) needs to differ from target branch (release-7.5).

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 11, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #11281.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #11282.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 11, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #11283.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 11, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
lidezhu pushed a commit to lidezhu/tiflow that referenced this pull request Jun 11, 2024
hicqu added a commit to ti-chi-bot/tiflow that referenced this pull request Jun 12, 2024
commit c092599
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 12 00:26:59 2024 +0800

    pkg/config, sink(ticdc): support output raw change event for mq and cloud storage sink (pingcap#11226) (pingcap#11290)

    close pingcap#11211

commit 3426e46
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 19:40:29 2024 +0800

    puller(ticdc): fix wrong update splitting behavior after table scheduling (pingcap#11269) (pingcap#11282)

    close pingcap#11219

commit 2a28078
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 16:40:37 2024 +0800

    mysql(ticdc): remove error filter when check isTiDB in backend init (pingcap#11214) (pingcap#11261)

    close pingcap#11213

commit 2425d54
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 16:40:30 2024 +0800

    log(ticdc): Add more error query information to the returned error to facilitate users to know the cause of the failure (pingcap#10945) (pingcap#11257)

    close pingcap#11254

commit 053cdaf
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 15:34:30 2024 +0800

    cdc: log slow conflict detect every 60s (pingcap#11251) (pingcap#11287)

    close pingcap#11271

commit 327ba7b
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Tue Jun 11 11:42:00 2024 +0800

    redo(ticdc): return internal error in redo writer (pingcap#11011) (pingcap#11091)

    close pingcap#10124

commit d82ae89
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Mon Jun 10 22:28:29 2024 +0800

    ddl_puller (ticdc): handle dorp pk/uk ddl correctly (pingcap#10965) (pingcap#10981)

    close pingcap#10890

commit f15bec9
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Fri Jun 7 16:16:28 2024 +0800

    redo(ticdc): enable pprof and set memory limit for redo applier (pingcap#10904) (pingcap#10996)

    close pingcap#10900

commit ba50a0e
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 5 19:58:26 2024 +0800

    test(ticdc): enable sequence test (pingcap#11023) (pingcap#11037)

    close pingcap#11015

commit 94b9897
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 5 17:08:56 2024 +0800

    mounter(ticdc): timezone fill default value should also consider tz. (pingcap#10932) (pingcap#10946)

    close pingcap#10931

commit a912d33
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed Jun 5 10:49:25 2024 +0800

    mysql (ticdc): Improve the performance of the mysql sink by refining the transaction event batching logic (pingcap#10466) (pingcap#11242)

    close pingcap#11241

commit 6277d9a
Author: dongmen <20351731+asddongmen@users.noreply.github.com>
Date:   Wed May 29 20:13:22 2024 +0800

    kvClient (ticdc): revert e5999e3 to remove useless metrics (pingcap#11184)

    close pingcap#11073

commit 54e93ed
Author: dongmen <20351731+asddongmen@users.noreply.github.com>
Date:   Wed May 29 17:43:22 2024 +0800

    syncpoint (ticdc): make syncpoint support base64 encoded password (pingcap#11162)

    close pingcap#10516

commit 0ba9329
Author: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Date:   Wed May 29 09:07:21 2024 +0800

    (redo)ticdc: fix the event orderliness in redo log (pingcap#11117) (pingcap#11180)

    close pingcap#11096

Signed-off-by: qupeng <qupeng@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved cherry-pick-approved Cherry pick PR approved by release team. lgtm needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants