CDC cloud: [CDC:ErrMetaListDatabases][tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2021-08-05 16:54:32.127 +0800 CST, GC safe point is 2021-08-05 17:25:38.127 +0800 CST"] #2470

Tammyxia · 2021-08-05T09:50:37Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

What did you do? If possible, provide a recipe for reproducing the error.

Create new changefeed:

Pause it.
Resume it. $ tiup cdc:v4.0.14 cli changefeed resume -c cc-task-1 --pd=http://172.16.6.28:2379

What did you expect to see?

No error

What did you see instead?

Too many logs:

Versions of the cluster
- Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):
```
4.0.14
```
- TiCDC version (execute cdc version):
```
4.0.14 cloud Aug/03
```

The text was updated successfully, but these errors were encountered:

amyangfei · 2021-08-06T01:59:45Z

Could you provide cdc log and pd log, we can check the service GC safepoint and GC safepoint from these logs @Tammyxia

Tammyxia · 2021-08-06T03:08:04Z

I cannot upload logs to here, you know what the reason is. Private message to you...

Tammyxia · 2021-08-06T04:02:00Z

This issue caused changefeed cannot sync data any more, when gc-ttl was timeout , this changefeed status became to failed.

amyangfei · 2021-08-06T06:32:07Z

[2021/08/05 16:15:38.166 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426809947095564288]
[2021/08/05 16:25:38.166 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426810104381964288]
[2021/08/05 16:35:38.177 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426810261668364288]
[2021/08/05 16:45:38.166 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426810418954764288]
[2021/08/05 16:54:06.378 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=ticdc] [expire-at=1628240046] [safepoint=18446744073709551615]
[2021/08/05 16:54:06.382 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=ticdc] [expire-at=1628240046] [safepoint=18446744073709551615]
[2021/08/05 16:54:06.387 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=ticdc] [expire-at=1628240046] [safepoint=18446744073709551615]
[2021/08/05 16:54:06.449 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=ticdc] [expire-at=1628240046] [safepoint=18446744073709551615]
[2021/08/05 16:54:32.151 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=ticdc-creating-cc-task-1] [expire-at=1628154272] [safepoint=426810716226060299]
[2021/08/05 16:54:48.856 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=ticdc-creating-cc-task-1] [expire-at=1628154288] [safepoint=426810716226060299]
[2021/08/05 16:55:38.172 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426810576241164288]
[2021/08/05 17:05:38.230 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426810733527564288]
[2021/08/05 17:15:38.169 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426810890813964288]
[2021/08/05 17:25:38.166 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426811048100364288]
[2021/08/05 17:35:38.166 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426811205386764288]
[2021/08/05 17:45:38.166 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426811362673164288]
[2021/08/05 17:55:38.166 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426811519959564288]
[2021/08/05 18:05:38.165 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426811677245964288]
[2021/08/05 18:15:38.165 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426811834532364288]
[2021/08/05 18:25:38.165 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426811991818764288]
[2021/08/05 18:35:38.167 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426812149105164288]
[2021/08/05 18:45:38.172 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426812306391564288]
[2021/08/05 18:46:13.791 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=ticdc] [expire-at=1628246773] [safepoint=18446744073709551615]
[2021/08/05 18:47:13.789 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=ticdc] [expire-at=1628246833] [safepoint=18446744073709551615]

Notice these logs:

[2021/08/05 16:55:38.172 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426810576241164288]
[2021/08/05 17:05:38.230 +08:00] [INFO] [grpc_service.go:815] ["update service GC safe point"] [service-id=gc_worker] [expire-at=9223372036854775807] [safepoint=426810733527564288]

[2021/08/05 16:55:38.172 +08:00] gc_worker tso=426810576241164288, system: 2021-08-05 16:45:38.127 +0800 CST
[2021/08/05 17:05:38.230 +08:00] gc_worker tso=426810733527564288, system: 2021-08-05 16:55:38.127 +0800 CST

amyangfei · 2021-08-06T07:16:53Z

Some ticdc owner log

add table log, first and last, 60010 tables totally
[2021/08/05 16:54:58.836 +08:00] [INFO] [owner.go:268] ["add a table"] [tableId=167822] [boundaryTs=426810716226060299]
[2021/08/05 16:57:24.124 +08:00] [INFO] [owner.go:268] ["add a table"] [tableId=187326] [boundaryTs=426810716226060299]

add tables in owner costs 146s

[2021/08/05 16:57:24.129 +08:00] [INFO] [scheduler.go:346] ["Start rebalancing"] [changefeed=cc-task-1] [table-num=60010] [capture-num=9] [target-limit=6668]
[2021/08/05 17:00:33.137 +08:00] [INFO] [etcd_worker.go:571] ["Committing big txn successful"]
[2021/08/05 17:10:21.491 +08:00] [WARN] [gc_manager.go:101] ["update gc safe point failed, the gc safe point is larger than checkpointTs"] [actual=426810733527564288] [checkpointTs=426810716226060299]
[2021/08/05 17:10:21.492 +08:00] [INFO] [feed_state_manager.go:90] ["handle admin job"] [changefeedID=cc-task-1] [job="{\"CfID\":\"cc-task-1\",\"Type\":1,\"Opts\":{\"ForceRemove\":false},\"Error\":null}"]
[2021/08/05 17:10:21.567 +08:00] [ERROR] [changefeed.go:106] ["an error occurred in Owner"] [changefeedID=cc-task-1] [error="[CDC:ErrSnapshotLostByGC]fail to create or maintain changefeed due to snapshot loss caused by GC. checkpoint-ts 426810716226060299 is earlier than GC safepoint at 426810733527564288"] [errorVerbose="[CDC:ErrSnapshotLostByGC]fail to create or maintain changefeed due to snapshot loss caused by GC. checkpoint-ts 426810716226060299 is earlier than GC safepoint at 426810733527564288\ngithub.com/pingcap/errors.AddStack\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithStackByArgs\n\tgithub.com/pingcap/errors@v0.11.5-0.20201126102027-b0a155152ca3/normalize.go:156\ngithub.com/pingcap/ticdc/cdc/owner.(*gcManager).CheckStaleCheckpointTs\n\tgithub.com/pingcap/ticdc/cdc/owner/gc_manager.go:136\ngithub.com/pingcap/ticdc/cdc/owner.(*changefeed).tick\n\tgithub.com/pingcap/ticdc/cdc/owner/changefeed.go:128\ngithub.com/pingcap/ticdc/cdc/owner.(*changefeed).Tick\n\tgithub.com/pingcap/ticdc/cdc/owner/changefeed.go:105\ngithub.com/pingcap/ticdc/cdc/owner.(*Owner).Tick\n\tgithub.com/pingcap/ticdc/cdc/owner/owner.go:134\ngithub.com/pingcap/ticdc/pkg/orchestrator.(*EtcdWorker).Run\n\tgithub.com/pingcap/ticdc/pkg/orchestrator/etcd_worker.go:217\ngithub.com/pingcap/ticdc/cdc/capture.(*Capture).runEtcdWorker\n\tgithub.com/pingcap/ticdc/cdc/capture/capture.go:264\ngithub.com/pingcap/ticdc/cdc/capture.(*Capture).campaignOwner\n\tgithub.com/pingcap/ticdc/cdc/capture/capture.go:243\ngithub.com/pingcap/ticdc/cdc/capture.(*Capture).run.func2\n\tgithub.com/pingcap/ticdc/cdc/capture/capture.go:172\nruntime.goexit\n\truntime/asm_amd64.s:1371"][2021/08/05 17:10:21.567 +08:00] [INFO] [changefeed.go:239] ["close changefeed"] [changefeed=cc-task-1] [info="{\"sink-uri\":\"***\",\"opts\":{\"_changefeed_id\":\"cc-task-1\"},\"create-time\":\"2021-08-05T16:54:32.157934864+08:00\",\"start-ts\":426810716226060299,\"target-ts\":0,\"admin-job-type\":0,\"sort-engine\":\"unified\",\"sort-dir\":\"\",\"config\":{\"case-sensitive\":true,\"enable-old-value\":false,\"force-replicate\":false,\"check-gc-safe-point\":true,\"filter\":{\"rules\":[\"*.*\"],\"ignore-txn-start-ts\":null},\"mounter\":{\"worker-num\":16},\"sink\":{\"dispatchers\":null,\"protocol\":\"default\"},\"cyclic-replication\":{\"enable\":false,\"replica-id\":0,\"filter-replica-ids\":null,\"id-buckets\":0,\"sync-ddl\":false},\"scheduler\":{\"type\":\"table-number\",\"polling-time\":-1}},\"state\":\"normal\",\"history\":null,\"error\":null,\"sync-point-enabled\":false,\"sync-point-interval\":600000000000,\"creator-version\":\"v4.0.14\"}"][2021/08/05 17:10:21.624 +08:00] [INFO] [region_range_lock.go:370] ["unlocked range"] [lockID=33148] [regionID=2037989] [startKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006c00000000fb] [endKey=6d44444c4a6f6241ff64ff644964784c69ff7374ff0000000000ff000000f700000000ff0000006d00000000fb] [checkpointTs=426810964974501922]

From log [2021/08/05 17:10:21.491 +08:00] [WARN] [gc_manager.go:101] ["update gc safe point failed, the gc safe point is larger than checkpointTs"] [actual=426810733527564288] [checkpointTs=426810716226060299], the first time cdc owner udpates service GC safepoint, the safepoint has been moved forward.

It happens when table count is large, in this scenario 60K, and the initialize phase is slower than cdcChangefeedCreatingServiceGCSafePointTTL, which is 10 minutes by default.

Suggestion, we can increase the cdcChangefeedCreatingServiceGCSafePointTTL to a reasonable value as a workaround, before TiCDC can guarantee the initialization phase is less than 10 minutes.

amyangfei · 2021-08-06T07:32:16Z

Another problem is not clear, what happens between [2021/08/05 17:00:33.137 +08:00] and [2021/08/05 17:10:21.491 +08:00], seems TiCDC owner blocked in somewhere, but we can't find the root cause from logs only.

overvenus · 2021-08-09T16:10:51Z

Suggestion, we can increase the cdcChangefeedCreatingServiceGCSafePointTTL to a reasonable value as a workaround, before TiCDC can guarantee the initialization phase is less than 10 minutes.

As a workaround, let's increase the the ttl to 1 hour to mitigate the issue. (We have tested in a cluster that captures 60k tables, it works).

overvenus · 2021-09-01T11:50:29Z

Set severity to major as it only occurs when PD is extremely slow (means it's hard to reproduce in production).

overvenus · 2021-09-01T12:36:45Z

FYI, I am working on #2512 this week.

Tammyxia added type/bug This is a bug. severity/major This is a major bug. labels Aug 5, 2021

3AceShowHand added this to Need Triage in Question and Bug Reports via automation Aug 5, 2021

Tammyxia added severity/critical This is a critical bug. and removed severity/major This is a major bug. labels Aug 6, 2021

overvenus mentioned this issue Aug 12, 2021

owner, gcutil: always update service GC safepoint when owner finds new changefeeds #2512

Merged

overvenus moved this from Need Triage to In Progress in Question and Bug Reports Aug 12, 2021

overvenus added this to To do in Capturing 60K Tables via automation Aug 12, 2021

jebter assigned overvenus Aug 13, 2021

overvenus moved this from To do to In progress in Capturing 60K Tables Aug 15, 2021

overvenus added severity/major This is a major bug. and removed severity/critical This is a critical bug. labels Sep 1, 2021

ti-chi-bot closed this as completed in #2512 Sep 17, 2021

Capturing 60K Tables automation moved this from In progress to Done Sep 17, 2021

Question and Bug Reports automation moved this from In Progress to Done Sep 17, 2021

AkiraXie added the area/ticdc Issues or PRs related to TiCDC. label Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDC cloud: [CDC:ErrMetaListDatabases][tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2021-08-05 16:54:32.127 +0800 CST, GC safe point is 2021-08-05 17:25:38.127 +0800 CST"] #2470

CDC cloud: [CDC:ErrMetaListDatabases][tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2021-08-05 16:54:32.127 +0800 CST, GC safe point is 2021-08-05 17:25:38.127 +0800 CST"] #2470

Tammyxia commented Aug 5, 2021

amyangfei commented Aug 6, 2021

Tammyxia commented Aug 6, 2021

Tammyxia commented Aug 6, 2021

amyangfei commented Aug 6, 2021

amyangfei commented Aug 6, 2021 •

edited

Loading

amyangfei commented Aug 6, 2021

overvenus commented Aug 9, 2021

overvenus commented Sep 1, 2021

overvenus commented Sep 1, 2021

CDC cloud: [CDC:ErrMetaListDatabases][tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2021-08-05 16:54:32.127 +0800 CST, GC safe point is 2021-08-05 17:25:38.127 +0800 CST"] #2470

CDC cloud: [CDC:ErrMetaListDatabases][tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2021-08-05 16:54:32.127 +0800 CST, GC safe point is 2021-08-05 17:25:38.127 +0800 CST"] #2470

Comments

Tammyxia commented Aug 5, 2021

Bug Report

amyangfei commented Aug 6, 2021

Tammyxia commented Aug 6, 2021

Tammyxia commented Aug 6, 2021

amyangfei commented Aug 6, 2021

amyangfei commented Aug 6, 2021 • edited Loading

amyangfei commented Aug 6, 2021

overvenus commented Aug 9, 2021

overvenus commented Sep 1, 2021

overvenus commented Sep 1, 2021

amyangfei commented Aug 6, 2021 •

edited

Loading