Skip to content

kafka-consumer may drop replayed DML events based on HighWatermark and cause downstream inconsistency #4051

@wlwilliamx

Description

@wlwilliamx

What did you do?

  1. Run TiCDC with a Kafka sink (at-least-once delivery) and run cmd/kafka-consumer to apply events to a MySQL/TiDB downstream.
  2. Introduce Kafka network jitter/partition so TiCDC hits send timeouts and the changefeed restarts/retries.
  3. After recovery, run a downstream consistency check (or compare upstream/downstream data).

Note: Under restart/retry, it is expected that older commitTs can appear at larger Kafka offsets (append-only log + replay).

What did you expect to see?

Replay should eventually heal any missing window: kafka-consumer should only ignore events that are already flushed to downstream.

What did you see instead?

kafka-consumer treats commitTs < group.HighWatermark as a duplicate/fallback and ignores it, which can drop replayed DML fragments that were never applied downstream. This can lead to permanent downstream inconsistency.

Example log pattern:

  • "DML event fallback row, since less than the group high watermark, ignore it"

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

N/A

Upstream TiKV version (execute tikv-server --version):

N/A

TiCDC version (execute cdc version):

master @ 45d10ea0fa5879d4e50c775b95457214cb17717c

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions