Skip to content

maintainer can be overwhelmed by stale BlockStatusRequest resend loops and blocking for hours #4957

@asddongmen

Description

@asddongmen

Summary

Image

Under workloads with syncpoint enabled and many DDLs, the maintainer can spend excessive time handling BlockStatusRequest messages and repeatedly log maintainer is too slow. The current suspicion is that stale block-status resend tasks from obsolete dispatchers are never terminated after topology changes, so they keep resending WAITING barrier statuses for hours.

Evidence

  1. Slow maintainer handling:
[2026/04/30 09:45:38.278 +08:00] [INFO] [maintainer.go:318] ["maintainer is too slow"] [changefeedID=default/cdc-primary-to-secondary] [eventType=1] [duration=1h50m39.986192884s] [from=34dce5a8-64b8-4b20-b4ab-e57a4c0419e9] [to=] [type=BlockStatusRequest] [topic=]
  1. Extremely long-lived resend task for a syncpoint WAITING status:
[2026/04/30 10:16:20.980 +08:00] [INFO] [helper.go:293] ["resend task periodic resend"] [dispatcherID=127019155305713984904401939242792550840] [message="ID:<...> state:<IsBlocked:true BlockTs:465954228142080000 BlockTables:<> IsSyncPoint:true stage:WAITING > "] [executeCount=8160]

With a 5-second resend interval, executeCount=8160 means the task has been retrying for about 11 hours.

Suspected root cause

After dispatcher replacement, split/merge, or DDL-driven topology changes, old dispatchers may continue resending stale WAITING block statuses. Maintainer currently ignores some of these requests as coming from non-replicating or nonexistent dispatchers, but ignoring them does not terminate the resend loop. Over time, these stale retries accumulate and create sustained BlockStatusRequest pressure on the maintainer event loop.

Impact

  • excessive maintainer event handling latency
  • noisy slow logs
  • risk of barrier backlog growth under syncpoint + heavy DDL workloads

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions