Skip to content

maintainer: fallback may unexpectedly reschedule removed dispatchers #4874

@wlwilliamx

Description

@wlwilliamx

What did you do?

On the release-8.5 branch, the maintainer fallback logic marks a span absent when it receives a dispatcher terminal status (Stopped or Removed) and there is no active operator for that dispatcher.

This fallback was intended to recover from maintainer failover cases where operator state is lost after a dispatcher is removed as part of move/split scheduling. However, the same signal can also appear in normal cleanup paths where the dispatcher should stay removed.

What did you expect to see?

The maintainer should not recreate a dispatcher that was legitimately removed. Terminal dispatcher statuses without an operator should be classified by the original scheduling/removal intent before deciding whether to reschedule the span.

What did you see instead?

The fallback may mark the span absent and let the scheduler add the dispatcher back unexpectedly. On release-8.5, temporarily disabling this fallback is needed as a stopgap while a safer recovery path is designed.

Versions of the cluster

Upstream TiDB cluster version:

Not version-specific.

Upstream TiKV version:

Not version-specific.

TiCDC version:

release-8.5 branch

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects-8.5This bug affects the 8.5.x(LTS) versions.severity/moderatetype/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions