Skip to content

throttle() blocks forever when context is cancelled during abort #1670

@ggilder

Description

@ggilder

Summary

When abort() is called (e.g. due to heartbeat failures after a MySQL failover), it cancels the migration context and causes initiateThrottlerChecks to exit via ctx.Done(). However, initiateThrottlerChecks exits without calling SetThrottled(false, ...), leaving isThrottled permanently true.

The throttle() loop only checks IsThrottled() and has no awareness of context cancellation, so it spins indefinitely — preventing the migration goroutine from ever returning and leaving the process deadlocked.

Reproduction scenario

  1. Migration is running and throttled (e.g. due to replica lag)
  2. A MySQL failover occurs — the applier's connection starts hitting Error 1290: read-only
  3. injectHeartbeat fails MaxRetries() times → PanicAbort is sent → abort() is called
  4. Context is cancelled → initiateThrottlerChecks exits via ctx.Done()
  5. isThrottled remains true — the goroutine that would set it false has exited
  6. throttle() loops forever; the migration process never exits

Observable symptoms

  • Status line keeps printing with Lag: and HeartbeatLag: counting up (other goroutines still alive)
  • State: throttled, lag=Xs is frozen at the lag value from when throttle first triggered
  • Process will not terminate without SIGKILL

Root cause

throttle() uses time.Sleep(250ms) with no ctx.Done() check:

func (thlr *Throttler) throttle(onThrottled func()) {
    for {
        if shouldThrottle, _, _ := thlr.migrationContext.IsThrottled(); !shouldThrottle {
            return
        }
        if onThrottled != nil {
            onThrottled()
        }
        time.Sleep(250 * time.Millisecond) // no ctx.Done() check
    }
}

When the goroutine responsible for calling SetThrottled(false, ...) exits due to context cancellation, nothing else can unblock this loop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions