Summary
When abort() is called (e.g. due to heartbeat failures after a MySQL failover), it cancels the migration context and causes initiateThrottlerChecks to exit via ctx.Done(). However, initiateThrottlerChecks exits without calling SetThrottled(false, ...), leaving isThrottled permanently true.
The throttle() loop only checks IsThrottled() and has no awareness of context cancellation, so it spins indefinitely — preventing the migration goroutine from ever returning and leaving the process deadlocked.
Reproduction scenario
- Migration is running and throttled (e.g. due to replica lag)
- A MySQL failover occurs — the applier's connection starts hitting
Error 1290: read-only
injectHeartbeat fails MaxRetries() times → PanicAbort is sent → abort() is called
- Context is cancelled →
initiateThrottlerChecks exits via ctx.Done()
isThrottled remains true — the goroutine that would set it false has exited
throttle() loops forever; the migration process never exits
Observable symptoms
- Status line keeps printing with
Lag: and HeartbeatLag: counting up (other goroutines still alive)
State: throttled, lag=Xs is frozen at the lag value from when throttle first triggered
- Process will not terminate without
SIGKILL
Root cause
throttle() uses time.Sleep(250ms) with no ctx.Done() check:
func (thlr *Throttler) throttle(onThrottled func()) {
for {
if shouldThrottle, _, _ := thlr.migrationContext.IsThrottled(); !shouldThrottle {
return
}
if onThrottled != nil {
onThrottled()
}
time.Sleep(250 * time.Millisecond) // no ctx.Done() check
}
}
When the goroutine responsible for calling SetThrottled(false, ...) exits due to context cancellation, nothing else can unblock this loop.
Summary
When
abort()is called (e.g. due to heartbeat failures after a MySQL failover), it cancels the migration context and causesinitiateThrottlerChecksto exit viactx.Done(). However,initiateThrottlerChecksexits without callingSetThrottled(false, ...), leavingisThrottledpermanentlytrue.The
throttle()loop only checksIsThrottled()and has no awareness of context cancellation, so it spins indefinitely — preventing the migration goroutine from ever returning and leaving the process deadlocked.Reproduction scenario
Error 1290: read-onlyinjectHeartbeatfailsMaxRetries()times →PanicAbortis sent →abort()is calledinitiateThrottlerChecksexits viactx.Done()isThrottledremainstrue— the goroutine that would set itfalsehas exitedthrottle()loops forever; the migration process never exitsObservable symptoms
Lag:andHeartbeatLag:counting up (other goroutines still alive)State: throttled, lag=Xsis frozen at the lag value from when throttle first triggeredSIGKILLRoot cause
throttle()usestime.Sleep(250ms)with noctx.Done()check:When the goroutine responsible for calling
SetThrottled(false, ...)exits due to context cancellation, nothing else can unblock this loop.