Skip to content

Fix deletion sequence to avoid a race condition#3031

Merged
vigoo merged 1 commit intomainfrom
flaky-fix-2
Mar 19, 2026
Merged

Fix deletion sequence to avoid a race condition#3031
vigoo merged 1 commit intomainfrom
flaky-fix-2

Conversation

@vigoo
Copy link
Contributor

@vigoo vigoo commented Mar 18, 2026

Trying to fix flakyness of the invoking_worker_while_its_getting_deleted_works test case.

Changes:

  • New FinalWorkerState enum — encodes whether a stop should end in Unloaded (restartable) or Deleting (terminal)
  • StoppingWorker now carries final_state — so during shutdown, the intended destination is locked in while the mutex is held
  • stop_internal / stop_internal_locked accept final_state — if a delete arrives while already stopping, it upgrades the final state to Deleting
  • handle_stop_result reads final_state from the current StoppingWorker instead of hardcoding Unloaded
  • start_deleting simplified — just calls stop_internal(..., FinalWorkerState::Deleting) directly

The behavior that changes:

Before: Delete → stop worker → release lock → re-acquire → see Unloaded — but the invoke loop can grab the lock in that gap and restart from Unloaded

After: Delete → Stopping { final_state: Deleting } set while holding the lock → lock_non_stopping_worker() blocks invokers during shutdown → loop exits → handle_stop_result finalizes to Deleting → invokers wake up, see Deleting, and fail.

@vigoo vigoo enabled auto-merge (squash) March 18, 2026 19:19
@vigoo vigoo merged commit dcd2b38 into main Mar 19, 2026
29 checks passed
@vigoo vigoo deleted the flaky-fix-2 branch March 19, 2026 13:49
@github-actions github-actions bot locked and limited conversation to collaborators Mar 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants