Skip to content

fix(menubar): run CLI exit-wait and timeout off the cooperative pool#426

Merged
iamtoruk merged 1 commit into
mainfrom
fix/menubar-dataclient-deadlock
Jun 2, 2026
Merged

fix(menubar): run CLI exit-wait and timeout off the cooperative pool#426
iamtoruk merged 1 commit into
mainfrom
fix/menubar-dataclient-deadlock

Conversation

@iamtoruk
Copy link
Copy Markdown
Member

@iamtoruk iamtoruk commented Jun 2, 2026

Summary

The menubar wedged on "Loading Today…" for hours after an idle period. This moves the two blocking points in DataClient.runCLI off Swift's cooperative thread pool so a saturated pool can no longer deadlock the CLI calls or their timeout.

Root cause

DataClient.runCLI called the blocking process.waitUntilExit() from an async function on the cooperative thread pool. On a 16-core machine, 16 concurrent slow codeburn subprocesses pinned all 16 cooperative threads inside waitUntilExit; the 45s timeout — itself a Task on that same pool — could then never be scheduled to kill them, so the deadlock was permanent. Confirmed via sample: 16/16 cooperative threads parked in waitUntilExit.

PR #412 (AppStore inFlightKeys bookkeeping) sat a layer above this OS-thread deadlock and could not fix it.

Fix

  • Bridge waitUntilExit through a global (overcommit) queue via a continuation.
  • Drive the timeout from a DispatchSource on a global queue so it fires even when the cooperative pool is saturated.
  • Extract runProcess for testability.

Testing

  • New DataClientProcessTests: concurrency + timeout smoke test, output/exit-code test.
  • Running clean on a local dev build through a multi-hour idle soak — the exact scenario that previously wedged.

The menubar wedged on "Loading Today…" for hours after an idle period.
Root cause: DataClient.runCLI called the blocking process.waitUntilExit()
from an async function on Swift's cooperative thread pool. On a 16-core
machine, 16 concurrent slow `codeburn` subprocesses pinned all 16
cooperative threads inside waitUntilExit; the 45s timeout — itself a Task
on that same pool — could then never be scheduled to kill them, so the
deadlock was permanent. Confirmed via sample: 16/16 cooperative threads
parked in waitUntilExit. PR #412 (AppStore inFlightKeys bookkeeping) was a
layer above the OS-thread deadlock and could not fix it.

Move both blocking points off the cooperative pool: bridge waitUntilExit
through a global (overcommit) queue via a continuation, and drive the
timeout from a DispatchSource on a global queue so it fires even when the
pool is saturated. Extract runProcess for testability; add a concurrency +
timeout smoke test and an output/exit-code test.
@iamtoruk iamtoruk merged commit bec0491 into main Jun 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant