Skip to content

fix(audio): move entire engine setup off main thread to prevent HAL hang#93

Open
skhe wants to merge 1 commit intomissuo:mainfrom
skhe:fix/audio-capture-hal-hang
Open

fix(audio): move entire engine setup off main thread to prevent HAL hang#93
skhe wants to merge 1 commit intomissuo:mainfrom
skhe:fix/audio-capture-hal-hang

Conversation

@skhe
Copy link
Copy Markdown

@skhe skhe commented Apr 22, 2026

Fixes #92 (follow-up to #77).

Background

#77 fixed one variant of "Koe freezes at hotkey-hold, process alive, ~30% CPU, no recovery short of killing it" by moving -[AVAudioEngine startAndReturnError:] to a background queue guarded by a 3 s dispatch_semaphore_wait. Since then I hit the exact same symptom again on macOS 26.3 / 1.0.14 (15), but the sample shows the main thread stuck earlier, before the existing timeout gate:

-[SPHotkeyMonitor holdTimerFired]
 -> -[SPAppDelegate hotkeyMonitorDidDetectHoldStart]
   -> -[SPAudioCaptureManager startCaptureWithAudioCallback:] + 180
     -> -[AVAudioEngine inputNode]
       -> AVAudioEngineImpl::UpdateInputNode
         -> AVAudioIONodeImpl::GetInputFormat
           -> AVAudioIOUnit::GetHWFormat     (dispatch_sync into AVAudioIOUnit queue)
             -> AVAEHalUtil::GetSubDevices
               -> AudioObjectGetPropertyData_mac_imp
                 -> HALC_Object_GetPropertyData_DAI32
                   -> mach_msg2_trap    (100% of 2243 samples)

All of the CoreAudio-touching calls in startCaptureWithAudioCallback:-[AVAudioEngine inputNode], AudioUnitSetProperty(..., CurrentDevice, ...), -[AVAudioNode outputFormatForBus:], -[AVAudioInputNode installTapOnBus:...], -[AVAudioEngine prepare] — internally dispatch_sync onto the AVAudioIOUnit serial queue, which in turn talks to coreaudiod over mach_msg. After a Bluetooth route change, aggregate-device reconfiguration, or other HAL glitches, that reply can stall indefinitely, freezing the entire app.

#77 only moved startAndReturnError: off-main, so whenever the HAL stalls before that (as it does for me now), the code never even reaches the existing timeout gate.

Change

Move the full engine setup + start sequence into the background dispatch_async that was already used for startAndReturnError:, and gate the whole thing behind the same kEngineStartTimeoutSec semaphore. On timeout, the main thread returns NO; SPAppDelegate.startAudioCaptureWithRetry still retries once after 500 ms, matching the existing recovery path.

To avoid leaking a live capture session when the main thread has already given up but the background setup eventually succeeds, a mainAborted flag guarded by a shared lock tells the background block to [engine stop] the late-started engine instead of publishing it.

Diff summary:

  • KoeApp/Koe/Audio/SPAudioCaptureManager.m
    • Wraps engine alloc, inputNode, AudioUnitSetProperty, outputFormatForBus:, converter init, installTapOnBus:, prepare, and startAndReturnError: in one background dispatch_async(QOS_CLASS_USER_INITIATED, 0) block.
    • Keeps the same 3 s dispatch_semaphore_wait timeout and the same error-return shape (return NO), so startAudioCaptureWithRetry works unchanged.
    • Adds mainAborted + setupLock to cleanly stop the engine if the background block finishes after the main thread has already bailed out.
    • Preserves all existing behavior: AudioUnitSetProperty failure still rebuilds a fresh engine; the invalid-format guard (channelCount == 0 || sampleRate <= 0) still aborts; the tap block is byte-for-byte the same.

No public API changes. No config or UX changes.

Testing

  • Hot reload of xcodebuild -scheme Koe-lite is broken on my box (unrelated Xcode plug-in issue), but xcrun clang -fsyntax-only passes with no new warnings on the modified file.
  • Hold-to-talk and tap-to-toggle both still go through startCaptureWithAudioCallback: and the startAudioCaptureWithRetry wrapper in SPAppDelegate, so the normal path is unchanged; only the timing/threading of the pre-existing steps moves off the main thread.
  • Reproduction of the hang itself is racy (needs a specific coreaudiod stall, which I hit a few times over several weeks with AirPods reconnect / aggregate-device churn), so I wasn't able to force it on demand in a clean run. What the change guarantees is that whenever AVAudioEngine's internal dispatch_sync into AVAudioIOUnit stalls on HAL IPC, the main thread unblocks after kEngineStartTimeoutSec (3 s) instead of forever — putting it on the same footing as the start() hang that #77 already handles.

Happy to iterate on the locking style or the timeout constant if you'd prefer.

missuo#77's fix only wrapped startAndReturnError: in a background queue with
a 3 s timeout, but the earlier CoreAudio-touching calls in
-startCaptureWithAudioCallback: were still running on the main thread:

  -[AVAudioEngine inputNode]   (triggers UpdateInputNode -> GetHWFormat)
  AudioUnitSetProperty(..., CurrentDevice, ...)
  -[AVAudioNode outputFormatForBus:]
  -[AVAudioInputNode installTapOnBus:...]
  -[AVAudioEngine prepare]

All of them synchronously dispatch onto the AVAudioIOUnit serial queue,
which in turn talks to coreaudiod via mach_msg. After a Bluetooth route
change or other HAL glitch the reply can stall indefinitely and freeze
the whole app (process alive, ~30% CPU, no recovery short of killing
the process).

Move the full setup+start sequence into the existing background dispatch
and gate it with the same kEngineStartTimeoutSec semaphore. On timeout
the main thread returns NO (SPAppDelegate.startAudioCaptureWithRetry
retries once after 500 ms). A late-completing background setup whose
main thread already gave up is detected via a mainAborted flag and the
late-started engine is stopped to avoid leaking a live capture session.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Koe 仍会卡死在开始录音(#77 后续):inputNode / installTapOnBus / prepare 在主线程被 CoreAudio HAL 阻塞

1 participant