Make testRunawayProcess() deterministic#295
Merged
iCharlesHu merged 1 commit intoJun 3, 2026
Merged
Conversation
On cancellation, the test's teardown sent the child `SIGINT`, escalating to `SIGKILL` after 100ms; the `INT` trap was meant to kill the runaway grandchild and `exit 0`, and the child otherwise blocked in `wait "$child_pid"`. It failed intermittently in two ways, both from trapping an asynchronous signal around a blocking `wait`. First, `signaled(11)`: bash interrupted the `wait` for `SIGINT` via `siglongjmp`, and on Darwin's system bash (3.2, arm64e) that `longjmp` occasionally failed pointer authentication and crashed the child with `SIGSEGV`. Second, `signaled(9)`: if the signal landed after bash last checked pending traps, but before it blocked in `wait`, the trap was recorded but never run. A `wait` on a child that never exited offers no further safe point, so the child lived until teardown escalated to `SIGKILL`. Send `SIGTERM` instead. Its trap runs at a deferred safe point rather than `longjmp`-ing out of the handler, so the crash cannot occur. Poll with `kill -0`/`sleep` rather than blocking in `wait`, so a deferred trap runs within one short interval; also widen the grace period so it comfortably exceeds that interval and the trap is serviced before escalation.
jakepetroules
approved these changes
Jun 3, 2026
iCharlesHu
approved these changes
Jun 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
testRunawayProcess()fails about 0.05% of the time on macOS.On cancellation, the test's teardown sent the child
SIGINT, escalating toSIGKILLafter 100ms; theINTtrap was meant to kill the runaway grandchild andexit 0, and the child otherwise blocked inwait "$child_pid". It failed intermittently in two ways, both from trapping an asynchronous signal around a blockingwait.First,
signaled(11): bash interrupts thewaitforSIGINTviasiglongjmp, and on Darwin's system bash (3.2, arm64e) thatlongjmpoccasionally fails pointer authentication and crashes the child withSIGSEGV:Second,
signaled(9): If the signal lands after bash last checks pending traps, but before it blocks inwait, the trap is recorded but never run. Awaiton a child that never exits offers no further safe point, so the child lives until teardown escalates toSIGKILL.This PR sends
SIGTERMinstead. Its trap runs at a deferred safe point rather thanlongjmp-ing out of the handler, so the crash cannot occur. Poll withkill -0/sleeprather than blocking inwait, so a deferred trap runs within one short interval; also widen the grace period so it comfortably exceeds that interval and the trap is serviced before escalation. This test passes when run 10,000 times consecutively.I considered removing bash entirely, but that would require a purpose-built helper for various platforms and didn't seem worth it.