fix(detached): use next-exit wait in executeDetached#6
Merged
Conversation
- Mirror the existing fix in attach() (commit 89f0c70). The default
'not-running' wait condition is satisfied by the 'created' state,
so wait() could resolve immediately when armed in the brief window
after start() returned but before the daemon flipped the container
to 'running'.
- Symptom: execution.result resolved in milliseconds for a long-running
detached run, breaking downstream cancel() and reporting a false
succeeded status before the container had actually exited.
- One-line fix: pass { condition: 'next-exit' } to container.wait().
- Add test/e2e/detached-wait.test.ts: races result against a 2s
sentinel for a sleep 5 container, asserts the sentinel wins and
elapsed >= 4500ms once result eventually resolves.
build with cc
5 tasks
enixCode
added a commit
that referenced
this pull request
May 5, 2026
## Summary PR #6 fixed the wrong race. The actual root cause is that **both** `container.wait()` variants are unsafe for `executeDetached`: | variant | failure mode | |---|---| | default `'not-running'` | matches `'created'` state, so wait armed before daemon transitions out of `'created'` resolves immediately with `StatusCode 0` (false success). This is what #6 attempted to fix. | | `'next-exit'` (#6's fix) | requires the container to still be running at call time, hangs forever for fast-exit commands or post-cancel containers whose exit event already fired. | Locally, **5 of 6 detached e2e tests reproduced the #6 hang**, all timing out at 60s on `echo`-style commands and on the cancel scenario. The CI run on this branch's predecessor burned ~6 h before GitHub auto-cancelled it. ## Fix Replace `container.wait()` with a polling loop on `container.inspect()` (new `waitForDetachedExit` helper). Race-free because AutoRemove is disabled for detached runs, so `'exited'` state stays observable until `executeDetached`'s finally tears the container down. Detection latency is at most one 100 ms tick. ## CI guardrail (second commit) Add `timeout-minutes: 10` to the build/unit job and `15` to e2e. Prevents a future hang from burning another six hours of runner time. The default per-job ceiling is 360 min — these are well above normal completion time but cap any pathological case. ## Test plan - [x] `npm run test:build` clean - [x] `npm run test:unit` 30/30 pass - [x] `npm run test:e2e --test-timeout=30000` for `detached.test.js` + `detached-wait.test.js` → 7/7 pass in ~12 s (including the sleep-5 regression test from #6) - [x] Direct repro on `echo hello` detached → resolves in < 10 s with `exitCode 0`, no hang - [ ] CI green on this PR (the timeout-minutes cap protects against regression) ## Note on attach() `attach()` still uses `wait({condition: 'next-exit'})` — same fundamental risk if the state file claims `'running'` but the container has exited. The early-return guard `if (state.status !== 'running')` catches the common case but the residual race exists. Out of scope for this PR; tracking as follow-up.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
executeDetachedwas the last call-site still usingawait container.wait()withoutcondition: 'next-exit'. The default'not-running'is satisfied by the'created'state too, so the wait could resolve immediately if armed before the daemon flipped the container to'running'(race window afterawait start()returns).execution.resultresolved in milliseconds for a long-running detached run, downstreamcancel()arrived too late, and the run reportedsuccess: truewhile the container was still alive.attach()(commit89f0c70): pass{ condition: 'next-exit' }.Test plan
npm run build(clean compile, no TS errors)npm run test:unit— 30/30 passnpm run test:e2e(newdetached-waittest): pass in ~5.7s, sentinel wins the race andelapsed >= 4500msfor asleep 5detached run