Skip to content

fix(extract): finalize detached runs that extract artifacts#17

Merged
enixCode merged 1 commit into
mainfrom
fix/detached-extract-hang
Jun 2, 2026
Merged

fix(extract): finalize detached runs that extract artifacts#17
enixCode merged 1 commit into
mainfrom
fix/detached-extract-hang

Conversation

@enixCode
Copy link
Copy Markdown
Owner

@enixCode enixCode commented Jun 2, 2026

Problem

A detached run with extract never finalized. extractFromVolume's seeder container used AutoRemove and waited on the hijacked attach stream closing, which a volume-bound container does not reliably emit on Docker Desktop / Windows. The run hung forever in running, its container and volume were never cleaned up, and any caller polling the run state (light-run, and through it light-process) looped indefinitely. Every light-process node run hung.

Root cause

The extract seeder relied on the hijacked attach stream emitting end/close to know tar had finished. Volume-bound AutoRemove containers do not reliably emit those events on some Docker builds (the write path already documents this and moved to putArchive; the read/extract path still relied on stream close).

Fix

  • seeder: AutoRemove off; wait for the seeder to exit via inspect() polling (awaitSeederExit, the same race-free approach the detached run path already uses), drain the final bytes on an idle timeout (drainIdle) instead of on stream close, and remove the container explicitly (removeSeeder).
  • extract: runAlpine and streamTarOut use the new lifecycle with a finally that always removes the seeder.
  • listStates: skip sibling files in the state dir that have no id (e.g. cache-usage.json), which made listing runs throw path.join(undefined) (HTTP 500 on GET /runs).

Verification

  • 29 targeted tests pass: state unit, volume/extract e2e (incl. a 5 MB file), detached e2e.
  • The actual fixed DockerRunner.run detached+extract returns { success: true, extracted: [{ status: 'ok', bytes: ... }] }.

build with cc

A detached run with `extract` never finalized. extractFromVolume's seeder
container used AutoRemove and waited on the hijacked attach stream closing,
which a volume-bound container does not reliably emit on Docker Desktop /
Windows. The run hung forever in `running`, its container and volume were never
cleaned up, and any caller polling the run state (light-run, and through it
light-process) looped indefinitely - so every light-process node run hung.

- seeder: AutoRemove off; wait for the seeder to exit via inspect() polling
  (awaitSeederExit, the same race-free approach the detached run path already
  uses), drain the final bytes on an idle timeout (drainIdle) instead of on
  stream close, and remove the container explicitly
- extract: runAlpine and streamTarOut use the new lifecycle with a finally that
  always removes the seeder
- listStates: skip sibling files in the state dir that have no id (e.g.
  cache-usage.json), which made listing runs throw path.join(undefined)

build with cc
@enixCode enixCode merged commit 992d6d4 into main Jun 2, 2026
4 checks passed
enixCode added a commit that referenced this pull request Jun 2, 2026
Release 0.16.2. Ships the detached-extract finalize fix (#17).

- package.json + lock: 0.16.1 -> 0.16.2
- telemetry.ts: TRACER_VERSION 0.16.1 -> 0.16.2 (lock-step)

build with cc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant