Skip to content

fix(exec-server): retain output until streams close#18946

Merged
bolinfest merged 1 commit intomainfrom
pr18946
Apr 23, 2026
Merged

fix(exec-server): retain output until streams close#18946
bolinfest merged 1 commit intomainfrom
pr18946

Conversation

@bolinfest
Copy link
Copy Markdown
Collaborator

@bolinfest bolinfest commented Apr 22, 2026

Why

A Mac Bazel run hit a flake in server::handler::tests::output_and_exit_are_retained_after_notification_receiver_closes where the read path observed process exit but lost the expected buffered stdout (first\nsecond\n). See the GitHub Actions job and BuildBuddy invocation.

The underlying race is that process exit is not the same thing as stdout/stderr closure. If a child or grandchild inherits the pipe write end, or a process duplicates it with dup2, the watched process can exit while the stream is still open and more output can still arrive. The exec-server was starting exited-process retention cleanup from the exit event, so the process entry could be removed before the output streams had actually closed.

While stress-testing the exec-server unit suite, server::handler::tests::long_poll_read_fails_after_session_resume exposed a separate test race: it started a short-lived command that could exit and wake the pending long-poll read before the session-resume assertion observed the resumed-session error. That test is intended to cover resume eviction, not process-exit delivery, so this change keeps the process alive and quiet while the second connection resumes the session.

What changed

  • Keep exec-server process entries retained until stdout/stderr streams close, then start the post-exit retention timer from the closed event.
  • Wake long-poll readers when the closed event is emitted.
  • Add focused local_process unit coverage that proves late output is still retained after the short test retention interval has elapsed, and that closed process entries are eventually evicted.
  • Add a local and remote regression test where a parent exits while a child keeps inherited stdout open. The child waits on an explicit release file, so the test deterministically observes exit first, releases the child, then requires a nonzero-wait read from the exit sequence to receive the late output.
  • In codex-rs/exec-server/src/server/handler/tests.rs, make long_poll_read_fails_after_session_resume run a long-lived silent command instead of a short command that prints and exits. This isolates the test to session-resume behavior and prevents a normal process exit from satisfying the pending long-poll read first.

Testing

  • cargo test -p codex-exec-server exec_process_retains_output_after_exit_until_streams_close
  • cargo test -p codex-exec-server local_process::tests
  • cargo test -p codex-exec-server
  • just fix -p codex-exec-server
  • bazel test //codex-rs/exec-server:exec-server-unit-tests //codex-rs/exec-server:exec-server-exec_process-test //codex-rs/exec-server:exec-server-file_system-test //codex-rs/exec-server:exec-server-http_client-test //codex-rs/exec-server:exec-server-initialize-test //codex-rs/exec-server:exec-server-process-test //codex-rs/exec-server:exec-server-websocket-test
  • bazel test --runs_per_test=25 //codex-rs/exec-server:exec-server-unit-tests

Documentation

No docs update needed; this is an internal exec-server correctness fix.

@bolinfest bolinfest force-pushed the pr18946 branch 2 times, most recently from c81782e to b03f9f8 Compare April 23, 2026 19:13
@bolinfest bolinfest enabled auto-merge (squash) April 23, 2026 19:34
@bolinfest bolinfest merged commit 491a305 into main Apr 23, 2026
39 checks passed
@bolinfest bolinfest deleted the pr18946 branch April 23, 2026 19:50
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 23, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants