Skip to content

stabilize zsh-fork approvals and resume --last#16062

Open
bolinfest wants to merge 1 commit intomainfrom
pr16062
Open

stabilize zsh-fork approvals and resume --last#16062
bolinfest wants to merge 1 commit intomainfrom
pr16062

Conversation

@bolinfest
Copy link
Copy Markdown
Collaborator

@bolinfest bolinfest commented Mar 28, 2026

Why

PR #16047 exposed a real macOS zsh-fork bug, and fixing that runtime issue surfaced two follow-up stability problems on this stack:

  • the turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2 test only budgeted 5 seconds for a login-shell flow that can legitimately emit an extra startup approval such as /usr/libexec/path_helper -s on macOS before reaching the target rm subcommands
  • codex exec resume --last was willing to resume internal sub-agent rollouts, so a newer non-user-facing thread could win the updated_at sort and steal the resume target from the visible session

Taken together, those issues kept the PR red even after the macOS fd-handoff fix landed.

What Changed

  • keep the wrapper-side escalation stream alive across the macOS fd handoff in shell-escalation/src/unix/escalate_client.rs, while still dropping the local guard before waiting for the server reply so EOF is not masked
  • increase the zsh-fork app-server test's timeout_ms to 15 seconds so login-shell startup approvals do not consume the entire command budget on slower macOS CI shards
  • narrow codex-exec resume lookup to top-level resumable thread sources (cli, vscode, exec, and app-server) instead of considering sub-agent and unknown threads
  • add codex-exec regression coverage proving that resume --last ignores a newer internal memory-consolidation rollout
  • keep the shell-escalation regression coverage for both the delayed-handoff case and the server-closes-before-replying EOF case

How I Verified

  • cargo test -p codex-shell-escalation
  • cargo test -p codex-exec resume -- --nocapture
  • cargo test -p codex-app-server --test all turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2 -- --nocapture
  • repeated cargo test -q -p codex-app-server --test all turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2 3 additional times
  • just argument-comment-lint

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bb7732ada7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +74 to +77
.receive::<EscalateResponse>()
.await
.context("failed to receive EscalateResponse")?;
drop(server_stream_guard);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop handshake guard before waiting for first response

Holding server_stream_guard across client.receive::<EscalateResponse>().await can deadlock the wrapper if the escalation server exits before replying. In that failure mode the daemon closes its copy of the server endpoint, but this process still has one open, so the client side never observes EOF and the await can block indefinitely. Previously the local server endpoint was dropped immediately after handshake, so a server crash surfaced as an error instead of a hang.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in d9abab2: the wrapper now drops the handshake guard immediately after sending the first EscalateRequest, and there is a new regression test covering the server-closes-before-replying EOF path.

@bolinfest bolinfest force-pushed the pr16062 branch 3 times, most recently from d9abab2 to d2e7700 Compare March 28, 2026 02:58
@bolinfest bolinfest changed the title shell-escalation: keep macOS handshake stream alive stabilize zsh-fork approvals and resume --last Mar 28, 2026
@bolinfest bolinfest force-pushed the pr16062 branch 2 times, most recently from 7eced5c to 90e10bb Compare March 28, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant