fix(e2e): droid prompt submission and claude-code subagent flakes#1599
Conversation
Droid ingests long pasted prompts over several seconds; the fixed 200ms delay before Enter meant the keypress arrived while the input handler was still processing and got swallowed, leaving the prompt unsubmitted (recurring TestFactory* failures in CI). Send now waits until the echoed input has fully rendered, snapshots the pre-submit pane for WaitFor's settle guard, then verifies the pane reacted to Enter and retries up to three times. The snapshot is taken before Enter so it can never include response output from a fast agent, which would deadlock WaitFor's change requirement. Entire-Checkpoint: 0da2e3e238ea
New Claude Code releases can run Task subagents in the background: the tool call returns immediately, the foreground turn ends, and the subagent's file changes and commit land after turn-end with no active session — so no checkpoint trailer or condensation happens and the checkpoint-advance assertions time out (CI failures on Linux and Windows). Instruct the agent to run the subagent in the foreground and wait for it, keeping these tests on the synchronous path they are meant to cover. Tracking background-subagent work in the CLI itself (e.g. via the SubagentStop hook) is a separate product gap. Entire-Checkpoint: 91b00f9ec504
TmuxSession.Send may retry Enter when the pane does not visibly react within its 2s window. Vogon treated an empty stdin line as session termination, so on a slow runner a retried Enter could end the session mid-test. Exit only on explicit exit/quit and update the comments that described the pre-rewrite Send semantics. Entire-Checkpoint: ffcdd5a7efd3
waitForInputIngested already holds the settled pane content, so return it instead of re-capturing for the stableAtSend snapshot and the first Enter verification. Saves two tmux subprocess spawns per Send and makes the snapshot exactly the content the stability wait verified. Entire-Checkpoint: 6537ed5b37ba
There was a problem hiding this comment.
Pull request overview
Hardens the E2E tmux-based harness to make prompt submission reliable for slower-ingesting TUIs (notably factoryai-droid), and reduces claude-code subagent flakes by updating prompts to request foreground subagent execution. Also adjusts the Vogon REPL test double to tolerate Enter retries without terminating the session.
Changes:
- Reworked
TmuxSession.Sendto wait for echoed input to settle, snapshot pre-Enter state, and retry Enter submission. - Updated Vogon interactive mode to ignore empty lines (so Enter retries don’t end the session).
- Updated subagent E2E prompts to explicitly request foreground subagent execution.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| e2e/agents/tmux.go | Adds “wait for ingest”, pre-Enter snapshotting, and Enter verification/retry logic for tmux-driven agents. |
| e2e/vogon/main.go | Makes Vogon interactive mode ignore empty lines to support Enter retries from Send. |
| e2e/tests/subagent_commit_flow_test.go | Updates subagent prompt text to request foreground execution to avoid background-subagent flakes. |
| e2e/tests/single_session_test.go | Updates the same-turn subagent+commit prompt text to request foreground execution. |
|
Bugbot run |
Require two consecutive stable polls in waitForInputIngested so a single quiet interval mid-paste can't fake stability, and document why it compares raw captures instead of stableContent. Re-print the vogon prompt when ignoring an empty line so manual REPL sessions stay readable. Addresses PR #1599 review feedback. Entire-Checkpoint: 944b219ab908
Droid's prompt pattern matches the always-visible input box, so WaitFor can return mid-turn; the 10s file wait then expires while the Worker is still executing (60-120s turns on CI). Widen the file and rewind-point waits to absorb the Worker runtime. Entire-Checkpoint: 63e9c5af67f1
|
Bugbot run |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit a3db5ce. Configure here.
The adjacent comment documents Worker turns of 60-120s on CI but the file wait was capped at 90s, so a slow Worker could outlive the wait even when it succeeds. Droid's 2x timeout multiplier gives these tests a 6-minute budget, so the wider wait fits comfortably. Entire-Checkpoint: 132cc906436e

https://entire.io/gh/entireio/cli/trails/725
Why
E2E runs on main have been failing repeatedly (latest: run 28549683381):
TestFactoryTaskCheckpointExistsBeforeCommitandTestFactoryCommittedCheckpointExcludesPreExistingUntrackedFilesfail because the harness sends Enter a fixed 200ms after pasting the prompt. Droid v0.162.x ingests long pasted prompts over several seconds, the Enter gets swallowed, and the prompt sits unsubmitted in the input box until the test times out. Recurring across at least three recent main runs.TestSingleSessionSubagentCommitInTurnandTestSubagentCommitFlowfail because newer Claude Code releases can run Task subagents in the background: the tool call returns in ~40ms, the foreground turn ends (stophook sees no changes), and the subagent's file changes and commit land after turn-end with no active session — no checkpoint trailer, no condensation, so the checkpoint-advance assertions time out.What changed
TmuxSession.Sendwaits until the echoed input has fully rendered before submitting, then verifies the pane reacted to Enter and retries up to 3× if it was swallowed.stableAtSendsnapshot is now taken before Enter, so it can never include response output from a fast agent (a post-Enter snapshot deadlocksWaitFor's change-detection guard — caught by the Vogon canary during development).exit/quit), so a retried Enter cannot terminate the session mid-test.Decisions made during development
TmuxSession, not a droid-specific override: the race is universal (the old code already worked around the same issue for Claude's TUI with a fixed sleep); droid merely widened the timing window past it.SubagentStophook) is a separate product gap and deliberately not part of this PR — with a foreground subagent, hook logs confirmpre-task/post-tasknow span the actual subagent work (~22s) and the mid-turn commit gets its trailer.Reviewer notes
FACTORY_API_KEY); the droid-side fix is exercised by the full Vogon canary (which drives the sameTmuxSession.Sendpath, 59/59 green) but needs the CI E2E workflow for final confirmation.copilot-cli'sSendoverride still carries the old fixed-delay + unverified-Enter pattern; unifying it requires parameterizing around Copilot's Ctrl+S/autocomplete submission semantics — follow-up candidate, no regression here.Note
Low Risk
Changes are confined to E2E harness, Vogon canary, and test timeouts/prompts; no production CLI or hook logic is modified.
Overview
Hardens shared
TmuxSession.Sendfor slow TUIs (especially factoryai-droid): wait until pasted text is fully echoed (raw pane polling with consecutive stability), snapshotstableAtSendbefore Enter so fast agents cannot deadlockWaitFor, then submit Enter with pane-change verification and up to three retries when Enter is swallowed.Updates the Vogon interactive REPL to ignore empty lines (only
exit/quitend the session) so Enter retries fromSenddo not terminate mid-test.Factory droid hook tests extend file and task-rewind waits (90s / 30s) because prompt-pattern
WaitForcan return mid-turn while a Worker is still running.Claude Code subagent tests add explicit instructions to run the subagent in the foreground and wait for completion, avoiding background Task runs that finish after hooks and break checkpoint assertions.
Reviewed by Cursor Bugbot for commit a3db5ce. Configure here.