Skip to content

fix: harden E2E tests against CI timing flakiness#787

Merged
alishakawaguchi merged 3 commits intomainfrom
flaky-e2e-tests
Mar 26, 2026
Merged

fix: harden E2E tests against CI timing flakiness#787
alishakawaguchi merged 3 commits intomainfrom
flaky-e2e-tests

Conversation

@alishakawaguchi
Copy link
Copy Markdown
Contributor

@alishakawaguchi alishakawaguchi commented Mar 26, 2026

Summary

  • Timing fixes: WaitForCheckpoint/WaitForCheckpointAdvanceFrom 15s → 30s, AssertNoShadowBranches → polling WaitForNoShadowBranches(10s), opencode startup timeout 15s → 30s, cursor per-prompt timeout 90s → 2m
  • Bug report: Documents 3 real CLI bugs in e2e/OPENCODE_BUGS.md exposed by opencode E2E tests (mid-turn checkpoint trailers, orphaned shadow branches, agent-internal files in files_touched)

What this does NOT do

These are timing-only fixes. The 3 real opencode bugs documented in OPENCODE_BUGS.md are not worked around — the tests will continue to fail for opencode until the CLI bugs are fixed.

Test plan

  • mise run fmt && mise run lint — passes
  • mise run test:e2e:canary — passes
  • CI E2E all agents — cursor-cli 3/3 green, other agents pass except known opencode bugs

🤖 Generated with Claude Code


Note

Low Risk
Low risk: changes are confined to E2E harness/test timing and assertions, with no impact on production code paths. Main risk is masking real regressions by allowing slower completion or delayed cleanup in CI.

Overview
Hardens E2E tests against CI timing variance by increasing checkpoint-related waits (typically 15s30s) and extending OpenCode TUI startup readiness waiting.

Replaces the instant AssertNoShadowBranches check with a polling WaitForNoShadowBranches helper to tolerate asynchronous shadow-branch deletion after condensation, and adjusts a couple of prompts to use a longer per-prompt timeout.

Written by Cursor Bugbot for commit 601f139. Configure here.

… bugs

Timing fixes (legitimate — CI is slower than local):
- WaitForCheckpoint/WaitForCheckpointAdvanceFrom: 15s → 30s everywhere
- AssertNoShadowBranches → WaitForNoShadowBranches(10s) with polling
  (shadow branch cleanup is async, needs time to complete)
- opencode StartSession: startup timeout 15s → 30s (TUI render + settle)
- TestPartialStaging: per-prompt timeout 90s → 2m (cursor agent slowness)

Real bugs documented (not worked around):
- Mid-turn commits don't get checkpoint trailers for headless agents
- Shadow branches left orphaned after carry-forward
- Agent-internal files (.opencode/) included in files_touched

See e2e/OPENCODE_BUGS.md for full details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: c75d6c04967c
Copilot AI review requested due to automatic review settings March 26, 2026 21:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Harden the E2E suite against CI timing variability by increasing checkpoint/prompt timeouts and replacing brittle immediate assertions with polling, while documenting known OpenCode-specific CLI bugs uncovered by E2E runs.

Changes:

  • Increase checkpoint wait timeouts (15s → 30s) across E2E tests and add per-prompt timeout overrides where needed.
  • Replace AssertNoShadowBranches with polling-based WaitForNoShadowBranches(10s) to tolerate async cleanup lag.
  • Increase OpenCode TUI startup wait (15s → 30s) and add e2e/OPENCODE_BUGS.md documenting observed issues.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
e2e/testutil/assertions.go Adds WaitForNoShadowBranches polling helper to reduce shadow-branch cleanup flakes.
e2e/tests/subagent_commit_flow_test.go Uses longer checkpoint wait + polls for shadow-branch cleanup.
e2e/tests/stash_workflows_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup (with added rationale comment).
e2e/tests/split_commits_test.go Uses longer checkpoint waits, polls for cleanup, and increases per-prompt timeout via agents.WithPromptTimeout.
e2e/tests/single_session_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/session_lifecycle_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/rewind_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/resume_test.go Uses longer checkpoint waits for resume flows.
e2e/tests/resume_remote_test.go Uses longer checkpoint waits for remote resume flows.
e2e/tests/multi_session_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/mid_turn_commit_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/interactive_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/external_agent_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/existing_files_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/edge_cases_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/deleted_files_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/checkpoint_metadata_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/tests/attribution_test.go Uses longer checkpoint waits + polls for shadow-branch cleanup.
e2e/agents/opencode.go Increases OpenCode TUI readiness wait to reduce CI startup flakiness.
e2e/OPENCODE_BUGS.md Documents OpenCode-driven failures as real CLI bugs (no test workarounds).

alishakawaguchi and others added 2 commits March 26, 2026 14:29
- Remove unused AssertNoShadowBranches (all callers migrated)
- WaitForNoShadowBranches: use require.Emptyf to fail fast, include
  timeout in error message for easier debugging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: a4a48ee818e0
@alishakawaguchi
Copy link
Copy Markdown
Contributor Author

bugbot run

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@alishakawaguchi alishakawaguchi marked this pull request as ready for review March 26, 2026 22:11
@alishakawaguchi alishakawaguchi requested a review from a team as a code owner March 26, 2026 22:11
Copy link
Copy Markdown
Contributor

@pfleidi pfleidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@alishakawaguchi alishakawaguchi merged commit 2e8d703 into main Mar 26, 2026
12 of 13 checks passed
@alishakawaguchi alishakawaguchi deleted the flaky-e2e-tests branch March 26, 2026 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants