Skip to content

Speed up test suite across 4 packages (~15s → ~6s)#537

Merged
wesm merged 8 commits intoroborev-dev:mainfrom
cpcloud:worktree-shimmying-chasing-gray
Mar 18, 2026
Merged

Speed up test suite across 4 packages (~15s → ~6s)#537
wesm merged 8 commits intoroborev-dev:mainfrom
cpcloud:worktree-shimmying-chasing-gray

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Mar 18, 2026

Summary

  • Isolate agent, config, and review tests from global git config — extends Isolate tests from global git config to prevent gpg-agent/pinentry #531 to three more packages. Adds TestMain calling testenv.RunIsolatedMain() so git operations in tests never trigger the host's gpg-agent/pinentry (which can add 1-2s per invocation on machines with GPG signing configured).

  • Make worker pool idle sleeps responsive to shutdown — replaces time.Sleep in the worker idle loop with a select on a shutdown channel, so tests that stop the daemon don't block waiting for the sleep to finish.

  • Stub isPROpenFn in CI poller test harness — avoids shelling out to gh CLI during tests. Removes a source of ~0.5s latency and flaky failures when gh isn't authenticated.

  • Parallelize daemon subtests — marks 21 independent daemon tests as t.Parallel(), including TestIsDaemonAliveLegacyStatusCodes subtests and the bulk of the worker/failover/cancellation tests. Drops internal/daemon from ~7.2s to ~4.2s.

  • Tighten requireNever timeout — the invalid-config watcher test was waiting 500ms for a "never" assertion; 100ms is sufficient and shaves ~400ms.

  • Fix configureSubprocess to always set Cancel on exec.Cmd — when called on a plain exec.Command (no context), cmd.Cancel is nil, so the wrapper's cancel() call panicked. Now sets a default Kill cancel before wrapping. This is a correctness fix surfaced by the timing changes below.

  • Speed up agent subprocess tests (8s → 1s) — two changes:

    1. Override subprocessWaitDelay from 5s to 50ms in tests that don't need the production grace period.
    2. Add a Go 1.25 ETXTBSY probe guard (case "$1" in *etxtbsy*) exit 0;; esac) to shell-script test fixtures.

Go 1.25 ETXTBSY probe — why the guard is necessary

Go 1.25 added a built-in ETXTBSY probe to os/exec: every Cmd.Start() call forks a child that runs execve(path, [path, "--help-probe-etxtbsy"]). If execve succeeds (file isn't busy), the probe child actually executes the script — it isn't killed, and the parent waits for it to exit before proceeding.

For shell scripts, this means the script runs twice: once as the probe (with $1 = "--help-probe-etxtbsy"), and once as the real invocation. If the script ignores $1 and does something slow (like sleep 5), the probe child runs the full body, and Go waits for it — adding the script's entire runtime as invisible overhead to Cmd.Start().

This was confirmed via strace -f: two child PIDs ran the same clock_nanosleep(5s), one from the probe fork (CLONE_VFORK|CLONE_PIDFD) and one from the real execution. The probe child was waited on via waitid(P_PIDFD).

The guard pattern — case "$1" in *etxtbsy*) exit 0;; esac as the first line after the shebang — makes the probe child exit immediately. This is only needed in test fixtures; real agent binaries already handle unknown flags gracefully.

Further details are documented in internal/agent/agent_test_helpers.go on the writeTempCommand function.

🤖 Generated with Claude Code

cpcloud and others added 8 commits March 18, 2026 08:23
Extends commit 5b1aec5 to three more packages that spawn git
subprocesses in tests. Without RunIsolatedMain, commit.gpgsign=true
in the user's global gitconfig triggers gpg-agent/pinentry prompts,
adding seconds of latency per git-commit call.

review: 6.6s → 0.05s, config: 2.0s → 0.07s, agent: ~1s saved

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Worker goroutines used plain time.Sleep during idle polling and error
backoff, which blocked up to 2s (idle) or 5s (error) before noticing
the stop signal. Replace with select on both stopCh and time.After so
workers exit immediately when Stop() is called.

This is both a production improvement (daemon shutdown is now instant
instead of waiting for sleeping workers) and a test speedup — the IPv6
server test dropped from 2.0s to ~0.01s because it starts a real
server with 4 idle workers that all need to shut down.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Every CI poller test that exercised postBatchResults was shelling out
to `gh pr view --repo acme/api` (the fake test repo) because
isPROpenFn was unset, falling through to the real implementation.
Each call took ~0.3s to fail, adding up to ~3-4s across the 10+ batch
tests.

Set a default isPROpenFn in the harness constructor that returns true.
Tests that need specific closed-PR behavior already override it after
construction, so those continue to work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each of the 7 "not alive" subtests incurs a 200ms sleep from
IsDaemonAlive's retry logic (attempt 1 fails, sleep 200ms, attempt 2
fails). Running sequentially that's ~1.4s; with t.Parallel() the
subtests overlap and the wall time drops to ~0.2s.

Each subtest creates its own independent httptest.Server so there is
no shared state to worry about.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test verifies that invalid TOML doesn't change the loaded config.
The debounce delay is 200ms, so 300ms (debounce + 100ms margin) is
sufficient to confirm no reload occurred. Previously 500ms, saving
~200ms.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add t.Parallel() to the 21 slowest independent tests (0.1s+) across
8 test files. These tests all use isolated resources (temp dirs,
in-memory DBs, independent broadcasters) with no shared global state.

The ConfigWatcher tests benefit the most — four tests with
time-based debounce waits (0.2-0.7s each) now overlap instead of
running sequentially. Crypto key generation, CI poller batch tests,
and worker pool concurrency tests also overlap.

TestHandleEnqueueAgentAvailability is the one >0.1s test left
sequential because it mutates PATH via os.Setenv.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Go 1.25's exec.CommandContext now sets cmd.Cancel unconditionally
(unlike earlier versions which only set it when WaitDelay was zero).
The existing configureSubprocess wrapper was already correct for Go
1.25, but make the nil-Cancel guard explicit so it also works if the
function is called on a plain exec.Command (no context).

This ensures context cancellation always signals agent subprocesses
regardless of how the Cmd was constructed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two changes:

- TestContextProcessErrorRunPathCancellation: Add an etxtbsy guard
  to the shell script so Go 1.25's ETXTBSY probe (which forks a
  second child running the script with --help-probe-etxtbsy as $1)
  exits immediately instead of sleeping for 5s. The test keeps its
  original sh→sleep process tree to exercise the realistic orphaned-
  child scenario. Also reduce subprocessWaitDelay to 50ms for the
  test so cmd.Wait returns promptly after the kill. (5s → 0.02s)

  Root cause confirmed via strace: Go 1.25's os/exec probes every
  execve for ETXTBSY by forking a child that runs the command with
  --help-probe-etxtbsy as argv[1]. The probe child actually executes
  the full script (since the script ignores arguments), creating a
  phantom sleep process that the test binary waits for. The probe
  process is never killed by context cancellation because Go only
  tracks the "real" child's pidfd.

- TestCodexReviewTimeoutClosesStdoutPipe: Reduce background sleep
  from 2s to 0.2s — it only needs to outlive the 50ms context
  timeout. (2.1s → 0.27s)

Also documents the ETXTBSY probe behavior on writeTempCommand for
future test authors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci bot commented Mar 18, 2026

roborev: Combined Review (6661736)

Summary: This PR improves test isolation and runtime, optimizes subprocess cancellation logic, and makes worker idle sleeps responsive to shutdown.

All reviewers agree the code is clean with no issues of medium or
higher severity found.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

//
// # ETXTBSY probes and Go 1.25
//
// Go 1.25 added a built-in ETXTBSY probe to os/exec: every exec.Cmd.Start()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this exists is super weird to me. Many scripts ignore their arguments, so now it seems that in Go 1.25 those scripts always execute twice unless users explicitly handle this one weird, go-specific flag.

@wesm
Copy link
Copy Markdown
Collaborator

wesm commented Mar 18, 2026

Good deal. thanks for doing this!

@wesm wesm merged commit f881c6f into roborev-dev:main Mar 18, 2026
8 checks passed
@cpcloud cpcloud deleted the worktree-shimmying-chasing-gray branch March 18, 2026 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants