Refactor test suite: reduce boilerplate, improve isolation and safety #170

wesm · 2026-01-29T22:30:58Z

Summary

Comprehensive refactoring of the test suite across 55 files, resulting in a net reduction of ~3,400 lines of test code while improving quality:

Shared test helpers: Extracted common patterns into reusable helpers (internal/testutil/testutil.go, per-package helpers_test.go and *_test_helpers.go files), eliminating repetitive setup code for databases, git repos, HTTP servers, and mock agents
Table-driven tests: Converted repetitive test cases (e.g. parseStreamJSON, normalize, config) to table-driven style
Test isolation: Fixed goroutine safety issues, eliminated shared mutable state, filtered inherited env vars, and ensured tests don't depend on the user's global git config
Correctness fixes: Fixed silent test passes on job failures, SQL injection risks in test helpers, shell escaping bugs in mock CLIs, missing error checks, and hardcoded values that could cause false positives
Cleanup: Removed dead code (unused parameters, redundant unsets), renamed conflicting helpers, and standardized patterns across packages

Test plan

go test ./... passes
No test regressions from the refactoring
Verify helpers are reused consistently across test files

🤖 Generated with Claude Code

Add helpers_test.go with reusable test utilities: - patchServerAddr: safely swap/restore global serverAddr - createTestRepo: set up git repos with files committed - writeTestFiles: create file trees without git - newTestCmd: create cobra.Command with captured output - newMockServer: configurable httptest.Server mimicking daemon API Refactor analyze_test.go to use these helpers, removing ~150 lines of duplicated mock server setup, file creation, and serverAddr patching. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Deduplicate repeated job-filtering logic and JSON encoding in client_test.go. The mockJobsHandler helper centralizes the 5 copy-pasted filter implementations, and writeJSON eliminates repetitive json.NewEncoder(w).Encode() calls. TestWaitForReview now uses http.ServeMux for cleaner route dispatch. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

setupMockDaemon already wraps handlers to auto-respond to /api/status, so these duplicate checks were unnecessary boilerplate. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace inline boilerplate with existing helpers: createTestRepo, patchServerAddr, newTestCmd, newMockServer, and writeJSON. This removes duplicated git repo setup, manual serverAddr patching, cobra output capture, and raw json.NewEncoder calls. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…pers Extract repeated git repo initialization, working directory management, hook file manipulation, and PATH mocking into reusable helpers in internal/testutil. This reduces duplication across 7 chdir sites and 3 git-init sites, and makes the helpers available for other test files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add mockServerModel, pressKey, pressKeys, and pressSpecial helpers to eliminate repeated httptest.NewServer/defer/newTuiModel setup and verbose tea.KeyMsg construction across ~94 key press simulations and 21 mock server setups. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace duplicated setupTestRepo with testutil.NewTestRepoWithCommit and introduce reviewHarness struct to eliminate repeated cobra.Command and bytes.Buffer setup, config file writing, and verbose runLocalReview calls. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…plate Replace ~200 lines of copy-pasted HTTP handler logic in TestRefineLoopStaysOnFailedFixChain and TestRefinePendingJobWaitDoesNotConsumeIteration with createConfigurableMockRefineHandler that wraps the base handler with hook overrides. Replace ad-hoc failingAgent/changingAgent structs with a generic functionalMockAgent. Add inDir helper to eliminate repeated os.Getwd/os.Chdir/defer boilerplate. Reuse setupRefineRepo where tests were duplicating git repo initialization. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add WithReview/WithJob builder methods to mockDaemonClient for chainable setup, extract createMockExecutable for platform-specific script creation, and add gitCommitFile helper to eliminate repeated write/add/commit sequences. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Extract setupGitRepo, chdir, and makeDir helpers to replace repeated patterns of evalSymlinks+initGitRepo, manual chdir+defer, and os.MkdirAll boilerplate across all test cases. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace manual HOME/ROBOREV_DATA_DIR overrides, inline runGit helpers, and daemon.json setup with setupMockDaemon, newTestGitRepo, and setupFastPolling across all review command tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Extract repeated patterns into helpers: createRepoWithConfig for temp repo setup, newMockReviewServer/newMockJobsServer for HTTP test servers, stubReview for review construction, patchPromptPollInterval for global variable patching. Replace mockCmd with existing newTestCmd, manual serverAddr patching with existing patchServerAddr, and inline JSON encoding with existing writeJSON. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add chdirRepo, mockReviewDaemon, and runShowCmd helpers to helpers_test.go and use them along with existing newTestGitRepo and CommitFile to eliminate ~185 lines of duplicated git setup, mock daemon wiring, and command execution code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add streamFormatterFixture for shared buffer/formatter setup, event builder functions for constructing JSON (both Anthropic and Gemini styles), and assertion helpers (assertContains, assertNotContains, assertEmpty) to eliminate repetitive setup and checking patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add updateModel, makeJob, makeReview, and boolPtr helpers along with functional option functions (withStatus, withRef, withAgent, etc.) to reduce repetitive struct construction and type assertion boilerplate throughout the 7600-line test file. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove duplicate containsArg (use existing containsString from helpers) - Extract expectedAgents list as single source of truth for registry tests - Add verifyAgentPassesFlag helper and consolidate OpenCode/Copilot model flag tests - Consolidate TestAvailableAgents to use expectedAgents instead of hardcoded map Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…tArgs helper Consolidate 6 individual TestParseStreamJSON_* functions into a single table-driven TestParseStreamJSON test, and introduce an assertArgs helper to reduce verbose flag-checking boilerplate in TestClaudeBuildArgs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Introduce withUnsafeAgents and mockAgentCLI helpers in agent_test_helpers.go to eliminate duplicated patterns across agent tests: manual SetAllowUnsafeAgents cleanup, shell script generation for capturing args/stdin, and temporary file setup via os.WriteFile. Applied across codex, claude, droid, and gemini tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add newMockDroidAgent, runReviewScenario, assertArgsOrder, assertContainsArg, and assertNotContainsArg helpers to agent_test_helpers.go. Refactor droid_test.go to use these helpers, reducing repetitive mock setup and improving readability of argument order verification in TestDroidBuildArgsPromptWithDash. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…-driven test Replaces 7 individual TestGeminiParseStreamJSON_* functions with a single table-driven TestGeminiParseStreamJSON, reducing boilerplate and making it easier to add new parser test cases. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ins) and refactor opencode_test.go Adds reusable helpers to agent_test_helpers.go: - makeToolCallJSON: generates tool-call JSON without hardcoded strings - ScriptBuilder: fluent API for constructing mock shell scripts - assertContains/assertNotContains: reduce strings.Contains boilerplate Refactors opencode_test.go to use these helpers, eliminating brittle hardcoded JSON and manual shell script construction. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add newTempRepo() and writeRepoConfigStr() helpers to eliminate repetitive temp dir + config file setup. Replace manual env var save/restore with t.Setenv. Net reduction of ~180 lines. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Extract assertEventReceived, assertNoEventReceived, getSubscriberCount, and makeTestEvent helpers to eliminate repeated select/timeout blocks, manual mutex access, and duplicate event construction. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…GitRepo helpers Add mockAPI() to standardize httptest.Server+HTTPClient creation and createTestGitRepo() to centralize git repo initialization with symlink resolution. Refactored all 9 test functions to use these helpers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…arness Extract repeated setup (temp dir, config file, broadcaster, watcher start/cleanup) into a configWatcherHarness struct with updateConfig and waitForReload helpers. Also add a shared writeFile helper to eliminate repeated os.WriteFile+fatal patterns. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…nd seedErrorLog helpers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…teHealthCheck, and decodeHealthStatus helpers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…and testRoborevExe constant Extract parseWmicOutput from getCommandLineWmic so TestWmicHeaderOnlyOutput tests the actual parsing logic instead of re-implementing it. Add testRoborevExe constant to eliminate repeated path strings across test cases. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…and runNormalizeTests helper Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… and assertLines helpers Extract repeated inline normalize closure into shared simpleNormalizer function and add assertLines helper to reduce verbose len-check + per-index assertion patterns across Writer tests. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

DataDir() treats an empty ROBOREV_DATA_DIR the same as unset, so t.Setenv("ROBOREV_DATA_DIR", "") is sufficient. The os.Unsetenv call was redundant and potentially misleading. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Accept *testing.T and call t.Fatal instead of silently returning -1, consistent with the other test helpers in broadcaster_test.go. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…conflicts The generic name writeFile could collide with helpers in other test files in the same package. Rename to writeTestFile for clarity and safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The helper refactor made TestNewErrorLog discard the returned *ErrorLog, so the test only checked file existence. Restore the nil check to keep the test aligned with its name. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add explicit wantType for Result ("text") and ToolResult ("tool") cases in TestNormalizeClaudeOutput to catch type regressions - Add notWantType field to normalizeTestCase and use it in PreservesJSONWithExtraKeys to assert type is not "tool" - Rewrite TestGetNormalizer to verify correct normalizer mapping by testing characteristic behavior, not just non-nil returns Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The helper mutates a global variable without synchronization, making it unsafe for use with t.Parallel(). Document this constraint to prevent future misuse. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace json.Unmarshal(w.Body.Bytes(), ...) calls in server_test.go with testutil.DecodeJSON, putting the previously unused helper to use and reducing boilerplate. Also make InitTestGitRepo use the -C flag consistently for all git commands instead of mixing cmd.Dir and -C. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

TestBroadcasterIntegrationWithWorker previously skipped all event assertions when the job failed, silently passing. Now it initializes a real git repo so the worker succeeds, and explicitly fails if the job does not reach JobStatusDone. Also fixes a misleading comment in TestStreamEventsWithRepoFilter about which event was filtered. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use cfg.MaxWorkers instead of max(workers, 1) in newWorkerTestContext so that passing 0 actually uses the config default as documented. Replace bare tc.DB.ClaimJob calls in two cancel tests with tc.createAndClaimJob to get proper error checking and job ID verification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Have NewTestRepo delegate to NewTestRepoWithAuthor with default "Test" - Use runGit in AddWorktree cleanup for consistent error handling - Give each TestGetHooksPath subtest its own repo to prevent config leaks Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace map[string]string with []ReviewComment to ensure deterministic comment insertion order and prevent potential test flakiness from Go's random map iteration. Also add safety documentation to CreateCompletedReview about its ClaimJob assumption requiring isolated DBs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…mptContains Use 100-char truncation limit consistently across all test error messages, matching the assertPromptContains helper. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace manual os.Setenv/cleanup in setupTestEnv with t.Setenv, which automatically restores env vars and panics if t.Parallel() is called, preventing env var races. Change getResultForAgent to accept *testing.T and call t.Fatalf on missing results instead of returning nil, giving clear failure messages and removing redundant nil checks at call sites. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace bare `_, _ = db.ClaimJob(...)` calls with the `claimJob` test helper so setup failures are caught via t.Fatalf instead of silently ignored. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add allowlist validation for table names in assertPgCount and assertPgCountWhere to prevent SQL injection via interpolated parameter - Extract tryCreateCompletedReview (error-returning) from createCompletedReview so goroutines in TestIntegration_MultiplayerRealistic Round 3 no longer call t.Fatalf from non-test goroutines (which causes panics) - Remove redundant DELETE statements in TestIntegration_SyncNowPushesAllBatches that were already handled by newIntegrationEnv's schema drop/recreate Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Use context.Background() instead of t.Context() in cleanup functions (cleanupSchemaOnFinish, cleanupTablesOnFinish) since test context is cancelled by the time cleanup runs - Use pgx.Identifier.Sanitize() in cleanupTablesOnFinish instead of raw fmt.Sprintf interpolation to prevent SQL injection - Register t.Cleanup in openRawPgxPool for consistent pool lifecycle management, removing manual Close() calls and close-on-skip logic from skipIfTableInSchema Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Use commit.ID instead of hardcoded `1` in TestVerdictSuppressionForPromptJobs - Check CompleteJob/FailJob errors in TestGetRepoStats setup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Previously setJobStatus silently returned for JobStatusQueued since it's the default state for new jobs. This meant it couldn't reset a job back to queued after a state change. Now it explicitly resets status, clears started_at/finished_at/error fields via UPDATE for robustness. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The refactoring that extracted assertUUIDFormat and checkUniqueness helpers dropped the inline uniqueness check from TestGenerateUUID. Restore it so the test validates both format and uniqueness, matching the original behavior. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move JobsCount increment before OnJobs callback to match OnEnqueue behavior, ensuring the count is updated even if the callback panics or calls t.Fatal. Also guard against negative JobIDStart values by using <= 0 instead of == 0. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Previously, os.Environ() was appended with override vars, but duplicate keys (e.g. HOME) could already exist in the inherited environment. On Unix the last value typically wins, but this is undocumented behavior. Now we explicitly filter out overridden keys before appending, ensuring deterministic isolation from the user's global git config. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace strings.Contains with split-on-comma + exact comparison in toolsArgValue assertions to prevent false positives from substring overlap (e.g. "Edit" matching a hypothetical "MultiEdit"). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

No caller passes a non-empty repoPath, so the parameter is dead code. Simplify the helper to always use t.TempDir() directly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ailure diagnostics The unbounded drain loop would spin until ReceiveWithTimeout timed out, producing a misleading "timed out waiting for event" message. Now the loop is capped at 20 iterations with an explicit failure message if review.completed is never received. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The cleanup function runs after the test finishes, where t.Fatalf can panic or produce confusing output. Revert to silent exec.Command approach since worktree removal failure during cleanup is not worth failing the test over. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The three dual-schema migration tests (DualSchemaWithDataErrors, EmptyPublicTableDropped, MigratesPublicTableWithData) were missing public.schema_version, which migrateLegacyTables checks to detect legacy schemas. Previously these tests were silently skipped because roborev.repos already existed from prior test runs; the refactored cleanup now properly drops the roborev schema, exposing the bug. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm · 2026-01-29T23:20:40Z

Burn!

wesm and others added 30 commits January 29, 2026 16:30

Remove redundant /api/status handlers from tests using setupMockDaemon

081e9fd

setupMockDaemon already wraps handlers to auto-respond to /api/status, so these duplicate checks were unnecessary boilerplate. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Reduce test boilerplate in errorlog_test.go with createTestErrorLog a…

01d6651

…nd seedErrorLog helpers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Reduce test boilerplate in health_test.go with setupTestServer, execu…

ff9b73f

…teHealthCheck, and decodeHealthStatus helpers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Reduce test boilerplate in normalize_test.go with table-driven tests …

4058e0c

…and runNormalizeTests helper Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm and others added 27 commits January 29, 2026 16:30

Remove redundant os.Unsetenv in TestDataDir

a71d5e5

DataDir() treats an empty ROBOREV_DATA_DIR the same as unset, so t.Setenv("ROBOREV_DATA_DIR", "") is sufficient. The os.Unsetenv call was redundant and potentially misleading. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Make getSubscriberCount fail the test on type assertion failure

c3c4ee2

Accept *testing.T and call t.Fatal instead of silently returning -1, consistent with the other test helpers in broadcaster_test.go. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Rename writeFile to writeTestFile in config_watcher_test.go to avoid …

28435ca

…conflicts The generic name writeFile could collide with helpers in other test files in the same package. Rename to writeTestFile for clarity and safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Verify NewErrorLog return value in TestNewErrorLog

b626347

The helper refactor made TestNewErrorLog discard the returned *ErrorLog, so the test only checked file existence. Restore the nil check to keep the test aligned with its name. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add parallel-safety warning to mockIdentifyProcess helper

711086b

The helper mutates a global variable without synchronization, making it unsafe for use with t.Parallel(). Document this constraint to prevent future misuse. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Align truncation limit in TestNonGeminiRunReturnsEmpty with assertPro…

b7b3fe4

…mptContains Use 100-char truncation limit consistently across all test error messages, matching the assertPromptContains helper. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use claimJob helper in TestRetryJob and TestRetryJobAtomic

810cac5

Replace bare `_, _ = db.ClaimJob(...)` calls with the `claimJob` test helper so setup failures are caught via t.Fatalf instead of silently ignored. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix fragile hardcoded commit ID and unchecked errors in repos_test.go

fabe42d

- Use commit.ID instead of hardcoded `1` in TestVerdictSuppressionForPromptJobs - Check CompleteJob/FailJob errors in TestGetRepoStats setup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove unused repoPath parameter from runReviewScenario

65c6bbd

No caller passes a non-empty repoPath, so the parameter is dead code. Simplify the helper to always use t.TempDir() directly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add missing bytes and cobra imports in fix_test.go

3f95e35

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wesm merged commit 54a495b into main Jan 29, 2026
7 checks passed

wesm deleted the analyze-refactor-tests branch January 29, 2026 23:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor test suite: reduce boilerplate, improve isolation and safety #170

Refactor test suite: reduce boilerplate, improve isolation and safety #170

Uh oh!

wesm commented Jan 29, 2026 •

edited

Loading

Uh oh!

wesm commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor test suite: reduce boilerplate, improve isolation and safety #170

Refactor test suite: reduce boilerplate, improve isolation and safety #170

Uh oh!

Conversation

wesm commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

wesm commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wesm commented Jan 29, 2026 •

edited

Loading