Skip to content

Conversation

@wesm
Copy link
Collaborator

@wesm wesm commented Jan 29, 2026

Summary

Comprehensive refactoring of the test suite across 55 files, resulting in a net reduction of ~3,400 lines of test code while improving quality:

  • Shared test helpers: Extracted common patterns into reusable helpers (internal/testutil/testutil.go, per-package helpers_test.go and *_test_helpers.go files), eliminating repetitive setup code for databases, git repos, HTTP servers, and mock agents
  • Table-driven tests: Converted repetitive test cases (e.g. parseStreamJSON, normalize, config) to table-driven style
  • Test isolation: Fixed goroutine safety issues, eliminated shared mutable state, filtered inherited env vars, and ensured tests don't depend on the user's global git config
  • Correctness fixes: Fixed silent test passes on job failures, SQL injection risks in test helpers, shell escaping bugs in mock CLIs, missing error checks, and hardcoded values that could cause false positives
  • Cleanup: Removed dead code (unused parameters, redundant unsets), renamed conflicting helpers, and standardized patterns across packages

Test plan

  • go test ./... passes
  • No test regressions from the refactoring
  • Verify helpers are reused consistently across test files

🤖 Generated with Claude Code

wesm and others added 30 commits January 29, 2026 16:30
Add helpers_test.go with reusable test utilities:
- patchServerAddr: safely swap/restore global serverAddr
- createTestRepo: set up git repos with files committed
- writeTestFiles: create file trees without git
- newTestCmd: create cobra.Command with captured output
- newMockServer: configurable httptest.Server mimicking daemon API

Refactor analyze_test.go to use these helpers, removing ~150 lines of
duplicated mock server setup, file creation, and serverAddr patching.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Deduplicate repeated job-filtering logic and JSON encoding in client_test.go.
The mockJobsHandler helper centralizes the 5 copy-pasted filter implementations,
and writeJSON eliminates repetitive json.NewEncoder(w).Encode() calls.
TestWaitForReview now uses http.ServeMux for cleaner route dispatch.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
setupMockDaemon already wraps handlers to auto-respond to /api/status,
so these duplicate checks were unnecessary boilerplate.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace inline boilerplate with existing helpers: createTestRepo,
patchServerAddr, newTestCmd, newMockServer, and writeJSON. This removes
duplicated git repo setup, manual serverAddr patching, cobra output
capture, and raw json.NewEncoder calls.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…pers

Extract repeated git repo initialization, working directory management,
hook file manipulation, and PATH mocking into reusable helpers in
internal/testutil. This reduces duplication across 7 chdir sites and
3 git-init sites, and makes the helpers available for other test files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add mockServerModel, pressKey, pressKeys, and pressSpecial helpers
to eliminate repeated httptest.NewServer/defer/newTuiModel setup and
verbose tea.KeyMsg construction across ~94 key press simulations
and 21 mock server setups.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace duplicated setupTestRepo with testutil.NewTestRepoWithCommit and
introduce reviewHarness struct to eliminate repeated cobra.Command and
bytes.Buffer setup, config file writing, and verbose runLocalReview calls.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…plate

Replace ~200 lines of copy-pasted HTTP handler logic in
TestRefineLoopStaysOnFailedFixChain and TestRefinePendingJobWaitDoesNotConsumeIteration
with createConfigurableMockRefineHandler that wraps the base handler with
hook overrides. Replace ad-hoc failingAgent/changingAgent structs with a
generic functionalMockAgent. Add inDir helper to eliminate repeated
os.Getwd/os.Chdir/defer boilerplate. Reuse setupRefineRepo where tests
were duplicating git repo initialization.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add WithReview/WithJob builder methods to mockDaemonClient for chainable
setup, extract createMockExecutable for platform-specific script creation,
and add gitCommitFile helper to eliminate repeated write/add/commit sequences.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract setupGitRepo, chdir, and makeDir helpers to replace repeated
patterns of evalSymlinks+initGitRepo, manual chdir+defer, and
os.MkdirAll boilerplate across all test cases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace manual HOME/ROBOREV_DATA_DIR overrides, inline runGit helpers,
and daemon.json setup with setupMockDaemon, newTestGitRepo, and
setupFastPolling across all review command tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract repeated patterns into helpers: createRepoWithConfig for temp
repo setup, newMockReviewServer/newMockJobsServer for HTTP test servers,
stubReview for review construction, patchPromptPollInterval for global
variable patching. Replace mockCmd with existing newTestCmd, manual
serverAddr patching with existing patchServerAddr, and inline JSON
encoding with existing writeJSON.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add chdirRepo, mockReviewDaemon, and runShowCmd helpers to
helpers_test.go and use them along with existing newTestGitRepo
and CommitFile to eliminate ~185 lines of duplicated git setup,
mock daemon wiring, and command execution code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add streamFormatterFixture for shared buffer/formatter setup, event
builder functions for constructing JSON (both Anthropic and Gemini
styles), and assertion helpers (assertContains, assertNotContains,
assertEmpty) to eliminate repetitive setup and checking patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add updateModel, makeJob, makeReview, and boolPtr helpers along with
functional option functions (withStatus, withRef, withAgent, etc.) to
reduce repetitive struct construction and type assertion boilerplate
throughout the 7600-line test file.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove duplicate containsArg (use existing containsString from helpers)
- Extract expectedAgents list as single source of truth for registry tests
- Add verifyAgentPassesFlag helper and consolidate OpenCode/Copilot model flag tests
- Consolidate TestAvailableAgents to use expectedAgents instead of hardcoded map

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tArgs helper

Consolidate 6 individual TestParseStreamJSON_* functions into a single
table-driven TestParseStreamJSON test, and introduce an assertArgs helper
to reduce verbose flag-checking boilerplate in TestClaudeBuildArgs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce withUnsafeAgents and mockAgentCLI helpers in agent_test_helpers.go
to eliminate duplicated patterns across agent tests: manual SetAllowUnsafeAgents
cleanup, shell script generation for capturing args/stdin, and temporary file
setup via os.WriteFile. Applied across codex, claude, droid, and gemini tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add newMockDroidAgent, runReviewScenario, assertArgsOrder,
assertContainsArg, and assertNotContainsArg helpers to
agent_test_helpers.go. Refactor droid_test.go to use these helpers,
reducing repetitive mock setup and improving readability of argument
order verification in TestDroidBuildArgsPromptWithDash.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…-driven test

Replaces 7 individual TestGeminiParseStreamJSON_* functions with a single
table-driven TestGeminiParseStreamJSON, reducing boilerplate and making it
easier to add new parser test cases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ins) and refactor opencode_test.go

Adds reusable helpers to agent_test_helpers.go:
- makeToolCallJSON: generates tool-call JSON without hardcoded strings
- ScriptBuilder: fluent API for constructing mock shell scripts
- assertContains/assertNotContains: reduce strings.Contains boilerplate

Refactors opencode_test.go to use these helpers, eliminating brittle
hardcoded JSON and manual shell script construction.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add newTempRepo() and writeRepoConfigStr() helpers to eliminate
repetitive temp dir + config file setup. Replace manual env var
save/restore with t.Setenv. Net reduction of ~180 lines.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extract assertEventReceived, assertNoEventReceived, getSubscriberCount,
and makeTestEvent helpers to eliminate repeated select/timeout blocks,
manual mutex access, and duplicate event construction.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…GitRepo helpers

Add mockAPI() to standardize httptest.Server+HTTPClient creation and
createTestGitRepo() to centralize git repo initialization with symlink
resolution. Refactored all 9 test functions to use these helpers.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…arness

Extract repeated setup (temp dir, config file, broadcaster, watcher start/cleanup)
into a configWatcherHarness struct with updateConfig and waitForReload helpers.
Also add a shared writeFile helper to eliminate repeated os.WriteFile+fatal patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nd seedErrorLog helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…teHealthCheck, and decodeHealthStatus helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…and testRoborevExe constant

Extract parseWmicOutput from getCommandLineWmic so TestWmicHeaderOnlyOutput
tests the actual parsing logic instead of re-implementing it. Add
testRoborevExe constant to eliminate repeated path strings across test cases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…and runNormalizeTests helper

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… and assertLines helpers

Extract repeated inline normalize closure into shared simpleNormalizer
function and add assertLines helper to reduce verbose len-check +
per-index assertion patterns across Writer tests.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
wesm and others added 27 commits January 29, 2026 16:30
DataDir() treats an empty ROBOREV_DATA_DIR the same as unset, so
t.Setenv("ROBOREV_DATA_DIR", "") is sufficient. The os.Unsetenv call
was redundant and potentially misleading.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Accept *testing.T and call t.Fatal instead of silently returning -1,
consistent with the other test helpers in broadcaster_test.go.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…conflicts

The generic name writeFile could collide with helpers in other test files
in the same package. Rename to writeTestFile for clarity and safety.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The helper refactor made TestNewErrorLog discard the returned *ErrorLog,
so the test only checked file existence. Restore the nil check to keep
the test aligned with its name.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add explicit wantType for Result ("text") and ToolResult ("tool") cases
  in TestNormalizeClaudeOutput to catch type regressions
- Add notWantType field to normalizeTestCase and use it in
  PreservesJSONWithExtraKeys to assert type is not "tool"
- Rewrite TestGetNormalizer to verify correct normalizer mapping by
  testing characteristic behavior, not just non-nil returns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The helper mutates a global variable without synchronization, making it
unsafe for use with t.Parallel(). Document this constraint to prevent
future misuse.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace json.Unmarshal(w.Body.Bytes(), ...) calls in server_test.go
with testutil.DecodeJSON, putting the previously unused helper to use
and reducing boilerplate. Also make InitTestGitRepo use the -C flag
consistently for all git commands instead of mixing cmd.Dir and -C.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TestBroadcasterIntegrationWithWorker previously skipped all event
assertions when the job failed, silently passing. Now it initializes a
real git repo so the worker succeeds, and explicitly fails if the job
does not reach JobStatusDone. Also fixes a misleading comment in
TestStreamEventsWithRepoFilter about which event was filtered.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use cfg.MaxWorkers instead of max(workers, 1) in newWorkerTestContext so
that passing 0 actually uses the config default as documented. Replace
bare tc.DB.ClaimJob calls in two cancel tests with tc.createAndClaimJob
to get proper error checking and job ID verification.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Have NewTestRepo delegate to NewTestRepoWithAuthor with default "Test"
- Use runGit in AddWorktree cleanup for consistent error handling
- Give each TestGetHooksPath subtest its own repo to prevent config leaks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace map[string]string with []ReviewComment to ensure deterministic
comment insertion order and prevent potential test flakiness from Go's
random map iteration. Also add safety documentation to CreateCompletedReview
about its ClaimJob assumption requiring isolated DBs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…mptContains

Use 100-char truncation limit consistently across all test error
messages, matching the assertPromptContains helper.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace manual os.Setenv/cleanup in setupTestEnv with t.Setenv, which
automatically restores env vars and panics if t.Parallel() is called,
preventing env var races. Change getResultForAgent to accept *testing.T
and call t.Fatalf on missing results instead of returning nil, giving
clear failure messages and removing redundant nil checks at call sites.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace bare `_, _ = db.ClaimJob(...)` calls with the `claimJob` test
helper so setup failures are caught via t.Fatalf instead of silently
ignored.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add allowlist validation for table names in assertPgCount and
  assertPgCountWhere to prevent SQL injection via interpolated parameter
- Extract tryCreateCompletedReview (error-returning) from createCompletedReview
  so goroutines in TestIntegration_MultiplayerRealistic Round 3 no longer
  call t.Fatalf from non-test goroutines (which causes panics)
- Remove redundant DELETE statements in TestIntegration_SyncNowPushesAllBatches
  that were already handled by newIntegrationEnv's schema drop/recreate

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use context.Background() instead of t.Context() in cleanup functions
  (cleanupSchemaOnFinish, cleanupTablesOnFinish) since test context is
  cancelled by the time cleanup runs
- Use pgx.Identifier.Sanitize() in cleanupTablesOnFinish instead of
  raw fmt.Sprintf interpolation to prevent SQL injection
- Register t.Cleanup in openRawPgxPool for consistent pool lifecycle
  management, removing manual Close() calls and close-on-skip logic
  from skipIfTableInSchema

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use commit.ID instead of hardcoded `1` in TestVerdictSuppressionForPromptJobs
- Check CompleteJob/FailJob errors in TestGetRepoStats setup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously setJobStatus silently returned for JobStatusQueued since it's
the default state for new jobs. This meant it couldn't reset a job back
to queued after a state change. Now it explicitly resets status, clears
started_at/finished_at/error fields via UPDATE for robustness.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The refactoring that extracted assertUUIDFormat and checkUniqueness
helpers dropped the inline uniqueness check from TestGenerateUUID.
Restore it so the test validates both format and uniqueness, matching
the original behavior.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move JobsCount increment before OnJobs callback to match OnEnqueue
behavior, ensuring the count is updated even if the callback panics or
calls t.Fatal. Also guard against negative JobIDStart values by using
<= 0 instead of == 0.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, os.Environ() was appended with override vars, but duplicate
keys (e.g. HOME) could already exist in the inherited environment. On
Unix the last value typically wins, but this is undocumented behavior.
Now we explicitly filter out overridden keys before appending, ensuring
deterministic isolation from the user's global git config.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace strings.Contains with split-on-comma + exact comparison in
toolsArgValue assertions to prevent false positives from substring
overlap (e.g. "Edit" matching a hypothetical "MultiEdit").

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
No caller passes a non-empty repoPath, so the parameter is dead code.
Simplify the helper to always use t.TempDir() directly.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ailure diagnostics

The unbounded drain loop would spin until ReceiveWithTimeout timed out,
producing a misleading "timed out waiting for event" message. Now the loop
is capped at 20 iterations with an explicit failure message if
review.completed is never received.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The cleanup function runs after the test finishes, where t.Fatalf can
panic or produce confusing output. Revert to silent exec.Command
approach since worktree removal failure during cleanup is not worth
failing the test over.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The three dual-schema migration tests (DualSchemaWithDataErrors,
EmptyPublicTableDropped, MigratesPublicTableWithData) were missing
public.schema_version, which migrateLegacyTables checks to detect
legacy schemas. Previously these tests were silently skipped because
roborev.repos already existed from prior test runs; the refactored
cleanup now properly drops the roborev schema, exposing the bug.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm
Copy link
Collaborator Author

wesm commented Jan 29, 2026

Burn!

@wesm wesm merged commit 54a495b into main Jan 29, 2026
7 checks passed
@wesm wesm deleted the analyze-refactor-tests branch January 29, 2026 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants