Skip to content

refactor: extract shared process_runner.cjs from claude and copilot harnesses#29888

Merged
pelikhan merged 3 commits intomainfrom
copilot/refactor-agent-harness-runner
May 3, 2026
Merged

refactor: extract shared process_runner.cjs from claude and copilot harnesses#29888
pelikhan merged 3 commits intomainfrom
copilot/refactor-agent-harness-runner

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 3, 2026

claude_harness.cjs and copilot_harness.cjs each contained an independent runProcess implementation (~50 lines) plus duplicated formatDuration and sleep helpers — identical lifecycle logic that had already begun to drift in minor ways.

Changes

  • New process_runner.cjs — shared utility exporting runProcess({ command, args, attempt, log, logArgs }), formatDuration, and sleep

    • log is caller-supplied so each harness keeps its own prefix ([claude-harness] / [copilot-harness])
    • logArgs (optional) accepts a pre-redacted arg list for logging, keeping sensitive values out of logs without coupling redaction logic to the runner
  • claude_harness.cjs — removed ~95 lines of local duplicates; updated call site to new object-param API

  • copilot_harness.cjs — removed ~100 lines of local duplicates (including now-unused spawn import); moved --prompt/-p redaction from inside runProcess to the call site, passing it as logArgs

  • New process_runner.test.cjs — 19 tests covering formatDuration, sleep, and the full runProcess lifecycle: stdout/stderr forwarding, hasOutput tracking, exit code propagation, close-based resolution, spawn error handling, logArgs redaction, and log truncation

// Before — each harness owned its own copy
function runProcess(command, args, attempt) { /* ~50 lines */ }

// After — shared runner; callers supply log fn and redacted args
const result = await runProcess({ command, args: currentArgs, attempt, log, logArgs: safeArgs });

Copilot AI linked an issue May 3, 2026 that may be closed by this pull request
5 tasks
Copilot AI changed the title [WIP] Refactor duplicate code in agent harness process runner refactor: extract shared process_runner.cjs from claude and copilot harnesses May 3, 2026
Copilot AI requested a review from gh-aw-bot May 3, 2026 06:29
@pelikhan pelikhan marked this pull request as ready for review May 3, 2026 08:09
Copilot AI review requested due to automatic review settings May 3, 2026 08:09
@github-actions github-actions Bot mentioned this pull request May 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Claude and Copilot harnesses to share a single runProcess implementation (plus formatDuration / sleep) and adds a dedicated test suite for the shared runner.

Changes:

  • Added actions/setup/js/process_runner.cjs exporting shared runProcess, formatDuration, and sleep.
  • Updated claude_harness.cjs and copilot_harness.cjs to use the new object-parameter runProcess API (with caller-provided logging and optional redacted logArgs).
  • Added process_runner.test.cjs covering duration formatting, sleep, and runProcess lifecycle behavior.
Show a summary per file
File Description
actions/setup/js/process_runner.cjs Introduces shared process spawning/forwarding/output-collection logic used by both harnesses.
actions/setup/js/process_runner.test.cjs Adds Vitest coverage for the shared runner utilities and lifecycle behavior.
actions/setup/js/claude_harness.cjs Removes duplicated helpers and switches to the shared runner with harness-specific logging/redaction.
actions/setup/js/copilot_harness.cjs Removes duplicated helpers and switches to the shared runner; moves --prompt/-p redaction to call site via logArgs.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

actions/setup/js/process_runner.test.cjs:79

  • These tests spawn real Node processes that write to the parent process stdout/stderr via runProcess’s forwarding (process.stdout.write / process.stderr.write). This will make the Vitest output noisy and can interfere with CI logs; consider stubbing process.stdout.write/process.stderr.write (via vi.spyOn) during the runProcess test suite and restoring afterward.
    it("collects stdout output and sets hasOutput", async () => {
      const logs = [];
      const result = await runProcess({
        command: process.execPath,
        args: ["-e", 'process.stdout.write("hello stdout"); process.exit(0)'],
        attempt: 0,
        log: msg => logs.push(msg),
      });
      expect(result.hasOutput).toBe(true);
      expect(result.output).toContain("hello stdout");
    });
  • Files reviewed: 4/4 changed files
  • Comments generated: 3

Comment on lines +35 to +41
const start = Date.now();
await sleep(50);
expect(Date.now() - start).toBeGreaterThanOrEqual(40);
});

it("resolves immediately for 0ms", async () => {
await expect(sleep(0)).resolves.toBeUndefined();
Comment on lines +190 to +191
// Extract the args portion from the log line (everything after "spawning: <cmd> ")
const afterCommand = spawnLog?.replace(/.*spawning: \S+ /, "") ?? "";
Comment on lines +100 to +125
child.on("exit", (code, signal) => {
log(`attempt ${attempt + 1}: process exit event` + ` exitCode=${code ?? 1}` + (signal ? ` signal=${signal}` : ""));
});

// Resolve on 'close', not 'exit', to ensure stdio streams are fully drained.
child.on("close", (code, signal) => {
const durationMs = Date.now() - startTime;
const exitCode = code ?? 1;
log(`attempt ${attempt + 1}: process closed` + ` exitCode=${exitCode}` + (signal ? ` signal=${signal}` : "") + ` duration=${formatDuration(durationMs)}` + ` stdout=${stdoutBytes}B stderr=${stderrBytes}B hasOutput=${hasOutput}`);
resolve({ exitCode, output: collectedOutput, hasOutput, durationMs });
});

child.on("error", err => {
const durationMs = Date.now() - startTime;
// prettier-ignore
const errno = /** @type {NodeJS.ErrnoException} */ (err);
const errCode = errno.code ?? "unknown";
const errSyscall = errno.syscall ?? "unknown";
log(`attempt ${attempt + 1}: failed to start process '${command}': ${err.message}` + ` (code=${errCode} syscall=${errSyscall})`);
resolve({
exitCode: 1,
output: collectedOutput,
hasOutput,
durationMs,
});
});
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented May 3, 2026

@copilot review all comments

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 94/100

Excellent test quality

Metric Value
New/modified tests analyzed 19
✅ Design tests (behavioral contracts) 19 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 15 (79%)
Duplicate test clusters 0
Test inflation detected No (1.49:1 ratio)
🚨 Coding-guideline violations None

Test Classification Details

View all 19 test classifications
Test File Classification Notes
formats zero milliseconds as 0s process_runner.test.cjs ✅ Design Boundary value (zero)
formats sub-minute durations as seconds only process_runner.test.cjs ✅ Design Multiple boundary values
formats exactly one minute process_runner.test.cjs ✅ Design Boundary at 60s
formats minutes and seconds process_runner.test.cjs ✅ Design Output contract
truncates sub-second precision process_runner.test.cjs ✅ Design Edge case: truncation behavior
resolves after the given delay process_runner.test.cjs ✅ Design Timing contract
resolves immediately for 0ms process_runner.test.cjs ✅ Design Edge case: zero delay
resolves with exitCode 0 for a successful command process_runner.test.cjs ✅ Design Happy path: exit code contract
resolves with the actual non-zero exit code on failure process_runner.test.cjs ✅ Design Error path: non-zero exit
collects stdout output and sets hasOutput process_runner.test.cjs ✅ Design Output capture contract
collects stderr output and sets hasOutput process_runner.test.cjs ✅ Design Stderr capture contract
sets hasOutput false when no output is produced process_runner.test.cjs ✅ Design Edge case: empty output
logs spawning with logArgs instead of args when provided process_runner.test.cjs ✅ Design Arg redaction contract
falls back to args for logging when logArgs is not provided process_runner.test.cjs ✅ Design Fallback log contract
uses the attempt number in log messages process_runner.test.cjs ✅ Design Attempt tracking contract
resolves with exitCode 1 and hasOutput false when command is not found process_runner.test.cjs ✅ Design Error path: missing binary
collects combined stdout and stderr in output process_runner.test.cjs ✅ Design Combined stream contract
resolves with durationMs as a non-negative number process_runner.test.cjs ✅ Design Return type contract
truncates logArgs to 200 chars in spawn log process_runner.test.cjs ✅ Design Security/truncation contract

Flagged Tests — Requires Review

No tests flagged. All tests verify observable behavioral contracts using real process execution.


Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 19 tests (vitest)

No Go test files were added or modified in this PR.


Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). The test suite uses real child process execution instead of mocks, covering both happy paths and error scenarios (non-zero exit codes, missing binaries, empty output, boundary values, arg redaction). No coding-guideline violations detected.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §25273924690

🧪 Test quality analysis by Test Quality Sentinel · ● 606.3K ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 94/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 19 tests verify behavioral contracts using real child process execution with solid error path and edge case coverage.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 3, 2026

@copilot review all comments

Addressed all three review comments in commit 9810f54:

  1. sleep test flakiness — replaced the wall-clock Date.now() assertion with vi.useFakeTimers() + vi.advanceTimersByTime(1000), so the test is instantaneous and deterministic.

  2. Fragile truncation regex — replaced the \S+-based parse (which breaks when execPath contains spaces) with a x+$ anchor that matches the trailing run of x characters directly (our longArg is all x's, so the truncated portion always ends the log line).

  3. Double-resolution guard — added a settled boolean + settle() wrapper in runProcess so only the first terminal event (error or close) logs and resolves the promise. Subsequent events are silently dropped.

Copilot AI requested a review from pelikhan May 3, 2026 08:20
@pelikhan pelikhan merged commit 388fbaa into main May 3, 2026
@pelikhan pelikhan deleted the copilot/refactor-agent-harness-runner branch May 3, 2026 13:08
Copilot AI added a commit that referenced this pull request May 3, 2026
…arnesses (#29888)

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate Code Detected: Agent harness process runner

4 participants