Handle answer-only native compare runs by mohanagy · Pull Request #65 · mohanagy/graphify-ts

mohanagy · 2026-05-08T17:28:32Z

Summary

treat native-agent compare exit-0 plain-text Claude runs as answer-only artifacts instead of runner failures
keep Anthropic JSON usage behavior unchanged while clarifying when provider-proof reductions are unavailable
add regression coverage and update compare/native-agent guidance in help and docs

Testing

npm run typecheck
npm run test:run
npm run build

Summary by CodeRabbit

Release Notes

Documentation
- Clarified how to use --baseline-mode native_agent with structured Anthropic runners for token reduction metrics versus plain-text runners for answer-only artifacts
- Updated CLI help text to reflect this behavior distinction
Bug Fixes
- The compare command now gracefully handles plain-text runner outputs lacking token usage metadata, storing them as answer-only artifacts instead of reporting errors

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-05-08T17:28:43Z

📝 Walkthrough

Walkthrough

This PR adds support for plain-text answer-only output in native_agent compare mode. When the structured runner exits cleanly but produces no parseable usage metadata, the result is now recorded as answer_only instead of runner_error, with dedicated formatting and documentation.

Changes

Native Agent Answer-Only Result Status

Layer / File(s)	Summary
Type Definition `src/infrastructure/compare.ts`	`NativeAgentRunStatus` union expanded with new `answer_only` variant containing evidence, exit code, stderr, and result path.
Core Execution Logic `src/infrastructure/compare.ts`	Baseline and graphify result recording updated: when runner exits with code 0 but parsing fails, status is recorded as `answer_only`; otherwise remains `runner_error`.
Failure Classification & Helper `src/infrastructure/compare.ts`	Failure counting logic refined to treat only `runner_error` as failure; new `isNativeAgentRunFailure` helper introduced.
Result Formatting `src/infrastructure/compare.ts`	`formatNativeAgentCompareSummary` updated to emit dedicated "answer-only run saved" message and skip provider-proof reduction reporting for `answer_only` results.
CLI Help & Documentation `src/cli/main.ts`, `README.md`, `docs/proof-workflows.md`	Help text and user docs clarified to explain answer-only fallback when structured Anthropic JSON output is unavailable.
Test Coverage `tests/unit/cli.test.ts`, `tests/unit/compare-native-agent.test.ts`, `tests/unit/compare.test.ts`	CLI help test updated; native-agent execution test adjusted for `answer_only` classification; new integration test added for plain-text runner output without usage metadata.

Poem

🐰 When structures fade to simple text,
The answer-only path's up next,
No usage gleams, but answers clear—
We save the result, without fear! ✨

Possibly related PRs

mohanagy/graphify-ts#19: Modifies structured-runner parsing for Gemini usage capture; related infrastructure for provider-specific usage fallback handling.
mohanagy/graphify-ts#28: Touches native_agent compare flow in compare.ts, CLI help, and tests; overlapping change surface.
mohanagy/graphify-ts#22: Centralizes structured JSON parsing via prompt-runner; foundational logic this PR builds upon for answer-only fallback.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: handling answer-only native compare runs as a distinct case instead of failures.
Description check	✅ Passed	The description covers the key change, testing steps, and relevant updates, matching the template structure with summary and testing sections completed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/native-agent-answer-only

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/infrastructure/compare.ts (1)

1680-1689: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate native_agent exec templates before expanding them.

executeNativeAgentCompare() bypasses validateCompareExecTemplate(), so native_agent still accepts $(cat {prompt_file}) / backtick substitutions that the regular compare path rejects. That can push the full prompt into argv/process listings and hit shell command-length limits on larger prompts. Apply the same validation once before either expandCompareExecTemplate() call.

Suggested fix

 export async function executeNativeAgentCompare(
   input: GenerateCompareArtifactsInput,
   dependencies: ExecuteNativeAgentCompareDependencies = {},
 ): Promise<NativeAgentCompareResult> {
   if (input.baselineMode !== 'native_agent') {
     throw new Error(`executeNativeAgentCompare requires baselineMode "native_agent", got "${input.baselineMode}"`)
   }
+  validateCompareExecTemplate(input.execTemplate)
 
   const graphPath = validateGraphPath(input.graphPath)
   const projectRoot = realpathSync(inferProjectRootFromGraphPath(graphPath))
   const questions = resolveCompareQuestions(input)

Also applies to: 1747-1756

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/infrastructure/compare.ts` around lines 1680 - 1689, The exec template
for native_agent is not being validated before expansion in
executeNativeAgentCompare; call validateCompareExecTemplate(input.execTemplate)
(or equivalent) and handle validation errors before invoking
expandCompareExecTemplate(...) for both the baseline and candidate paths in
executeNativeAgentCompare; similarly add the same pre-expansion validation at
the other occurrence around the code handling the candidate run (the second
expandCompareExecTemplate call referenced in the review), so both baseline and
candidate expansions validate input.execTemplate first.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/infrastructure/compare.ts`:
- Around line 1680-1689: The exec template for native_agent is not being
validated before expansion in executeNativeAgentCompare; call
validateCompareExecTemplate(input.execTemplate) (or equivalent) and handle
validation errors before invoking expandCompareExecTemplate(...) for both the
baseline and candidate paths in executeNativeAgentCompare; similarly add the
same pre-expansion validation at the other occurrence around the code handling
the candidate run (the second expandCompareExecTemplate call referenced in the
review), so both baseline and candidate expansions validate input.execTemplate
first.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b5216e25-b3a4-42bc-956e-6a6f0b145657

📥 Commits

Reviewing files that changed from the base of the PR and between 954886a and 851414e.

📒 Files selected for processing (7)

README.md
docs/proof-workflows.md
src/cli/main.ts
src/infrastructure/compare.ts
tests/unit/cli.test.ts
tests/unit/compare-native-agent.test.ts
tests/unit/compare.test.ts

Handle answer-only native compare runs

851414e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

mohanagy merged commit 6c28238 into main May 8, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle answer-only native compare runs#65

Handle answer-only native compare runs#65
mohanagy merged 1 commit into
mainfrom
fix/native-agent-answer-only

mohanagy commented May 8, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

Poem

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohanagy commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohanagy commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading