Skip to content

Handle answer-only native compare runs#65

Merged
mohanagy merged 1 commit into
mainfrom
fix/native-agent-answer-only
May 8, 2026
Merged

Handle answer-only native compare runs#65
mohanagy merged 1 commit into
mainfrom
fix/native-agent-answer-only

Conversation

@mohanagy
Copy link
Copy Markdown
Owner

@mohanagy mohanagy commented May 8, 2026

Summary

  • treat native-agent compare exit-0 plain-text Claude runs as answer-only artifacts instead of runner failures
  • keep Anthropic JSON usage behavior unchanged while clarifying when provider-proof reductions are unavailable
  • add regression coverage and update compare/native-agent guidance in help and docs

Testing

  • npm run typecheck
  • npm run test:run
  • npm run build

Summary by CodeRabbit

Release Notes

  • Documentation

    • Clarified how to use --baseline-mode native_agent with structured Anthropic runners for token reduction metrics versus plain-text runners for answer-only artifacts
    • Updated CLI help text to reflect this behavior distinction
  • Bug Fixes

    • The compare command now gracefully handles plain-text runner outputs lacking token usage metadata, storing them as answer-only artifacts instead of reporting errors

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds support for plain-text answer-only output in native_agent compare mode. When the structured runner exits cleanly but produces no parseable usage metadata, the result is now recorded as answer_only instead of runner_error, with dedicated formatting and documentation.

Changes

Native Agent Answer-Only Result Status

Layer / File(s) Summary
Type Definition
src/infrastructure/compare.ts
NativeAgentRunStatus union expanded with new answer_only variant containing evidence, exit code, stderr, and result path.
Core Execution Logic
src/infrastructure/compare.ts
Baseline and graphify result recording updated: when runner exits with code 0 but parsing fails, status is recorded as answer_only; otherwise remains runner_error.
Failure Classification & Helper
src/infrastructure/compare.ts
Failure counting logic refined to treat only runner_error as failure; new isNativeAgentRunFailure helper introduced.
Result Formatting
src/infrastructure/compare.ts
formatNativeAgentCompareSummary updated to emit dedicated "answer-only run saved" message and skip provider-proof reduction reporting for answer_only results.
CLI Help & Documentation
src/cli/main.ts, README.md, docs/proof-workflows.md
Help text and user docs clarified to explain answer-only fallback when structured Anthropic JSON output is unavailable.
Test Coverage
tests/unit/cli.test.ts, tests/unit/compare-native-agent.test.ts, tests/unit/compare.test.ts
CLI help test updated; native-agent execution test adjusted for answer_only classification; new integration test added for plain-text runner output without usage metadata.

Poem

🐰 When structures fade to simple text,
The answer-only path's up next,
No usage gleams, but answers clear—
We save the result, without fear! ✨

Possibly related PRs

  • mohanagy/graphify-ts#19: Modifies structured-runner parsing for Gemini usage capture; related infrastructure for provider-specific usage fallback handling.
  • mohanagy/graphify-ts#28: Touches native_agent compare flow in compare.ts, CLI help, and tests; overlapping change surface.
  • mohanagy/graphify-ts#22: Centralizes structured JSON parsing via prompt-runner; foundational logic this PR builds upon for answer-only fallback.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: handling answer-only native compare runs as a distinct case instead of failures.
Description check ✅ Passed The description covers the key change, testing steps, and relevant updates, matching the template structure with summary and testing sections completed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/native-agent-answer-only

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/infrastructure/compare.ts (1)

1680-1689: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate native_agent exec templates before expanding them.

executeNativeAgentCompare() bypasses validateCompareExecTemplate(), so native_agent still accepts $(cat {prompt_file}) / backtick substitutions that the regular compare path rejects. That can push the full prompt into argv/process listings and hit shell command-length limits on larger prompts. Apply the same validation once before either expandCompareExecTemplate() call.

Suggested fix
 export async function executeNativeAgentCompare(
   input: GenerateCompareArtifactsInput,
   dependencies: ExecuteNativeAgentCompareDependencies = {},
 ): Promise<NativeAgentCompareResult> {
   if (input.baselineMode !== 'native_agent') {
     throw new Error(`executeNativeAgentCompare requires baselineMode "native_agent", got "${input.baselineMode}"`)
   }
+  validateCompareExecTemplate(input.execTemplate)
 
   const graphPath = validateGraphPath(input.graphPath)
   const projectRoot = realpathSync(inferProjectRootFromGraphPath(graphPath))
   const questions = resolveCompareQuestions(input)

Also applies to: 1747-1756

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/infrastructure/compare.ts` around lines 1680 - 1689, The exec template
for native_agent is not being validated before expansion in
executeNativeAgentCompare; call validateCompareExecTemplate(input.execTemplate)
(or equivalent) and handle validation errors before invoking
expandCompareExecTemplate(...) for both the baseline and candidate paths in
executeNativeAgentCompare; similarly add the same pre-expansion validation at
the other occurrence around the code handling the candidate run (the second
expandCompareExecTemplate call referenced in the review), so both baseline and
candidate expansions validate input.execTemplate first.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/infrastructure/compare.ts`:
- Around line 1680-1689: The exec template for native_agent is not being
validated before expansion in executeNativeAgentCompare; call
validateCompareExecTemplate(input.execTemplate) (or equivalent) and handle
validation errors before invoking expandCompareExecTemplate(...) for both the
baseline and candidate paths in executeNativeAgentCompare; similarly add the
same pre-expansion validation at the other occurrence around the code handling
the candidate run (the second expandCompareExecTemplate call referenced in the
review), so both baseline and candidate expansions validate input.execTemplate
first.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b5216e25-b3a4-42bc-956e-6a6f0b145657

📥 Commits

Reviewing files that changed from the base of the PR and between 954886a and 851414e.

📒 Files selected for processing (7)
  • README.md
  • docs/proof-workflows.md
  • src/cli/main.ts
  • src/infrastructure/compare.ts
  • tests/unit/cli.test.ts
  • tests/unit/compare-native-agent.test.ts
  • tests/unit/compare.test.ts

@mohanagy mohanagy merged commit 6c28238 into main May 8, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant