Handle answer-only native compare runs#65
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
📝 WalkthroughWalkthroughThis PR adds support for plain-text answer-only output in native_agent compare mode. When the structured runner exits cleanly but produces no parseable usage metadata, the result is now recorded as ChangesNative Agent Answer-Only Result Status
Poem
Possibly related PRs
🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/infrastructure/compare.ts (1)
1680-1689:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate
native_agentexec templates before expanding them.
executeNativeAgentCompare()bypassesvalidateCompareExecTemplate(), sonative_agentstill accepts$(cat {prompt_file})/ backtick substitutions that the regular compare path rejects. That can push the full prompt into argv/process listings and hit shell command-length limits on larger prompts. Apply the same validation once before eitherexpandCompareExecTemplate()call.Suggested fix
export async function executeNativeAgentCompare( input: GenerateCompareArtifactsInput, dependencies: ExecuteNativeAgentCompareDependencies = {}, ): Promise<NativeAgentCompareResult> { if (input.baselineMode !== 'native_agent') { throw new Error(`executeNativeAgentCompare requires baselineMode "native_agent", got "${input.baselineMode}"`) } + validateCompareExecTemplate(input.execTemplate) const graphPath = validateGraphPath(input.graphPath) const projectRoot = realpathSync(inferProjectRootFromGraphPath(graphPath)) const questions = resolveCompareQuestions(input)Also applies to: 1747-1756
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/infrastructure/compare.ts` around lines 1680 - 1689, The exec template for native_agent is not being validated before expansion in executeNativeAgentCompare; call validateCompareExecTemplate(input.execTemplate) (or equivalent) and handle validation errors before invoking expandCompareExecTemplate(...) for both the baseline and candidate paths in executeNativeAgentCompare; similarly add the same pre-expansion validation at the other occurrence around the code handling the candidate run (the second expandCompareExecTemplate call referenced in the review), so both baseline and candidate expansions validate input.execTemplate first.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/infrastructure/compare.ts`:
- Around line 1680-1689: The exec template for native_agent is not being
validated before expansion in executeNativeAgentCompare; call
validateCompareExecTemplate(input.execTemplate) (or equivalent) and handle
validation errors before invoking expandCompareExecTemplate(...) for both the
baseline and candidate paths in executeNativeAgentCompare; similarly add the
same pre-expansion validation at the other occurrence around the code handling
the candidate run (the second expandCompareExecTemplate call referenced in the
review), so both baseline and candidate expansions validate input.execTemplate
first.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b5216e25-b3a4-42bc-956e-6a6f0b145657
📒 Files selected for processing (7)
README.mddocs/proof-workflows.mdsrc/cli/main.tssrc/infrastructure/compare.tstests/unit/cli.test.tstests/unit/compare-native-agent.test.tstests/unit/compare.test.ts
Summary
Testing
Summary by CodeRabbit
Release Notes
Documentation
--baseline-mode native_agentwith structured Anthropic runners for token reduction metrics versus plain-text runners for answer-only artifactsBug Fixes
comparecommand now gracefully handles plain-text runner outputs lacking token usage metadata, storing them as answer-only artifacts instead of reporting errors