refactor(prompt): consolidate template context and add golden snapshot tests#630
refactor(prompt): consolidate template context and add golden snapshot tests#630
Conversation
roborev: Combined Review (
|
e6357ab to
1281801
Compare
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
35f7bf6 to
97ec9a7
Compare
roborev: Combined Review (
|
97ec9a7 to
12a09a8
Compare
roborev: Combined Review (
|
12a09a8 to
ebdf660
Compare
roborev: Combined Review (
|
- share common prompt build setup
ebdf660 to
9c44399
Compare
roborev: Combined Review (
|
Fixes Windows CI where CRLF line endings in .gotmpl files caused string-contains assertions (expecting \n) to fail on prompt output. Templates are embedded via go:embed, so checkout line endings are baked into the binary.
roborev: Combined Review (
|
Three regressions were introduced by the template consolidation: - Range review prompts lost the 'Per-Commit Reviews in This Range' section, allowing agents to re-raise issues already surfaced on individual commits. Restored InRangeReviews population via a lookupReviewContexts helper and re-added the in_range_reviews template block. - BuildAddressPrompt lumped all responses into a single 'Previous Addressing Attempts' section, misclassifying developer comments as fix attempts. Restored the SplitResponses call and the separate address_tool_attempts / address_user_comments templates. - The Codex review template dropped the explicit 'Do NOT search or read files outside the repository checkout' guardrail. Restored.
roborev: Combined Review (
|
…ssions Golden-file snapshot tests under internal/prompt/testdata/golden cover seven canonical scenarios (default/codex single review, in-range range review, dirty review, address prompt with split responses, security review, design review). A pre-refactor diff against commit c27d4dc surfaced four more regressions from the template consolidation: - Address prompt dropped the 4-line 'Previous Addressing Attempts' preamble explaining how to learn from prior fix attempts. Restored. - Address prompt lost the blank line between the severity-filter instruction and '## Review Findings'. Restored. - Templates for previous_reviews, in_range_reviews, and previous_review_attempts each emitted a stray '\n' when empty, accumulating into 3 extra blank lines before '## Current Commit'. Collapsed the trailing {{end}} onto the same line. - In-range / previous-review entries emitted 2 blank lines between successive items instead of 1. Added {{- end}} trim on the range. - review_comments template likewise emitted a stray '\n' when empty; collapsed its trailing {{end}}. Regenerate goldens with: go test -update-golden ./internal/prompt/
…paths Adds seven more snapshot scenarios: - single-review with claude-code agent (agent-specific system prompt) - single-review with gemini agent (agent-specific system prompt) - single-review with previous-reviews context (contextCount=2, DB reviews) - single-review with .roborev.toml review_guidelines - single-review with additionalContext (PR discussion injection) - single-review with severity filter (minSeverity=medium) - single-review with truncated diff (cap=4000, generic commit fallback) Brings golden coverage of the prompt matrix from ~40% to ~75%.
roborev: Combined Review (
|
Closes the remaining coverage gap: - single review truncated diff with codex agent (codex-specific fallback with git show / git diff commands) - range review truncated diff (generic range fallback) - dirty review truncated diff (partial snippet + truncated marker) - address prompt without a severity filter (baseline rendering) Golden coverage of the prompt matrix is now at roughly 100% of the documented paths (18 scenarios total).
roborev: Combined Review (
|
roborev: Combined Review (
|
Two places reconstructed ReviewOptionalContext field-by-field and silently dropped the newly-restored InRangeReviews slice: - buildRangePrompt rebuilt ctx.optional from the selected codex variant with a struct literal that omitted InRangeReviews. Replaced with ctx.optional = selectedCtx.Review.Optional.Clone() so any future field is carried through automatically. - trimOptionalSections repopulated the view after TrimNext but left InRangeReviews stale, defeating a TrimNext call that cleared it. Replaced the field-by-field copy with *view = ctx. Also adds InRangeReviews to measureOptionalSectionsLoss so fallback variants that drop per-commit reviews are scored correctly. Removes two now-unused conversion helpers.
…d regression tests - renderShellCommand re-applies stripInlineCodeBreakers so backticks and control characters in git refs cannot escape the enclosing Markdown inline code span. The sanitizer pre-dated this branch (pr #558) but was dropped when renderShellCommand was refactored. - review_comments template ends with an extra trailing blank line and drops its own leading blank, so comment-bearing entries in previous_reviews / in_range_reviews / previous_review_attempts keep a blank line separator before the next entry. Before this fix, the {{- end}} trim ate the only separator after the comment block and the following '--- Review ... ---' header butted against the last comment line. Regression tests added: - TestTrimOptionalSectionsPropagatesInRangeReviewsClear: locks in that trimOptionalSections (*view = ctx) propagates the cleared InRangeReviews slice back to the caller, which the field-by-field rebuild did not. - TestMeasureOptionalSectionsLossCountsInRangeReviews: ensures the fallback selector treats dropped InRangeReviews as a loss. - TestReviewOptionalContextTrimNextPreservesPriority: extended to cover the InRangeReviews priority slot. - TestRenderShellCommandStripsInlineCodeBreakers: table-driven test that backticks and control bytes never survive rendering. - TestGoldenPrompt_PreviousReviewsWithComments: golden-file snapshot proving the comment separator fix. - TestGoldenPrompt_RangeTruncatedCodexPreservesInRangeReviews: golden-file snapshot for the truncated-codex-range path proving InRangeReviews survive and the codex fallback is selected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Golden snapshot tests compare prompt output byte-for-byte. Without this rule, *.golden files are checked out as CRLF on Windows with core.autocrlf while the renderer always emits LF, so the tests would fail even when prompt rendering is correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
roborev: Combined Review (
|
Summary
TemplateContextroot; unifies prompt rendering, fitting, and range fallback selection through it.InRangeReviewssection in range prompts, the tool-attempts / user-comments split in address prompts, and the codex "do not read files outside the repo checkout" guardrail.internal/prompt/testdata/golden/covering 18 scenarios across agents (default, codex, claude-code, gemini), review types (review, range, dirty, address, security, design), context injection (previous reviews, guidelines, additional context, severity filter), and diff-truncation fallbacks. Regenerate withgo test -update-golden ./internal/prompt/.\nin empty sections, extra blank lines between range entries) surfaced by byte-diffing against the pre-refactor output.*.gotmpltemplates via.gitattributesso the embedded FS produces identical output on Windows.