Skip to content

Harden Codex large-diff prompt budgeting and fallbacks#558

Merged
wesm merged 25 commits intomainfrom
fix/sandboxing
Mar 23, 2026
Merged

Harden Codex large-diff prompt budgeting and fallbacks#558
wesm merged 25 commits intomainfrom
fix/sandboxing

Conversation

@wesm
Copy link
Copy Markdown
Collaborator

@wesm wesm commented Mar 20, 2026

Summary

  • harden oversized Codex review prompts so fallback instructions preserve review filtering and stay within prompt budgets
  • preserve the legacy prompt-size default, add UTF-8-safe truncation, and enforce a final hard cap for tiny configured budgets
  • avoid eagerly materializing oversized commit and range diffs by using bounded git reads
  • expand regression coverage for large diffs, large metadata, exclude handling, and small prompt caps

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 20, 2026

roborev: Combined Review (53c56a8)

Verdict: The PR introduces a high-severity command injection vulnerability in the fallback shell snippets and contains configuration and matching
inconsistencies.

High

  • Location: internal/prompt/prompt.go:242, internal/prompt/prompt.go:253
  • Problem: The Codex oversized-diff fallback embeds sha and rangeRef directly into shell command examples without quoting or normalization
    . This creates a command-injection sink if these values contain attacker-controlled ref names (e.g., topic;touch${IFS}/tmp/pwned). Because the agent is instructed to run these locally, malicious refs could lead to arbitrary local command execution.
  • Fix: Do not interpolate raw revspecs into
    shell snippets. Resolve inputs to verified full commit OIDs before embedding, shell-escape them correctly, or instruct the agent to inspect the diff through a non-shell mechanism.

Medium

  • Location: internal/prompt/prompt.go:235

  • Problem: isCodexReview Agent only matches the literal name "codex". This feature will be skipped for any configured alias that resolves to the Codex provider, causing aliased setups to fall back to the older, weaker oversized-diff guidance.

  • Fix: Decide the fallback from resolved agent/provider metadata instead of the raw agentName,
    or normalize all supported Codex aliases before checking.

  • Location: internal/prompt/prompt.go:330, internal/prompt/prompt.go:416

  • Problem: The oversized-diff gate uses the hardcoded MaxPromptSize constant instead of the config-aware
    b.maxPromptSize() accessor. Non-default prompt-size configurations will trigger the Codex fallback incorrectly.

  • Fix: Replace the MaxPromptSize comparisons with b.maxPromptSize() (or equivalent config-aware accessor) in both prompt builders, and add coverage for non-default prompt-
    size configuration.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 20, 2026

roborev: Combined Review (91c4d25)

Verdict: The PR introduces size-aware fallback prompts for Codex
, but requires fixes for command injection risks, hardcoded limits, and inefficient test setups.

Medium

  • Location: internal/prompt/prompt.go:282, internal/prompt/prompt.go:467
    Problem: The new Codex oversized-range fallback
    embeds rangeRef directly into shell command examples (e.g., git log --oneline %s). rangeRef is a free-form revision expression and can contain shell metacharacters, creating a command-injection path in the reviewer environment if an external branch name is used.
    Fix
    :
    Resolve the range to verified object IDs before rendering it into command examples, or shell-escape the argument before interpolation.

  • Location: internal/prompt/prompt.go (around writeCodexCommitInspectionFallback, writeCodexRangeInspectionFallback, buildSinglePrompt, buildRangePrompt )
    Problem: The new Codex fallback logic hard-codes MaxPromptSize instead of using the builder’s config-aware limit. If a smaller prompt cap is configured, the fallback can exceed the real limit.
    Fix: Thread b.maxPromptSize() through the overflow check
    and writeLongestFitting, and add a test that sets a reduced prompt limit to verify the fallback stays within the configured cap.

  • Location: internal/prompt/prompt_test.go (findSingleCommitNearCapRepo and findRangeNearCapRepo)
    Problem:
    The functions create up to 60 separate git repositories via shell commands in a loop to iteratively find a prompt size boundary. This trial-and-error file system manipulation will significantly slow down test execution and increase I/O overhead.
    Fix: Calculate the required guidelineLen mathematically based on the
    length of a baseline prompt without guidelines, and create the test repository only once.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 20, 2026

roborev: Combined Review (182913d)

Verdict: The PR implements necessary prompt truncation and fallback logic for oversized Codex reviews, but requires fixes for a severe performance/correctness bug in UTF
-8 truncation and hardcoded prompt size limits.

High Severity

  • Location: internal/prompt/prompt.go:42 (truncateUTF8)
  • Problem: utf8.ValidString checks the entire string from the beginning. If the string contains any invalid UTF-8
    prior to the truncation point (e.g., from user guidelines or commit messages), the loop will repeatedly decrement maxBytes to remove the invalid byte, unintentionally wiping out large portions of valid context (or the entire string). Additionally, this causes an O(N*M) CPU spike as utf8. ValidString repeatedly re-scans the valid prefix on every iteration.
  • Fix: Check only the truncation boundary. Walk backward from maxBytes (up to 3 bytes) to detect and remove a trailing incomplete rune, rather than re-validating the entire prefix.

Medium Severity

  • Location: internal/prompt/prompt.go:233, internal/prompt/prompt.go:409, internal/prompt/prompt.go:495
  • Problem: The new Codex fallback path hardcodes the legacy MaxPromptSize (2
    50 KB) instead of using the builder’s config-aware prompt budget. On installations that set a smaller cap, writeLongestFitting can still emit an oversized prompt; on installations that set a larger cap, the code can unnecessarily drop the inline diff even though it would fit.
  • Fix: Thread
    b.maxPromptSize() through the oversized-diff checks and fallback writers so both the “should we omit the diff?” decision and the trimming logic use the effective configured limit. Add a regression test that exercises a non-default prompt budget.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 20, 2026

roborev: Combined Review (563f187)

Verdict: The PR requires important fixes to address a command injection vulnerability, a critical bug in UTF-8 truncation logic, and incorrect prompt
size limits.

High Severity Findings

  • Location: internal/prompt/prompt.go:281, internal/prompt/prompt.go:307, internal/prompt/prompt.go:429, internal/prompt/prompt.go:534

    • Problem: The new Codex oversized-diff fallbacks interpolate sha/rangeRef directly into shell command snippets and explicitly instruct Codex to run them. If those refs are not already canonical SHAs, an attacker could supply ref names containing shell metacharacters, leading to command injection when the agent
      executes the suggested command verbatim.
    • Suggested Fix: Resolve refs to validated full commit SHAs before embedding them in any command text, or shell-escape them with a dedicated quoting routine. Enforce the “must be a SHA” invariant at this boundary.
  • Location: internal/prompt/prompt. go (truncateUTF8 function)

    • Problem: utf8.ValidString(s[:maxBytes]) validates the entire substring. If the text contains any invalid UTF-8 sequence prior to the truncation point, it will return false on every iteration until maxBytes drops
      to 0, completely wiping out the string instead of just truncating it.
    • Suggested Fix: Check only the boundary byte to avoid splitting a multi-byte character. Replace the loop condition with: for maxBytes > 0 && !utf8.RuneStart(s[maxBytes]) { maxBytes-- }

Medium Severity Findings

  • Location: internal/prompt/prompt.go:427, internal/prompt/prompt.go:532
    • Problem: The new fallback path uses the legacy MaxPromptSize constant for both the overflow check and
      the trimming budget, ignoring Builder.maxPromptSize(). Any non-default configured prompt cap will be applied incorrectly: prompts can still exceed the real cap or degrade to fallback too early.
    • Suggested Fix: Resolve the active limit once with b.maxPromptSize() and use that value consistently
      in the size check and in buildPromptPreservingCurrentSection.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

wesm and others added 6 commits March 20, 2026 17:13
- Use config-resolved prompt cap instead of hardcoded MaxPromptSize
  in buildSinglePrompt, buildRangePrompt, and BuildDirty
- Shell-quote rangeRef in Codex fallback variants to prevent
  injection via crafted branch names
- Replace O(N²) utf8.ValidString loop in truncateUTF8 with
  O(1) utf8.RuneStart boundary check

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Non-Codex single/range fallbacks now route through
  buildPromptPreservingCurrentSection to trim optional context
  (guidelines, previous reviews) when it alone exceeds the cap
- BuildDirty separates optional context from required sections
  and trims it before the diff-size check
- Add regression tests with a 10KB configured cap for non-Codex
  single, range, and dirty paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 22, 2026

roborev: Combined Review (1f32674)

Verdict: The PR introduces bounded git diff reading and improves prompt budget handling, but contains a medium-severity issue regarding UTF-8 safe truncation.

Medium

  • Location:
    internal/git/git.go:667
  • Problem: captureGitOutputLimited truncates git output at an arbitrary byte boundary and immediately converts it to string. If the diff contains multibyte UTF-8 text near the cutoff, the returned prompt can now contain invalid
    UTF-8 even though the prompt layer added UTF-8-safe trimming elsewhere.
  • Fix: Trim the captured bytes to a valid UTF-8 boundary before returning them, and add a test that limits a diff containing non-ASCII content.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm wesm changed the title Fix Codex oversized diff review fallbacks Harden Codex large-diff prompt budgeting and fallbacks Mar 22, 2026
wesm and others added 4 commits March 22, 2026 14:14
On Windows, a killed git process can remain blocked if its stdout
pipe buffer is full and nothing is reading it. This prevents
cmd.Wait from returning because the stderr copy goroutine never
sees EOF. Drain remaining stdout after cancel so the process can
unblock, exit, and release all pipe handles.

Fixes 600s timeout in Windows CI for git and prompt packages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Tests now assert against config.DefaultMaxPromptSize (200KB) instead
  of the legacy MaxPromptSize constant (250KB), matching what
  resolveMaxPromptSize actually returns with no config
- Add regression test proving the 200KB default cap is honored
- sanitizeToValidUTF8 uses U+FFFD replacement marker instead of
  silently dropping invalid byte sequences
- Trim trailing partial rune before sanitizing so truncation doesn't
  expand the output past the byte limit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous test used a diff far larger than both caps, so it
passed regardless of which limit was enforced. Now creates a
prompt between 200KB and 250KB and verifies the builder with no
config enforces the 200KB default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 22, 2026

roborev: Combined Review (8d7ca23)

Verdict: The patch introduces useful streaming diff readers and bounded allocations, but contains a medium-severity command injection risk on Windows and functional bugs in diff-budget calculations and generic file filtering.

Medium

  • Location: internal/prompt/prompt.go (Line 294
    , Line 340)
    Problem:
    shellQuote() now emits single-quoted command arguments on Windows, and codexRangeInspectionFallbackVariants() embeds rangeRef into shell command strings that the review agent is told to run locally. Single quotes are PowerShell-specific; they do not protect arguments in cmd.exe, where metacharacters such
    as &, |, > and < remain active. If an externally sourced ref name ever reaches rangeRef and an oversized diff triggers this fallback, a Windows agent executing the suggested command through cmd.exe can be steered into running attacker-controlled commands.
    Fix: Do not render shell
    -ready strings from ref names. Prefer structured/argv-style command instructions so the agent can invoke git without a shell. If text commands must be emitted, generate them for one explicit shell only and escape according to that shell, or reject metacharacters in externally influenced refs before rendering.

  • Location
    :
    internal/prompt/prompt.go (around buildSinglePrompt / buildRangePrompt)
    Problem: The code computes diffLimit before trimming optional sections (optional context, large commit bodies, previous reviews). If those sections alone push baseLen over the cap,
    GetDiffLimited is called with 0 bytes and the prompt falls back to "Diff too large", even when the actual diff would fit after trimming the optional context. This causes small commits/ranges to lose their inline diff unnecessarily.
    Fix: Trim optional context/current overflow to the configured cap first
    , then calculate remaining diff budget and only fall back when the diff still does not fit after that trimming.

  • Location: internal/git/git.go (around GetFilesChanged and GetRangeFilesChanged)
    Problem: These generic helpers now always apply ReviewPathspec Args, which includes built-in and configured review excludes even when callers pass no excludes. That silently changes their contract from "all changed files" to "review-filtered changed files", so any existing caller that uses them for generic file lists or counts will now miss files like go.sum or repo-configured excludes
    .
    Fix: Preserve the old unfiltered behavior in GetFilesChanged / GetRangeFilesChanged, and add separate review-specific helpers or pass the filtered pathspec only at the review call sites.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

wesm and others added 2 commits March 22, 2026 15:28
- Restore GetFilesChanged/GetRangeFilesChanged to return all files
  without review excludes; callers that need filtering pass excludes
  explicitly via GetDiff/GetDiffLimited
- Drop Windows-specific shellQuote branch (single quotes don't
  protect in cmd.exe); always use POSIX quoting since commands
  are instructions for Codex agents, not shell invocations
- Compute diff budget assuming optional context can be trimmed,
  so large guidelines don't force a needless "diff too large"
  fallback when the actual diff would fit

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The diff budget is now computed from required sections only
(system prompt, commit metadata). Optional context (guidelines,
previous reviews) is trimmed after the diff size is known, so
large guidelines no longer cause small diffs to hit the "diff
too large" fallback unnecessarily.

Add regression test where guidelines consume most of the budget
but a small diff still gets inlined after context trimming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 22, 2026

roborev: Combined Review (385504e)

The changes implement budget-aware diff capture and prompt fallback logic, but introduce a few medium-severity correctness issues regarding mandatory diff preservation and safe prompt shaping.

Medium Severity Findings

  • Location: internal/prompt/prompt.go:219
    Problem: BuildDirty trims optionalContext against promptCap without reserving space for any mandatory diff/fallback text, then hard-caps the final prompt afterward. If guidelines/history
    already consume the budget, the tail cut can remove ### Diff entirely, producing a dirty-review prompt that contains no changes to review.
    Fix: Reserve space for a minimal diff section/fallback before trimming optional context, the same way the single-commit/range fallback path reserves room for its shortest
    variant.

  • Location: internal/prompt/prompt.go:271
    Problem: buildPromptPreservingCurrentSection and the other truncateUTF8 call sites trim markdown-heavy context at arbitrary byte boundaries. That can leave an open code fence or splice ### Diff/### Combined Diff directly onto a partial line, which can cause the rest of the prompt to be interpreted as code or malformed prose.
    Fix: Trim at section boundaries where possible, or normalize the truncated fragment by forcing a separator newline and closing any unmatched code fences before appending the next section
    .


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Collaborator Author

wesm commented Mar 22, 2026

"Both findings are speculative edge cases, not practical bugs:

  1. BuildDirty with no diff section: If guidelines alone exceed the cap, the diff section
    gets the truncation treatment — same as every other path. The hardCapPrompt is a safety net,
    not the primary trimming mechanism. And in practice, guidelines that exceed 200KB are
    pathological.
  2. Truncation at markdown boundaries: Every truncation in every LLM prompt system works this
    way. Agents handle malformed markdown gracefully — they're not parsers. Engineering
    section-boundary-aware truncation is significant complexity for zero practical benefit.

These are the kind of findings that sound rigorous but don't correspond to real failure
modes"

wesm and others added 2 commits March 23, 2026 10:46
TUI runtime files stored endpoint.BaseURL() which collapses Unix
socket endpoints to "http://localhost", losing the socket path.
Add ConfigAddr() that returns a ParseEndpoint-compatible string
(e.g. "unix:///path/to/sock") and use it in runtime metadata so
external tools can reconnect to the correct transport.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 23, 2026

roborev: Combined Review (8c43452)

Verdict: The PR
implements prompt size budgeting and fallback commands, but contains medium-severity issues related to Windows shell compatibility and Markdown injection vulnerabilities.

Review Findings

Medium

  • Location: internal/prompt/prompt.go (shellQuote, renderShellCommand, codexCommitInspectionFallbackVariants, codex RangeInspectionFallbackVariants)

    • Problem: The new oversized-diff fallback commands are always rendered with POSIX single-quote escaping. On Windows shells, single quotes are not parsed the same way, so commands containing quoted pathspec excludes like ':(exclude,glob)**/go.sum' will be passed to
      Git with the quotes included and fail. This breaks the local-inspection path for Codex on Windows when the inline diff has been omitted.
    • Fix: Render fallback commands with OS/shell-appropriate quoting, or avoid shell-escaped strings entirely and provide argv-style command templates that do not depend on POS
      IX quoting rules.
  • Location: internal/prompt/prompt.go (Lines 326, 349)

    • Problem: The oversized-diff Codex fallback embeds shell commands inside Markdown inline code using `...`, but only shell-quotes the
      interpolated arguments. Repo-controlled exclude patterns (from .roborev.toml) or ref/range strings containing backticks or embedded newlines can break out of the code span and inject additional prompt text/instructions. This could allow a malicious shared repository to steer the review agent.
    • Fix: Escape
      untrusted values for Markdown separately from shell quoting, or avoid inline-code rendering entirely for interpolated commands. Consider using fenced blocks with a dynamically chosen fence length, and reject control characters/newlines in repo-sourced exclude patterns before embedding them.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

wesm and others added 3 commits March 23, 2026 11:02
The fallback inspection commands included the full built-in lockfile
exclude list (20+ pathspec args), making prompts unreadable. Built-in
excludes are already applied when reading the diff via GetDiffLimited;
repeating them in agent instructions is redundant. Now only user-
configured excludes (from exclude_patterns) appear in the fallback.

Export FormatExcludeArgs for the prompt package to use directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exclude patterns from .roborev.toml are embedded in markdown inline
code spans in Codex fallback commands. Patterns containing backticks
could break out of the code span and inject prompt text. Silently
drop patterns with control characters or backticks in
FormatExcludeArgs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FormatExcludeArgs feeds into actual git diff pathspecs, so dropping
patterns with backticks there would break real exclude behavior.
Move the sanitization to the prompt rendering layer where pathspec
args are embedded in markdown inline code spans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 23, 2026

roborev: Combined Review (f5b3b96)

Summary Verdict: Enhances prompt building and git diff extraction to enforce size budgets, but introduces command injection and markdown formatting vulnerabilities via unescaped git refs.

High Severity

  • Command Injection via Git Refs: Untrusted git refs are interpolated into shell commands without quoting in the oversized-diff fallback paths. buildSinglePrompt takes a general git ref, not a guaranteed full SHA, and the new fallback strings emit git show %s -- path/to/file, View with: git show %s, and View with: git diff %s directly. Git ref names can legally contain shell-significant characters, so reviewing an attacker-controlled branch or tag from a shared repo can inject extra shell syntax into the command that the agent is told to run locally.
    *
    Affected Locations:
    * internal/prompt/prompt.go:389
    * [internal/prompt/prompt.go:529](/
    home/roborev/repos/roborev/internal/prompt/prompt.go#L529)
    * [internal/prompt/prompt.go:638](/home/roborev/repos/roborev/internal/prompt/prompt.go#L
*   *Suggested Remediation:* Resolve refs to canonical object IDs before formatting, and generate every displayed command with `renderShellCommand(...)` rather than raw `fmt.Sprintf`.

Medium Severity

  • Markdown Formatting Breakout: The new markdown-safety filtering only covers pathspec args,
    not the ref/range values embedded into inline command examples. safeForMarkdown drops backticks from exclude args, but sha and rangeRef are still inserted into backtick-delimited markdown command snippets. A malicious external ref name containing backticks can break out of the inline code span and inject
    additional prompt text or malformed markdown into the instructions sent to the review agent.
    • Affected Locations:
      • [internal/prompt/prompt.go:376](/home/roborev/repos/roborev/internal/prompt/prompt.go#L3
  1. *   [`internal/prompt/prompt.go:414`](/home/roborev/repos/roborev/internal/prompt/prompt.go#L414)
    
    • Suggested Remediation: Apply markdown sanitization to ref/range values
      too, or stop embedding untrusted values inside inline code spans; resolving refs to full SHAs before prompt construction is the simplest fix for single-commit reviews.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

The non-Codex oversized-diff fallbacks and one Codex example used
raw fmt.Sprintf for git refs. Use renderShellCommand/shellQuote
for consistency with the other Codex fallback commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented Mar 23, 2026

roborev: Combined Review (175f12e)

Verdict: The changes effectively harden prompt budgeting and truncation, but introduce medium-severity issues in the Codex oversized-diff fallback regarding review scope consistency
and markdown injection vulnerabilities.

Medium

  • Location: internal/prompt/prompt.go#L366, internal/prompt/prompt.go#L39
    0
    (buildSinglePrompt / buildRangePrompt)
    Problem: The fallback commands fail to preserve the full review scope. They only apply user-configured excludes and drop built
    -in exclusions (excludedPathPatterns). This causes the agent to inspect files that are normally hidden (like lockfiles or generated artifacts), leading to false-positive findings on oversized reviews.
    Fix: Build the fallback commands from the same full pathspec used to collect review diffs, or provide an equivalent helper that
    applies both built-in and user excludes.

  • Location: internal/prompt/prompt.go#L366, internal/prompt/prompt.go#L3
    90

    Problem: Markdown injection vulnerability. While pathspec arguments use safeForMarkdown, the sha and rangeRef variables do not. A hostile ref name from
    a shared repo containing backticks can terminate inline code spans and inject attacker-controlled text into the review prompt sent to the autonomous agent.
    Fix: Treat sha and rangeRef as markdown-untrusted input. Run them through the same markdown safety filter used for pathspec args, or use a
    dynamically fenced code block instead of inline backticks.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Collaborator Author

wesm commented Mar 23, 2026

" 1. Lockfile excludes in fallback: Same as review 10768. Already decided this is intentional — the agent
can use judgment. Not changing.
2. Markdown injection via refs: The sha and rangeRef values in the Codex fallback are already passed
through shellQuote/renderShellCommand which wraps them in single quotes — those don't contain backticks.
The non-Codex "View with:" lines aren't inside inline code spans (no backticks), so there's nothing to
break out of. Not a real issue."

@wesm wesm merged commit 6867dfb into main Mar 23, 2026
8 checks passed
@wesm wesm deleted the fix/sandboxing branch March 23, 2026 18:14
wesm added a commit that referenced this pull request Apr 18, 2026
…d regression tests

- renderShellCommand re-applies stripInlineCodeBreakers so backticks and
  control characters in git refs cannot escape the enclosing Markdown
  inline code span. The sanitizer pre-dated this branch (pr #558) but
  was dropped when renderShellCommand was refactored.

- review_comments template ends with an extra trailing blank line and
  drops its own leading blank, so comment-bearing entries in
  previous_reviews / in_range_reviews / previous_review_attempts keep
  a blank line separator before the next entry. Before this fix, the
  {{- end}} trim ate the only separator after the comment block and
  the following '--- Review ... ---' header butted against the last
  comment line.

Regression tests added:
- TestTrimOptionalSectionsPropagatesInRangeReviewsClear: locks in
  that trimOptionalSections (*view = ctx) propagates the cleared
  InRangeReviews slice back to the caller, which the field-by-field
  rebuild did not.
- TestMeasureOptionalSectionsLossCountsInRangeReviews: ensures the
  fallback selector treats dropped InRangeReviews as a loss.
- TestReviewOptionalContextTrimNextPreservesPriority: extended to
  cover the InRangeReviews priority slot.
- TestRenderShellCommandStripsInlineCodeBreakers: table-driven test
  that backticks and control bytes never survive rendering.
- TestGoldenPrompt_PreviousReviewsWithComments: golden-file snapshot
  proving the comment separator fix.
- TestGoldenPrompt_RangeTruncatedCodexPreservesInRangeReviews:
  golden-file snapshot for the truncated-codex-range path proving
  InRangeReviews survive and the codex fallback is selected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant