Harden Codex large-diff prompt budgeting and fallbacks#558
Conversation
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
roborev: Combined Review (
|
- Use config-resolved prompt cap instead of hardcoded MaxPromptSize in buildSinglePrompt, buildRangePrompt, and BuildDirty - Shell-quote rangeRef in Codex fallback variants to prevent injection via crafted branch names - Replace O(N²) utf8.ValidString loop in truncateUTF8 with O(1) utf8.RuneStart boundary check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Non-Codex single/range fallbacks now route through buildPromptPreservingCurrentSection to trim optional context (guidelines, previous reviews) when it alone exceeds the cap - BuildDirty separates optional context from required sections and trims it before the diff-size check - Add regression tests with a 10KB configured cap for non-Codex single, range, and dirty paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
On Windows, a killed git process can remain blocked if its stdout pipe buffer is full and nothing is reading it. This prevents cmd.Wait from returning because the stderr copy goroutine never sees EOF. Drain remaining stdout after cancel so the process can unblock, exit, and release all pipe handles. Fixes 600s timeout in Windows CI for git and prompt packages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Tests now assert against config.DefaultMaxPromptSize (200KB) instead of the legacy MaxPromptSize constant (250KB), matching what resolveMaxPromptSize actually returns with no config - Add regression test proving the 200KB default cap is honored - sanitizeToValidUTF8 uses U+FFFD replacement marker instead of silently dropping invalid byte sequences - Trim trailing partial rune before sanitizing so truncation doesn't expand the output past the byte limit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous test used a diff far larger than both caps, so it passed regardless of which limit was enforced. Now creates a prompt between 200KB and 250KB and verifies the builder with no config enforces the 200KB default. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
- Restore GetFilesChanged/GetRangeFilesChanged to return all files without review excludes; callers that need filtering pass excludes explicitly via GetDiff/GetDiffLimited - Drop Windows-specific shellQuote branch (single quotes don't protect in cmd.exe); always use POSIX quoting since commands are instructions for Codex agents, not shell invocations - Compute diff budget assuming optional context can be trimmed, so large guidelines don't force a needless "diff too large" fallback when the actual diff would fit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The diff budget is now computed from required sections only (system prompt, commit metadata). Optional context (guidelines, previous reviews) is trimmed after the diff size is known, so large guidelines no longer cause small diffs to hit the "diff too large" fallback unnecessarily. Add regression test where guidelines consume most of the budget but a small diff still gets inlined after context trimming. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
|
"Both findings are speculative edge cases, not practical bugs:
These are the kind of findings that sound rigorous but don't correspond to real failure |
TUI runtime files stored endpoint.BaseURL() which collapses Unix socket endpoints to "http://localhost", losing the socket path. Add ConfigAddr() that returns a ParseEndpoint-compatible string (e.g. "unix:///path/to/sock") and use it in runtime metadata so external tools can reconnect to the correct transport. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
The fallback inspection commands included the full built-in lockfile exclude list (20+ pathspec args), making prompts unreadable. Built-in excludes are already applied when reading the diff via GetDiffLimited; repeating them in agent instructions is redundant. Now only user- configured excludes (from exclude_patterns) appear in the fallback. Export FormatExcludeArgs for the prompt package to use directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exclude patterns from .roborev.toml are embedded in markdown inline code spans in Codex fallback commands. Patterns containing backticks could break out of the code span and inject prompt text. Silently drop patterns with control characters or backticks in FormatExcludeArgs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FormatExcludeArgs feeds into actual git diff pathspecs, so dropping patterns with backticks there would break real exclude behavior. Move the sanitization to the prompt rendering layer where pathspec args are embedded in markdown inline code spans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
The non-Codex oversized-diff fallbacks and one Codex example used raw fmt.Sprintf for git refs. Use renderShellCommand/shellQuote for consistency with the other Codex fallback commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
roborev: Combined Review (
|
|
" 1. Lockfile excludes in fallback: Same as review 10768. Already decided this is intentional — the agent |
…d regression tests - renderShellCommand re-applies stripInlineCodeBreakers so backticks and control characters in git refs cannot escape the enclosing Markdown inline code span. The sanitizer pre-dated this branch (pr #558) but was dropped when renderShellCommand was refactored. - review_comments template ends with an extra trailing blank line and drops its own leading blank, so comment-bearing entries in previous_reviews / in_range_reviews / previous_review_attempts keep a blank line separator before the next entry. Before this fix, the {{- end}} trim ate the only separator after the comment block and the following '--- Review ... ---' header butted against the last comment line. Regression tests added: - TestTrimOptionalSectionsPropagatesInRangeReviewsClear: locks in that trimOptionalSections (*view = ctx) propagates the cleared InRangeReviews slice back to the caller, which the field-by-field rebuild did not. - TestMeasureOptionalSectionsLossCountsInRangeReviews: ensures the fallback selector treats dropped InRangeReviews as a loss. - TestReviewOptionalContextTrimNextPreservesPriority: extended to cover the InRangeReviews priority slot. - TestRenderShellCommandStripsInlineCodeBreakers: table-driven test that backticks and control bytes never survive rendering. - TestGoldenPrompt_PreviousReviewsWithComments: golden-file snapshot proving the comment separator fix. - TestGoldenPrompt_RangeTruncatedCodexPreservesInRangeReviews: golden-file snapshot for the truncated-codex-range path proving InRangeReviews survive and the codex fallback is selected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary