Skip to content

Add configurable PR comment upsert and fix UTF-8 truncation#411

Merged
wesm merged 12 commits intoroborev-dev:mainfrom
dannysteenman:fix/upsert-pr-comments
Mar 3, 2026
Merged

Add configurable PR comment upsert and fix UTF-8 truncation#411
wesm merged 12 commits intoroborev-dev:mainfrom
dannysteenman:fix/upsert-pr-comments

Conversation

@dannysteenman
Copy link
Contributor

@dannysteenman dannysteenman commented Mar 2, 2026

Summary

CI review comments are always created as new comments. This adds an opt-in ci.upsert_comments setting that finds the existing roborev marker comment and patches it in-place instead.

  • Add ci.upsert_comments config (global and per-repo, default false)
  • Add FindExistingComment, UpsertPRComment, and CreatePRComment in internal/github
  • Fall back to creating a new comment when PATCH returns 403/404 (token mismatch)
  • Target the newest marker comment to reduce fallback churn
  • Fix UTF-8 truncation: replace utf8.ValidString full-string scan with boundary-only TrimPartialRune
  • Propagate caller context through postCIComment for Ctrl+C cancellation

Config

# Global (~/.roborev/config.toml)
[ci]
upsert_comments = true

# Per-repo (.roborev.toml) — overrides global
[ci]
upsert_comments = false

🤖 Generated with Claude Code

@roborev-ci
Copy link

roborev-ci bot commented Mar 2, 2026

roborev: Combined Review (07d1117)

Verdict: Refactors PR comment logic to upsert existing comments via a hidden marker, but introduces issues with comment targeting, security
, and GitHub Enterprise compatibility.

🔴 High

Incorrect jq Filter Matches Quoted Replies

  • File: internal/github/comment.go:28
  • Description: Using contains(%q) in the jq filter will match comments where users have quoted the
    bot's review in a reply (which typically adds > before the text). This will cause the bot to inadvertently overwrite a human user's comment.
  • Suggested Fix: Change .body | contains(%q) to .body | startswith(%q) since the marker is always reliably prepended
    to the very beginning of the bot's comment.

🟡 Medium

Security: Vulnerable Comment Selection via Static Marker

  • File: internal/github/comment.go:26, internal/github/comment.go:84
  • Description: Existing comment selection relies
    solely on a static marker (<!-- roborev-pr-comment -->). In shared repos, any external user who can comment can post that marker, causing UpsertPRComment to target an attacker-controlled/non-bot comment. This can lead to attempted overwrites of the wrong comment or persistent failures to post
    /update roborev output (effectively suppressing CI review visibility).
  • Suggested Fix: Match on both the marker and a trusted author identity (bot/app login or user ID). For example, fetch the authenticated actor and filter comments by comment.user.login == botLogin plus the marker. Consider
    adding fallback behavior to create a new bot comment instead of hard-failing if a patch fails.

GitHub Enterprise / Host-Qualified Repo Regression Risk

  • File: internal/github/comment.go:34, internal/github/comment.go:97

Description:** FindExistingComment/patchComment build API paths with repos/%s/... using ghRepo directly. If ghRepo is host-qualified (e.g. ghe.example.com/org/repo, which gh --repo supports), this becomes an
invalid REST path. createComment still uses --repo, making behavior inconsistent across create vs. find/patch.

  • Suggested Fix: Normalize ghRepo to owner/repo for REST path usage and pass the host via --hostname (or equivalent), or consistently use gh repo
    /host flags.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

dannysteenman and others added 6 commits March 3, 2026 09:13
Embed an invisible HTML marker (<!-- roborev-pr-comment -->) in every
roborev comment. On each post, search for an existing comment with the
marker using `gh api`. If found, update it via PATCH; otherwise create
a new one.

Extracts shared logic into internal/github/comment.go and updates both
the CLI path (cmd/roborev/ci.go) and the daemon path
(internal/daemon/ci_poller.go) to delegate to UpsertPRComment.
- Use encoding/json.Marshal for PATCH payload instead of strconv.Quote
  to produce strictly valid JSON for all inputs (fixes \v/\a escapes).
- Parse only the first non-empty line from gh api --paginate output to
  handle multi-page results that emit multiple comment IDs.
- Replace shallow marker/truncation tests with ones that exercise
  UpsertPRComment and capture subprocess stdin.
- Add tests for multi-line paginated output and PATCH payload validity.
- Add control characters (\v, \a) to PATCH payload test to directly
  protect against the strconv.Quote regression.
- Assert JSON round-trip preserves the original body content.
- Verify truncation boundary is exactly review.MaxCommentLen.
- Reserve suffix length before slicing so truncated body stays within
  MaxCommentLen (affects both github.UpsertPRComment and
  review.formatSingleResult)
- Back up to valid UTF-8 boundary when slicing to avoid splitting
  multibyte characters
- Fall back to creating a new comment when PATCH fails (e.g. comment
  owned by a different actor returns 403)
- Add tests for create failure, patch-failure fallback, and UTF-8 safe
  truncation in both packages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…daries

- PATCH fallback now checks for HTTP 403/404 in the error message;
  other failures (network, rate-limit) propagate as errors to avoid
  silent duplicate comments
- Fix UTF-8 truncation tests in both packages to place the multibyte
  character straddling the actual cut boundary (maxBody), not an
  arbitrary offset

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wesm
Copy link
Collaborator

wesm commented Mar 3, 2026

Looking at this. I actually want to see all the review comments from my CI bot rather than updating them. I'm going to take a pass on this and make the upserting behavior configurable rather than the default

wesm and others added 6 commits March 3, 2026 09:31
PR comments now default to creating a new comment each run.
Set ci.upsert_comments = true to restore the previous behavior
of finding and patching an existing marker comment.

- Add UpsertComments bool to CIConfig
- Extract prepareBody helper, add exported CreatePRComment
- Branch on config in postCIComment (CLI) and postPRComment (daemon)
- Add tests for CreatePRComment (marker, truncation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address review findings:
- Add UpsertComments *bool to RepoCIConfig so per-repo .roborev.toml
  can override the global setting (fixes CI-only pipelines with no
  global config)
- Resolve with precedence: repo config > global config > false
- Add resolveUpsertComments to daemon with repo config lookup
- Add resolveCIUpsertComments to CLI with same precedence
- Add 4 daemon tests covering default, global, and repo override paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cover all precedence paths: nil/nil default, global true/false,
repo overriding global in both directions, and repo nil falling
through to global.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FindExistingComment now selects the newest (last) marker comment
instead of the oldest, so upsert targets the comment most likely
writable by the current token.

Replace utf8.ValidString loop (O(N) full-string scan that could
strip the entire body if interior invalid bytes exist) with
TrimPartialRune, which only inspects the trailing rune boundary.
Shared across github/comment.go and review/synthesize.go.

- Add TrimPartialRune to review package with table-driven tests
- Add mixed-ownership multi-ID test for 403 fallback
- Update multi-line test to assert newest ID selected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a string ended with orphan continuation bytes (e.g., 0x80)
preceded by a valid ASCII byte, the backward scan would
incorrectly trim the valid byte too. Now checks whether the
rune-start byte found during the backward walk decodes to a
valid rune — if so, keeps it and only removes the orphan tail.

Add test cases for single and double orphan continuation bytes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
postCIComment was creating context.Background() instead of
accepting the caller's context, preventing Ctrl+C from canceling
in-flight GitHub API requests during roborev ci review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wesm wesm force-pushed the fix/upsert-pr-comments branch from 07d1117 to 5ec108d Compare March 3, 2026 17:07
@wesm wesm changed the title fix: upsert PR comments instead of creating duplicates Make PR comment upsert opt-in, add CreatePRComment, fix UTF-8 truncation Mar 3, 2026
@wesm wesm changed the title Make PR comment upsert opt-in, add CreatePRComment, fix UTF-8 truncation Add configurable PR comment upsert and fix UTF-8 truncation Mar 3, 2026
@roborev-ci
Copy link

roborev-ci bot commented Mar 3, 2026

roborev: Combined Review (5ec108d)

Verdict: All agents agree the code is clean and no issues were found.

The changes successfully implement opt-in, marker-based PR comment upsert functionality, safe UTF-8 truncation, and relevant configuration/fallback logic without introducing any identified functional or security regressions
.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm wesm merged commit 278df12 into roborev-dev:main Mar 3, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants