Conversation
…edback Comment bodies from bot reviewers (e.g. CodeRabbit) contain massive amounts of noise (<details> blocks, HTML comments, bot footers) that inflate feedback files to ~125K chars with only ~4.5K useful. The noise stripping rules were LLM instructions in Step 4, meaning the data was already too large before stripping could happen. Move stripping into a clean_body jq function applied directly in both the issue comments and review comments pipelines. Add a 4000 char per-comment safety net with [comment truncated] marker. Update Step 4 to reference the jq-level stripping instead of listing LLM rules. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WalkthroughA new 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. No actionable comments were generated in the recent review. 🎉 Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@plugins/beagle-core/commands/fetch-pr-feedback.md`:
- Around line 55-61: The truncation marker logic in clean_body never triggers
because you slice with .[:4000] before checking length; change the pipeline to
preserve the original string length (e.g., save the cleaned string to a
temporary symbol/variable like original = .) then emit original[:4000] + (if
original|length > 4000 then "\n\n[comment truncated]" else "" end); update the
clean_body pipeline to reference that temporary before slicing so the condition
can detect >4000 and append the marker.
- Line 56: The current gsub call uses a greedy pattern
"(?s)<details>.*</details>" which will remove content across multiple <details>
blocks; update the regex used in the gsub invocation to a non-greedy match like
"(?s)<details>.*?</details>" so each <details>...</details> pair is removed
independently (leave the gsub call and surrounding logic intact, only change the
regex).
The <details> regex used greedy .* which would consume content between multiple blocks. The truncation check ran after slicing, so the marker was never appended. Both clean_body definitions (issue + review comments) are fixed. Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
clean_bodyjq functions applied directly in both comment-fetching pipelines (Step 3)<details>blocks, HTML comments, and bot footer boilerplate at the jq level before data reaches the LLM[comment truncated]markerContext
Real data from a CodeRabbit-reviewed PR showed 125K chars total with only ~4.5K useful (3.6% signal). A single walkthrough comment was 70K chars. The previous approach relied on LLM instructions to strip noise, but the data was already too large before the LLM could process it.
Test plan
/beagle-core:fetch-pr-feedbackagainst a PR with CodeRabbit reviews[comment truncated]markers appear only on genuinely oversized comments (if any)🤖 Generated with Claude Code