Skip to content

fix(commands): move noise stripping into jq pipelines for fetch-pr-feedback#55

Merged
anderskev merged 2 commits intomainfrom
fix/fetch-pr-feedback-noise-stripping
Feb 7, 2026
Merged

fix(commands): move noise stripping into jq pipelines for fetch-pr-feedback#55
anderskev merged 2 commits intomainfrom
fix/fetch-pr-feedback-noise-stripping

Conversation

@anderskev
Copy link
Member

Summary

  • Move noise stripping from LLM instructions (Step 4) into clean_body jq functions applied directly in both comment-fetching pipelines (Step 3)
  • Strip <details> blocks, HTML comments, and bot footer boilerplate at the jq level before data reaches the LLM
  • Add 4000 char per-comment safety net with [comment truncated] marker

Context

Real data from a CodeRabbit-reviewed PR showed 125K chars total with only ~4.5K useful (3.6% signal). A single walkthrough comment was 70K chars. The previous approach relied on LLM instructions to strip noise, but the data was already too large before the LLM could process it.

Test plan

  • Run /beagle-core:fetch-pr-feedback against a PR with CodeRabbit reviews
  • Verify output is reasonable size (~5-10K chars vs 125K before)
  • Confirm no actionable review content is lost
  • Check [comment truncated] markers appear only on genuinely oversized comments (if any)

🤖 Generated with Claude Code

…edback

Comment bodies from bot reviewers (e.g. CodeRabbit) contain massive amounts
of noise (<details> blocks, HTML comments, bot footers) that inflate feedback
files to ~125K chars with only ~4.5K useful. The noise stripping rules were
LLM instructions in Step 4, meaning the data was already too large before
stripping could happen.

Move stripping into a clean_body jq function applied directly in both the
issue comments and review comments pipelines. Add a 4000 char per-comment
safety net with [comment truncated] marker. Update Step 4 to reference the
jq-level stripping instead of listing LLM rules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 7, 2026

Walkthrough

A new clean_body jq function was added to sanitize comment text by removing <details> blocks, HTML comments, and trailing boilerplate, then truncating to 4000 characters and appending a [comment truncated] marker when needed. The transformation for both issue comments and review comments now maps body: (.body | clean_body) instead of using raw .body. Inline noise-stripping steps were removed from the narrative, and documentation text was updated to reflect that noise stripping and truncation are handled by clean_body.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: moving noise stripping logic from LLM instructions into jq pipelines for the fetch-pr-feedback command.
Description check ✅ Passed The description is directly related to the changeset, providing context about why the change was made, what was changed, and a test plan for verification.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


No actionable comments were generated in the recent review. 🎉


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@plugins/beagle-core/commands/fetch-pr-feedback.md`:
- Around line 55-61: The truncation marker logic in clean_body never triggers
because you slice with .[:4000] before checking length; change the pipeline to
preserve the original string length (e.g., save the cleaned string to a
temporary symbol/variable like original = .) then emit original[:4000] + (if
original|length > 4000 then "\n\n[comment truncated]" else "" end); update the
clean_body pipeline to reference that temporary before slicing so the condition
can detect >4000 and append the marker.
- Line 56: The current gsub call uses a greedy pattern
"(?s)<details>.*</details>" which will remove content across multiple <details>
blocks; update the regex used in the gsub invocation to a non-greedy match like
"(?s)<details>.*?</details>" so each <details>...</details> pair is removed
independently (leave the gsub call and surrounding logic intact, only change the
regex).

@anderskev anderskev self-assigned this Feb 7, 2026
@anderskev anderskev added the bug Something isn't working label Feb 7, 2026
The <details> regex used greedy .* which would consume content between
multiple blocks. The truncation check ran after slicing, so the marker
was never appended. Both clean_body definitions (issue + review comments)
are fixed.

Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@anderskev anderskev merged commit 79cee7f into main Feb 7, 2026
1 check passed
@anderskev anderskev deleted the fix/fetch-pr-feedback-noise-stripping branch February 7, 2026 20:00
@anderskev anderskev mentioned this pull request Feb 7, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant