Skip to content

feat(fetch): add content quality signals (word_count, redirect_chain, is_paywall)#82

Merged
chaliy merged 1 commit intomainfrom
claude/issue-76-quality-signals
Mar 27, 2026
Merged

feat(fetch): add content quality signals (word_count, redirect_chain, is_paywall)#82
chaliy merged 1 commit intomainfrom
claude/issue-76-quality-signals

Conversation

@chaliy
Copy link
Copy Markdown
Contributor

@chaliy chaliy commented Mar 27, 2026

What

Add content quality signals to help agents decide whether fetched content is worth processing.

Why

Agents waste tokens processing low-quality or paywalled content. These signals let agents make informed decisions before committing to full processing.

How

  • word_count: Counted from final text content (after conversion)
  • redirect_chain: Tracks all intermediate URLs during redirect following. Empty if no redirects occurred.
  • is_paywall: Heuristic detection against 13 common paywall indicators (e.g., "subscribe to read", "paywall", "premium content", "unlock this article"). Only set to true when detected; omitted otherwise.
  • Paywall detection runs on raw HTML before conversion (catches class names, hidden text)
  • Redirect chain integrated into send_request_following_redirects

Risk

  • Low — additive fields, backward-compatible
  • is_paywall is a soft signal (false positives possible for pages that discuss paywalls)

Checklist

  • Unit tests (word count, paywall detection, redirect chain tracking, direct response)
  • Clippy clean
  • Docs build clean

Closes #76

… is_paywall)

Add word_count, redirect_chain, and is_paywall fields to FetchResponse.
Word count computed from final content. Redirect chain tracks all
intermediate URLs during redirect following. Paywall detection uses
heuristic matching against common paywall indicators in raw HTML.

Closes #76
@chaliy chaliy force-pushed the claude/issue-76-quality-signals branch from 0d13405 to 4c90252 Compare March 27, 2026 03:27
@chaliy chaliy merged commit 9e4ea8c into main Mar 27, 2026
10 checks passed
@chaliy chaliy deleted the claude/issue-76-quality-signals branch March 27, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: content quality signals (word_count, redirect_chain, is_paywall)

1 participant