What
Add boilerplate stripping to reduce token waste for agents. Most web pages are 80% navigation, footers, sidebars, and ads.
Approach
- Strip
<nav>, <footer>, <aside>, elements with role="navigation", role="banner", role="contentinfo"
- Prioritize
<main>, <article>, [role="main"] content when present
- Add
content_focus option to FetchRequest: "main" (strip boilerplate) vs "full" (current behavior, default)
Why
Huge token savings for agents. A typical news article page is 80%+ boilerplate that wastes LLM context.
Acceptance criteria
- New
content_focus field on FetchRequest
- When
"main": strip nav/footer/aside/boilerplate elements before conversion
- When
"full" or omitted: current behavior unchanged
- Tests covering pages with and without semantic HTML structure