Skip to content

feat(fetchers): add DocsSiteFetcher with llms.txt support#63

Merged
chaliy merged 1 commit intomainfrom
fix/issue-52-docs-site-fetcher
Mar 26, 2026
Merged

feat(fetchers): add DocsSiteFetcher with llms.txt support#63
chaliy merged 1 commit intomainfrom
fix/issue-52-docs-site-fetcher

Conversation

@chaliy
Copy link
Copy Markdown
Contributor

@chaliy chaliy commented Mar 26, 2026

What

Adds a DocsSiteFetcher that detects documentation sites and the llms.txt standard, returning clean content optimized for LLM consumption.

Closes #52

Why

Agents reading documentation get noisy HTML with navbars, search boxes, and UI chrome. The llms.txt standard provides pre-optimized content for LLMs, and docs sites benefit from specialized handling.

How

  • Matches known docs site patterns (ReadTheDocs, docs.rs, GitBook, netlify/vercel, docs./wiki./developer.* prefixes) and explicit llms.txt/llms-full.txt URLs
  • For matched sites: probes for llms-full.txt then llms.txt at the origin
  • If found: returns llms.txt content with format: "documentation"
  • If not found: fetches the page directly with HTML-to-markdown conversion
  • Direct llms.txt URL requests are handled natively
  • Registered before DefaultFetcher; non-docs URLs fall through to DefaultFetcher

Risk

  • Low
  • Only adds a new fetcher; DefaultFetcher still handles all non-docs URLs

Checklist

  • Unit tests passed
  • Clippy clean (-D warnings)
  • Docs build without warnings
  • Formatting applied

Closes #52 — Adds a fetcher that detects documentation sites and the
llms.txt standard (llmstxt.org), returning clean content optimized for
LLM consumption.

Matches known docs sites (ReadTheDocs, docs.rs, GitBook, Docusaurus via
netlify/vercel, docs.*/wiki.*/developer.* prefixes) and explicit
llms.txt/llms-full.txt URLs. For matched sites, probes for llms-full.txt
then llms.txt at the origin before returning content. Falls through to
DefaultFetcher for non-docs URLs.
@chaliy chaliy merged commit 0276684 into main Mar 26, 2026
10 checks passed
@chaliy chaliy deleted the fix/issue-52-docs-site-fetcher branch March 26, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(fetchers): DocsSiteFetcher — llms.txt and documentation framework support

1 participant