Add sphinx-gp-llms: LLM-friendly documentation outputs#47
Merged
Conversation
why: LLM agents need machine-readable entry points to docs sites; llms.txt, llms-full.txt, docs.json, and per-page .md twins are the emerging conventions (llmstxt.org, Cloudflare, Mintlify, Lakebed). what: - New workspace package sphinx-gp-llms with Sphinx 8.1+ idioms - llms.txt: structured Markdown index (H1/blockquote/H2 sections) following the llmstxt.org spec (Jeremy Howard, Answer.AI) - llms-full.txt: concatenated full-content Markdown of all pages (community convention, Anthropic/Cloudflare/Mintlify/GitBook) - docs.json: agent-oriented manifest with agentEntrypoints and per-page headings (Lakebed/Ping convention) - Per-page .md twins: source file copy alongside each HTML page (Cloudflare "Markdown for Agents", Mintlify, Stripe, Vercel) - Hooks into build-finished (file generation) and html-page-context (footer link injection) - Config: llms_generate_txt, llms_generate_full, llms_generate_json, llms_generate_md_twins, llms_excludes, llms_description_length - Silent no-op when site_url is unset (same pattern as sitemap)
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #47 +/- ##
==========================================
+ Coverage 91.82% 91.87% +0.05%
==========================================
Files 220 233 +13
Lines 17776 18186 +410
==========================================
+ Hits 16322 16709 +387
- Misses 1454 1477 +23 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
why: make the new extension available to all consumer projects and ensure CI smoke tests cover it. what: - Add sphinx-gp-llms to uv workspace sources and dev deps - Add to DEFAULT_EXTENSIONS (after sphinx_gp_sitemap) - Add to ruff known-first-party and pytest testpaths - Add smoke_sphinx_gp_llms runner to CI package_tools
why: the footer's Machine-readable line should link every format the new sphinx-gp-llms extension generates. what: - Add Markdown, docs.json, llms.txt, llms-full.txt links - Links appear conditionally when sphinx-gp-llms injects context variables via html-page-context hook - Existing raw source link preserved
why: workspace infrastructure tests require a docs page, redirect entry, and cluster classification for every package. what: - Add docs/packages/sphinx-gp-llms/index.md landing page - Add extensions/sphinx-gp-llms redirect in redirects.txt - Add build-seo cluster in package_reference.py - Add to publishable packages set in test_package_reference.py
why: verify llms.txt format, llms-full.txt content, docs.json schema, and .md twin file generation. what: - conftest.py with module-scoped build fixture and shared scenario - test_importable.py: smoke test for setup() callable - test_llms_txt.py: H1, blockquote, sections, link format - test_llms_full_txt.py: page content, separators, source URLs - test_docs_json.py: manifest schema, page fields, headings - test_md_twins.py: file existence and content verification - All use NamedTuple fixtures with test_id, strict typing
Member
Author
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. Notable observations below threshold (scored 75/100 — edge cases, not blocking):
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
why: disabling an output (e.g. llms_generate_json = False) still rendered its footer link, producing a 404. what: - Check llms_generate_md_twins, llms_generate_txt, llms_generate_full, llms_generate_json before injecting each context variable in _inject_llms_context
why: the outer template guard required theme_source_repository, hiding all LLM footer links for projects that set site_url but not source_repository. what: - Add elif branch for LLM links when source_repository is unset - LLM links now render independently of source_repository - Source path and raw-source link still require source_repository
why: pages without a section heading (stubs, pure-directive pages) may not have an entry in env.titles, causing a KeyError crash during build-finished. what: - Use env.titles.get(docname) in _llms_txt.py and _docs_json.py, skip titleless pages - Use env.titles.get(docname) in _llms_full_txt.py, fall back to docname as title since content is still useful
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sphinx-gp-llmsthat generates four LLM-friendly output formats during the standard HTML Sphinx buildllms.txt— structured Markdown index following the llmstxt.org spec (Jeremy Howard, Answer.AI), with H1 project name, blockquote summary, and H2 sections grouped by toctree captionsllms-full.txt— concatenated full-content Markdown of all documentation pages, following the community convention adopted by Anthropic, Cloudflare, Mintlify, and GitBookdocs.json— agent-oriented manifest withagentEntrypoints, per-pagemarkdownUrl, and heading outlines, following the Lakebed/Ping convention.mdtwins — source file copies alongside HTML output, following the Cloudflare "Markdown for Agents" conventionsource_repositoryenv.titlesaccess to skip titleless pages instead of crashingDesign decisions
build-finishedto generate files alongside the HTML build, matching thesphinx-gp-sitemappattern. No separate build invocation needed — everysphinx-buildautomatically produces LLM outputs.{toctree}:caption:option maps directly to the H2 sections in the llms.txt format, so existing toctree structure translates without new configuration.site_urlis unset: Projects withoutdocs_urlconfigured skip LLM output at INFO level, matching sitemap behavior. No broken builds.html-page-contexthook injects link URLs only when the extension is loaded and the correspondingllms_generate_*flag isTrue. The footer also renders LLM links independently ofsource_repository, falling back to anelifbranch when the source-path section is unavailable.env.titles.get(docname)and skip pages without titles (_llms_txt.py,_docs_json.py) or fall back to the docname (_llms_full_txt.py).Verification
Verify all output files are generated:
$ ls docs/_build/html/llms.txt docs/_build/html/llms-full.txt docs/_build/html/docs.jsonVerify per-page .md twins exist:
$ ls docs/_build/html/index.md docs/_build/html/configuration.mdVerify footer links render:
$ grep -c "llms.txt" docs/_build/html/configuration/index.htmlTest plan
uv run ruff check .— no lint issuesuv run mypy— no type errorsuv run pytest tests/ packages/ --reruns 0— all tests passjust build-docs— docs build successfully with all LLM outputssource_repositoryconfigured