Skip to content

docs: publish llms.txt and markdown renditions of the docs#3024

Merged
maxisbey merged 4 commits into
mainfrom
llms-txt
Jun 30, 2026
Merged

docs: publish llms.txt and markdown renditions of the docs#3024
maxisbey merged 4 commits into
mainfrom
llms-txt

Conversation

@maxisbey

Copy link
Copy Markdown
Contributor

Publishes an llms.txt version of the documentation, generated at build time by a small MkDocs hook (no new dependencies):

  • /llms.txt — a markdown index of the prose pages, grouped by nav section
  • a .md rendition of every prose page next to its HTML (e.g. tutorial/tools/index.md), with --8<-- snippet includes resolved inline and relative links rewritten to absolute URLs — this is what the llms.txt links point at
  • /llms-full.txt — every prose page concatenated for single-fetch consumption

The API reference pages are mkdocstrings stubs with no markdown source, so they are linked as rendered HTML from an ## Optional section instead of being embedded.

Motivation and Context

The spec site already serves https://modelcontextprotocol.io/llms.txt (a Mintlify platform feature); none of the SDK doc sites do. The output format matches the spec site and the Pydantic ecosystem's mkdocs-llmstxt output. We generate from the markdown sources directly instead of using the mkdocs-llmstxt plugin: our prose pages are plain markdown, so the plugin's HTML-to-markdown round-trip (and its dependency chain) isn't needed.

How Has This Been Tested?

  • mkdocs build --strict (what the docs CI job runs) exercises the hook on every build; artifacts inspected manually: 41 indexed pages, ~440 KB llms-full.txt, all 118 snippet includes resolved, links absolute.
  • Edge cases exercised by building with synthetic docs changes: indented includes inside admonitions/tabs, pymdownx section syntax (clean error naming the page), included files whose content contains a literal --8<--, titled links, and --dirty builds (rejected with an error before any artifact is written, since dirty builds skip unmodified pages and would silently truncate the output).

Breaking Changes

None. One behavioral note: pymdownx.snippets now sets base_path: !relative $config_dir — snippet resolution previously depended on the build's working directory (the extension default is the CWD, not the repo root as the old comment claimed). Builds run from the repo root behave identically.

Types of changes

  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

The deploy workflow builds v1.x at the site root, so root /llms.txt needs a [v1.x] backport of this hook; this PR covers /v2/. Pages have no frontmatter description: yet, so the index bullets are title-only; the hook picks descriptions up automatically if we add them later.

AI Disclaimer

maxisbey added 2 commits June 29, 2026 18:34
A build hook generates the llmstxt.org artifacts into the site: llms.txt
(an index of the prose pages grouped by nav section), a .md rendition of
each prose page with snippet includes resolved and relative links made
absolute, and llms-full.txt with every page concatenated for single-fetch
consumption. The API reference pages are mkdocstrings stubs with no
markdown source, so they are linked as rendered HTML from an Optional
section instead of being embedded.

Incremental (--dirty) builds are rejected by the hook: they skip
unmodified pages, which would silently truncate the generated artifacts.

Anchor pymdownx.snippets base_path to the config directory so the
extension and the hook resolve includes identically regardless of the
build's working directory (the extension's default base_path is the CWD,
not the repo root as the old comment claimed).
Mirrors the restrict_base_path default of pymdownx.snippets, so the hook
and the extension accept the same set of include paths.
@maxisbey maxisbey marked this pull request as ready for review June 29, 2026 18:44
Same shape as the v1.x backport of this hook, whose ruff config bans
global statements (PLW0603).

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found and verified against the latest diff

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="docs/hooks/llms_txt.py">

<violation number="1" location="docs/hooks/llms_txt.py:44">
P2: Snippet parsing is narrower than pymdownx snippets syntax, so valid include directives can fail the build. Expand the regex to accept documented inline marker variants.</violation>

<violation number="2" location="docs/hooks/llms_txt.py:104">
P2: The unresolved-include guard treats literal/escaped `--8<--` text as failures. Check for remaining real, unescaped include directives instead of raw substring counts.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread docs/hooks/llms_txt.py
return content

resolved, substitutions = _SNIPPET_LINE.subn(include, markdown)
if substitutions != sum("--8<--" in line for line in markdown.splitlines()):

@cubic-dev-ai cubic-dev-ai Bot Jun 29, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The unresolved-include guard treats literal/escaped --8<-- text as failures. Check for remaining real, unescaped include directives instead of raw substring counts.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/hooks/llms_txt.py, line 99:

<comment>The unresolved-include guard treats literal/escaped `--8<--` text as failures. Check for remaining real, unescaped include directives instead of raw substring counts.</comment>

<file context>
@@ -0,0 +1,172 @@
+        return content
+
+    resolved, substitutions = _SNIPPET_LINE.subn(include, markdown)
+    if substitutions != sum("--8<--" in line for line in markdown.splitlines()):
+        raise PluginError(f"llms_txt: unresolved snippet include in {page.file.src_uri}")
+
</file context>
Suggested change
if substitutions != sum("--8<--" in line for line in markdown.splitlines()):
if re.search(r'^[ \t]*(?!;)-{1,}8<-{1,}(?:$|[ \t]+)', resolved, flags=re.MULTILINE):
Fix with cubic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one's deliberate. The guard's job is to make any marker-bearing line the hook didn't consume fail the build rather than ship as junk in the renditions — including malformed directives that a "does it look like a directive" regex would miss. The cost is that a page mentioning --8<-- literally fails the build, but no page does today, and the error names the page, so whoever hits it first can adjust the hook with the actual case in front of them. A separate detection regex can drift from the consume regex, which is exactly the silent gap this avoids.

AI Disclaimer

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent comment was wrong here: the guard is intentionally checking for any marker-bearing line the hook didn’t consume, including malformed directives and literal --8<-- text. That keeps the detection in lockstep with the consume regex and avoids a silent drift gap.

Thanks for the feedback! I've saved this as a new learning to improve future reviews.

Comment thread docs/hooks/llms_txt.py
),
]

_SNIPPET_LINE = re.compile(r'^(?P<indent>[ \t]*)--8<-- "(?P<path>[^"\n]+)"$', flags=re.MULTILINE)

@cubic-dev-ai cubic-dev-ai Bot Jun 29, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Snippet parsing is narrower than pymdownx snippets syntax, so valid include directives can fail the build. Expand the regex to accept documented inline marker variants.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/hooks/llms_txt.py, line 43:

<comment>Snippet parsing is narrower than pymdownx snippets syntax, so valid include directives can fail the build. Expand the regex to accept documented inline marker variants.</comment>

<file context>
@@ -0,0 +1,172 @@
+    ),
+]
+
+_SNIPPET_LINE = re.compile(r'^(?P<indent>[ \t]*)--8<-- "(?P<path>[^"\n]+)"$', flags=re.MULTILINE)
+_MD_LINK = re.compile(r'(\]\()([^)\s]+\.md)(#[^)\s]*)?( +"[^"]*")?(\))')
+
</file context>
Fix with cubic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also deliberate — the docs only use whole-file includes of complete runnable examples from docs_src/ (the convention described in mkdocs.yml), so the hook supports exactly that pattern and fails the build with an error naming the page for anything else. Section syntax and the block form both die loudly rather than rendering wrong. If we ever adopt other variants, extending the hook then beats carrying an implementation of pymdownx's full semantics that nothing exercises.

AI Disclaimer

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That whole-file-only boundary is intentional, so the parent comment was too broad here. This hook should keep failing loud on section/block variants rather than growing full pymdownx snippet semantics.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs in the hook logic, but this adds a ~180-line MkDocs build hook plus a snippet base_path change, so it's worth a maintainer's look at the design and the published artifacts.

Extended reasoning...

Overview

This PR adds a new MkDocs hook (docs/hooks/llms_txt.py, ~180 lines) that generates llms.txt, llms-full.txt, and per-page markdown renditions at build time, registers it in mkdocs.yml, sets an explicit base_path: !relative $config_dir for pymdownx.snippets, and adds a short pointer in docs/index.md. No runtime/library code is touched.

Security risks

Low. The hook runs only at docs build time and writes into the site output directory. It reads snippet files from disk but mirrors the snippets extension's base-path restriction (rejecting paths that resolve outside the repo root), and snippet paths come from the repo's own markdown sources, not user input. Generated links are absolute URLs to the project's own docs site.

Level of scrutiny

This is docs/build tooling rather than production library code, so the blast radius is limited to the documentation site and CI docs builds (the strict-build guards mean failures surface as build errors rather than silently wrong output). Still, it is a new ~180-line piece of logic with regex-based snippet/link rewriting, nav traversal, and a behavioral tweak to snippet resolution — not a mechanical change — so it falls outside what I'd auto-approve and merits a human pass over the approach and output format.

Other factors

The bug hunting system found no bugs. There are two outstanding P2 comments from another review bot (about the strictness of the snippet-include regex and the unresolved-include guard) that the author hasn't responded to yet; they read as robustness nits for future doc edits rather than current correctness problems, since the PR description indicates the current docs build cleanly with this hook. There is no automated test coverage for the hook itself beyond the strict docs build exercising it.

Unresolvable relative .md links and pages that do not start with an H1
now fail the build instead of producing broken or malformed renditions.
Embedded .py snippets gain a leading comment naming the source file
under docs_src/, so the rendition still points at the file on disk.
@maxisbey maxisbey merged commit 67d7593 into main Jun 30, 2026
34 checks passed
@maxisbey maxisbey deleted the llms-txt branch June 30, 2026 10:30

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this adds a new ~180-line MkDocs build hook plus a snippets base_path change, and whether to publish llms.txt renditions at all is a maintainer-level decision — worth a human look.

Extended reasoning...

Overview

This PR adds a new MkDocs hook (docs/hooks/llms_txt.py, ~184 lines) that generates llms.txt, llms-full.txt, and per-page markdown renditions at docs build time, registers the hook in mkdocs.yml, sets an explicit base_path for pymdownx.snippets, and adds a pointer to the new artifacts in docs/index.md. It is documentation/build-tooling only — no runtime SDK code is touched.

Security risks

Low. The hook runs only during the docs build, not in shipped code. It reads files referenced by snippet includes, but explicitly resolves them against the repo root and rejects paths that escape it (mirroring the snippets extension's restrict_base_path), and it fails the build loudly on unresolved includes, unreadable files, or skipped pages. The base_path: !relative $config_dir change makes snippet resolution independent of the build CWD, which is a behavior-preserving hardening for builds run from the repo root.

Level of scrutiny

Moderate. Although nothing here is production-critical, this is not a mechanical change: it introduces a non-trivial new Python hook with its own parsing logic (snippet-include resolution, link rewriting, nav traversal), publishes new public artifacts on the docs site, and makes a deliberate scope decision (whole-file includes only, fail-loud otherwise) that the author defended in response to automated review comments. Whether the project wants to maintain this hook versus using the mkdocs-llmstxt plugin, and whether the published URLs/format are right, are judgment calls a maintainer should make.

Other factors

The automated bug-hunting pass found no bugs. The cubic-dev-ai review raised two P2 suggestions; the author responded explaining both behaviors are intentional fail-loud guards, and the bot accepted the rationale — there are no unresolved actionable comments. There is no test coverage for the hook itself; it is exercised only by mkdocs build --strict in docs CI. The PR description also notes a follow-up backport to v1.x will be needed for the root /llms.txt, which is a deployment consideration for maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants