Conversation
A build hook generates the llmstxt.org artifacts into the site: llms.txt (an index of the prose pages grouped by nav section), a .md rendition of each prose page with snippet includes resolved and relative links made absolute, and llms-full.txt with every page concatenated for single-fetch consumption. The API reference pages are mkdocstrings stubs with no markdown source, so they are linked as rendered HTML from an Optional section instead of being embedded. Incremental (--dirty) builds are rejected by the hook: they skip unmodified pages, which would silently truncate the generated artifacts. Anchor pymdownx.snippets base_path to the config directory so the extension and the hook resolve includes identically regardless of the build's working directory (the extension's default base_path is the CWD, not the repo root as the old comment claimed).
Mirrors the restrict_base_path default of pymdownx.snippets, so the hook and the extension accept the same set of include paths.
Same shape as the v1.x backport of this hook, whose ruff config bans global statements (PLW0603).
There was a problem hiding this comment.
2 issues found and verified against the latest diff
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="docs/hooks/llms_txt.py">
<violation number="1" location="docs/hooks/llms_txt.py:44">
P2: Snippet parsing is narrower than pymdownx snippets syntax, so valid include directives can fail the build. Expand the regex to accept documented inline marker variants.</violation>
<violation number="2" location="docs/hooks/llms_txt.py:104">
P2: The unresolved-include guard treats literal/escaped `--8<--` text as failures. Check for remaining real, unescaped include directives instead of raw substring counts.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
| return content | ||
|
|
||
| resolved, substitutions = _SNIPPET_LINE.subn(include, markdown) | ||
| if substitutions != sum("--8<--" in line for line in markdown.splitlines()): |
There was a problem hiding this comment.
P2: The unresolved-include guard treats literal/escaped --8<-- text as failures. Check for remaining real, unescaped include directives instead of raw substring counts.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/hooks/llms_txt.py, line 99:
<comment>The unresolved-include guard treats literal/escaped `--8<--` text as failures. Check for remaining real, unescaped include directives instead of raw substring counts.</comment>
<file context>
@@ -0,0 +1,172 @@
+ return content
+
+ resolved, substitutions = _SNIPPET_LINE.subn(include, markdown)
+ if substitutions != sum("--8<--" in line for line in markdown.splitlines()):
+ raise PluginError(f"llms_txt: unresolved snippet include in {page.file.src_uri}")
+
</file context>
| if substitutions != sum("--8<--" in line for line in markdown.splitlines()): | |
| if re.search(r'^[ \t]*(?!;)-{1,}8<-{1,}(?:$|[ \t]+)', resolved, flags=re.MULTILINE): |
There was a problem hiding this comment.
This one's deliberate. The guard's job is to make any marker-bearing line the hook didn't consume fail the build rather than ship as junk in the renditions — including malformed directives that a "does it look like a directive" regex would miss. The cost is that a page mentioning --8<-- literally fails the build, but no page does today, and the error names the page, so whoever hits it first can adjust the hook with the actual case in front of them. A separate detection regex can drift from the consume regex, which is exactly the silent gap this avoids.
There was a problem hiding this comment.
The parent comment was wrong here: the guard is intentionally checking for any marker-bearing line the hook didn’t consume, including malformed directives and literal --8<-- text. That keeps the detection in lockstep with the consume regex and avoids a silent drift gap.
Thanks for the feedback! I've saved this as a new learning to improve future reviews.
| ), | ||
| ] | ||
|
|
||
| _SNIPPET_LINE = re.compile(r'^(?P<indent>[ \t]*)--8<-- "(?P<path>[^"\n]+)"$', flags=re.MULTILINE) |
There was a problem hiding this comment.
P2: Snippet parsing is narrower than pymdownx snippets syntax, so valid include directives can fail the build. Expand the regex to accept documented inline marker variants.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/hooks/llms_txt.py, line 43:
<comment>Snippet parsing is narrower than pymdownx snippets syntax, so valid include directives can fail the build. Expand the regex to accept documented inline marker variants.</comment>
<file context>
@@ -0,0 +1,172 @@
+ ),
+]
+
+_SNIPPET_LINE = re.compile(r'^(?P<indent>[ \t]*)--8<-- "(?P<path>[^"\n]+)"$', flags=re.MULTILINE)
+_MD_LINK = re.compile(r'(\]\()([^)\s]+\.md)(#[^)\s]*)?( +"[^"]*")?(\))')
+
</file context>
There was a problem hiding this comment.
Also deliberate — the docs only use whole-file includes of complete runnable examples from docs_src/ (the convention described in mkdocs.yml), so the hook supports exactly that pattern and fails the build with an error naming the page for anything else. Section syntax and the block form both die loudly rather than rendering wrong. If we ever adopt other variants, extending the hook then beats carrying an implementation of pymdownx's full semantics that nothing exercises.
There was a problem hiding this comment.
That whole-file-only boundary is intentional, so the parent comment was too broad here. This hook should keep failing loud on section/block variants rather than growing full pymdownx snippet semantics.
There was a problem hiding this comment.
I didn't find any bugs in the hook logic, but this adds a ~180-line MkDocs build hook plus a snippet base_path change, so it's worth a maintainer's look at the design and the published artifacts.
Extended reasoning...
Overview
This PR adds a new MkDocs hook (docs/hooks/llms_txt.py, ~180 lines) that generates llms.txt, llms-full.txt, and per-page markdown renditions at build time, registers it in mkdocs.yml, sets an explicit base_path: !relative $config_dir for pymdownx.snippets, and adds a short pointer in docs/index.md. No runtime/library code is touched.
Security risks
Low. The hook runs only at docs build time and writes into the site output directory. It reads snippet files from disk but mirrors the snippets extension's base-path restriction (rejecting paths that resolve outside the repo root), and snippet paths come from the repo's own markdown sources, not user input. Generated links are absolute URLs to the project's own docs site.
Level of scrutiny
This is docs/build tooling rather than production library code, so the blast radius is limited to the documentation site and CI docs builds (the strict-build guards mean failures surface as build errors rather than silently wrong output). Still, it is a new ~180-line piece of logic with regex-based snippet/link rewriting, nav traversal, and a behavioral tweak to snippet resolution — not a mechanical change — so it falls outside what I'd auto-approve and merits a human pass over the approach and output format.
Other factors
The bug hunting system found no bugs. There are two outstanding P2 comments from another review bot (about the strictness of the snippet-include regex and the unresolved-include guard) that the author hasn't responded to yet; they read as robustness nits for future doc edits rather than current correctness problems, since the PR description indicates the current docs build cleanly with this hook. There is no automated test coverage for the hook itself beyond the strict docs build exercising it.
Unresolvable relative .md links and pages that do not start with an H1 now fail the build instead of producing broken or malformed renditions. Embedded .py snippets gain a leading comment naming the source file under docs_src/, so the rendition still points at the file on disk.
There was a problem hiding this comment.
I didn't find any bugs, but this adds a new ~180-line MkDocs build hook plus a snippets base_path change, and whether to publish llms.txt renditions at all is a maintainer-level decision — worth a human look.
Extended reasoning...
Overview
This PR adds a new MkDocs hook (docs/hooks/llms_txt.py, ~184 lines) that generates llms.txt, llms-full.txt, and per-page markdown renditions at docs build time, registers the hook in mkdocs.yml, sets an explicit base_path for pymdownx.snippets, and adds a pointer to the new artifacts in docs/index.md. It is documentation/build-tooling only — no runtime SDK code is touched.
Security risks
Low. The hook runs only during the docs build, not in shipped code. It reads files referenced by snippet includes, but explicitly resolves them against the repo root and rejects paths that escape it (mirroring the snippets extension's restrict_base_path), and it fails the build loudly on unresolved includes, unreadable files, or skipped pages. The base_path: !relative $config_dir change makes snippet resolution independent of the build CWD, which is a behavior-preserving hardening for builds run from the repo root.
Level of scrutiny
Moderate. Although nothing here is production-critical, this is not a mechanical change: it introduces a non-trivial new Python hook with its own parsing logic (snippet-include resolution, link rewriting, nav traversal), publishes new public artifacts on the docs site, and makes a deliberate scope decision (whole-file includes only, fail-loud otherwise) that the author defended in response to automated review comments. Whether the project wants to maintain this hook versus using the mkdocs-llmstxt plugin, and whether the published URLs/format are right, are judgment calls a maintainer should make.
Other factors
The automated bug-hunting pass found no bugs. The cubic-dev-ai review raised two P2 suggestions; the author responded explaining both behaviors are intentional fail-loud guards, and the bot accepted the rationale — there are no unresolved actionable comments. There is no test coverage for the hook itself; it is exercised only by mkdocs build --strict in docs CI. The PR description also notes a follow-up backport to v1.x will be needed for the root /llms.txt, which is a deployment consideration for maintainers.
Publishes an llms.txt version of the documentation, generated at build time by a small MkDocs hook (no new dependencies):
/llms.txt— a markdown index of the prose pages, grouped by nav section.mdrendition of every prose page next to its HTML (e.g.tutorial/tools/index.md), with--8<--snippet includes resolved inline and relative links rewritten to absolute URLs — this is what the llms.txt links point at/llms-full.txt— every prose page concatenated for single-fetch consumptionThe API reference pages are mkdocstrings stubs with no markdown source, so they are linked as rendered HTML from an
## Optionalsection instead of being embedded.Motivation and Context
The spec site already serves https://modelcontextprotocol.io/llms.txt (a Mintlify platform feature); none of the SDK doc sites do. The output format matches the spec site and the Pydantic ecosystem's mkdocs-llmstxt output. We generate from the markdown sources directly instead of using the mkdocs-llmstxt plugin: our prose pages are plain markdown, so the plugin's HTML-to-markdown round-trip (and its dependency chain) isn't needed.
How Has This Been Tested?
mkdocs build --strict(what the docs CI job runs) exercises the hook on every build; artifacts inspected manually: 41 indexed pages, ~440 KB llms-full.txt, all 118 snippet includes resolved, links absolute.--8<--, titled links, and--dirtybuilds (rejected with an error before any artifact is written, since dirty builds skip unmodified pages and would silently truncate the output).Breaking Changes
None. One behavioral note:
pymdownx.snippetsnow setsbase_path: !relative $config_dir— snippet resolution previously depended on the build's working directory (the extension default is the CWD, not the repo root as the old comment claimed). Builds run from the repo root behave identically.Types of changes
Checklist
Additional context
The deploy workflow builds v1.x at the site root, so root
/llms.txtneeds a[v1.x]backport of this hook; this PR covers/v2/. Pages have no frontmatterdescription:yet, so the index bullets are title-only; the hook picks descriptions up automatically if we add them later.AI Disclaimer