feat: generate llms-full.txt for AI knowledge-base ingestion#883
Conversation
Concatenates every docs page (325) into llms-full.txt (~1.7 MB) and every blog post (55) into llms-full-blog.txt (~0.5 MB). Each page is prefixed with a source URL delimiter so agents can cite back. Lets Claude Projects and Cursor users one-shot-ingest the Envio knowledge base without mid-conversation browsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThe plugin-generate-llms.js file is enhanced to tag collected items with source metadata ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
plugins/plugin-generate-llms.js (2)
108-114:⚠️ Potential issue | 🟡 MinorRead full-content inputs from absolute paths.
Docs items store a cwd-relative
filePath, butrenderLLMSFull()now reopens it directly. Capture the already-resolvedfullPathduring collection so this build step does not depend on the process cwd.🛠️ Proposed fix
collectedDocs.push({ filePath: path.join(config.path, file), + contentPath: fullPath, title, description, pageUrl, source: "docs", });collectedDocs.push({ filePath: fullPath, + contentPath: fullPath, title, description, pageUrl, source: "blog", });const parts = [header.trim(), ""]; for (const item of items) { - const raw = fs.readFileSync(item.filePath, "utf-8"); + const raw = fs.readFileSync(item.contentPath, "utf-8"); const body = matter(raw).content.trimStart();Also applies to: 174-180, 234-238
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/plugin-generate-llms.js` around lines 108 - 114, The collectedDocs entries currently store a cwd-relative filePath which breaks renderLLMSFull when it reopens files; update the collection code around where collectedDocs is populated (the block adding objects with filePath, title, description, pageUrl, source) to include a resolved fullPath (e.g., resolve with path.join(config.path, file) or path.resolve) and store it as fullPath on each item; then ensure renderLLMSFull uses the new fullPath property when reopening files. Apply the same change to the other collection sites noted (the similar blocks at lines referenced around 174-180 and 234-238) so all doc entries include an absolute fullPath.
190-215:⚠️ Potential issue | 🟡 MinorKeep legacy
llms.txtand markdown copies docs-only.
collectedDocsnow contains docs and blog entries.orderDocs()still scans all entries, andwriteMarkdownCopies(collectedDocs)will also emit blog markdown copies. Filter legacy outputs tosource === "docs"so the existing navigational index remains unchanged.🛠️ Proposed fix
function orderDocs(includeOrder) { if (!includeOrder || includeOrder.length === 0) { return []; } + const docsOnly = collectedDocs.filter( + (doc) => doc.source === "docs" + ); const matched = new Set(); const ordered = []; const duplicates = new Set(); for (const pattern of includeOrder) { - for (const doc of collectedDocs) { + for (const doc of docsOnly) { const docPath = toPosix(doc.filePath); const pat = toPosix(pattern);if (main) { - writeMarkdownCopies(collectedDocs); - - // Generate llms-full variants: one for docs, one for blog. - // Agents that cannot browse mid-conversation (Claude Projects, - // Cursor) paste these into their context window for full recall. const docsItems = collectedDocs.filter( (d) => d.source === "docs" ); const blogItems = collectedDocs.filter( (d) => d.source === "blog" ); + + writeMarkdownCopies(docsItems); + + // Generate llms-full variants: one for docs, one for blog. + // Agents that cannot browse mid-conversation (Claude Projects, + // Cursor) paste these into their context window for full recall.Also applies to: 299-310
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/plugin-generate-llms.js` around lines 190 - 215, orderDocs and the legacy-markdown/llms emitters are currently iterating over collectedDocs which includes blogs; restrict legacy outputs to docs-only by filtering for entries where doc.source === "docs". Update orderDocs (and the similar block around writeMarkdownCopies / llms generation at the other location) to either accept a filtered array (e.g., collectedDocs.filter(d => d.source === "docs")) or add a guard inside the loop (skip if doc.source !== "docs") so ordered, duplicates, and legacy markdown/llms.txt only reflect docs.
🧹 Nitpick comments (1)
plugins/plugin-generate-llms.js (1)
312-340: Derive header URLs fromsiteConfig.url.The source delimiters already use
siteConfig.url; doing the same in the generated headers keeps preview/staging/domain changes consistent.♻️ Proposed refactor
if (docsItems.length > 0) { + const siteUrl = url.replace(/\/$/, ""); const header = `# Envio: Full Documentation for LLMs\n\n` + - `> Every page of docs.envio.dev concatenated as markdown, ` + + `> Every page of ${siteUrl} concatenated as markdown, ` + `with per-page source URLs, for direct ingestion into ` + - `LLM context windows. Pair with https://docs.envio.dev/llms.txt ` + + `LLM context windows. Pair with ${siteUrl}/llms.txt ` + `for the navigational index.`;if (blogItems.length > 0) { + const siteUrl = url.replace(/\/$/, ""); const header = `# Envio: Full Blog and Case Studies for LLMs\n\n` + - `> Every blog post and case study on docs.envio.dev ` + + `> Every blog post and case study on ${siteUrl} ` + `concatenated as markdown, with per-page source URLs. ` + - `Pair with https://docs.envio.dev/llms-full.txt for ` + + `Pair with ${siteUrl}/llms-full.txt for ` + `technical documentation.`;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/plugin-generate-llms.js` around lines 312 - 340, The headers for llms-full.txt and llms-full-blog.txt hardcode docs.envio.dev; update them to derive the base URL from the siteConfig.url value instead. In the block that builds the header strings (around renderLLMSFull calls), reference siteConfig.url (normalize to remove trailing slash if needed) when composing the "Pair with ..." and any "source URLs" text so both headers use the configured site URL; keep the rest of the header text intact and continue writing files to context.outDir as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@plugins/plugin-generate-llms.js`:
- Around line 108-114: The collectedDocs entries currently store a cwd-relative
filePath which breaks renderLLMSFull when it reopens files; update the
collection code around where collectedDocs is populated (the block adding
objects with filePath, title, description, pageUrl, source) to include a
resolved fullPath (e.g., resolve with path.join(config.path, file) or
path.resolve) and store it as fullPath on each item; then ensure renderLLMSFull
uses the new fullPath property when reopening files. Apply the same change to
the other collection sites noted (the similar blocks at lines referenced around
174-180 and 234-238) so all doc entries include an absolute fullPath.
- Around line 190-215: orderDocs and the legacy-markdown/llms emitters are
currently iterating over collectedDocs which includes blogs; restrict legacy
outputs to docs-only by filtering for entries where doc.source === "docs".
Update orderDocs (and the similar block around writeMarkdownCopies / llms
generation at the other location) to either accept a filtered array (e.g.,
collectedDocs.filter(d => d.source === "docs")) or add a guard inside the loop
(skip if doc.source !== "docs") so ordered, duplicates, and legacy
markdown/llms.txt only reflect docs.
---
Nitpick comments:
In `@plugins/plugin-generate-llms.js`:
- Around line 312-340: The headers for llms-full.txt and llms-full-blog.txt
hardcode docs.envio.dev; update them to derive the base URL from the
siteConfig.url value instead. In the block that builds the header strings
(around renderLLMSFull calls), reference siteConfig.url (normalize to remove
trailing slash if needed) when composing the "Pair with ..." and any "source
URLs" text so both headers use the configured site URL; keep the rest of the
header text intact and continue writing files to context.outDir as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c12e8f89-b973-49b5-b1d7-69cef8fdf073
📒 Files selected for processing (1)
plugins/plugin-generate-llms.js
Summary
plugins/plugin-generate-llms.jsto emit two additional files at build timellms-full.txt(~1.7 MB, 325 doc pages): every docs page concatenated with source URL delimitersllms-full-blog.txt(~0.5 MB, 55 posts): every blog post and case study concatenatedllms.txtnavigational index is unchangedTest plan
build//llms-full.txtand/llms-full-blog.txtwith 200llms-full.txtinto Claude Projects and run a few sample doc queries🤖 Generated with Claude Code
Summary by CodeRabbit