feat: generate llms-full.txt for AI knowledge-base ingestion by Jordy-Baby · Pull Request #883 · enviodev/docs

Jordy-Baby · 2026-04-23T13:14:10Z

Summary

Extends plugins/plugin-generate-llms.js to emit two additional files at build time
llms-full.txt (~1.7 MB, 325 doc pages): every docs page concatenated with source URL delimiters
llms-full-blog.txt (~0.5 MB, 55 posts): every blog post and case study concatenated
Existing llms.txt navigational index is unchanged
Target use cases: Claude Projects and Cursor users who paste the file into context for full-recall Q&A without mid-conversation browsing

Test plan

Local build succeeds, both files written to build/
Page count matches source tree (325 docs + 55 blog = 380)
Source URL delimiter present for every page
Verify preview deploy serves /llms-full.txt and /llms-full-blog.txt with 200
Paste llms-full.txt into Claude Projects and run a few sample doc queries

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Items are now tagged by source type (docs or blog)
- Generated output files consolidate all documentation and blog content with proper formatting, source URLs, and headers for unified content ingestion

Concatenates every docs page (325) into llms-full.txt (~1.7 MB) and every blog post (55) into llms-full-blog.txt (~0.5 MB). Each page is prefixed with a source URL delimiter so agents can cite back. Lets Claude Projects and Cursor users one-shot-ingest the Envio knowledge base without mid-conversation browsing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-04-23T13:14:16Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
envio-docs	Ready	Preview, Comment	Apr 23, 2026 1:15pm

coderabbitai · 2026-04-23T13:18:07Z

📝 Walkthrough

Walkthrough

The plugin-generate-llms.js file is enhanced to tag collected items with source metadata ("docs" or "blog") and introduce a rendering routine that concatenates markdown content into two dedicated build outputs (llms-full.txt and llms-full-blog.txt) with per-item source URL comments and formatted headers for LLM ingestion contexts.

Changes

Cohort / File(s)	Summary
LLM Output Generation Enhancement `plugins/plugin-generate-llms.js`	Added source field tagging for collected items and new rendering routine that concatenates markdown content into two aggregated output files (`llms-full.txt` and `llms-full-blog.txt`) with per-item source URLs, headings, and descriptions for LLM ingestion.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

llms.txt generation #706: Adds initial llms.txt generation and markdown copies to the same plugin, providing foundational LLM output infrastructure.
fix: generate .md files for all docs so llms.txt links resolve #868: Modifies the same plugin to change how doc items are processed and output for LLM ingestion contexts.
chore: llms.txt updates #876: Extends the plugin with blog scanning and per-post markdown emission, overlapping with blog collection logic.

Suggested reviewers

nikbhintade
moose-code
keenbeen32

Poem

🐰 ✨
A plugin now sorts with care so keen,
Tagging docs and blogs pristine,
Concatenating wisdom into files bright,
Source comments glow through LLM night!
Content streams flow, all organized right! 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: generate llms-full.txt for AI knowledge-base ingestion' directly and clearly describes the primary change—generating a new llms-full.txt file for AI knowledge-base ingestion. It is concise, specific, and accurately represents the main objective of the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/llms-full-generation

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

plugins/plugin-generate-llms.js (2)

108-114: ⚠️ Potential issue | 🟡 Minor

Read full-content inputs from absolute paths.

Docs items store a cwd-relative filePath, but renderLLMSFull() now reopens it directly. Capture the already-resolved fullPath during collection so this build step does not depend on the process cwd.

🛠️ Proposed fix

                         collectedDocs.push({
                             filePath: path.join(config.path, file),
+                            contentPath: fullPath,
                             title,
                             description,
                             pageUrl,
                             source: "docs",
                         });

                         collectedDocs.push({
                             filePath: fullPath,
+                            contentPath: fullPath,
                             title,
                             description,
                             pageUrl,
                             source: "blog",
                         });

                 const parts = [header.trim(), ""];
                 for (const item of items) {
-                    const raw = fs.readFileSync(item.filePath, "utf-8");
+                    const raw = fs.readFileSync(item.contentPath, "utf-8");
                     const body = matter(raw).content.trimStart();

Also applies to: 174-180, 234-238

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/plugin-generate-llms.js` around lines 108 - 114, The collectedDocs
entries currently store a cwd-relative filePath which breaks renderLLMSFull when
it reopens files; update the collection code around where collectedDocs is
populated (the block adding objects with filePath, title, description, pageUrl,
source) to include a resolved fullPath (e.g., resolve with
path.join(config.path, file) or path.resolve) and store it as fullPath on each
item; then ensure renderLLMSFull uses the new fullPath property when reopening
files. Apply the same change to the other collection sites noted (the similar
blocks at lines referenced around 174-180 and 234-238) so all doc entries
include an absolute fullPath.

190-215: ⚠️ Potential issue | 🟡 Minor

Keep legacy llms.txt and markdown copies docs-only.

collectedDocs now contains docs and blog entries. orderDocs() still scans all entries, and writeMarkdownCopies(collectedDocs) will also emit blog markdown copies. Filter legacy outputs to source === "docs" so the existing navigational index remains unchanged.

🛠️ Proposed fix

             function orderDocs(includeOrder) {
                 if (!includeOrder || includeOrder.length === 0) {
                     return [];
                 }
 
+                const docsOnly = collectedDocs.filter(
+                    (doc) => doc.source === "docs"
+                );
                 const matched = new Set();
                 const ordered = [];
                 const duplicates = new Set();
 
                 for (const pattern of includeOrder) {
-                    for (const doc of collectedDocs) {
+                    for (const doc of docsOnly) {
                         const docPath = toPosix(doc.filePath);
                         const pat = toPosix(pattern);

                 if (main) {
-                    writeMarkdownCopies(collectedDocs);
-
-                    // Generate llms-full variants: one for docs, one for blog.
-                    // Agents that cannot browse mid-conversation (Claude Projects,
-                    // Cursor) paste these into their context window for full recall.
                     const docsItems = collectedDocs.filter(
                         (d) => d.source === "docs"
                     );
                     const blogItems = collectedDocs.filter(
                         (d) => d.source === "blog"
                     );
+
+                    writeMarkdownCopies(docsItems);
+
+                    // Generate llms-full variants: one for docs, one for blog.
+                    // Agents that cannot browse mid-conversation (Claude Projects,
+                    // Cursor) paste these into their context window for full recall.

Also applies to: 299-310

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/plugin-generate-llms.js` around lines 190 - 215, orderDocs and the
legacy-markdown/llms emitters are currently iterating over collectedDocs which
includes blogs; restrict legacy outputs to docs-only by filtering for entries
where doc.source === "docs". Update orderDocs (and the similar block around
writeMarkdownCopies / llms generation at the other location) to either accept a
filtered array (e.g., collectedDocs.filter(d => d.source === "docs")) or add a
guard inside the loop (skip if doc.source !== "docs") so ordered, duplicates,
and legacy markdown/llms.txt only reflect docs.

🧹 Nitpick comments (1)

plugins/plugin-generate-llms.js (1)

312-340: Derive header URLs from siteConfig.url.

The source delimiters already use siteConfig.url; doing the same in the generated headers keeps preview/staging/domain changes consistent.

♻️ Proposed refactor

                     if (docsItems.length > 0) {
+                        const siteUrl = url.replace(/\/$/, "");
                         const header =
                             `# Envio: Full Documentation for LLMs\n\n` +
-                            `> Every page of docs.envio.dev concatenated as markdown, ` +
+                            `> Every page of ${siteUrl} concatenated as markdown, ` +
                             `with per-page source URLs, for direct ingestion into ` +
-                            `LLM context windows. Pair with https://docs.envio.dev/llms.txt ` +
+                            `LLM context windows. Pair with ${siteUrl}/llms.txt ` +
                             `for the navigational index.`;

                     if (blogItems.length > 0) {
+                        const siteUrl = url.replace(/\/$/, "");
                         const header =
                             `# Envio: Full Blog and Case Studies for LLMs\n\n` +
-                            `> Every blog post and case study on docs.envio.dev ` +
+                            `> Every blog post and case study on ${siteUrl} ` +
                             `concatenated as markdown, with per-page source URLs. ` +
-                            `Pair with https://docs.envio.dev/llms-full.txt for ` +
+                            `Pair with ${siteUrl}/llms-full.txt for ` +
                             `technical documentation.`;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@plugins/plugin-generate-llms.js` around lines 312 - 340, The headers for
llms-full.txt and llms-full-blog.txt hardcode docs.envio.dev; update them to
derive the base URL from the siteConfig.url value instead. In the block that
builds the header strings (around renderLLMSFull calls), reference
siteConfig.url (normalize to remove trailing slash if needed) when composing the
"Pair with ..." and any "source URLs" text so both headers use the configured
site URL; keep the rest of the header text intact and continue writing files to
context.outDir as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@plugins/plugin-generate-llms.js`:
- Around line 108-114: The collectedDocs entries currently store a cwd-relative
filePath which breaks renderLLMSFull when it reopens files; update the
collection code around where collectedDocs is populated (the block adding
objects with filePath, title, description, pageUrl, source) to include a
resolved fullPath (e.g., resolve with path.join(config.path, file) or
path.resolve) and store it as fullPath on each item; then ensure renderLLMSFull
uses the new fullPath property when reopening files. Apply the same change to
the other collection sites noted (the similar blocks at lines referenced around
174-180 and 234-238) so all doc entries include an absolute fullPath.
- Around line 190-215: orderDocs and the legacy-markdown/llms emitters are
currently iterating over collectedDocs which includes blogs; restrict legacy
outputs to docs-only by filtering for entries where doc.source === "docs".
Update orderDocs (and the similar block around writeMarkdownCopies / llms
generation at the other location) to either accept a filtered array (e.g.,
collectedDocs.filter(d => d.source === "docs")) or add a guard inside the loop
(skip if doc.source !== "docs") so ordered, duplicates, and legacy
markdown/llms.txt only reflect docs.

---

Nitpick comments:
In `@plugins/plugin-generate-llms.js`:
- Around line 312-340: The headers for llms-full.txt and llms-full-blog.txt
hardcode docs.envio.dev; update them to derive the base URL from the
siteConfig.url value instead. In the block that builds the header strings
(around renderLLMSFull calls), reference siteConfig.url (normalize to remove
trailing slash if needed) when composing the "Pair with ..." and any "source
URLs" text so both headers use the configured site URL; keep the rest of the
header text intact and continue writing files to context.outDir as before.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c12e8f89-b973-49b5-b1d7-69cef8fdf073

📥 Commits

Reviewing files that changed from the base of the PR and between d4b28da and e7504c3.

📒 Files selected for processing (1)

plugins/plugin-generate-llms.js

Jordy-Baby requested a review from nikbhintade as a code owner April 23, 2026 13:14

vercel Bot deployed to Preview April 23, 2026 13:15 View deployment

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Jordy-Baby enabled auto-merge (squash) April 24, 2026 10:12

nikbhintade approved these changes Apr 24, 2026

View reviewed changes

Jordy-Baby merged commit 6d3bd3e into main Apr 24, 2026
3 checks passed

Jordy-Baby deleted the chore/llms-full-generation branch April 24, 2026 11:47

DenhamPreen mentioned this pull request May 12, 2026

dp/hypersync remove tiers #912

Merged

coderabbitai Bot mentioned this pull request May 18, 2026

Improve agent discovery of llms.txt #925

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: generate llms-full.txt for AI knowledge-base ingestion#883

feat: generate llms-full.txt for AI knowledge-base ingestion#883
Jordy-Baby merged 1 commit into
mainfrom
chore/llms-full-generation

Jordy-Baby commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jordy-Baby commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

vercel Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jordy-Baby commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Apr 23, 2026 •

edited

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading