New PR for plugin implementation #307
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
docusaurus-plugin-llms-txt
A powerful Docusaurus plugin that generates Markdown versions of your HTML pages and creates an
llms.txtindex file for AI/LLM consumption. Perfect for making your documentation easily accessible to Large Language Models while maintaining human-readable markdown files.Features
Installation
Quick Start
Basic Setup
Add the plugin to your
docusaurus.config.js:Basic Configuration
After building your site (
npm run build), you'll find:llms.txtin your build output directoryConfiguration Options
Main Plugin Options
siteTitlestringsiteDescriptionstringundefineddepth1|2|3|4|51enableDescriptionsbooleantrueoptionalLinksOptionalLink[][]includeOrderstring[][]runOnPostBuildbooleantrueonRouteError'ignore'|'log'|'warn'|'throw''warn'logLevel0|1|2|31Content Options (
content)Note: All content options are optional. If you don't specify a
contentobject, all options use their defaults.enableMarkdownFilesbooleantruerelativePathsbooleantrueincludeBlogbooleanfalseincludePagesbooleanfalseincludeDocsbooleantrueexcludeRoutesstring[][]contentSelectorsstring[]routeRulesRouteRule[][]remarkStringifyobject{}remarkGfmboolean|objecttruerehypeProcessTablesbooleantrueDetailed Configuration
Depth Configuration
The
depthoption controls how deep the hierarchical organization goes in your document tree. This is crucial for determining how your URLs are categorized.How it works:
depth: 1:/api/users→apicategorydepth: 2:/api/users/create→api/userscategorydepth: 3:/api/users/create/advanced→api/users/createcategoryUse cases:
depth: 1- Simple sites with few top-level sectionsdepth: 2- Most documentation sites with clear section/subsection structuredepth: 3+- Complex sites with deep hierarchies or very specific organization needsOptional Links
Add external or additional links to your llms.txt in a separate "Optional" section.
Structure:
Required fields:
title- Display text for the linkurl- The URL to link toExample:
Output:
Include Order
Controls the order in which categories appear in your llms.txt using glob patterns. Categories matching earlier patterns appear first.
Pattern matching rules:
/**for matching entire directory trees*for single-level wildcardsError Handling
The
onRouteErroroption controls what happens when individual pages fail to process. Valid values:'ignore','log','warn','throw'.'ignore': Skip failed routes silently'log': Log failures but continue (no console output in normal mode)'warn': Show warnings for failures but continue (recommended)'throw': Stop entire build on first failureLogging Levels
The
logLeveloption controls verbosity of console output. Range: 0-3 (integer).0(Quiet): Only errors and final success/failure1(Normal): Errors, warnings, and completion messages (default)2(Verbose): Above + processing info and statistics3(Debug): Above + detailed debug informationPath Configuration
The
relativePathsoption controls link format in bothllms.txtand markdown files:true:./getting-started/index.md,../api/reference.mdfalse:https://mysite.com/getting-started/,https://mysite.com/api/reference/When to use relative paths:
When to use absolute paths:
Route Exclusion
Use glob patterns to exclude specific routes or route patterns:
Common exclusion patterns:
**/_category_/**- Docusaurus auto-generated category pages/tags/**- Blog tag pages/archive/**- Archived content**/*.xml- Sitemap and RSS files**/internal/**- Internal documentationContent Selectors
CSS selectors used to extract main content from HTML pages. The plugin tries each selector in order until it finds content.
Default selectors (used when
contentSelectorsis not specified):How it works:
[]will use the default selectors aboveCustom selectors for different themes:
Important notes:
'main'or'article')Debugging content extraction:
logLevel: 3to see which selector is being used for each pagedocument.querySelector('your-selector')Markdown Processing
remarkStringifyOptionsControls how the HTML→Markdown conversion formats the output. These options are passed directly to the remark-stringify library.
📖 For complete option reference, see: remark-stringify options
remarkGfmOptionsControls table processing, strikethrough text, task lists, and other GitHub-style markdown features. These options are passed directly to the remark-gfm library.
Default values when
remarkGfm: true:When you set
remarkGfm: true, the plugin automatically applies these defaults:You can override any of these by providing an object instead of
true:📖 For complete option reference, see: remark-gfm options
rehypeProcessTables(Plugin Option)Type:
boolean| Default:trueThis is a plugin-specific option that controls whether to process HTML tables for better markdown conversion. When enabled, the plugin intelligently processes HTML tables to create clean markdown tables. When disabled, tables are left as raw HTML in the markdown output.
Route Rules
Route rules provide powerful per-route customization capabilities. They allow you to override any processing option for specific routes or route patterns.
Basic Route Rule Structure
Required fields:
route- The glob pattern to match routes againstOptional fields (all have validation constraints):
depth- Must be integer 1-5categoryName- Any stringcontentSelectors- Array of CSS selector stringsincludeOrder- Array of glob pattern stringsRoute Pattern Matching
Route rules use glob patterns to match routes:
Rule Priority
When multiple rules match the same route, the most specific rule wins:
Advanced Examples
API Documentation with Custom Structure:
Multi-Language Documentation:
CLI Commands
The plugin provides CLI commands for standalone operation and cleanup:
Generate Command
Generates
llms.txtand markdown files using cached routes from a previous build.Arguments:
siteDir(optional) - Path to your Docusaurus site directory. Defaults to current working directory.Prerequisites:
You must run
npm run buildfirst to create the route cache.Examples:
How it works:
When to use:
runOnPostBuild: falseconfiguredClean Command
Removes all generated markdown files and
llms.txtusing cached file information.Arguments:
siteDir(optional) - Path to your Docusaurus site directory. Defaults to current working directory.Options:
--clear-cache- Also clear the plugin cache directoryExamples:
What gets cleaned:
llms.txtindex file--clear-cache: The entire plugin cache directorySafe operation:
When to use:
enableMarkdownFiles: true/falseAdvanced Configuration Examples
Multi-Language Support
API Documentation Focus
Blog-Heavy Site
Custom Content Extraction
Understanding the Output
llms.txt Structure
The generated
llms.txtfollows this structure:Markdown Files
When
enableMarkdownFilesis true, individual markdown files are created for each page:relativePathssettingTroubleshooting
Common Issues
"No cached routes found"
npm run buildfirst to generate the cachedocusaurus.config.jsEmpty or minimal content
contentSelectorsconfigurationlogLevel: 3for debug outputRoute processing failures
onRouteError: 'ignore'to skip problematic routeslogLevel: 2to see which routes are failingexcludeRoutesto filter out problematic pathsDebug Configuration
Performance Optimization
For large sites:
Caching
The plugin uses intelligent caching to speed up subsequent builds:
.docusaurus/docusaurus-plugin-llms-txt/llms-txt-clean --clear-cacheto reset the cacheLicense
MIT