Skip to content

Add llms-full.txt for LLM accessibility of documentation #355

@Wenzel

Description

@Wenzel

Summary

The TSFFS documentation at https://intel.github.io/tsffs/ is not discoverable or easily consumable by LLMs (e.g. for RAG pipelines, context-window injection, or AI coding assistants).

Specifically:

  • No /llms.txt or /llms-full.txt (per the emerging llmstxt.org standard)
  • No /sitemap.xml or /robots.txt
  • Content is split across ~77 pages; an LLM agent has no way to enumerate them without recursively parsing HTML nav links
  • print.html (the full single-page build) exists but is not advertised anywhere

Proposed fix

Add a llms-full.txt generation step to the docs CI pipeline. The Markdown source in docs/src/ is already well-structured and SUMMARY.md provides the correct chapter order.

One shell command added after mdbook build in .github/workflows/docs.yml:

grep -oP '\([^)]+\.md\)' src/SUMMARY.md | tr -d '()' | while read -r f; do
  echo; cat "src/$f"
done > book/html/llms-full.txt

This:

  • Requires no new tools, dependencies, or files
  • Produces a ~213 KB, ~6000-line concatenated Markdown file at /llms-full.txt
  • Automatically stays up to date as docs change
  • Is fully ingestible in a single LLM context window

Verification

Tested locally — the output flows correctly from intro → setup → config → harnessing → fuzzing → tutorials → developer docs, with all code examples intact.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions