This builds single file targets from documentations, primarily for the purpose of feeding into LLMs to chat with doc.
The normal update target initializes and updates the public submodules:
make updateAfter that, build the available single-file documentation targets:
make single_fileSome documentation sources are optional private submodules.
They are marked with update = none in .gitmodules, so make update skips their initial clone.
This lets users without access update the public submodules and build the public documentation without Git trying to clone private repositories.
For a fresh checkout where you also have access to the private submodules, run both update targets:
make update update-privatemake update-private only initializes and updates the optional private submodules.
It does not replace make update, because it does not initialize or update the public submodules.
The two targets are independent and complementary; the combined command above runs the public update first, then the private update.
After a private submodule has been initialized once, the regular make update target will also run that project's own update target.
Run make update-private again when you are setting up a fresh checkout, when a private submodule was not initialized before, or when you specifically want to refresh only the optional private submodules.
The root build/ directory, when present, is aggregate output.
Every other first-level project directory is its own documentation project.
Within each project, it must contain
-
makefilewith targetssingle_file,all,clean,Clean,update.-
single_fileshould make the best single file target, and make a copy atbuild/with the name of that subdirectory and an appropriate extension. -
allmake all single file targets, includingsingle_fileand optionally other alternative single file targets. -
cleancleanup generated files. -
Cleanis a more thorough cleanup, including other things that are more expensive to generate such as the environment. -
updatewill update the doc to point to the latest one, and also the environment if present.
-
-
.gitignore -
optionally
README.md
Some patterns in creating single file documentations are discussed below.
Usually we are manipulating from the source of a repository. In that case, add it as a submodule to track it first:
git submodule add URL PROJECT/gitUsually the environment to build a doc is non-trivial. In that case, a reproducible environment should be included, such as via pixi.
In some cases, tools involve is commonly available on UNIX and hence a custom environment is not created.
Current projects:
| Project | Pattern | Source | Environment | Primary single file |
|---|---|---|---|---|
conda-forge |
Build, extract & convert | public submodule, Docusaurus | pixi + npm | Markdown |
devbox |
Build, serve, crawl, extract & convert | public submodule, Mintlify | pixi + pnpm | Markdown |
flox |
Build, serve, crawl, extract & convert | public submodule, Mintlify | pixi + npm | Markdown |
isambard-docs |
Tweak & rebuild | optional private submodule, MkDocs | pixi | Markdown |
mamba |
Simply build | public submodule, Sphinx + Doxygen | pixi | Markdown |
nersc |
Tweak & rebuild | public submodule, MkDocs | pixi | Markdown |
pandoc |
Simply Download | release artifact | system tools | Markdown |
pixi |
Tweak & rebuild | public submodule, MkDocs | pixi | Markdown |
Pkg.jl |
Simply Download | release artifact | system tools | |
python-patterns |
Simply build | public submodule, Sphinx | pixi | Markdown |
spack |
Simply build | public submodule, Sphinx | pixi | plain text |
Examples: pandoc, Pkg.jl
In some cases, the documentation framework used in a project does not have an option to produce single file documentation. We will then simply concat all relevant doc files and call it a day.
There are no current projects using this pattern.
Some projects do not provide the source of documentation, or the source is not available locally. We use this recipe instead:
- crawl by wget
- convert to markdown by pandoc
- concat
Previous example: flox
We will use the original build system to produce a single file target that is not provided.
Examples: python-patterns, spack, mamba with sphinx
We would dive into the doc build framework and tweak it so that single file target are produced.
Examples:
nersc,pixi,isambard-docswith MkDocs and additional pluginprint-site
Some documentation sites can build a complete HTML site, but the built site is split across routes and contains navigation or generated pages we do not want verbatim. We build the upstream site, extract the documentation body into one HTML file, then convert that file to Markdown or plain text with pandoc.
Examples:
conda-forgewith Docusaurus
Some documentation frameworks do not expose a static build artifact or a single-file output, but they can serve a rendered local preview. In that case, we use the upstream source to start a local preview server, crawl the rendered routes from that server, extract the documentation body from each page, combine the extracted bodies into one HTML file, then convert that file to Markdown or plain text with pandoc.
This differs from crawling the public website. Because we build and serve the site ourselves, we can use the repository's navigation metadata as the site map, include local source changes that are not deployed publicly, pin and reproduce the toolchain, and control the extraction and cleanup before conversion. It also avoids depending on the public site's current deployment, redirects, analytics, robots rules, or unrelated page chrome.
Examples:
devboxwith Mintlifyfloxwith Mintlify. The Flox target uses the newflox/docsrepository, installs a local pinned Mintlify CLI with npm, serves the preview locally, crawls the routes listed indocs.json, extracts rendered documentation content, and converts it with pandoc.