Replace sphinx-multiversion with per-branch docs builds#5226
Replace sphinx-multiversion with per-branch docs builds#5226AntoineRichard wants to merge 7 commits intodevelopfrom
Conversation
Remove sphinx-multiversion (unmaintained since 2020) and replace with CI-native per-branch builds. Each deploy branch now builds only its own docs using plain sphinx-build, then deploys to a versioned subdirectory on gh-pages with keep_files: true. Changes: - Rewrite docs.yaml: single-version build + per-branch deploy - Add docs-migration.yaml: one-time workflow to rebuild all versions - Add version-switcher.js: injected into old versions for navigation - Add pydata-sphinx-theme version switcher config to conf.py - Remove sphinx_multiversion extension, smv_* config, versioning.html - Remove multi-docs Makefile target
Same changes as the develop PR (#5226), applied to main's diverged files. Removes sphinx-multiversion, adds per-branch CI builds with version switcher, and one-time migration workflow.
Greptile SummaryThis PR replaces
Confidence Score: 3/5Not safe to merge without addressing the versions.json race condition and the empty DOCS_VERSION_SLUG for PRs, which could break the version switcher and PR doc builds. Three P1 findings: a genuine race condition that can silently drop entries from the version switcher when concurrent branch deploys occur, an empty env var that bypasses the Python default and may cause a Sphinx -W error on every PR build, and a new unpinned third-party action that violates the repo's explicit supply-chain policy. .github/workflows/docs.yaml — race condition in versions.json upsert (lines 137-196) and empty DOCS_VERSION_SLUG for PRs (lines 59-69); also peaceiris/actions-gh-pages@v4 pinning in both workflow files. Important Files Changed
Sequence DiagramsequenceDiagram
participant Dev as Developer
participant CI as GitHub Actions
participant Pages as gh-pages
participant Switcher as versions.json
Dev->>CI: Push to main/develop/release/**
CI->>CI: build-docs (sphinx-build -W)
CI->>CI: Upload artifact docs-html
CI->>Pages: Fetch current versions.json
Pages-->>CI: existing entries
CI->>CI: Upsert this branch's entry
CI->>CI: Assemble site/slug + inject version-switcher.js
CI->>Pages: Deploy with keep_files:true
Pages->>Switcher: Overwrite versions.json
Note over CI,Pages: Race: concurrent branch deploys both fetch same versions.json, last writer wins
Dev->>CI: PR opened/updated
CI->>CI: build-docs only (no deploy)
Note over CI: DOCS_VERSION_SLUG empty for PRs
|
.github/workflows/docs.yaml
Outdated
| - name: Update versions.json | ||
| env: | ||
| SLUG: ${{ steps.slug.outputs.slug }} | ||
| run: | | ||
| # Fetch existing versions.json from gh-pages (may not exist yet) | ||
| PAGES_URL="https://isaac-sim.github.io/IsaacLab/versions.json" | ||
| if curl -sf "$PAGES_URL" -o existing.json; then | ||
| echo "Fetched existing versions.json" | ||
| else | ||
| echo "[]" > existing.json | ||
| echo "No existing versions.json, starting fresh" | ||
| fi | ||
|
|
||
| # Determine display name | ||
| REF="${{ github.ref }}" | ||
| if [[ "$REF" == refs/tags/* ]]; then | ||
| NAME="${SLUG}" | ||
| else | ||
| NAME="${SLUG}" | ||
| fi | ||
|
|
||
| # Upsert this version's entry | ||
| python3 -c " | ||
| import json, sys | ||
|
|
||
| slug = '${SLUG}' | ||
| name = '${NAME}' | ||
|
|
||
| with open('existing.json') as f: | ||
| versions = json.load(f) | ||
|
|
||
| # Remove existing entry for this slug | ||
| versions = [v for v in versions if v['version'] != slug] | ||
|
|
||
| # Add updated entry | ||
| entry = { | ||
| 'name': name, | ||
| 'version': slug, | ||
| 'url': '/IsaacLab/' + slug + '/', | ||
| } | ||
| versions.append(entry) | ||
|
|
||
| # Sort: branches first (main, develop), then releases, then tags by version | ||
| def sort_key(v): | ||
| s = v['version'] | ||
| if s == 'main': | ||
| return (0, s) | ||
| elif s == 'develop': | ||
| return (1, s) | ||
| elif s.startswith('release-'): | ||
| return (2, s) | ||
| else: | ||
| return (3, s) | ||
|
|
||
| versions.sort(key=sort_key) | ||
|
|
||
| with open('site/versions.json', 'w') as f: | ||
| json.dump(versions, f, indent=2) | ||
| f.write('\n') | ||
| " |
There was a problem hiding this comment.
Race condition on
versions.json
When two branches push at the same time (e.g. main and develop), both jobs concurrently fetch the same initial versions.json, each add only their own entry, and the last to deploy overwrites the file — silently dropping the other branch's entry from the switcher until that branch is next pushed.
The concurrency group scopes cancellation to ${{ github.ref }}, so same-ref runs cancel each other but different refs run freely in parallel. In practice, main and develop often advance together (e.g. after a version bump).
A safe fix: make the upsert atomic by pushing to a dedicated branch and using git merge semantics, or add a retry loop with optimistic locking on the JSON file. A simpler but less correct workaround is to use a separate repository_dispatch event from each deploy that serialises all versions.json writes into a single workflow.
.github/workflows/docs.yaml
Outdated
| cat site/versions.json | ||
|
|
||
| - name: Deploy to gh-pages | ||
| uses: peaceiris/actions-gh-pages@v4 |
There was a problem hiding this comment.
New third-party action not SHA-pinned
peaceiris/actions-gh-pages@v4 is a new third-party action not present in any other workflow in this repo. Per the project's CI policy (AGENTS.md), all actions must be pinned to a commit SHA (action@<sha> # vX.Y.Z) and new actions require repo admin approval before use.
This applies to docs-migration.yaml line 168 as well.
.github/workflows/docs.yaml
Outdated
| run: | | ||
| REF="${{ github.ref }}" | ||
| if [[ "$REF" == refs/tags/* ]]; then | ||
| SLUG="${REF#refs/tags/}" | ||
| elif [[ "$REF" == refs/heads/* ]]; then | ||
| SLUG="${REF#refs/heads/}" | ||
| SLUG="${SLUG//\//-}" | ||
| else | ||
| SLUG="" | ||
| fi | ||
| echo "slug=$SLUG" >> "$GITHUB_OUTPUT" |
There was a problem hiding this comment.
Empty
DOCS_VERSION_SLUG for PR builds
For pull-request events, github.ref is refs/pull/<n>/merge, which does not match either branch pattern. The else branch leaves SLUG="", so the make current-docs step gets DOCS_VERSION_SLUG="" in its environment.
In conf.py, os.getenv("DOCS_VERSION_SLUG", version) returns "" (not the fallback version) when the variable is set-but-empty. This makes version_match: "" in the switcher config, which will not match any entry in versions.json. Depending on how sphinx-book-theme/pydata-sphinx-theme handle an unmatched version, this could emit a Sphinx warning that's promoted to an error by the -W flag in the Makefile, breaking the PR build.
Or exclude the variable from the env: block when slug is empty so os.getenv sees it as unset and uses the default version.
| // Works for both /IsaacLab/main/... and custom domain setups. | ||
| var pathParts = window.location.pathname.split("/").filter(Boolean); | ||
| // Expect at least /<project>/<version>/... | ||
| var basePath = "/" + pathParts[0] + "/"; | ||
| var currentSlug = pathParts[1] || ""; | ||
| // The relative page path within the version (e.g. "api/index.html"). | ||
| var pagePath = pathParts.slice(2).join("/") || "index.html"; |
There was a problem hiding this comment.
Base-path detection fails for custom domain root deployments
The comment on line 24 claims the script "works for both /IsaacLab/main/… and custom domain setups," but the logic is only correct for the GitHub Pages sub-path case.
For a custom domain where docs live at the root (e.g. https://docs.isaaclab.io/main/page.html):
pathParts = ["main", "page.html"]basePath = "/main/"← wrong, should be"/"currentSlug = "page.html"← wrong, should be"main"JSON_URL = "/main/versions.json"← 404
This is a pre-wired assumption of a two-level URL prefix. If a custom domain is ever used, the switcher will silently fail to load. Consider removing the misleading comment or parameterising basePath via a data attribute set at injection time.
| with: | ||
| github_token: ${{ secrets.GITHUB_TOKEN }} | ||
| publish_dir: ./site | ||
| keep_files: false |
There was a problem hiding this comment.
keep_files: false all-or-nothing behaviour is undocumented
The migration deploy uses keep_files: false, which fully replaces gh-pages. If any matrix build job fails (even though fail-fast: false keeps other jobs running), the deploy-all job is skipped because needs: [build-version] requires all matrix jobs to succeed. In that scenario, the old gh-pages content remains, which is safe.
However, it is worth documenting this intentional "all-or-nothing" behaviour with a comment, since it's non-obvious that a single failed historical tag prevents the entire migration from deploying.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Review: Replace sphinx-multiversion with per-branch docs builds
Great initiative — sphinx-multiversion has been a well-known pain point and the per-branch build pattern (à la NumPy/CPython) is the right direction. The design spec is thorough and the overall architecture is sound. I have several issues to flag, ranging from a potential data-loss race condition to smaller improvements.
🔴 Critical: Race condition in versions.json updates (docs.yaml deploy-docs job)
The deploy-docs job fetches versions.json from the live GitHub Pages URL, upserts the current version, and writes it back. If two deploy branches push simultaneously (e.g. main and develop within the same minute), both will fetch the same versions.json, each add their entry, and whichever deploys last will overwrite the other's entry. This is a real risk since keep_files: true only preserves files not in the publish directory — both deploys write versions.json to the root.
The existing concurrency block uses ${{ github.workflow }}-${{ github.ref }} which is per-ref, so two different refs can still race.
Fix: Add a separate concurrency group on just the deploy-docs job to serialize all deploys:
deploy-docs:
concurrency:
group: docs-deploy-gh-pages
cancel-in-progress: false🟡 Medium: Shell variable injection in Python heredoc (docs.yaml lines ~119-120)
In the "Update versions.json" step, the Python script uses ${SLUG} and ${NAME} via shell interpolation directly inside Python string literals:
slug = '${SLUG}'
name = '${NAME}'If a branch name ever contains a single quote (unlikely but possible in git), this breaks the Python script — or worse, allows injection. The SLUG env var is already set for this step. Safer to use os.environ['SLUG'] inside the Python code instead of shell interpolation.
🟡 Medium: Migration workflow keep_files: false is destructive (docs-migration.yaml line 172)
The docs-migration.yaml deploy step uses keep_files: false, doing a full overwrite of gh-pages. This is correct for a one-time bootstrap, but if accidentally re-triggered after regular deploys have added new versions not in the matrix (e.g. v3.0.1), those versions get wiped. Consider:
- Adding a prominent warning comment in the workflow
- Adding a second confirmation input (e.g.
type_to_confirm) - Deleting/disabling this workflow after the migration is complete
🟢 Minor: Tag trigger pattern is broader than intended (docs.yaml line 14)
The tag pattern 'v[1-9]*' matches v1, v1-rc, v100anything, etc. The spec says v*.*.*. If you only want semver release tags, consider tightening to 'v[1-9]+.[0-9]+.[0-9]+*' or similar.
🟢 Minor: version-switcher.js uses deprecated XMLHttpRequest
The HEAD request for page-existence checking (line ~67) uses XMLHttpRequest, which is deprecated. Since docs pages target modern browsers, fetch() with { method: 'HEAD' } would be more future-proof.
🟢 Nit: Design spec committed to repo
The 281-line spec at docs/superpowers/specs/2026-04-10-multi-version-docs-design.md is thorough documentation, but consider if it belongs in the repo long-term or if the PR description is sufficient context.
Overall: Well-designed migration with a solid spec. The versions.json race condition is the main concern — adding a deploy-level concurrency group would make this production-safe. The rest is polish. 👍
- Add deploy concurrency group to serialize gh-pages writes (race fix) - Only set DOCS_VERSION_SLUG for deploy branches, unset for PRs - SHA-pin peaceiris/actions-gh-pages to v4.0.0 - Set DOCS_VERSION_SLUG in migration workflow build step - Use os.environ instead of shell interpolation in Python heredoc - Remove dead NAME conditional block - Suppress config.invalid_switcher_json warning (404 before first deploy) - Exclude superpowers/ directory from Sphinx build
- SHA-pin all GitHub Actions (checkout, setup-python, upload/download-artifact) - Add explicit permissions: (contents: read default, write for deploy) - Pass github.ref via env: block to avoid shell injection - Add empty-slug guard in deploy job (fail-fast instead of corrupt root) - Fetch versions.json from gh-pages branch via gh api (avoid CDN cache) - Add Python input validation: JSON decode errors, schema checks, empty slug - Use heredoc for inline Python (avoids shell interpolation issues) - Add sed injection verification count - Add min-count assertion in migration workflow - JS: replace empty catch with console.warn, add onerror handler, validate array
- Use 'or' fallback for DOCS_VERSION_SLUG (empty string != unset) - Add explicit GH_TOKEN to versions.json fetch step - Add sed injection verification to migration workflow - Emit ::warning:: when injection count mismatches in docs.yaml - Log non-200 HTTP responses in version-switcher.js
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR replaces the unmaintained sphinx-multiversion with a per-branch/tag docs build pipeline — the same pattern used by NumPy and CPython. The architecture is well-designed: each push builds only its own ref, deploys to a versioned subdirectory on gh-pages, and a JSON-driven version switcher ties everything together. The author has addressed the critical findings from the previous review (race condition fix via deploy concurrency group, shell injection fix via os.environ). A few remaining items are worth addressing before merge.
Design Assessment
Design is sound. The per-branch build model with keep_files: true is the industry standard for multi-version docs. The two-tier version switcher approach (pydata-sphinx-theme built-in for new versions, injected standalone JS for old versions) is pragmatic and handles the migration gracefully. The deploy-level concurrency group correctly serializes all deploys to prevent the versions.json race condition.
Findings
🟡 Warning: Migration workflow keep_files: false has no re-run guard — .github/workflows/docs-migration.yaml:194
The migration workflow does a full overwrite of gh-pages (keep_files: false). This is correct for the initial bootstrap, but if accidentally re-triggered after regular deploys have added versions not in the matrix (e.g., a new v3.1.0 tag deployed via docs.yaml), those versions get wiped. The dry_run input helps, but a distracted operator could still nuke live docs.
Consider adding a confirmation input or a prominent comment:
inputs:
confirm:
description: "Type 'OVERWRITE' to confirm full gh-pages replacement"
required: true
type: stringThen gate deploy-all on inputs.confirm == 'OVERWRITE'. Alternatively, add a comment noting this workflow should be deleted or disabled after the migration is complete.
🟡 Warning: Tag trigger pattern is broader than intended — .github/workflows/docs.yaml:14
The tag pattern 'v[1-9]*' matches any tag starting with v followed by a non-zero digit — including v1, v1-rc, v100anything, etc. The design spec calls for semver tags (v*.*.*). While unlikely to cause issues in practice (who creates a v1 tag?), tightening the pattern prevents accidental builds:
tags:
- 'v[1-9]+.[0-9]+.[0-9]+*'The trailing * allows pre-release suffixes like v3.0.0-beta (which is in the migration matrix).
🔵 Suggestion: Hardcoded /IsaacLab/ path in sed injection could use a variable — .github/workflows/docs.yaml:148 and .github/workflows/docs-migration.yaml:131
Both workflows hardcode /IsaacLab/ in the sed script tag injection and in versions.json URL entries. If the repository is ever renamed or forked, these would silently break. Consider deriving it from github.repository:
env:
SITE_PATH: /IsaacLab # or: /${{ github.event.repository.name }}Then use ${SITE_PATH} in the sed command and Python script. This is low-risk since renaming is unlikely, but it would make the workflows more portable.
🔵 Suggestion: version-switcher.js uses XMLHttpRequest — docs/_static/version-switcher.js:61,98
Both the versions.json fetch and the HEAD page-existence check use XMLHttpRequest, which is a legacy API. Since the docs target modern browsers, fetch() would be more concise and future-proof:
fetch(targetUrl, { method: "HEAD" })
.then(function (res) {
window.location.href = res.ok ? targetUrl : targetBase;
})
.catch(function () {
window.location.href = targetBase;
});Not blocking — XHR works fine — but worth modernizing since this is new code.
Test Coverage
This is a documentation/CI infrastructure PR with no runtime code changes. The manual test plan in the PR description is appropriate:
- Docs CI builds successfully on this PR
- After merge, trigger
docs-migration.yamlto rebuild all versions - Verify version switcher works on deployed docs
- Verify old version pages have the injected JS switcher
No automated unit tests are needed. The CI checks (pre-commit, broken links, docs build) provide adequate validation for this type of change.
One addition to the test plan: After running the migration, verify that versions.json contains all expected entries and that the sort order is correct (branches first, then releases, then tags).
CI Status
- ✅ labeler, pre-commit, Check Secrets, Check for Broken Links: passing
- ⏳ license-check, Build Docs: pending
Verdict
COMMENT
The critical issues from the prior review have been addressed. The remaining findings are all preventive improvements (migration guard, tighter tag pattern, portability, API modernization) — none are blocking. This is well-designed infrastructure work. Ship it once the pending CI checks pass and consider the migration-guard suggestion to protect against accidental re-runs.
| with: | ||
| github_token: ${{ secrets.GITHUB_TOKEN }} | ||
| publish_dir: ./site | ||
| keep_files: false |
There was a problem hiding this comment.
🟡 Warning: keep_files: false does a full overwrite of gh-pages. If this workflow is accidentally re-triggered after regular docs.yaml deploys have added new versions (e.g. v3.1.0), those versions get wiped.
Consider adding a confirmation input to gate deploy-all:
inputs:
confirm:
description: "Type 'OVERWRITE' to confirm full gh-pages replacement"
required: true
type: stringOr add a comment noting this workflow should be deleted/disabled after the migration is complete.
.github/workflows/docs.yaml
Outdated
| - 'release/**' | ||
| - 'feature/isaacsim-6-0' | ||
| tags: | ||
| - 'v[1-9]*' |
There was a problem hiding this comment.
🟡 Warning: Pattern 'v[1-9]*' matches any tag starting with v + a non-zero digit (e.g. v1, v1-rc, v100anything). The design spec calls for semver tags. Consider tightening:
tags:
- 'v[1-9]+.[0-9]+.[0-9]+*'The trailing * still allows pre-release suffixes like v3.0.0-beta.
.github/workflows/docs.yaml
Outdated
| if: needs.doc-build-type.outputs.trigger-deploy == 'true' | ||
| # Inject version-switcher.js into every HTML file | ||
| find "site/${SLUG}" -name "*.html" -exec sed -i \ | ||
| 's|</body>|<script src="/IsaacLab/version-switcher.js"></script></body>|' {} + |
There was a problem hiding this comment.
🔵 Suggestion: The /IsaacLab/ path is hardcoded here and in the Python versions.json generator. If the repository is ever renamed or forked, this silently breaks. Consider deriving from github.repository:
env:
SITE_PATH: /${{ github.event.repository.name }}Then use ${SITE_PATH} in the sed command. Low risk but improves portability.
| // Try to navigate to the same page under the new version. | ||
| var targetUrl = targetBase + pagePath; | ||
| // Use a HEAD request to check if the page exists; fall back to index. | ||
| var xhr = new XMLHttpRequest(); |
There was a problem hiding this comment.
🔵 Suggestion: XMLHttpRequest is a legacy API. Since this is new code targeting modern browsers, fetch() would be more concise:
fetch(targetUrl, { method: "HEAD" })
.then(function (res) {
window.location.href = res.ok ? targetUrl : targetBase;
})
.catch(function () {
window.location.href = targetBase;
});Not blocking — XHR works fine.
Split the monolithic docs workflow into a develop-specific workflow as part of the docs CI refactor. The new docs-dev.yml handles pushes to develop and PRs targeting develop, with a hardcoded "develop" version slug and a simplified deploy step that no longer manages versions.json or the redirect index.html (those move to docs-release.yml).
Split the unified docs workflow into release-specific handling. This workflow covers main/release branch pushes, tag deploys, and PRs against main/release branches. The develop branch is handled separately by docs-dev.yml (PR #5226). Key changes from the old unified workflow: - Remove develop branch triggers (moved to docs-dev.yml) - Add branch filter on pull_request to scope to main/release - Add /stable/ alias creation for tag deploys - Conditional root index.html only on tag deploys
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Follow-up
New commits (b8f8fef→552f936): Split docs.yaml into docs-dev.yml for develop-only builds (fixes the empty DOCS_VERSION_SLUG issue), and added a /stable/ alias + semver-aware sorting to the migration workflow. Previous review concerns (race condition, empty slug) are addressed.
One new issue with the /stable/ alias below.
| else | ||
| echo "No tag directories found, skipping /stable/ creation" | ||
| fi | ||
|
|
There was a problem hiding this comment.
🟡 Warning: /stable/ directory gets included in versions.json as a duplicate entry
The "Generate versions.json" step iterates all directories under site/, which now includes stable alongside the tag it aliases (e.g. v3.0.0-beta). This means users see both stable and v3.0.0-beta in the version switcher, pointing to identical content.
Additionally, _version_tuple("stable") returns () (no digits), so it sorts before all versioned tags — appearing between the release-* entries and v1.0.0, which is an odd position for a "latest" alias.
Fix: Exclude stable from versions.json generation (it's an alias, not a real version):
SKIP_DIRS = {"stable"}
for name in sorted(os.listdir("site")):
path = os.path.join("site", name)
if not os.path.isdir(path) or name in SKIP_DIRS:
continue
...Alternatively, if you want stable in the switcher, handle it specially in sort_key (e.g. return (-1, ()) to put it first) and consider adding a display name like "name": "stable (v3.0.0-beta)" so users understand the alias.
Summary
sphinx-multiversion(unmaintained since 2020) from the docs buildsphinx-buildkeep_files: truepydata-sphinx-themeversion switcher (viasphinx-book-theme)version-switcher.jsinjected into old version pagesdocs-migration.yamlworkflow to rebuild all historical versionsContext
sphinx-multiversioninstalls one Sphinx version and builds all branches/tagsagainst it — no dependency isolation. When Sphinx 9.0 changed
Config.read(),every version broke. This replaces it with the industry-standard pattern used
by NumPy and CPython.
See spec:
docs/superpowers/specs/2026-04-10-multi-version-docs-design.mdTest plan
docs-migration.yamlto rebuild all versions