[CI] Restore nightly changelog cron: pyproject staging + per-package failure tolerance#5859
[CI] Restore nightly changelog cron: pyproject staging + per-package failure tolerance#5859hujc7 wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Summary
This PR fixes a CI workflow issue in the nightly changelog auto-commit step by adding source/*/pyproject.toml to the staged files.
Analysis
What changed:
- Added
source/*/pyproject.tomlto thegit addglob pattern in.github/workflows/nightly-changelog.yml - Added a 5-line comment explaining the rationale
Root cause (well-documented in PR):
Package.write_versionintools/changelog/cli.pybumps bothconfig/extension.tomlANDpyproject.tomlwhen a[project]block exists- The workflow only staged
extension.toml, leavingpyproject.tomlchanges unstaged - The subsequent
git pull --rebasefails with "You have unstaged changes" - This was a latent issue exposed by #5785 which added
[project] versionfields to all managed packages
Assessment:
✅ Correctness: The glob pattern source/*/pyproject.toml correctly matches the same package structure as the existing source/*/config/extension.toml pattern
✅ Side effects: Minimal risk - staging additional files that are already being modified by the compile step is safe. The PR notes that the downstream "Bumped packages" loop filters explicitly to extension.toml, so commit body composition is unaffected
✅ Documentation: Excellent inline comment explaining why both files need to be staged
✅ Consistency: Follows the existing pattern of the workflow
Verdict
LGTM — This is a clean, well-documented one-line fix (plus helpful comments) for a real CI failure. The change is minimal and correctly addresses the root cause.
Update (2026-05-29)
Reviewed new commits pushed since initial review:
New change: Tolerate per-package compile failures
continue-on-error: trueadded to compile step allows commit/push to proceed even if some packages fail to compileRe-propagate compile failurestep ensures the job still shows as failed after successful packages ship- The
if: always()condition on commit step ensures partial results are committed
✅ Sound design: This prevents a single malformed changelog fragment from blocking all other packages. Failed packages still get highlighted in the job output.
LGTM — Both the original fix and the new resilience improvement are well-designed.
Cherry-picks the workflow tolerance piece from isaac-sim#5831 onto main so the scheduled cron (which only reads main's copy of nightly-changelog.yml) actually gets the partial-success behavior, instead of leaving it dead code on develop and release. Compile step gains ``id: compile`` + ``continue-on-error: true`` so a single package's malformed CHANGELOG.rst no longer short-circuits the entire matrix entry. Commit/push gate becomes ``always() && !inputs.dry_run`` so successful packages still ship their CHANGELOG.rst / extension.toml / deleted-fragment writes even when one package raised during compile. Trailing Re-propagate failure step re-asserts the non-zero exit so the job tile stays red and the maintainer notices.
The nightly changelog compile bumps both ``config/extension.toml`` and ``pyproject.toml`` (when a ``[project]`` version line is present). The workflow's commit step only staged ``extension.toml``, leaving the pyproject bumps as unstaged changes. The subsequent ``git pull --rebase`` refuses with ``cannot pull with rebase: You have unstaged changes`` and the matrix entry exits 128 before push. isaac-sim#5785 made every managed package carry a ``pyproject.toml`` with a ``[project] version`` field, so the pyproject write went from no-op to active and the missing glob became reachable. Add ``source/*/pyproject.toml`` to the git-add list so the rebase/push sequence runs on a clean working tree.
822c6f4 to
616eea1
Compare
Greptile SummaryThis PR restores the nightly changelog cron after a morning failure caused by unstaged
Confidence Score: 4/5Safe to merge; fixes a real regression that was breaking the nightly cron, with one minor edge case worth tracking. The pyproject.toml staging fix directly addresses the documented failure mode and is straightforward. The compile-failure tolerance logic is correct — steps.compile.outcome (pre-continue-on-error) is the right field to inspect, and the re-propagation step reliably keeps the job red. The one rough edge is that always() on the commit/push step makes it fire even on workflow cancellation, which could silently push partial changelog state to the branch; a more targeted condition would close that window without affecting the intended partial-success behavior. .github/workflows/nightly-changelog.yml — specifically the always() condition on the commit/push step and its interaction with workflow cancellation. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Compile fragments\ncontinue-on-error: true] -->|outcome=success| B[Commit and push\nif: always && !dry_run]
A -->|outcome=failure\npartial writes on disk| B
A -->|outcome=cancelled| B
B --> C{git diff --staged --quiet?}
C -->|No staged changes| D[Echo: nothing to commit\nExit 0]
C -->|Has staged changes| E[git commit\ngit pull --rebase\ngit push]
E --> F[Re-propagate compile failure\nif: steps.compile.outcome == 'failure']
D --> F
F -->|compile failed| G[exit 1\nJob tile stays red]
F -->|compile succeeded| H[Step skipped\nJob passes]
Reviews (1): Last reviewed commit: "Stage pyproject.toml in nightly changelo..." | Re-trigger Greptile |
| # rest still wrote their CHANGELOG.rst / deleted fragments / bumped | ||
| # extension.toml). The staged-diff check below short-circuits cleanly | ||
| # if the failed package was the only one with pending work. | ||
| if: ${{ always() && !inputs.dry_run }} |
There was a problem hiding this comment.
always() runs commit/push on job cancellation
always() is evaluated as true even when the job is cancelled mid-run. If a maintainer cancels the workflow during the compile step, steps.compile.outcome becomes 'cancelled' (not 'failure'), the commit step fires, whatever partial on-disk state the compile script left behind gets staged and pushed, and the "Re-propagate compile failure" step is skipped (it only checks for 'failure'). The branch ends up with a partial-state commit while the job tile shows "Cancelled" — easy to miss.
Using if: ${{ (success() || steps.compile.outcome == 'failure') && !inputs.dry_run }} would limit the commit step to runs where compile either fully succeeded or cleanly failed, matching the intent of the design while closing the cancellation window.
``always() && !inputs.dry_run`` evaluates as true even when the job is cancelled mid-compile (10-minute timeout, maintainer cancel). At that point ``steps.compile.outcome`` is ``'cancelled'`` (not ``'failure'``), so the commit step would run against a partial on-disk state AND the ``Re-propagate compile failure`` step would skip (it checks for ``'failure'`` specifically) — pushing a half-finished commit while the job tile shows "Cancelled". Tighten the guard to ``(success() || steps.compile.outcome == 'failure')``: fires on clean success, fires on caught failure (so partial-success still ships per the design), does NOT fire on cancellation or earlier-step failures. Comment updated to name the explicit outcomes covered.
|
There seems regression in main such that the test is not passing, which is required before "Build Docs" can run? |
1. Summary
.github/workflows/nightly-changelog.ymlon main (the cron-host copy is the only one the schedule reads).2. Background
The cron at 06:27 UTC on 2026-05-29 short-circuited both matrix entries:
develop matrix exited 128 in the commit/push step with
error: cannot pull with rebase: You have unstaged changes.tools/changelog/cli.py'sPackage.write_versionbumps bothconfig/extension.tomlANDpyproject.tomlwhen a[project]block exists. Clean up sub module packaging and remove setup.py #5785 added[project] versionlines to every managed package's pyproject.toml — activating a write path in cli.py that the workflow's commit step never staged. The unstaged pyproject changes madegit pull --rebaserefuse.release/3.0.0-beta2 matrix exited 1 with
remote: error: GH013: Repository rule violations. The isaaclab-bot App isn't on ruleset 16056339's bypass-actor list forrefs/heads/release/*. Orthogonal to this PR — repo-admin bypass-list edit handled separately.This PR addresses failure 1. It also bundles the workflow tolerance piece from #5831, which was previously merged onto develop's copy of the file but never reached main — so the cron has been running without the partial-success guarantees #5831 was meant to provide.
3. Commits
3.1 Tolerate per-package compile failures (cherry-pick of #5831 workflow piece)
id: compile+continue-on-error: trueon the Compile step.if: ${{ always() && !inputs.dry_run }}so successful packages still ship when one package raised.Re-propagate compile failurestep re-asserts the non-zero exit so the job tile stays red.#5831 wedged this on develop's copy of the workflow file, but the cron only reads main's copy, so the partial-success behavior was dead code until this commit.
3.2 Stage pyproject.toml in the auto-commit
source/*/pyproject.tomlto the existinggit addglob.The downstream
Bumped packagesloop explicitly filterssource/*/config/extension.toml, so the new staged paths don't affect commit-body composition.4. Why both on main, not develop / release
The scheduled cron only registers on the default branch and only reads main's copy of
nightly-changelog.yml. Develop and release/3.0.0-beta2 each carry their own copies of the workflow, but those copies are consumed only via manualworkflow_dispatch(and even then, only if the maintainer picks that branch in the "Use workflow from" dropdown). Bug fixes to the cron behavior have to land on main to actually take effect.5. Verification
./isaaclab.sh -fclean.git diff --staged --quietcorrectly reports work pending; commit composes; the existinggit pull --rebaseruns against a clean working tree.workflow_dispatchagainst a sacrificial branch (see Test plan).6. Out of scope
7. Test plan
./isaaclab.sh -fclean.workflow_dispatchagainst a throwaway target branch (cut from develop, single trivial fragment) to confirm end-to-end compile → stage → commit → rebase → push.