Skip to content

[CI] Restore nightly changelog cron: pyproject staging + per-package failure tolerance#5859

Open
hujc7 wants to merge 3 commits into
isaac-sim:mainfrom
hujc7:jichuanh/nightly-stage-pyproject
Open

[CI] Restore nightly changelog cron: pyproject staging + per-package failure tolerance#5859
hujc7 wants to merge 3 commits into
isaac-sim:mainfrom
hujc7:jichuanh/nightly-stage-pyproject

Conversation

@hujc7
Copy link
Copy Markdown
Collaborator

@hujc7 hujc7 commented May 29, 2026

1. Summary

  • Restores the scheduled nightly changelog cron after two failures on the develop and release/3.0.0-beta2 matrix entries this morning (https://github.com/isaac-sim/IsaacLab/actions/runs/26621920244).
  • Two commits, both edit .github/workflows/nightly-changelog.yml on main (the cron-host copy is the only one the schedule reads).
  • 1 file changed, 30 insertions, 2 deletions.

2. Background

The cron at 06:27 UTC on 2026-05-29 short-circuited both matrix entries:

  1. develop matrix exited 128 in the commit/push step with error: cannot pull with rebase: You have unstaged changes. tools/changelog/cli.py's Package.write_version bumps both config/extension.toml AND pyproject.toml when a [project] block exists. Clean up sub module packaging and remove setup.py #5785 added [project] version lines to every managed package's pyproject.toml — activating a write path in cli.py that the workflow's commit step never staged. The unstaged pyproject changes made git pull --rebase refuse.

  2. release/3.0.0-beta2 matrix exited 1 with remote: error: GH013: Repository rule violations. The isaaclab-bot App isn't on ruleset 16056339's bypass-actor list for refs/heads/release/*. Orthogonal to this PR — repo-admin bypass-list edit handled separately.

This PR addresses failure 1. It also bundles the workflow tolerance piece from #5831, which was previously merged onto develop's copy of the file but never reached main — so the cron has been running without the partial-success guarantees #5831 was meant to provide.

3. Commits

3.1 Tolerate per-package compile failures (cherry-pick of #5831 workflow piece)

  • id: compile + continue-on-error: true on the Compile step.
  • Commit/push gate becomes if: ${{ always() && !inputs.dry_run }} so successful packages still ship when one package raised.
  • Trailing Re-propagate compile failure step re-asserts the non-zero exit so the job tile stays red.

#5831 wedged this on develop's copy of the workflow file, but the cron only reads main's copy, so the partial-success behavior was dead code until this commit.

3.2 Stage pyproject.toml in the auto-commit

  • Adds source/*/pyproject.toml to the existing git add glob.
  • Five-line comment naming the failure mode for future maintainers.

The downstream Bumped packages loop explicitly filters source/*/config/extension.toml, so the new staged paths don't affect commit-body composition.

4. Why both on main, not develop / release

The scheduled cron only registers on the default branch and only reads main's copy of nightly-changelog.yml. Develop and release/3.0.0-beta2 each carry their own copies of the workflow, but those copies are consumed only via manual workflow_dispatch (and even then, only if the maintainer picks that branch in the "Use workflow from" dropdown). Bug fixes to the cron behavior have to land on main to actually take effect.

5. Verification

  • ./isaaclab.sh -f clean.
  • Logic check: with the glob fix, compile's write to pyproject.toml gets staged; the staged-diff check git diff --staged --quiet correctly reports work pending; commit composes; the existing git pull --rebase runs against a clean working tree.
  • Runtime confirmation deferred to the next scheduled run (2026-05-30 06:27 UTC) or a manual workflow_dispatch against a sacrificial branch (see Test plan).

6. Out of scope

  • release/3.0.0-beta2 bypass-actor on ruleset 16056339 — repo-admin UI edit, not a code change.
  • The auto-bump / GitRepo / AutoBumpRun refactor that would move the entire commit/push/rebase lifecycle into cli.py and make the workflow's git-add glob obsolete — sequel PR, design discussed in the iteration thread.

7. Test plan

  • ./isaaclab.sh -f clean.
  • Authorship audit: both commits authored by jichuanh.
  • Optional pre-merge: workflow_dispatch against a throwaway target branch (cut from develop, single trivial fragment) to confirm end-to-end compile → stage → commit → rebase → push.
  • Post-merge: next scheduled cron green for the develop matrix entry.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR fixes a CI workflow issue in the nightly changelog auto-commit step by adding source/*/pyproject.toml to the staged files.

Analysis

What changed:

  • Added source/*/pyproject.toml to the git add glob pattern in .github/workflows/nightly-changelog.yml
  • Added a 5-line comment explaining the rationale

Root cause (well-documented in PR):

  • Package.write_version in tools/changelog/cli.py bumps both config/extension.toml AND pyproject.toml when a [project] block exists
  • The workflow only staged extension.toml, leaving pyproject.toml changes unstaged
  • The subsequent git pull --rebase fails with "You have unstaged changes"
  • This was a latent issue exposed by #5785 which added [project] version fields to all managed packages

Assessment:
Correctness: The glob pattern source/*/pyproject.toml correctly matches the same package structure as the existing source/*/config/extension.toml pattern

Side effects: Minimal risk - staging additional files that are already being modified by the compile step is safe. The PR notes that the downstream "Bumped packages" loop filters explicitly to extension.toml, so commit body composition is unaffected

Documentation: Excellent inline comment explaining why both files need to be staged

Consistency: Follows the existing pattern of the workflow

Verdict

LGTM — This is a clean, well-documented one-line fix (plus helpful comments) for a real CI failure. The change is minimal and correctly addresses the root cause.


Update (2026-05-29)

Reviewed new commits pushed since initial review:

New change: Tolerate per-package compile failures

  • continue-on-error: true added to compile step allows commit/push to proceed even if some packages fail to compile
  • Re-propagate compile failure step ensures the job still shows as failed after successful packages ship
  • The if: always() condition on commit step ensures partial results are committed

Sound design: This prevents a single malformed changelog fragment from blocking all other packages. Failed packages still get highlighted in the job output.

LGTM — Both the original fix and the new resilience improvement are well-designed.

hujc7 added 2 commits May 29, 2026 09:23
Cherry-picks the workflow tolerance piece from isaac-sim#5831 onto main so the
scheduled cron (which only reads main's copy of nightly-changelog.yml)
actually gets the partial-success behavior, instead of leaving it dead
code on develop and release.

Compile step gains ``id: compile`` + ``continue-on-error: true`` so a
single package's malformed CHANGELOG.rst no longer short-circuits the
entire matrix entry. Commit/push gate becomes ``always() &&
!inputs.dry_run`` so successful packages still ship their CHANGELOG.rst
/ extension.toml / deleted-fragment writes even when one package raised
during compile. Trailing Re-propagate failure step re-asserts the
non-zero exit so the job tile stays red and the maintainer notices.
The nightly changelog compile bumps both ``config/extension.toml`` and
``pyproject.toml`` (when a ``[project]`` version line is present). The
workflow's commit step only staged ``extension.toml``, leaving the
pyproject bumps as unstaged changes. The subsequent ``git pull --rebase``
refuses with ``cannot pull with rebase: You have unstaged changes`` and
the matrix entry exits 128 before push.

isaac-sim#5785 made every managed package carry a ``pyproject.toml`` with a
``[project] version`` field, so the pyproject write went from no-op
to active and the missing glob became reachable.

Add ``source/*/pyproject.toml`` to the git-add list so the rebase/push
sequence runs on a clean working tree.
@hujc7 hujc7 force-pushed the jichuanh/nightly-stage-pyproject branch from 822c6f4 to 616eea1 Compare May 29, 2026 09:23
@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels May 29, 2026
@hujc7 hujc7 changed the title [CI] Stage pyproject.toml in nightly changelog auto-commit [CI] Restore nightly changelog cron: pyproject staging + per-package failure tolerance May 29, 2026
@hujc7 hujc7 changed the base branch from develop to main May 29, 2026 09:24
@hujc7 hujc7 marked this pull request as ready for review May 29, 2026 16:43
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR restores the nightly changelog cron after a morning failure caused by unstaged pyproject.toml writes blocking git pull --rebase. It also cherry-picks the per-package compile-failure tolerance from #5831 (which had only reached the develop copy of the workflow, not main, so the cron never benefited from it).

  • Root-cause fix: adds source/*/pyproject.toml to the git add glob alongside extension.toml, with a comment explaining the Package.write_version dual-write that triggered the regression introduced by Clean up sub module packaging and remove setup.py #5785.
  • Compile failure tolerance: id: compile + continue-on-error: true lets the commit/push step run for packages that did succeed; a trailing "Re-propagate compile failure" step re-asserts the non-zero exit so the job tile stays red when any package failed.
  • Commit gate updated: if: ${{ always() && !inputs.dry_run }} replaces the bare !inputs.dry_run so the commit step is reachable after a partial compile failure.

Confidence Score: 4/5

Safe to merge; fixes a real regression that was breaking the nightly cron, with one minor edge case worth tracking.

The pyproject.toml staging fix directly addresses the documented failure mode and is straightforward. The compile-failure tolerance logic is correct — steps.compile.outcome (pre-continue-on-error) is the right field to inspect, and the re-propagation step reliably keeps the job red. The one rough edge is that always() on the commit/push step makes it fire even on workflow cancellation, which could silently push partial changelog state to the branch; a more targeted condition would close that window without affecting the intended partial-success behavior.

.github/workflows/nightly-changelog.yml — specifically the always() condition on the commit/push step and its interaction with workflow cancellation.

Important Files Changed

Filename Overview
.github/workflows/nightly-changelog.yml Adds compile failure tolerance (continue-on-error, always() commit gate, re-propagate step) and fixes the root-cause git-add regression by staging source/*/pyproject.toml. Logic is sound; one edge case with always() on workflow cancellation could commit partial state silently.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Compile fragments\ncontinue-on-error: true] -->|outcome=success| B[Commit and push\nif: always && !dry_run]
    A -->|outcome=failure\npartial writes on disk| B
    A -->|outcome=cancelled| B
    B --> C{git diff --staged --quiet?}
    C -->|No staged changes| D[Echo: nothing to commit\nExit 0]
    C -->|Has staged changes| E[git commit\ngit pull --rebase\ngit push]
    E --> F[Re-propagate compile failure\nif: steps.compile.outcome == 'failure']
    D --> F
    F -->|compile failed| G[exit 1\nJob tile stays red]
    F -->|compile succeeded| H[Step skipped\nJob passes]
Loading

Reviews (1): Last reviewed commit: "Stage pyproject.toml in nightly changelo..." | Re-trigger Greptile

Comment thread .github/workflows/nightly-changelog.yml Outdated
# rest still wrote their CHANGELOG.rst / deleted fragments / bumped
# extension.toml). The staged-diff check below short-circuits cleanly
# if the failed package was the only one with pending work.
if: ${{ always() && !inputs.dry_run }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 always() runs commit/push on job cancellation

always() is evaluated as true even when the job is cancelled mid-run. If a maintainer cancels the workflow during the compile step, steps.compile.outcome becomes 'cancelled' (not 'failure'), the commit step fires, whatever partial on-disk state the compile script left behind gets staged and pushed, and the "Re-propagate compile failure" step is skipped (it only checks for 'failure'). The branch ends up with a partial-state commit while the job tile shows "Cancelled" — easy to miss.

Using if: ${{ (success() || steps.compile.outcome == 'failure') && !inputs.dry_run }} would limit the commit step to runs where compile either fully succeeded or cleanly failed, matching the intent of the design while closing the cancellation window.

``always() && !inputs.dry_run`` evaluates as true even when the job is
cancelled mid-compile (10-minute timeout, maintainer cancel). At that
point ``steps.compile.outcome`` is ``'cancelled'`` (not ``'failure'``),
so the commit step would run against a partial on-disk state AND the
``Re-propagate compile failure`` step would skip (it checks for
``'failure'`` specifically) — pushing a half-finished commit while the
job tile shows "Cancelled".

Tighten the guard to ``(success() || steps.compile.outcome == 'failure')``:
fires on clean success, fires on caught failure (so partial-success still
ships per the design), does NOT fire on cancellation or earlier-step
failures. Comment updated to name the explicit outcomes covered.
@hujc7
Copy link
Copy Markdown
Collaborator Author

hujc7 commented May 29, 2026

There seems regression in main such that the test is not passing, which is required before "Build Docs" can run?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants