Conversation
…dition
The release workflow had two related cancellation bugs:
1. Three downstream jobs (build-debian-pkg, publish-docker-image,
publish-release) used `if: always() && ...`, which kept them
dispatching onto runners even after the operator cancelled the
workflow. The intermediate `contains(needs.X.result, 'failure')`
checks did not match `'cancelled'`, and `always()` explicitly
bypasses workflow-level cancellation.
2. The In-case-of-failure rollback job enumerated specific failure
modes (`contains(..., 'failure')` for each upstream) and required
`build-release.result == 'success'`. Cancellations and
skipped-due-to-upstream cases did not satisfy any branch, so the
git tag was left on the remote whenever the operator cancelled the
run. The rollback step itself also assumed the tag was always
present, so calling it on a workflow that never reached the
tag-push step would emit a confusing "tag does not exist" failure.
Changes per job
---------------
build-release (no `if:` change)
Always runs. The tag-push step (line 119) is still gated by
`inputs.perform_release && inputs.release_version != ''`, so a
dry-run never creates a tag.
test-release (no `if:` change)
Job-level `if:` remains `! inputs.skip_tests`. Skipped tests yield
result `'skipped'`, which downstream jobs continue to treat as a
pass — same as before.
build-debian-pkg
`if: always() && contains(... 'success') && !contains(... 'failure')`
->
`if: !cancelled() && contains(... 'success') && !contains(... 'failure')`
- Full success: runs.
- test-release failed: skipped (`!contains('failure','failure')`
is false).
- test-release skipped (skip_tests=true): runs.
- build-release failed/cancelled: skipped (no 'success' substring).
- Workflow cancelled: skipped — was the bug; previously ran because
`'cancelled'` does not contain `'failure'`.
publish-docker-image
Same change as build-debian-pkg, same behaviour matrix. The
internal docker push step is additionally gated by
`inputs.perform_release` for dry-run support.
publish-release
`if: always() && contains(... 'success') x3`
->
`if: !cancelled() && contains(... 'success') x3`
- Full success: runs.
- Any upstream failed/cancelled/skipped: skipped (any
`contains('success')` clause becomes false).
- Workflow cancelled in the window between all upstreams finishing
as success and this job starting: skipped — was the bug.
- The internal `gh release create` step keeps its current `set +e`
soft-failure behaviour intentionally: a half-published draft must
be completable manually using the printed instructions and the
already-built artefacts, so this step is by design not allowed to
fail the job. publish-release.result therefore stays 'success'
even when the GitHub release object itself was not created, and
rollback does not fire — preserving the tag and artefacts for the
operator's manual recovery.
In-case-of-failure
Job-level condition simplified from a four-branch failure-mode
enumeration to a single positive invariant: rollback whenever the
release did not fully publish.
`if: always() && perform_release && contains(build-release, 'success')
&& (contains(test-release, 'failure') || ...four-way OR... )`
->
`if: always() && perform_release && release_version != ''
&& needs.publish-release.result != 'success'`
`!= 'success'` matches `'failure'`, `'cancelled'`, and `'skipped'`
uniformly. The `build-release.result == 'success'` precondition is
removed so rollback also runs when build-release was cancelled
mid-flight (the tag may already have been pushed at line 119 before
the cancel signal arrived). The rollback step is now idempotent via
`git ls-remote --exit-code --quiet --tags`: deletes the tag if it
exists, otherwise prints "nothing to rollback" and exits 0.
Behaviour matrix (assume perform_release=true unless noted)
-----------------------------------------------------------
| Scenario | publish-release | rollback |
|---------------------------------------------------|-----------------|----------|
| Happy path | success | no |
| Dry run (perform_release=false) | success | no |
| skip_tests=true, otherwise success | success | no |
| build-release fails (tag may or may not exist) | skipped | yes |
| test-release fails | skipped | yes |
| build-debian-pkg fails | skipped | yes |
| publish-docker-image fails | skipped | yes |
| Cancel during build-release | skipped | yes |
| Cancel during test-release | skipped | yes |
| Cancel during build-debian-pkg | skipped | yes |
| Cancel during publish-docker-image | skipped | yes |
| Cancel during publish-release | cancelled | yes |
| Cancel after publish-release succeeded (too late) | success | no |
| gh release create fails (by design, soft-fail) | success | no |
| Operator cancels rollback itself | - | cancelled (tag may persist; manual cleanup) |
Step-level conditions left unchanged
------------------------------------
- `if: always()` on the diagnostic torrent-client-status upload
(test-release line 420). Step-level `always()` is contained to
the job; it lets us capture diagnostics even when the QA test
step itself failed.
- `if: ${{ inputs.perform_release }}` on the publish-side steps
(lines 78, 113, 586, 627, 696). These gate API calls and tag
creation for dry-run support.
- `if: ${{ ! inputs.disable_version_check }}` on the version
validation step (line 164).
Co-Authored-By: Claude
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the GitHub Actions release workflow to better respect operator cancellations and to make rollback behavior consistent when the release is not fully published.
Changes:
- Adjusts downstream job conditions to avoid dispatching jobs after a workflow cancellation.
- Simplifies rollback job condition to trigger whenever publishing did not complete successfully, and makes tag deletion idempotent.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
JkLondon
approved these changes
Apr 30, 2026
Member
Author
|
I tested it -- https://github.com/erigontech/erigon/actions/runs/25304335979/job/74177149288 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix two related cancellation bugs in
.github/workflows/release.yml:build-debian-pkg,publish-docker-image, andpublish-releaseusedif: always() && ..., which kept dispatching them onto runners even after the operator cancelled the workflow. The intermediatecontains(needs.X.result, 'failure')checks did not match'cancelled', andalways()explicitly bypasses workflow-level cancellation.In-case-of-failureenumerated specific failure modes (contains(..., 'failure')on each upstream) and requiredbuild-release.result == 'success'. Cancellations and skipped-due-to-upstream cases didn't satisfy any branch, so the git tag was left on the remote whenever the operator cancelled. The rollback step itself also assumed the tag was always present, so calling it on a workflow that never reached the tag-push step would emit a confusing "tag does not exist" failure.Changes per job
build-release — no
if:changeAlways runs. Tag-push step (line 119) is still gated by
inputs.perform_release && inputs.release_version != '', so a dry-run never creates a tag.test-release — no
if:changeJob-level
if:remains! inputs.skip_tests. Skipped tests yield result'skipped', which downstream jobs continue to treat as a pass — same as before.build-debian-pkg
test-releasefailed: skipped.test-releaseskipped (skip_tests=true): runs.build-releasefailed/cancelled: skipped.'cancelled'does not contain'failure'.publish-docker-image
Same change as
build-debian-pkg, same behaviour matrix. Internal docker push step is additionally gated byinputs.perform_releasefor dry-run support.publish-release
gh release createstep keeps its currentset +esoft-failure behaviour intentionally: a half-published draft must be completable manually using the printed instructions and the already-built artefacts. The job is by design not allowed to fail.publish-release.resulttherefore stays'success'even when the GitHub release object itself was not created, and rollback does not fire — preserving the tag and artefacts for the operator's manual recovery.In-case-of-failure
Condition simplified from a four-branch failure-mode enumeration to a single positive invariant: roll back whenever the release did not fully publish.
!= 'success'matches'failure','cancelled', and'skipped'uniformly. Thebuild-release.result == 'success'precondition is removed so rollback also runs whenbuild-releasewas cancelled mid-flight (the tag may already have been pushed at line 119 before the cancel signal arrived).The rollback step is now idempotent via
git ls-remote --exit-code --quiet --tags:Behaviour matrix (assume
perform_release=trueunless noted)perform_release=false)skip_tests=true, otherwise successgh release createfails (by design, soft-fail)Step-level conditions left unchanged
if: always()on the diagnostic torrent-client-status upload (test-release line 420). Step-levelalways()is contained to the job; it lets us capture diagnostics even when the QA test step itself failed.if: ${{ inputs.perform_release }}on the publish-side steps (lines 78, 113, 586, 627, 696). These gate API calls and tag creation for dry-run support.if: ${{ ! inputs.disable_version_check }}on the version validation step (line 164).Test plan
perform_release: false(dry run) — confirm all jobs run as before, no tag created, rollback not triggered.perform_release: trueand let it complete normally — confirm release published, rollback not triggered.perform_release: true, cancel duringtest-release— confirmbuild-debian-pkgandpublish-docker-imageare skipped (not dispatched), rollback runs and removes tag.perform_release: true, cancel duringbuild-debian-pkg— confirmpublish-releaseis skipped, rollback runs and removes tag.perform_release: trueagainst a non-existent commit/branch to force abuild-releasefailure — confirm rollback runs and reports "nothing to rollback" (idempotent).Cherry-pick to
release/3.4will follow once this is merged tomain.