Skip to content

.github/workflows/release: fix cancellation handling and rollback condition#20872

Merged
lystopad merged 1 commit intomainfrom
feature/lystopad/release-workflow-token-and-cancel-fixes
May 4, 2026
Merged

.github/workflows/release: fix cancellation handling and rollback condition#20872
lystopad merged 1 commit intomainfrom
feature/lystopad/release-workflow-token-and-cancel-fixes

Conversation

@lystopad
Copy link
Copy Markdown
Member

Summary

Fix two related cancellation bugs in .github/workflows/release.yml:

  1. Downstream jobs ignored cancellation. build-debian-pkg, publish-docker-image, and publish-release used if: always() && ..., which kept dispatching them onto runners even after the operator cancelled the workflow. The intermediate contains(needs.X.result, 'failure') checks did not match 'cancelled', and always() explicitly bypasses workflow-level cancellation.
  2. Rollback skipped cancellations. In-case-of-failure enumerated specific failure modes (contains(..., 'failure') on each upstream) and required build-release.result == 'success'. Cancellations and skipped-due-to-upstream cases didn't satisfy any branch, so the git tag was left on the remote whenever the operator cancelled. The rollback step itself also assumed the tag was always present, so calling it on a workflow that never reached the tag-push step would emit a confusing "tag does not exist" failure.

Changes per job

build-release — no if: change

Always runs. Tag-push step (line 119) is still gated by inputs.perform_release && inputs.release_version != '', so a dry-run never creates a tag.

test-release — no if: change

Job-level if: remains ! inputs.skip_tests. Skipped tests yield result 'skipped', which downstream jobs continue to treat as a pass — same as before.

build-debian-pkg

-if: always() && contains(needs.build-release.result, 'success') && !contains(needs.test-release.result, 'failure')
+if: !cancelled() && contains(needs.build-release.result, 'success') && !contains(needs.test-release.result, 'failure')
  • Full success: runs.
  • test-release failed: skipped.
  • test-release skipped (skip_tests=true): runs.
  • build-release failed/cancelled: skipped.
  • Workflow cancelled: skipped — was the bug; previously ran because 'cancelled' does not contain 'failure'.

publish-docker-image

Same change as build-debian-pkg, same behaviour matrix. Internal docker push step is additionally gated by inputs.perform_release for dry-run support.

publish-release

-if: always() && contains(... 'success') && contains(... 'success') && contains(... 'success')
+if: !cancelled() && contains(... 'success') && contains(... 'success') && contains(... 'success')
  • Full success: runs.
  • Any upstream failed/cancelled/skipped: skipped.
  • Workflow cancelled in the window between all upstreams finishing as success and this job starting: skipped — was the bug.
  • The internal gh release create step keeps its current set +e soft-failure behaviour intentionally: a half-published draft must be completable manually using the printed instructions and the already-built artefacts. The job is by design not allowed to fail. publish-release.result therefore stays 'success' even when the GitHub release object itself was not created, and rollback does not fire — preserving the tag and artefacts for the operator's manual recovery.

In-case-of-failure

Condition simplified from a four-branch failure-mode enumeration to a single positive invariant: roll back whenever the release did not fully publish.

 if: >-
   always() &&
   inputs.perform_release &&
-  contains(needs.build-release.result, 'success') &&
-  (
-    contains(needs.test-release.result, 'failure') ||
-    contains(needs.build-debian-pkg.result, 'failure') ||
-    contains(needs.publish-docker-image.result, 'failure') ||
-    contains(needs.publish-release.result, 'failure')
-  )
+  inputs.release_version != '' &&
+  needs.publish-release.result != 'success'

!= 'success' matches 'failure', 'cancelled', and 'skipped' uniformly. The build-release.result == 'success' precondition is removed so rollback also runs when build-release was cancelled mid-flight (the tag may already have been pushed at line 119 before the cancel signal arrived).

The rollback step is now idempotent via git ls-remote --exit-code --quiet --tags:

- git push -d origin ${RELEASE_VERSION}
+ if git ls-remote --exit-code --quiet --tags origin "${RELEASE_VERSION}"; then
+   git push -d origin "${RELEASE_VERSION}"
+   echo "Tag ${RELEASE_VERSION} removed."
+ else
+   echo "Tag ${RELEASE_VERSION} not present on remote — nothing to rollback."
+ fi

Behaviour matrix (assume perform_release=true unless noted)

Scenario publish-release rollback runs
Happy path success no
Dry run (perform_release=false) success no
skip_tests=true, otherwise success success no
build-release fails (tag may or may not exist) skipped yes
test-release fails skipped yes
build-debian-pkg fails skipped yes
publish-docker-image fails skipped yes
Cancel during build-release skipped yes
Cancel during test-release skipped yes
Cancel during build-debian-pkg skipped yes
Cancel during publish-docker-image skipped yes
Cancel during publish-release cancelled yes
Cancel after publish-release succeeded (too late) success no
gh release create fails (by design, soft-fail) success no
Operator cancels rollback itself - cancelled — tag may persist; manual cleanup

Step-level conditions left unchanged

  • if: always() on the diagnostic torrent-client-status upload (test-release line 420). Step-level always() is contained to the job; it lets us capture diagnostics even when the QA test step itself failed.
  • if: ${{ inputs.perform_release }} on the publish-side steps (lines 78, 113, 586, 627, 696). These gate API calls and tag creation for dry-run support.
  • if: ${{ ! inputs.disable_version_check }} on the version validation step (line 164).

Test plan

  • Dispatch with perform_release: false (dry run) — confirm all jobs run as before, no tag created, rollback not triggered.
  • Dispatch with perform_release: true and let it complete normally — confirm release published, rollback not triggered.
  • Dispatch with perform_release: true, cancel during test-release — confirm build-debian-pkg and publish-docker-image are skipped (not dispatched), rollback runs and removes tag.
  • Dispatch with perform_release: true, cancel during build-debian-pkg — confirm publish-release is skipped, rollback runs and removes tag.
  • Dispatch with perform_release: true against a non-existent commit/branch to force a build-release failure — confirm rollback runs and reports "nothing to rollback" (idempotent).

Cherry-pick to release/3.4 will follow once this is merged to main.

…dition

The release workflow had two related cancellation bugs:

1. Three downstream jobs (build-debian-pkg, publish-docker-image,
   publish-release) used `if: always() && ...`, which kept them
   dispatching onto runners even after the operator cancelled the
   workflow. The intermediate `contains(needs.X.result, 'failure')`
   checks did not match `'cancelled'`, and `always()` explicitly
   bypasses workflow-level cancellation.

2. The In-case-of-failure rollback job enumerated specific failure
   modes (`contains(..., 'failure')` for each upstream) and required
   `build-release.result == 'success'`. Cancellations and
   skipped-due-to-upstream cases did not satisfy any branch, so the
   git tag was left on the remote whenever the operator cancelled the
   run. The rollback step itself also assumed the tag was always
   present, so calling it on a workflow that never reached the
   tag-push step would emit a confusing "tag does not exist" failure.

Changes per job
---------------

build-release (no `if:` change)
  Always runs. The tag-push step (line 119) is still gated by
  `inputs.perform_release && inputs.release_version != ''`, so a
  dry-run never creates a tag.

test-release (no `if:` change)
  Job-level `if:` remains `! inputs.skip_tests`. Skipped tests yield
  result `'skipped'`, which downstream jobs continue to treat as a
  pass — same as before.

build-debian-pkg
  `if: always() && contains(... 'success') && !contains(... 'failure')`
  ->
  `if: !cancelled() && contains(... 'success') && !contains(... 'failure')`

  - Full success: runs.
  - test-release failed: skipped (`!contains('failure','failure')`
    is false).
  - test-release skipped (skip_tests=true): runs.
  - build-release failed/cancelled: skipped (no 'success' substring).
  - Workflow cancelled: skipped — was the bug; previously ran because
    `'cancelled'` does not contain `'failure'`.

publish-docker-image
  Same change as build-debian-pkg, same behaviour matrix. The
  internal docker push step is additionally gated by
  `inputs.perform_release` for dry-run support.

publish-release
  `if: always() && contains(... 'success') x3`
  ->
  `if: !cancelled() && contains(... 'success') x3`

  - Full success: runs.
  - Any upstream failed/cancelled/skipped: skipped (any
    `contains('success')` clause becomes false).
  - Workflow cancelled in the window between all upstreams finishing
    as success and this job starting: skipped — was the bug.
  - The internal `gh release create` step keeps its current `set +e`
    soft-failure behaviour intentionally: a half-published draft must
    be completable manually using the printed instructions and the
    already-built artefacts, so this step is by design not allowed to
    fail the job. publish-release.result therefore stays 'success'
    even when the GitHub release object itself was not created, and
    rollback does not fire — preserving the tag and artefacts for the
    operator's manual recovery.

In-case-of-failure
  Job-level condition simplified from a four-branch failure-mode
  enumeration to a single positive invariant: rollback whenever the
  release did not fully publish.

  `if: always() && perform_release && contains(build-release, 'success')
       && (contains(test-release, 'failure') || ...four-way OR... )`
  ->
  `if: always() && perform_release && release_version != ''
       && needs.publish-release.result != 'success'`

  `!= 'success'` matches `'failure'`, `'cancelled'`, and `'skipped'`
  uniformly. The `build-release.result == 'success'` precondition is
  removed so rollback also runs when build-release was cancelled
  mid-flight (the tag may already have been pushed at line 119 before
  the cancel signal arrived). The rollback step is now idempotent via
  `git ls-remote --exit-code --quiet --tags`: deletes the tag if it
  exists, otherwise prints "nothing to rollback" and exits 0.

Behaviour matrix (assume perform_release=true unless noted)
-----------------------------------------------------------

| Scenario                                          | publish-release | rollback |
|---------------------------------------------------|-----------------|----------|
| Happy path                                        | success         | no       |
| Dry run (perform_release=false)                   | success         | no       |
| skip_tests=true, otherwise success                | success         | no       |
| build-release fails (tag may or may not exist)    | skipped         | yes      |
| test-release fails                                | skipped         | yes      |
| build-debian-pkg fails                            | skipped         | yes      |
| publish-docker-image fails                        | skipped         | yes      |
| Cancel during build-release                       | skipped         | yes      |
| Cancel during test-release                        | skipped         | yes      |
| Cancel during build-debian-pkg                    | skipped         | yes      |
| Cancel during publish-docker-image                | skipped         | yes      |
| Cancel during publish-release                     | cancelled       | yes      |
| Cancel after publish-release succeeded (too late) | success         | no       |
| gh release create fails (by design, soft-fail)    | success         | no       |
| Operator cancels rollback itself                  | -               | cancelled (tag may persist; manual cleanup) |

Step-level conditions left unchanged
------------------------------------

- `if: always()` on the diagnostic torrent-client-status upload
  (test-release line 420). Step-level `always()` is contained to
  the job; it lets us capture diagnostics even when the QA test
  step itself failed.
- `if: ${{ inputs.perform_release }}` on the publish-side steps
  (lines 78, 113, 586, 627, 696). These gate API calls and tag
  creation for dry-run support.
- `if: ${{ ! inputs.disable_version_check }}` on the version
  validation step (line 164).

Co-Authored-By: Claude
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the GitHub Actions release workflow to better respect operator cancellations and to make rollback behavior consistent when the release is not fully published.

Changes:

  • Adjusts downstream job conditions to avoid dispatching jobs after a workflow cancellation.
  • Simplifies rollback job condition to trigger whenever publishing did not complete successfully, and makes tag deletion idempotent.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/release.yml
Comment thread .github/workflows/release.yml
@AskAlexSharov AskAlexSharov requested a review from JkLondon April 30, 2026 04:13
@lystopad
Copy link
Copy Markdown
Member Author

lystopad commented May 4, 2026

@lystopad lystopad added this pull request to the merge queue May 4, 2026
@lystopad lystopad added the DevOPS label May 4, 2026
Merged via the queue into main with commit b8a7235 May 4, 2026
48 of 53 checks passed
@lystopad lystopad deleted the feature/lystopad/release-workflow-token-and-cancel-fixes branch May 4, 2026 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants