Skip to content

neuron-ci: best_effort stack cleanup to prevent false job failures#77032

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
ybrodsky-rh:neuron-ci-best-effort-cleanup
Mar 30, 2026
Merged

neuron-ci: best_effort stack cleanup to prevent false job failures#77032
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
ybrodsky-rh:neuron-ci-best-effort-cleanup

Conversation

@ybrodsky-rh
Copy link
Copy Markdown
Contributor

Summary

  • Creates a custom aws-neuron-operator-deprovision-stacks step that is identical to the shared aws-deprovision-stacks but marked best_effort: true, so CloudFormation stack cleanup failures no longer mark the entire job as failed when all tests passed.
  • Inlines the rosa-aws-sts-hcp-deprovision chain in both aws-neuron-operator-e2e-rosa and aws-neuron-operator-conditional-e2e-rosa workflows to substitute the shared step with the best_effort version.
  • Stale stacks are already handled by the aws-neuron-operator-cleanup-vpc pre step on the next run, so transient cleanup failures are harmless.

Test plan

  • CI validation passes (checkconfig, registry metadata)
  • Next periodic job run completes without being marked failed due to stack cleanup

Made with Cursor

The aws-deprovision-stacks step occasionally fails with DELETE_FAILED
when CloudFormation stack cleanup times out, marking the entire job as
failed even when all tests passed. This creates a custom best_effort
version of the step so cleanup failures no longer affect job status.
Stale stacks are cleaned up by the cleanup-vpc pre step on the next run.

Made-with: Cursor
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 29, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@ybrodsky-rh: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-rh-ecosystem-edge-neuron-ci-main-4.20-stable-aws-neuron-operator-e2e rh-ecosystem-edge/neuron-ci presubmit Registry content changed
pull-ci-rh-ecosystem-edge-neuron-ci-main-4.19-stable-aws-neuron-operator-e2e rh-ecosystem-edge/neuron-ci presubmit Registry content changed
periodic-ci-rh-ecosystem-edge-neuron-ci-main-4.20-stable-aws-neuron-operator-e2e-weekly N/A periodic Registry content changed
periodic-ci-rh-ecosystem-edge-neuron-ci-main-4.19-stable-aws-neuron-operator-e2e-weekly N/A periodic Registry content changed

Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@yevgeny-shnaidman
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 30, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 30, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ybrodsky-rh, yevgeny-shnaidman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ybrodsky-rh
Copy link
Copy Markdown
Contributor Author

/pj-rehearse ack

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@ybrodsky-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Mar 30, 2026
@ybrodsky-rh
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 30, 2026

@ybrodsky-rh: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 51a79c0 into openshift:main Mar 30, 2026
10 checks passed
mgencur pushed a commit to mgencur/release that referenced this pull request Mar 30, 2026
…penshift#77032)

The aws-deprovision-stacks step occasionally fails with DELETE_FAILED
when CloudFormation stack cleanup times out, marking the entire job as
failed even when all tests passed. This creates a custom best_effort
version of the step so cleanup failures no longer affect job status.
Stale stacks are cleaned up by the cleanup-vpc pre step on the next run.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants