neuron-ci: best_effort stack cleanup to prevent false job failures#77032
Conversation
The aws-deprovision-stacks step occasionally fails with DELETE_FAILED when CloudFormation stack cleanup times out, marking the entire job as failed even when all tests passed. This creates a custom best_effort version of the step so cleanup failures no longer affect job status. Stale stacks are cleaned up by the cleanup-vpc pre step on the next run. Made-with: Cursor
|
[REHEARSALNOTIFIER]
Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals. Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ybrodsky-rh, yevgeny-shnaidman The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/pj-rehearse ack |
|
@ybrodsky-rh: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/retest |
|
@ybrodsky-rh: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
…penshift#77032) The aws-deprovision-stacks step occasionally fails with DELETE_FAILED when CloudFormation stack cleanup times out, marking the entire job as failed even when all tests passed. This creates a custom best_effort version of the step so cleanup failures no longer affect job status. Stale stacks are cleaned up by the cleanup-vpc pre step on the next run. Made-with: Cursor
Summary
aws-neuron-operator-deprovision-stacksstep that is identical to the sharedaws-deprovision-stacksbut markedbest_effort: true, so CloudFormation stack cleanup failures no longer mark the entire job as failed when all tests passed.rosa-aws-sts-hcp-deprovisionchain in bothaws-neuron-operator-e2e-rosaandaws-neuron-operator-conditional-e2e-rosaworkflows to substitute the shared step with the best_effort version.aws-neuron-operator-cleanup-vpcpre step on the next run, so transient cleanup failures are harmless.Test plan
Made with Cursor