Skip to content

NO-JIRA: test(e2e): extend AWS guest resource cleanup timeout#8124

Merged
sjenning merged 1 commit intoopenshift:mainfrom
sjenning:extend-cloud-cleanup-timeout
Mar 31, 2026
Merged

NO-JIRA: test(e2e): extend AWS guest resource cleanup timeout#8124
sjenning merged 1 commit intoopenshift:mainfrom
sjenning:extend-cloud-cleanup-timeout

Conversation

@sjenning
Copy link
Copy Markdown
Contributor

@sjenning sjenning commented Mar 30, 2026

Increase the timeout for validating AWS guest resources deletion from 15 minutes to 25 minutes to allow more time for cloud resources to be properly cleaned up during test teardown.

This is difficult to debug since the HCP dump occurs before the HC deletion, however, in the destroy.log there is a repeatable ~21m gap between these two log entries

{"level":"info","ts":1774889627.4369519,"msg":"Deleting hosted cluster","namespace":"e2e-clusters-j98vq","name":"scale-from-zero-wmfpn"}
{"level":"info","ts":1774890884.4558256,"msg":"Destroying infrastructure","infraID":"scale-from-zero-wmfpn"}

I think the flaky nature of this issue is due to transient AWS throttling/load.

Summary by CodeRabbit

  • Tests
    • Increased the timeout for AWS resource cleanup validation in end-to-end tests from 15 to 25 minutes, providing more time for cloud resources to be removed while keeping the same polling cadence and existing success/failure behavior.

Note

Low Risk
Low risk test-only change that only increases teardown wait time; main impact is potentially longer e2e runtime when AWS resources are slow to delete.

Overview
In e2e AWS teardown, increases the validateAWSGuestResourcesDeletedFunc polling timeout for detecting remaining tagged guest resources from 15 minutes to 25 minutes, keeping the same 20s poll interval and validation logic to reduce cleanup-related flakes.

Written by Cursor Bugbot for commit ed56576. This will update automatically on new commits. Configure here.

@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 30, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 836515a4-d271-45ac-a8eb-e2b8411e0d99

📥 Commits

Reviewing files that changed from the base of the PR and between be14454 and ed56576.

📒 Files selected for processing (1)
  • test/e2e/util/fixture.go
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/util/fixture.go

📝 Walkthrough

Walkthrough

The validateAWSGuestResourcesDeletedFunc function in the E2E test fixture has been updated to extend its timeout threshold from 15 minutes to 25 minutes. The polling interval of 20 seconds and the termination condition logic remain unchanged. This adjustment affects how long the AWS cleanup validation loop will wait before timing out when verifying that guest resources have been deleted.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Mar 30, 2026
@openshift-ci openshift-ci bot requested review from devguyio and muraee March 30, 2026 20:05
@bryan-cox
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 30, 2026
@openshift-ci-robot
Copy link
Copy Markdown

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-21
/test e2e-aws-4-21
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@sjenning
Copy link
Copy Markdown
Contributor Author

/override ci/prow/e2e-aks
/override ci/prow/e2e-aks-4-21
/override ci/prow/e2e-aws-4-21
/override ci/prow/e2e-aws-upgrade-hypershift-operator
/override ci/prow/e2e-azure-self-managed
/override ci/prow/e2e-kubevirt-aws-ovn-reduced
/override ci/prow/e2e-v2-aws

@sjenning sjenning changed the title test(e2e): extend AWS guest resource cleanup timeout No-JIRA: test(e2e): extend AWS guest resource cleanup timeout Mar 30, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 30, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sjenning: This pull request explicitly references no jira issue.

Details

In response to this:

Increase the timeout for validating AWS guest resources deletion from 15 minutes to 25 minutes to allow more time for cloud resources to be properly cleaned up during test teardown.

This is difficult to debug since the HCP dump occurs before the HC deletion, however, in the destroy.log there is a repeatable ~21m gap between these two log entries

{"level":"info","ts":1774889627.4369519,"msg":"Deleting hosted cluster","namespace":"e2e-clusters-j98vq","name":"scale-from-zero-wmfpn"}
{"level":"info","ts":1774890884.4558256,"msg":"Destroying infrastructure","infraID":"scale-from-zero-wmfpn"}

I think the flaky nature of this issue is due to transient AWS throttling/load.

Summary by CodeRabbit

  • Tests
  • Extended AWS resource cleanup validation timeout from 15 to 25 minutes in end-to-end tests, allowing additional time for cleanup verification while maintaining existing polling intervals and validation logic.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 30, 2026

@sjenning: Overrode contexts on behalf of sjenning: ci/prow/e2e-aks, ci/prow/e2e-aks-4-21, ci/prow/e2e-aws-4-21, ci/prow/e2e-aws-upgrade-hypershift-operator, ci/prow/e2e-azure-self-managed, ci/prow/e2e-kubevirt-aws-ovn-reduced, ci/prow/e2e-v2-aws

Details

In response to this:

/override ci/prow/e2e-aks
/override ci/prow/e2e-aks-4-21
/override ci/prow/e2e-aws-4-21
/override ci/prow/e2e-aws-upgrade-hypershift-operator
/override ci/prow/e2e-azure-self-managed
/override ci/prow/e2e-kubevirt-aws-ovn-reduced
/override ci/prow/e2e-v2-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sjenning sjenning changed the title No-JIRA: test(e2e): extend AWS guest resource cleanup timeout NO-JIRA: test(e2e): extend AWS guest resource cleanup timeout Mar 30, 2026
@sjenning
Copy link
Copy Markdown
Contributor Author

/retest-required

Increase the timeout for validating AWS guest resources deletion from
15 minutes to 25 minutes to allow more time for cloud resources to be
properly cleaned up during test teardown.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 26.83%. Comparing base (6e8849b) to head (ed56576).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
test/e2e/util/fixture.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8124   +/-   ##
=======================================
  Coverage   26.83%   26.83%           
=======================================
  Files        1090     1090           
  Lines      105229   105229           
=======================================
  Hits        28242    28242           
  Misses      74559    74559           
  Partials     2428     2428           
Files with missing lines Coverage Δ
test/e2e/util/fixture.go 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented Mar 31, 2026

Test Results

e2e-aws

Failed Tests

Total failed tests: 2

  • TestUpgradeControlPlane
  • TestUpgradeControlPlane/Teardown

e2e-aks

@sjenning sjenning force-pushed the extend-cloud-cleanup-timeout branch from be14454 to ed56576 Compare March 31, 2026 00:59
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 31, 2026

New changes are detected. LGTM label has been removed.

@sjenning sjenning added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2026
@openshift-ci-robot
Copy link
Copy Markdown

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-21
/test e2e-aws-4-21
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@openshift-ci-robot
Copy link
Copy Markdown

@sjenning: This pull request explicitly references no jira issue.

Details

In response to this:

Increase the timeout for validating AWS guest resources deletion from 15 minutes to 25 minutes to allow more time for cloud resources to be properly cleaned up during test teardown.

This is difficult to debug since the HCP dump occurs before the HC deletion, however, in the destroy.log there is a repeatable ~21m gap between these two log entries

{"level":"info","ts":1774889627.4369519,"msg":"Deleting hosted cluster","namespace":"e2e-clusters-j98vq","name":"scale-from-zero-wmfpn"}
{"level":"info","ts":1774890884.4558256,"msg":"Destroying infrastructure","infraID":"scale-from-zero-wmfpn"}

I think the flaky nature of this issue is due to transient AWS throttling/load.

Summary by CodeRabbit

  • Tests
  • Increased the timeout for AWS resource cleanup validation in end-to-end tests from 15 to 25 minutes, providing more time for cloud resources to be removed while keeping the same polling cadence and existing success/failure behavior.

[!NOTE]
Low Risk
Low risk test-only change that only increases teardown wait time; main impact is potentially longer e2e runtime when AWS resources are slow to delete.

Overview
In e2e AWS teardown, increases the validateAWSGuestResourcesDeletedFunc polling timeout for detecting remaining tagged guest resources from 15 minutes to 25 minutes, keeping the same 20s poll interval and validation logic to reduce cleanup-related flakes.

Written by Cursor Bugbot for commit ed56576. This will update automatically on new commits. Configure here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sjenning
Copy link
Copy Markdown
Contributor Author

/override ci/prow/e2e-aks
/override ci/prow/e2e-aks-4-21
/override ci/prow/e2e-aws
/override ci/prow/e2e-aws-4-21
/override ci/prow/e2e-aws-upgrade-hypershift-operator
/override ci/prow/e2e-azure-self-managed
/override ci/prow/e2e-kubevirt-aws-ovn-reduced
/override ci/prow/e2e-v2-aws

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 31, 2026

@sjenning: Overrode contexts on behalf of sjenning: ci/prow/e2e-aks, ci/prow/e2e-aks-4-21, ci/prow/e2e-aws, ci/prow/e2e-aws-4-21, ci/prow/e2e-aws-upgrade-hypershift-operator, ci/prow/e2e-azure-self-managed, ci/prow/e2e-kubevirt-aws-ovn-reduced, ci/prow/e2e-v2-aws

Details

In response to this:

/override ci/prow/e2e-aks
/override ci/prow/e2e-aks-4-21
/override ci/prow/e2e-aws
/override ci/prow/e2e-aws-4-21
/override ci/prow/e2e-aws-upgrade-hypershift-operator
/override ci/prow/e2e-azure-self-managed
/override ci/prow/e2e-kubevirt-aws-ovn-reduced
/override ci/prow/e2e-v2-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sjenning
Copy link
Copy Markdown
Contributor Author

/verified by @sjenning

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 31, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sjenning: This PR has been marked as verified by @sjenning.

Details

In response to this:

/verified by @sjenning

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sjenning sjenning merged commit 6f289ab into openshift:main Mar 31, 2026
43 of 46 checks passed
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 31, 2026

@sjenning: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants