Skip to content

[SREP-4255] Add --security-group flag to cleanup command#3204

Merged
openshift-merge-bot[bot] merged 3 commits into
openshift:mainfrom
smarthall:SREP-4255/cleanup-security-groups
Apr 23, 2026
Merged

[SREP-4255] Add --security-group flag to cleanup command#3204
openshift-merge-bot[bot] merged 3 commits into
openshift:mainfrom
smarthall:SREP-4255/cleanup-security-groups

Conversation

@smarthall
Copy link
Copy Markdown
Member

Add a --security-group flag to the osde2e cleanup command that removes leftover security groups from orphaned VPCs whose CloudFormation stacks are in DELETE_FAILED state.

Leftover security groups (e.g. from OCPBUGS-74960) block CloudFormation stack deletion, causing all VPC cleanup to fail silently. This flag is designed to run before --vpc so that the subsequent stack deletion can succeed.

The implementation:

  • Finds osde2e VPCs with DELETE_FAILED CloudFormation stacks
  • Skips VPCs belonging to active clusters
  • Revokes all ingress/egress rules to clear cross-SG dependencies
  • Deletes all non-default security groups in those VPCs
  • Reports results to Slack summary

@openshift-ci-robot
Copy link
Copy Markdown

There are test jobs defined for this repository which are not configured to run automatically. Comment /test ? to see a list of all defined jobs. Review these jobs and use /test <job> to manually trigger jobs most likely to be impacted by the proposed changes.Comment /pipeline required to trigger all required & necessary jobs.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 8, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 3720d027-fdd8-4d43-be9e-dea4bd053f8d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Added AWS security group cleanup functionality to the osde2e cleanup command via a new --security-groups flag. Implements a CleanupSecurityGroups method to remove orphaned security groups from VPCs tagged with osde2e-*, while preserving those tied to active clusters. Optimized VPC name parsing with a precompiled regular expression.

Changes

Cohort / File(s) Summary
Cleanup Command Integration
cmd/osde2e/cleanup/cmd.go
Added --security-groups flag to args, SGErrors field to Message type with JSON tag, and invoked CleanupSecurityGroups() in run() with error accumulation and summary reporting.
Security Group Cleanup Implementation
pkg/common/aws/securitygroups.go
New CleanupSecurityGroups method that discovers VPCs tagged osde2e-*, verifies CloudFormation stack status (DELETE_FAILED only), skips active clusters, revokes all permissions, and deletes orphaned security groups with dry-run and error-reporting support.
VPC Name Parsing Optimization
pkg/common/aws/vpc.go
Precompiled regex vpcNameRegexp to eliminate per-call compilation in getClusterNameFromVPCName, improving performance with identical matching logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@smarthall smarthall marked this pull request as ready for review April 8, 2026 05:58
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2026
@openshift-ci openshift-ci Bot requested review from YiqinZhang and ritmun April 8, 2026 05:58
@smarthall smarthall force-pushed the SREP-4255/cleanup-security-groups branch 2 times, most recently from f7db25b to bc5bf90 Compare April 8, 2026 06:23
smarthall added a commit to smarthall/release that referenced this pull request Apr 8, 2026
Add --security-group flag immediately before --vpc in all 7 AWS cleanup
jobs (cleanup-tekton-aws, cleanup-selfservice-aws, cleanup-dev-aws,
cleanup-hypershift-aws, cleanup-trt-aws, cleanup-informing-aws,
cleanup-rosa-nightly). This ensures leftover security groups are removed
before VPC CloudFormation stack deletion is attempted.

Workaround for OCPBUGS-74960, implemented in openshift/osde2e#3204.

Signed-off-by: Daniel Hall <danhall@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
cmd/osde2e/cleanup/cmd.go (1)

51-52: Minor: Naming inconsistency.

Variable securityGroups (plural) vs flag --security-group (singular). Consider aligning for clarity, though not blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/osde2e/cleanup/cmd.go` around lines 51 - 52, The variable name
securityGroups is inconsistent with the CLI flag --security-group; rename one
for clarity—either change the variable securityGroups to securityGroup to match
the singular flag, or change the flag to --security-groups to match the plural
variable—update all references to the chosen identifier (e.g., securityGroups,
securityGroup, and the flag definition/usage in cmd package) so tests and flag
parsing code continue to work.
pkg/common/aws/securitygroups.go (1)

78-83: Consider distinguishing stack-not-found from other errors.

All DescribeStacks errors are silently skipped. Transient failures (throttling, auth) would cause VPCs to be skipped without indication. Adding a log for non-404 errors would improve debuggability.

♻️ Proposed improvement
 		stackResp, err := cfnClient.DescribeStacks(&cloudformation.DescribeStacksInput{
 			StackName: aws.String(vpcStackName),
 		})
 		if err != nil {
+			// Stack not found is expected; log other errors for debugging
+			if !strings.Contains(err.Error(), "does not exist") {
+				log.Printf("Warning: DescribeStacks failed for %s: %v\n", vpcStackName, err)
+			}
 			continue
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/common/aws/securitygroups.go` around lines 78 - 83, The DescribeStacks
call in cfnClient.DescribeStacks when iterating vpcStackName currently swallows
all errors (err -> continue); update the error handling in that block to
distinguish the CloudFormation "stack does not exist" case from other errors:
check the error type/message returned by cfnClient.DescribeStacks (use the AWS
SDK error code or awsErr.Code() / errors.As to detect
"ValidationError"/"StackNotFound" semantics), continue silently only for the
not-found case, and for any other error log a warning or error including
vpcStackName and the full err (e.g., via processLogger/your logger) before
continuing; reference cfnClient.DescribeStacks, stackResp, and vpcStackName to
locate and change the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/osde2e/cleanup/cmd.go`:
- Around line 51-52: The variable name securityGroups is inconsistent with the
CLI flag --security-group; rename one for clarity—either change the variable
securityGroups to securityGroup to match the singular flag, or change the flag
to --security-groups to match the plural variable—update all references to the
chosen identifier (e.g., securityGroups, securityGroup, and the flag
definition/usage in cmd package) so tests and flag parsing code continue to
work.

In `@pkg/common/aws/securitygroups.go`:
- Around line 78-83: The DescribeStacks call in cfnClient.DescribeStacks when
iterating vpcStackName currently swallows all errors (err -> continue); update
the error handling in that block to distinguish the CloudFormation "stack does
not exist" case from other errors: check the error type/message returned by
cfnClient.DescribeStacks (use the AWS SDK error code or awsErr.Code() /
errors.As to detect "ValidationError"/"StackNotFound" semantics), continue
silently only for the not-found case, and for any other error log a warning or
error including vpcStackName and the full err (e.g., via processLogger/your
logger) before continuing; reference cfnClient.DescribeStacks, stackResp, and
vpcStackName to locate and change the code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 25f594b0-9e7f-487c-b31e-f1b4cd4c7d98

📥 Commits

Reviewing files that changed from the base of the PR and between bd7ed9e and bc5bf90.

📒 Files selected for processing (3)
  • cmd/osde2e/cleanup/cmd.go
  • pkg/common/aws/securitygroups.go
  • pkg/common/aws/vpc.go

@smarthall smarthall force-pushed the SREP-4255/cleanup-security-groups branch from bc5bf90 to ec9a3a9 Compare April 8, 2026 23:02
@smarthall
Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/common/aws/securitygroups.go`:
- Around line 82-89: The describe-stack error branch currently only logs and
continues, which hides discovery failures; update both occurrences (the branch
handling err in the DescribeStacks call around the vpcStackName check and the
similar branch later) to also increment failedCounter and append the error to
errorBuilder (e.g., failedCounter++ and errorBuilder.WriteString or
fmt.Fprintf(&errorBuilder, "describe stack %s: %v\n", vpcStackName, err)) before
continuing, while preserving the existing special-case for awserr.Error with
Code() == "ValidationError" so that missing stacks still skip without counting
as a failure.
- Around line 28-35: The DescribeVpcs and DescribeSecurityGroups calls on
CcsAwsSession.ec2 are non-paginated and can miss results; replace DescribeVpcs
with DescribeVpcsPages and DescribeSecurityGroups with
DescribeSecurityGroupsPages, iterating the pages and appending items into a
results slice while propagating any error returned by the paginator callback.
Update the logic around the current results variable(s) used in this file
(references to DescribeVpcs result and DescribeSecurityGroups result) to collect
from each page, check/return any pagination error, and preserve the existing
Filters and input structs when calling the Pages functions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 5ea4385e-a9e6-4010-a7d6-523577012206

📥 Commits

Reviewing files that changed from the base of the PR and between bd7ed9e and ec9a3a9.

📒 Files selected for processing (3)
  • cmd/osde2e/cleanup/cmd.go
  • pkg/common/aws/securitygroups.go
  • pkg/common/aws/vpc.go

Comment thread pkg/common/aws/securitygroups.go Outdated
Comment thread pkg/common/aws/securitygroups.go
@ritmun
Copy link
Copy Markdown
Contributor

ritmun commented Apr 13, 2026

/override ci/prow/hypershift-pr-check
/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 13, 2026

@ritmun: Overrode contexts on behalf of ritmun: ci/prow/hypershift-pr-check

Details

In response to this:

/override ci/prow/hypershift-pr-check
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread cmd/osde2e/cleanup/cmd.go Outdated
@smarthall smarthall force-pushed the SREP-4255/cleanup-security-groups branch from ec9a3a9 to 922feaf Compare April 21, 2026 00:31
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 21, 2026
Add a --security-group flag to the osde2e cleanup command that
removes leftover security groups from orphaned VPCs whose
CloudFormation stacks are in DELETE_FAILED state.

Leftover security groups (e.g. from OCPBUGS-74960) block
CloudFormation stack deletion, causing all VPC cleanup to fail
silently. This flag is designed to run before --vpc so that the
subsequent stack deletion can succeed.

The implementation:
- Finds osde2e VPCs with DELETE_FAILED CloudFormation stacks
- Skips VPCs belonging to active clusters
- Revokes all ingress/egress rules to clear cross-SG dependencies
- Deletes all non-default security groups in those VPCs
- Reports results to Slack summary

Signed-off-by: Daniel Hall <danhall@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@smarthall smarthall force-pushed the SREP-4255/cleanup-security-groups branch from 922feaf to 8a5043b Compare April 21, 2026 00:33
Comment thread cmd/osde2e/cleanup/cmd.go
smarthall and others added 2 commits April 23, 2026 14:16
Signed-off-by: Daniel Hall <danhall@redhat.com>
Extract core logic behind interfaces to enable testing with mocks,
covering all code paths: skip conditions, dry-run, rule revocation,
deletion success/failure, counter tracking, and error reporting.

Signed-off-by: Daniel Hall <danhall@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@smarthall smarthall force-pushed the SREP-4255/cleanup-security-groups branch from c0ac45c to fa0976c Compare April 23, 2026 05:35
@ritmun
Copy link
Copy Markdown
Contributor

ritmun commented Apr 23, 2026

/lgtm
/approve
/override ci/prow/hypershift-pr-check

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 23, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ritmun, smarthall

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 23, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

@ritmun: Overrode contexts on behalf of ritmun: ci/prow/hypershift-pr-check

Details

In response to this:

/lgtm
/approve
/override ci/prow/hypershift-pr-check

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

@smarthall: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit b25ce52 into openshift:main Apr 23, 2026
6 checks passed
smarthall added a commit to smarthall/release that referenced this pull request Apr 24, 2026
Adds the new --security-group flag before --vpc in all 7 AWS cleanup
periodic jobs. This cleans up leftover security groups in orphaned VPCs
with DELETE_FAILED CloudFormation stacks, unblocking subsequent VPC
stack deletion.

Depends on: openshift/osde2e#3204

Signed-off-by: Daniel Hall <danhall@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
openshift-merge-bot Bot pushed a commit to openshift/release that referenced this pull request Apr 24, 2026
Adds the new --security-group flag before --vpc in all 7 AWS cleanup
periodic jobs. This cleans up leftover security groups in orphaned VPCs
with DELETE_FAILED CloudFormation stacks, unblocking subsequent VPC
stack deletion.

Depends on: openshift/osde2e#3204

Signed-off-by: Daniel Hall <danhall@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Prucek pushed a commit to Prucek/release that referenced this pull request Apr 29, 2026
…t#78297)

Adds the new --security-group flag before --vpc in all 7 AWS cleanup
periodic jobs. This cleans up leftover security groups in orphaned VPCs
with DELETE_FAILED CloudFormation stacks, unblocking subsequent VPC
stack deletion.

Depends on: openshift/osde2e#3204

Signed-off-by: Daniel Hall <danhall@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
BATMAN-JD pushed a commit to BATMAN-JD/release that referenced this pull request May 1, 2026
…t#78297)

Adds the new --security-group flag before --vpc in all 7 AWS cleanup
periodic jobs. This cleans up leftover security groups in orphaned VPCs
with DELETE_FAILED CloudFormation stacks, unblocking subsequent VPC
stack deletion.

Depends on: openshift/osde2e#3204

Signed-off-by: Daniel Hall <danhall@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants