OCPBUGS-84971: Gate AWSDefaultSecurityGroupDeleted condition to AWS platform#8415
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@bryan-cox: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughThe bubbling of the Sequence Diagram(s)sequenceDiagram
participant Reconciler as HostedCluster Controller
participant API as Kubernetes API Server
participant HCP as HostedControlPlane
participant HCluster as HostedCluster Status
Reconciler->>API: Get HostedCluster
Reconciler->>API: Get HostedControlPlane (HCP)
Reconciler->>Reconciler: computeAWSDefaultSGDeletedCondition(hcluster, hcp)
alt platform != AWS or HCP nil or HCP not deleting
Reconciler-->>Reconciler: no condition computed / no change
else platform == AWS and HCP deleting
Reconciler->>HCP: read AWSDefaultSecurityGroupDeleted condition (if present)
Reconciler-->>Reconciler: compute condition (Unknown/True/False)
alt message differs from HCluster condition
Reconciler->>API: Update HostedCluster status (set condition)
API-->>HCluster: persist status
else message same
Reconciler-->>Reconciler: skip status update
end
end
🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-84971, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-84971, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
1675693 to
bdeba67
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8415 +/- ##
==========================================
+ Coverage 37.23% 37.25% +0.01%
==========================================
Files 752 752
Lines 91829 91833 +4
==========================================
+ Hits 34195 34214 +19
+ Misses 54993 54978 -15
Partials 2641 2641
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
@bryan-cox: This pull request references Jira Issue OCPBUGS-84971, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
The AWSDefaultSecurityGroupDeleted condition was being set on all HostedClusters during deletion regardless of platform, causing AWS-specific conditions to appear on Azure/KubeVirt/etc clusters. Extract the condition computation into a testable function and gate it behind a platform check matching the existing pattern used for AWSDefaultSecurityGroupCreated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bdeba67 to
273acd0
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go (1)
3728-3758: ⚡ Quick winAdd a same-status/different-message case to lock in message propagation behavior.
You already test the no-op path when the message matches. Please add the inverse case (same status/reason, different message) to ensure
changedis stilltrueand newer HCP context is propagated.Proposed test case addition
{ name: "When HC already has the same condition message, it should not report a change", hcluster: &hyperv1.HostedCluster{ Spec: hyperv1.HostedClusterSpec{ Platform: hyperv1.PlatformSpec{Type: hyperv1.AWSPlatform}, }, Status: hyperv1.HostedClusterStatus{ Conditions: []metav1.Condition{ { Type: string(hyperv1.AWSDefaultSecurityGroupDeleted), Status: metav1.ConditionTrue, Reason: "Deleted", Message: "Security group deleted", }, }, }, }, hcp: &hyperv1.HostedControlPlane{ ObjectMeta: metav1.ObjectMeta{DeletionTimestamp: &deletionTime}, Status: hyperv1.HostedControlPlaneStatus{ Conditions: []metav1.Condition{ { Type: string(hyperv1.AWSDefaultSecurityGroupDeleted), Status: metav1.ConditionTrue, Reason: "Deleted", Message: "Security group deleted", }, }, }, }, wantChanged: false, }, + { + name: "When HC has same status but different message, it should report a change", + hcluster: &hyperv1.HostedCluster{ + Spec: hyperv1.HostedClusterSpec{ + Platform: hyperv1.PlatformSpec{Type: hyperv1.AWSPlatform}, + }, + Status: hyperv1.HostedClusterStatus{ + Conditions: []metav1.Condition{ + { + Type: string(hyperv1.AWSDefaultSecurityGroupDeleted), + Status: metav1.ConditionTrue, + Reason: "Deleted", + Message: "old message", + }, + }, + }, + }, + hcp: &hyperv1.HostedControlPlane{ + ObjectMeta: metav1.ObjectMeta{DeletionTimestamp: &deletionTime}, + Status: hyperv1.HostedControlPlaneStatus{ + Conditions: []metav1.Condition{ + { + Type: string(hyperv1.AWSDefaultSecurityGroupDeleted), + Status: metav1.ConditionTrue, + Reason: "Deleted", + Message: "new message", + }, + }, + }, + }, + wantChanged: true, + wantStatus: metav1.ConditionTrue, + },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go` around lines 3728 - 3758, Add a new table-driven test case alongside the existing one that uses the same condition Type (string(hyperv1.AWSDefaultSecurityGroupDeleted)), Status (metav1.ConditionTrue) and Reason ("Deleted") but a different Message between hcluster.Status.Conditions and hcp.Status.Conditions; set hcluster to have the old message, hcp to have the new message, and assert wantChanged is true and that the controller logic updates/propagates the message from hcp into the HostedCluster condition (reference the hcluster, hcp, AWSDefaultSecurityGroupDeleted and wantChanged identifiers to locate and implement the case).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@hypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go`:
- Around line 3728-3758: Add a new table-driven test case alongside the existing
one that uses the same condition Type
(string(hyperv1.AWSDefaultSecurityGroupDeleted)), Status (metav1.ConditionTrue)
and Reason ("Deleted") but a different Message between
hcluster.Status.Conditions and hcp.Status.Conditions; set hcluster to have the
old message, hcp to have the new message, and assert wantChanged is true and
that the controller logic updates/propagates the message from hcp into the
HostedCluster condition (reference the hcluster, hcp,
AWSDefaultSecurityGroupDeleted and wantChanged identifiers to locate and
implement the case).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5318179b-f0e6-489c-9f4f-9b64ed9cef7e
📒 Files selected for processing (2)
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller_test.go
|
/pipeline required |
|
Scheduling tests matching the |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
Test Resultse2e-aws
e2e-aks
|
|
/retest one more time |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
/retest |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
/retest infra issue |
|
/test e2e-aws Previous failure was infra issue |
jparrill
left a comment
There was a problem hiding this comment.
LGTM — dropped a comment. Thanks!
|
Scheduling tests matching the |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox, jparrill The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
jparrill
left a comment
There was a problem hiding this comment.
Dropped a couple of comments. Thanks!
Please add a test case covering the "condition update" path — where the HC already has the SG condition but the HCP has progressed to a different message (e.g., from "security group still in use" to "Security group deleted"). This exercises the oldCondition.Message != freshCondition.Message branch explicitly and is the most common real-world scenario during active deletion.
Also consider setting hcluster.Generation to a non-zero value in at least one propagation test case and asserting condition.ObservedGeneration matches.
|
The PR only modifies the hosted cluster controller Go files — it has nothing to do with CI infrastructure, release image imports, or the multiarch scheduling gate. This is purely a CI infrastructure flake. Now I have all the evidence I need. Let me produce the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThis is a CI infrastructure flake unrelated to PR #8415. The job failed during the release payload import phase — specifically while importing the OCP 4.21 release image ( Root CauseThe root cause is a race condition between the multiarch scheduling gate webhook and ci-operator's pod status polling on the Detailed sequence of events:
Why only This is not related to PR #8415. The PR modifies only Recommendations
Evidence
|
|
/retest |
E2E Verification ResultsAll Prow e2e jobs passed. Artifact inspection confirms the fix is working — no AWS conditions leak onto AKS HostedClusters. AKS (
|
|
/verified by e2e See #8415 (comment) |
|
@bryan-cox: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@bryan-cox: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@bryan-cox: Jira Issue Verification Checks: Jira Issue OCPBUGS-84971 Jira Issue OCPBUGS-84971 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Summary
AWSDefaultSecurityGroupDeletedcondition behind an AWS platform check during HostedCluster deletion, matching the existing pattern used forAWSDefaultSecurityGroupCreatedcomputeAWSDefaultSGDeletedConditionfor testabilityRoot Cause
The
AWSDefaultSecurityGroupCreatedcondition (line 873) is correctly gated behindhcluster.Spec.Platform.Type == hyperv1.AWSPlatform, but theAWSDefaultSecurityGroupDeletedblock had no platform check — it ran for every platform during deletion.While the HCP-level controller (
destroyAWSDefaultSecurityGroup) correctly short-circuits for non-AWS platforms, the HostedCluster controller still created a fresh condition withStatus: Unknownand set it on the HostedCluster regardless.Test plan
TestComputeAWSDefaultSGDeletedCondition— 8 cases)AWSDefaultSecurityGroupDeletedcondition🤖 Generated with Claude Code
Summary by CodeRabbit