CNTRLPLANE-507: Add HCP finalizer to AWSEndpointService reconciler#8499
CNTRLPLANE-507: Add HCP finalizer to AWSEndpointService reconciler#8499hypershift-jira-solve-ci[bot] wants to merge 3 commits into
Conversation
The AWSEndpointService reconciler now adds a finalizer to the HostedControlPlane resource to ensure HCP credentials remain available during AWS PrivateLink resource cleanup. Previously, when the CPO restarted during deletion of a SharedVPC cluster, the clientBuilder was uninitialized and the HCP (with its cross-account role ARNs) could already be deleted. This caused the reconciler to fail creating AWS clients, and after a 10-minute grace period the hypershift-operator would force-remove the CPO finalizer, orphaning VPC endpoints, security groups, and DNS records in the shared VPC account. The new HCP finalizer follows the same pattern used by the Azure PLS controller: it blocks HCP deletion until all AWSEndpointService resources are cleaned up. When HCP deletion is detected, the reconciler initializes AWS clients from the still-available HCP, cleans up each AWSEndpointService's AWS resources, removes the CR finalizers, and finally removes the HCP finalizer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add unit tests for the new HCP finalizer lifecycle:
- TestEnsureHCPFinalizer: verifies the finalizer is added to the HCP
when not present, and is a no-op when already present.
- TestReconcileHCPDeletion: verifies four scenarios:
- HCP deletion with our finalizer cleans up AWS resources and
removes both the CR and HCP finalizers.
- HCP without our finalizer returns early without cleanup.
- SharedVPC HCP deletion initializes clients with role ARNs from
the still-available HCP.
- Multiple AWSEndpointService CRs: keeps the HCP finalizer until
all CRs are cleaned up.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Trim verbose comment blocks on hcpAWSPrivateLinkFinalizerName constant and reconcileHCPDeletion function to keep only the "why" per project conventions - Fix context.Background() usage in enqueueOnHCPChange to use the passed-in ctx for proper cancellation propagation - Add error path tests for ensureHCPFinalizer (Patch conflict and non-conflict errors) - Add error path tests for reconcileHCPDeletion (AWS client failure, cleanup failure, incomplete deletion, CR Update failure, List failure, HCP Patch conflict) - Add direct unit test for enqueueOnHCPChange covering HCP deletion with finalizer, EndpointAccess change, and no-op scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-507 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughThis change introduces an HCP-level finalizer ( Sequence DiagramsequenceDiagram
participant K8s as Kubernetes API
participant Rec as AWSEndpointService<br/>Reconciler
participant HCP as HostedControlPlane
participant AWS as AWS Services<br/>(EC2/Route53)
rect rgba(100, 150, 200, 0.5)
Note over Rec,HCP: Normal Reconciliation Path
Rec->>K8s: Get HCP
Rec->>Rec: ensureHCPFinalizer<br/>(add if missing)
Rec->>K8s: Patch HCP with finalizer
Rec->>Rec: Continue AWSEndpointService<br/>reconciliation
end
rect rgba(200, 100, 100, 0.5)
Note over Rec,AWS: HCP Deletion Path
K8s->>Rec: Detect HCP deletion<br/>(DeletionTimestamp set)
Rec->>Rec: reconcileHCPDeletion
Rec->>AWS: Initialize AWS client<br/>from HCP credentials
Rec->>AWS: Delete endpoint service,<br/>security group, Route53 records
Rec->>K8s: List all AWSEndpointService<br/>CRs in namespace
alt All CR finalizers cleared
Rec->>K8s: Remove HCP finalizer
Rec->>K8s: HCP deletion proceeds
else CR finalizers still present
Rec->>Rec: Requeue reconciliation
end
end
Suggested reviewers
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hypershift-jira-solve-ci[bot] The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8499 +/- ##
==========================================
+ Coverage 40.00% 40.05% +0.05%
==========================================
Files 751 751
Lines 92838 92915 +77
==========================================
+ Hits 37137 37216 +79
+ Misses 53014 53009 -5
- Partials 2687 2690 +3
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
What this PR does / why we need it:
Adds a finalizer on the HostedControlPlane resource from the AWSEndpointService reconciler to prevent HCP deletion before AWS PrivateLink resources are cleaned up.
Problem: When the CPO restarts during deletion of a SharedVPC cluster, the
clientBuilderis uninitialized and the HCP (with its cross-account role ARNs) may already be deleted. This causes the reconciler to fail creating AWS clients, and after a 10-minute grace period the hypershift-operator force-removes the CPO finalizer — orphaning VPC endpoints, security groups, and DNS records in the shared VPC account.Solution: The new HCP finalizer (
hypershift.openshift.io/aws-private-link-endpoint-cleanup) follows the same pattern used by the Azure PLS controller:enqueueOnHCPChange) to also trigger reconciliation when an HCP is being deleted with the finalizer presentWhich issue(s) this PR fixes:
Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-507
Special notes for your reviewer:
enqueueOnHCPChangehandler (renamed fromenqueueOnAccessChange) now triggers on both EndpointAccess changes and HCP deletions with the finalizergetAWSClienthelper, sourcing credentials from the still-available HCP specChecklist:
Always review AI generated responses prior to use.
Generated with Claude Code via
/jira:solve [CNTRLPLANE-507](https://redhat.atlassian.net/browse/CNTRLPLANE-507)Summary by CodeRabbit
Bug Fixes
Tests