OCPBUGS-77311: handle HostedCluster deletion in secretJanitor to prevent stuck cluster deletions#7779
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to data retention organization setting 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughReconcile now routes secrets for cleanup when the referenced HostedCluster is missing or has a DeletionTimestamp, via a new helper Changes
Sequence Diagram(s)sequenceDiagram
participant Reconciler as Reconciler
participant API as K8s API Server
participant Hosted as HostedCluster
participant Secret as Secret
Reconciler->>API: Get NodePool, Secret
API-->>Reconciler: NodePool, Secret
Reconciler->>API: Get HostedCluster by name
API-->>Reconciler: NotFound / DeletionTimestamp set
alt HostedCluster missing or deleting
Reconciler->>Secret: cleanupSecretForDeletion(secret)
alt Secret name has TokenSecretPrefix
Secret->>API: Patch annotate IgnitionServerTokenExpirationTimestampAnnotation
API-->>Secret: OK
else userdata secret
Secret->>API: Delete secret
API-->>Secret: Deleted / NotFound
end
else HostedCluster present and not deleting
Reconciler->>API: proceed with normal secret cleanup flow
API-->>Reconciler: OK
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
hypershift-operator/controllers/nodepool/secret_janitor_test.go (1)
344-384: LGTM! Good test coverage for HostedCluster being deleted scenario.The test correctly sets up the deletion scenario with
DeletionTimestampand a finalizer, and verifies that userdata secrets are deleted directly.Consider adding complementary test cases for full coverage.
The two new tests cover token cleanup when HostedCluster is missing and userdata cleanup when HostedCluster is being deleted. For completeness, you could add:
- Userdata secret cleanup when HostedCluster is not found
- Token secret cleanup when HostedCluster is being deleted
Since both paths use the same
cleanupSecretForDeletionmethod, this is optional.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/nodepool/secret_janitor_test.go` around lines 344 - 384, Add complementary unit tests to cover the missing paths: 1) a test that ensures userdata secrets are deleted when the HostedCluster is not found (simulate no HostedCluster in the fake client and call secretJanitor.Reconcile or directly cleanupSecretForDeletion with a userdata secret annotated with nodePoolAnnotation), and 2) a test that ensures token secrets are deleted when the HostedCluster is being deleted (create a deleting HostedCluster with DeletionTimestamp and finalizer, create a token secret rather than userdata, then call secretJanitor.Reconcile or cleanupSecretForDeletion and assert the token secret is removed). Reference the secretJanitor type, NodePoolReconciler.Client, cleanupSecretForDeletion and nodePoolAnnotation to locate relevant logic and mirror the existing test setup pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@hypershift-operator/controllers/nodepool/secret_janitor_test.go`:
- Around line 344-384: Add complementary unit tests to cover the missing paths:
1) a test that ensures userdata secrets are deleted when the HostedCluster is
not found (simulate no HostedCluster in the fake client and call
secretJanitor.Reconcile or directly cleanupSecretForDeletion with a userdata
secret annotated with nodePoolAnnotation), and 2) a test that ensures token
secrets are deleted when the HostedCluster is being deleted (create a deleting
HostedCluster with DeletionTimestamp and finalizer, create a token secret rather
than userdata, then call secretJanitor.Reconcile or cleanupSecretForDeletion and
assert the token secret is removed). Reference the secretJanitor type,
NodePoolReconciler.Client, cleanupSecretForDeletion and nodePoolAnnotation to
locate relevant logic and mirror the existing test setup pattern.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
hypershift-operator/controllers/nodepool/secret_janitor.gohypershift-operator/controllers/nodepool/secret_janitor_test.go
|
@sdminonne: This pull request references OCM-22350 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sdminonne: This pull request references OCM-22350 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sdminonne: This pull request references OCM-22350 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sdminonne: This pull request references Jira Issue OCPBUGS-77311, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sdminonne: This pull request references Jira Issue OCPBUGS-77311, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: csrwng, sdminonne The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@coderabbitai PTAL |
|
✅ Actions performedReview triggered.
|
|
Committed on top for clarity, Need to squash |
|
@sdminonne: This pull request references Jira Issue OCPBUGS-77311, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hypershift-operator/controllers/nodepool/secret_janitor.go`:
- Around line 211-216: The cleanupSecretForDeletion function should be made
idempotent by ignoring NotFound errors: when calling
setExpirationTimestampOnToken(ctx, r.Client, secret, r.now) or
r.Client.Delete(ctx, secret) detect apierrors.IsNotFound(err) and treat it as
success (return nil) instead of propagating the error; update imports to use
k8s.io/apimachinery/pkg/api/errors (apierrors) and ensure both branches handle
and swallow NotFound while still returning other errors.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
hypershift-operator/controllers/nodepool/secret_janitor.gohypershift-operator/controllers/nodepool/secret_janitor_test.go
|
@coderabbitai implemented #7779 (review) same pattern at line 77 PTAL |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 4221 🏁 Script executed: Length of output: 3290 🏁 Script executed: Length of output: 609
The
And line 77 follows the same pattern: This makes both cleanup operations idempotent and prevents unnecessary requeues if another process removes the secret between the |
|
/test e2e-aks |
…vent stuck cluster deletions During ROSA HCP cluster deletion, the secretJanitor controller could get stuck in an error loop when the pull secret was already deleted by OCM before the janitor finished cleaning up nodepool secrets. This blocked nodepool and cluster deletion indefinitely. Add early return paths in secretJanitor.Reconcile() when the HostedCluster is not found or is being deleted. In both cases, clean up the secret directly (expiration annotation for token secrets, delete for userdata secrets) without attempting to compute the expected secret name hash, which requires the pull secret and release image lookups. fix(OCM-22350): make secretJanitor cleanup idempotent by ignoring NotFound errors Wrap Delete and setExpirationTimestampOnToken calls in client.IgnoreNotFound to prevent unnecessary requeues when a secret is removed by another controller between the initial Get and the cleanup action. test(OCM-22350): add complementary secretJanitor cleanup tests for full coverage Add tests for userdata secret cleanup when HostedCluster is not found and token secret cleanup when HostedCluster is being deleted, completing the test matrix for cleanupSecretForDeletion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Scheduling tests matching the |
Test Resultse2e-aws
e2e-aks
|
|
/verified by e2e and unit tests |
|
@sdminonne: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sdminonne: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@sdminonne: Jira Issue Verification Checks: Jira Issue OCPBUGS-77311 Jira Issue OCPBUGS-77311 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
thanks @jparrill ! |
Summary
fixes https://issues.redhat.com/browse/OCPBUGS-77311
Test plan
make test)make verifypasses🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests