Conversation
WalkthroughAdds a ResyncInterval field to BIOS/BMC Settings/Version reconcilers, updates patch*FromTemplate functions to return (bool, error) indicating pending patching, propagates pending state into reconcile to RequeueAfter using ResyncInterval, and adds/updates tests exercising wait-and-reconcile behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Reconciler
participant K8sAPI as "Kubernetes API"
participant ChildCR as "Child CRs (BIOS/BMC)"
participant Timer as "Resync Timer"
Reconciler->>K8sAPI: Get Set resource & templates
Reconciler->>K8sAPI: List/Fetch Child CRs
Reconciler->>ChildCR: Create or Patch per template
ChildCR-->>Reconciler: Report state (Available | InProgress | Failed)
alt any InProgress or labeling incomplete
Reconciler->>Timer: RequeueAfter(ResyncInterval)
else all Available & labeled
Reconciler->>K8sAPI: Update Set status -> Completed
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/controller/suite_test.go (1)
250-297:⚠️ Potential issue | 🟠 MajorMissing
ResyncIntervalforBMCSettingsSetReconciler,BMCVersionSetReconciler, andBIOSVersionSetReconcilerin test setup.
ResyncIntervalis set only forBIOSSettingsSetReconciler(Line 296) but not for the other three Set reconcilers (Lines 250–253, 283–286, 288–291). Since all four controllers now useRequeueAfter: r.ResyncInterval, a zero-valuetime.Durationwill cause immediate requeue in tests—effectively a hot loop. This is especially relevant forBMCSettingsSetReconciler, which has a new test exercising the pending-patching requeue path.Proposed fix
Expect((&BIOSVersionSetReconciler{ - Client: k8sManager.GetClient(), - Scheme: k8sManager.GetScheme(), + Client: k8sManager.GetClient(), + Scheme: k8sManager.GetScheme(), + ResyncInterval: 10 * time.Millisecond, }).SetupWithManager(k8sManager)).To(Succeed()) // ... Expect((&BMCSettingsSetReconciler{ - Client: k8sManager.GetClient(), - Scheme: k8sManager.GetScheme(), + Client: k8sManager.GetClient(), + Scheme: k8sManager.GetScheme(), + ResyncInterval: 10 * time.Millisecond, }).SetupWithManager(k8sManager)).To(Succeed()) Expect((&BMCVersionSetReconciler{ - Client: k8sManager.GetClient(), - Scheme: k8sManager.GetScheme(), + Client: k8sManager.GetClient(), + Scheme: k8sManager.GetScheme(), + ResyncInterval: 10 * time.Millisecond, }).SetupWithManager(k8sManager)).To(Succeed())Based on learnings: "Prevent hot loops caused by status-only changes triggering reconciliation."
🤖 Fix all issues with AI agents
In `@internal/controller/biossettingsset_controller.go`:
- Around line 222-231: The current r.Get call in the BIOSSettings reconcile
block (checking server.Spec.BIOSSettingsRef) conflates three cases; change the
logic to explicitly handle them: if err == nil then the referenced BIOSSettings
exists and you should log/continue (keep the existing log.V(1).Info and
continue), if apierrors.IsNotFound(err) then proceed to create a new
BIOSSettings for this Server (as intended), and for any other non-nil err (e.g.,
network/RBAC) log the error via log.Error including context (Server and
BIOSSettings name) and return/requeue the error so reconciliation doesn’t
silently skip creation; use the same symbols r.Get, server.Spec.BIOSSettingsRef,
apierrors.IsNotFound, and log.Error to locate and update the code.
In `@internal/controller/bmcsettingsset_controller.go`:
- Around line 252-259: The Get call for BMCSettings currently conflates err==nil
and non-NotFound errors; update the r.Get(...) handling in the reconcile loop to
check three cases explicitly: if apierrors.IsNotFound(err) then proceed to
create a new metalv1alpha1.BMCSettings; else if err != nil then log the error
with context (include bmc.Name and bmc.Spec.BMCSettingRef.Name) and
return/requeue the reconcile (do not silently continue); else (err == nil) log
that the referenced BMCSettings exists and continue. Also fix the comment that
mentions "BIOSSettings" to correctly reference "BMCSettings".
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/controller/biossettingsset_controller.go (1)
300-306: Missing log message when skipping an InProgressBIOSSettings.
bmcsettingsset_controller.gologs"Skipping BMCSettings spec patching as it is in InProgress state"in the equivalent path;biossettingsset_controller.gois silent. This makes it harder to trace why certainBIOSSettingsobjects are not receiving spec updates during a reconcile.💬 Suggested addition
if settings.Status.State == metalv1alpha1.BIOSSettingsStateInProgress { + log.V(1).Info("Skipping BIOSSettings spec patching as it is in InProgress state", "BIOSSettings", settings.Name) pendingPatchingSettings = true continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biossettingsset_controller.go` around lines 300 - 306, The loop over settingsList.Items skips items whose Status.State == metalv1alpha1.BIOSSettingsStateInProgress but currently emits no log; add a log call (matching the BMCSettings message) inside the conditional before continue so the controller records skipping of that BIOSSettings object (include identifying info like settings.Name/Namespace and a message such as "Skipping BIOSSettings spec patching as it is in InProgress state"); update references in this block that use pendingPatchingSettings and settingsList.Items to ensure the logging is executed for each skipped item.internal/controller/biosversionset_controller.go (1)
261-267: Missing log message when skipping an InProgressBIOSVersion.Consistent with
bmcversionset_controller.goandbmcsettingsset_controller.go, add alog.V(1).Infoto make it visible in controller logs when aBIOSVersionis skipped due to its InProgress state.💬 Suggested addition
if version.Status.State == metalv1alpha1.BIOSVersionStateInProgress { + log.V(1).Info("Skipping BIOSVersion spec patching as it is in InProgress state", "BIOSVersion", version.Name) pendingPatchingVersion = true continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biosversionset_controller.go` around lines 261 - 267, The loop over versions.Items skips items whose status equals metalv1alpha1.BIOSVersionStateInProgress but currently emits no log; add a log.V(1).Info call just before setting pendingPatchingVersion=true and continue (use the controller logger variable used in this file, e.g., log or r.Log) to record a message like "skipping BIOSVersion in progress" including identifying keys such as version.Name and version.Namespace and the state (metalv1alpha1.BIOSVersionStateInProgress) so skipped BIOSVersion objects are visible in controller logs.internal/controller/bmcversionset_controller.go (1)
286-289: Document the intentionalUpgradeTask != nilguard to prevent future confusion.The skip condition here is stricter than in the other three controllers (
BIOSVersionSet,BMCSettingsSet,BIOSSettingsSet), which skip patching for anyInProgressstate. The extrabmcVersion.Status.UpgradeTask != nilcheck means a BMCVersion that isInProgressbut has no active task will be patched. If this is intentional (e.g., UpgradeTask being nil means the actual upgrade workload hasn't started yet), a brief comment would prevent future contributors from removing the guard as "dead code".💬 Suggested documentation comment
- if bmcVersion.Status.State == metalv1alpha1.BMCVersionStateInProgress && bmcVersion.Status.UpgradeTask != nil { + // Only skip patching when an actual upgrade task is running (UpgradeTask != nil). + // A BMCVersion that is InProgress but has no UpgradeTask yet is safe to patch. + if bmcVersion.Status.State == metalv1alpha1.BMCVersionStateInProgress && bmcVersion.Status.UpgradeTask != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcversionset_controller.go` around lines 286 - 289, The if that skips patching only when bmcVersion.Status.State == BMCVersionStateInProgress AND bmcVersion.Status.UpgradeTask != nil is intentionally stricter than other controllers; add a concise comment above this conditional explaining that an InProgress BMCVersion with a nil UpgradeTask should still be patched because a nil UpgradeTask indicates the upgrade workload hasn't started (so patching is safe), and note that the extra guard prevents skipping when there is an active UpgradeTask; reference bmcVersion.Status.UpgradeTask, BMCVersionStateInProgress, and pendingPatchingVersion in the comment so future contributors understand why the nil check must remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/bmcsettingsset_controller.go`:
- Around line 251-264: Change the NotFound logging and fix the stale comment in
the BMCSettings lookup block: when r.Get returns an error and
apierrors.IsNotFound(err) is true, replace the log.Error call with log.V(1).Info
to reflect the expected recovery path (include context keys "BMC", bmc.Name and
"BMCSettings", bmc.Spec.BMCSettingRef.Name); for the branch where err == nil
(the else after r.Get), update the comment to accurately state that the
referenced BMCSettings exists (remove "or unable to determining" language) and
keep the existing log.V(1).Info call that notes the BMC already has a
BMCSettings ref.
---
Duplicate comments:
In `@internal/controller/biossettingsset_controller.go`:
- Around line 231-234: The non-NotFound error handling in the reconciliation
loop (where log.Error(err, "error when trying to get BIOSSettings referenced by
Server", "Server", server.Name, "BIOSSettings",
server.Spec.BIOSSettingsRef.Name) is called) currently just logs and continues;
instead append that error to the errs slice (the same error-collection used by
this controller) so callers receive a retry signal—i.e., when the Get of the
BIOSSettings reference fails with a non-NotFound error, add errs = append(errs,
err) (or wrap with context) before continuing. Ensure you update the same errs
used elsewhere in this reconciler so rate-limited requeueing occurs.
- Around line 227-240: Change the NotFound path in the BIOSSettings lookup to
stop logging a real error: when r.Get(...) returns apierrors.IsNotFound(err)
replace the log.Error call with log.V(1).Info (preserving the message/context
and the fact that you'll create a new BIOSSettings), and update the comment on
the `else` branch (the branch where err == nil after `r.Get`) to remove the
stale phrase "or unable to determining" and instead state clearly that the
referenced BIOSSettings exists (e.g., "the referenced BIOSSettings exists, so
skip creating a new one"); use the existing symbols r.Get,
client.ObjectKey{Name: server.Spec.BIOSSettingsRef.Name},
metalv1alpha1.BIOSSettings, log.Error -> log.V(1).Info, and
server.Name/server.Spec.BIOSSettingsRef.Name to locate and change the code and
comment.
In `@internal/controller/biosversionset_controller.go`:
- Around line 184-187: The requeue uses r.ResyncInterval which can be zero;
update the conditional branch in the reconcile path that returns
ctrl.Result{RequeueAfter: r.ResyncInterval} (the block comparing
status.FullyLabeledServers, status.AvailableBIOSVersion and
pendingPatchingVersion) to guard against a 0 interval by substituting a safe
default (e.g., a small duration constant) when r.ResyncInterval == 0 before
returning the ctrl.Result so the controller actually requeues as intended.
In `@internal/controller/bmcsettingsset_controller.go`:
- Around line 255-258: The code currently logs non-NotFound Get errors
(log.Error(err, ...)) and then uses continue, which swallows transient API
errors instead of propagating them; update the block handling the Get of the
referenced BMCSettings (the section using log.Error, "Server", bmc.Name,
"BMCSettings", bmc.Spec.BMCSettingRef.Name and the continue) to append the error
to the errs slice (errs = append(errs, err)) before continuing so the caller
receives the error and controller-runtime can trigger a rate-limited retry;
ensure you only append for actual Get errors (not NotFound) and keep the
existing log statement.
---
Nitpick comments:
In `@internal/controller/biossettingsset_controller.go`:
- Around line 300-306: The loop over settingsList.Items skips items whose
Status.State == metalv1alpha1.BIOSSettingsStateInProgress but currently emits no
log; add a log call (matching the BMCSettings message) inside the conditional
before continue so the controller records skipping of that BIOSSettings object
(include identifying info like settings.Name/Namespace and a message such as
"Skipping BIOSSettings spec patching as it is in InProgress state"); update
references in this block that use pendingPatchingSettings and settingsList.Items
to ensure the logging is executed for each skipped item.
In `@internal/controller/biosversionset_controller.go`:
- Around line 261-267: The loop over versions.Items skips items whose status
equals metalv1alpha1.BIOSVersionStateInProgress but currently emits no log; add
a log.V(1).Info call just before setting pendingPatchingVersion=true and
continue (use the controller logger variable used in this file, e.g., log or
r.Log) to record a message like "skipping BIOSVersion in progress" including
identifying keys such as version.Name and version.Namespace and the state
(metalv1alpha1.BIOSVersionStateInProgress) so skipped BIOSVersion objects are
visible in controller logs.
In `@internal/controller/bmcversionset_controller.go`:
- Around line 286-289: The if that skips patching only when
bmcVersion.Status.State == BMCVersionStateInProgress AND
bmcVersion.Status.UpgradeTask != nil is intentionally stricter than other
controllers; add a concise comment above this conditional explaining that an
InProgress BMCVersion with a nil UpgradeTask should still be patched because a
nil UpgradeTask indicates the upgrade workload hasn't started (so patching is
safe), and note that the extra guard prevents skipping when there is an active
UpgradeTask; reference bmcVersion.Status.UpgradeTask, BMCVersionStateInProgress,
and pendingPatchingVersion in the comment so future contributors understand why
the nil check must remain.
Set types until all patching/creation is completed
|
@coderabbitai instead of using a resync interval wouldn't it be better to have a proper watch setup on the |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 5054 🏁 Script executed: Length of output: 11216 🏁 Script executed: Length of output: 14950
Current ArchitectureThe Set reconcilers already have watches in place:
Why ResyncInterval is Currently UsedLooking at the patching logic, when a Set spec is updated while child CRDs are InProgress:
The Key QuestionShouldn't the existing Yes, theoretically it should! When a child resource's status changes from
Potential IssuesThe
RecommendationI'd suggest investigating whether the
The watch-based approach you're suggesting would be cleaner, more event-driven, and more resource-efficient than polling. Would you like me to help investigate whether the existing watches are sufficient, or if specific watch predicates need to be added? |
|
@coderabbitai the polling to wait until all expected task is completed is not only about the template patching. It is also used for creation and deletion. later when the (wrong) BIOSSettings object is deleted or reference removed from server, the BIOSSettingsSet (ABC) does not reconcile as its not its own object, and the Server label did not change. Which is better? reconcile for ever Server CRD changes ( which can be constant and unneccessary due to constant update of server object as it reconcile every 2 min. or reconcile until BIOSSettingsSet is completed? |
|
Tip For best results, initiate chat on the files or code changes.
The Creation/Deletion Gap You DescribedYour scenario is spot-on:
Trade-offs AnalysisOption A: Watch all Server changes
Option B: ResyncInterval polling (your approach)
Option C: Smarter watch predicates
This would give you event-driven behavior without the noise. RecommendationGiven your constraints (Servers update every 2 min), I believe your ResyncInterval approach is reasonable because:
One Enhancement SuggestionYou could consider combining both approaches:
Watches(
&metalv1alpha1.Server{},
handler.EnqueueRequestsFromMapFunc(r.findSetsForServer),
builder.WithPredicates(predicate.Funcs{
UpdateFunc: func(e event.UpdateEvent) bool {
oldServer := e.ObjectOld.(*metalv1alpha1.Server)
newServer := e.ObjectNew.(*metalv1alpha1.Server)
// Only trigger if BIOSSettingsRef changed
return oldServer.Spec.BIOSSettingsRef != newServer.Spec.BIOSSettingsRef
},
}),
)This would give you fast event-driven updates when refs change while avoiding the constant reconciliation noise. But if you prefer to keep it simple with just ResyncInterval, that's a valid choice given your requirements. Does this align with your thinking? 🧠 Learnings used |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
internal/controller/bmcsettingsset_controller.go (1)
252-260:⚠️ Potential issue | 🟠 MajorDo not swallow non-NotFound lookup failures in
createMissingBMCSettings.The non-NotFound branch logs and
continues, so reconciliation can return success while read failures are masked. This weakens retry/backoff semantics and observability. Also, Line 257 logs"Server"for a BMC object.Suggested fix
if bmc.Spec.BMCSettingRef != nil { if err := r.Get(ctx, client.ObjectKey{Name: bmc.Spec.BMCSettingRef.Name}, &metalv1alpha1.BMCSettings{}); err != nil { if apierrors.IsNotFound(err) { log.V(1).Info("BMCSettings referenced by BMC not found, will create a new one", "BMC", bmc.Name, "BMCSettings", bmc.Spec.BMCSettingRef.Name) // proceed to create a new BMCSettings; the ref will be updated when it is created } else { - log.Error(err, "error when trying to get BMCSettings referenced by Server", "Server", bmc.Name, "BMCSettings", bmc.Spec.BMCSettingRef.Name) - // we will try this again in next reconciliation loop + log.Error(err, "Failed to get BMCSettings referenced by BMC", "BMC", bmc.Name, "BMCSettings", bmc.Spec.BMCSettingRef.Name) + errs = append(errs, fmt.Errorf("failed to get BMCSettings %s referenced by BMC %s: %w", bmc.Spec.BMCSettingRef.Name, bmc.Name, err)) continue } } else {As per coding guidelines: "Follow Kubernetes logging message style guidelines - start with capital letter, use active voice, use past tense for errors, and specify object types."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/bmcsettingsset_controller.go` around lines 252 - 260, The code in createMissingBMCSettings swallows non-NotFound errors from r.Get by logging and doing continue which masks transient read failures and mislabels the object as "Server"; change the non-NotFound branch to return the error (or requeue) so the controller retries with backoff instead of silently proceeding, and update the log message in the error branch to use the correct object type "BMC" and follow Kubernetes logging style (capitalize, active voice, past tense) when calling log.Error for the r.Get failure; locate the failing call via r.Get(...) and the log site that currently logs "Server" and replace the continue with a propagated error return.internal/controller/biossettingsset_controller.go (1)
228-236:⚠️ Potential issue | 🟠 MajorPropagate non-
NotFoundlookup failures instead of silently skipping creation.Transient/API/RBAC errors are currently logged and skipped, which hides reconciliation failure and delays recovery behind polling.
✅ Proposed fix
if err := r.Get(ctx, client.ObjectKey{Name: server.Spec.BIOSSettingsRef.Name}, &metalv1alpha1.BIOSSettings{}); err != nil { if apierrors.IsNotFound(err) { log.Error(err, "failed to get BIOSSettings referenced by Server", "Server", server.Name, "BIOSSettings", server.Spec.BIOSSettingsRef.Name) // we will go ahead and create a new BIOSSettings for this server. the ref will be updated when the new BIOSSettings is created } else { - log.Error(err, "error when trying to get BIOSSettings referenced by Server", "Server", server.Name, "BIOSSettings", server.Spec.BIOSSettingsRef.Name) - // we will try this again in next reconciliation loop + errs = append(errs, fmt.Errorf("failed to get BIOSSettings %s referenced by Server %s: %w", server.Spec.BIOSSettingsRef.Name, server.Name, err)) continue } } else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/controller/biossettingsset_controller.go` around lines 228 - 236, The reconciliation currently swallows non-NotFound errors from r.Get when looking up the BIOSSettingsRef (inside the loop around r.Get and handling Server and metalv1alpha1.BIOSSettings), logging them and using continue; instead return or requeue the error so controller machinery retries immediately. Change the else branch that handles errors from r.Get(...) (where apierrors.IsNotFound(err) is false) to return ctrl.Result{}, err (or wrap with apierrors.NewInternalError) so the Reconcile function propagates the failure instead of silently skipping creation for Server.Spec.BIOSSettingsRef.Name.
🧹 Nitpick comments (1)
internal/controller/biossettingsset_controller.go (1)
35-36: Guard the requeue fallback against an unset interval.If
ResyncIntervalis0, the fallback requeue path is effectively disabled. Adding a default/validation here would make the safety-net behavior robust.🔁 Suggested safeguard
- if status.FullyLabeledServers != status.AvailableBIOSSettings || pendingPatchingSettings { + if status.FullyLabeledServers != status.AvailableBIOSSettings || pendingPatchingSettings { + requeueAfter := r.ResyncInterval + if requeueAfter <= 0 { + requeueAfter = 30 * time.Second + } log.V(1).Info("Waiting for all BIOSSettings to be created/Patched for the labeled Servers", "Status", status) - return ctrl.Result{RequeueAfter: r.ResyncInterval}, nil + return ctrl.Result{RequeueAfter: requeueAfter}, nil }Also applies to: 209-212
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/controller/biossettingsset_controller_test.go`:
- Around line 524-530: The delete assertions for biosSettingsSet1,
biosSettings02, and biosSettings03 can flake if resources were already removed;
update the cleanup to treat NotFound as success by capturing the error from
k8sClient.Delete and asserting that err == nil || apierrors.IsNotFound(err)
(instead of Expect(...).To(Succeed())), and keep the subsequent
Eventually(Get(...)).Should(Satisfy(apierrors.IsNotFound)) checks as-is; apply
the same change for the similar block referencing the same resources around the
602-607 area.
---
Duplicate comments:
In `@internal/controller/biossettingsset_controller.go`:
- Around line 228-236: The reconciliation currently swallows non-NotFound errors
from r.Get when looking up the BIOSSettingsRef (inside the loop around r.Get and
handling Server and metalv1alpha1.BIOSSettings), logging them and using
continue; instead return or requeue the error so controller machinery retries
immediately. Change the else branch that handles errors from r.Get(...) (where
apierrors.IsNotFound(err) is false) to return ctrl.Result{}, err (or wrap with
apierrors.NewInternalError) so the Reconcile function propagates the failure
instead of silently skipping creation for Server.Spec.BIOSSettingsRef.Name.
In `@internal/controller/bmcsettingsset_controller.go`:
- Around line 252-260: The code in createMissingBMCSettings swallows
non-NotFound errors from r.Get by logging and doing continue which masks
transient read failures and mislabels the object as "Server"; change the
non-NotFound branch to return the error (or requeue) so the controller retries
with backoff instead of silently proceeding, and update the log message in the
error branch to use the correct object type "BMC" and follow Kubernetes logging
style (capitalize, active voice, past tense) when calling log.Error for the
r.Get failure; locate the failing call via r.Get(...) and the log site that
currently logs "Server" and replace the continue with a propagated error return.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
internal/controller/biossettingsset_controller.gointernal/controller/biossettingsset_controller_test.gointernal/controller/bmcsettingsset_controller.gointernal/controller/bmcsettingsset_controller_test.gointernal/controller/helper.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/controller/bmcsettingsset_controller_test.go
| By("Deleting the 1st BIOSSettingsSet") | ||
| Expect(k8sClient.Delete(ctx, biosSettingsSet1)).To(Succeed()) | ||
| Eventually(Get(biosSettingsSet1)).Should(Satisfy(apierrors.IsNotFound)) | ||
| Expect(k8sClient.Delete(ctx, biosSettings02)).To(Succeed()) | ||
| Eventually(Get(biosSettings02)).Should(Satisfy(apierrors.IsNotFound)) | ||
| Expect(k8sClient.Delete(ctx, biosSettings03)).To(Succeed()) | ||
| Eventually(Get(biosSettings03)).Should(Satisfy(apierrors.IsNotFound)) |
There was a problem hiding this comment.
Make cleanup deletions tolerant to already-removed resources.
These Delete assertions can flake when dependents are removed before the explicit delete call. Consider ignoring NotFound in cleanup paths.
🔧 Suggested hardening
+import "sigs.k8s.io/controller-runtime/pkg/client"- Expect(k8sClient.Delete(ctx, biosSettings02)).To(Succeed())
+ Expect(client.IgnoreNotFound(k8sClient.Delete(ctx, biosSettings02))).To(Succeed())
Eventually(Get(biosSettings02)).Should(Satisfy(apierrors.IsNotFound))
- Expect(k8sClient.Delete(ctx, biosSettings03)).To(Succeed())
+ Expect(client.IgnoreNotFound(k8sClient.Delete(ctx, biosSettings03))).To(Succeed())
Eventually(Get(biosSettings03)).Should(Satisfy(apierrors.IsNotFound))
...
- Expect(k8sClient.Delete(ctx, biosSettings02_02)).To(Succeed())
+ Expect(client.IgnoreNotFound(k8sClient.Delete(ctx, biosSettings02_02))).To(Succeed())
Eventually(Get(biosSettings02_02)).Should(Satisfy(apierrors.IsNotFound))
- Expect(k8sClient.Delete(ctx, biosSettings03_02)).To(Succeed())
+ Expect(client.IgnoreNotFound(k8sClient.Delete(ctx, biosSettings03_02))).To(Succeed())
Eventually(Get(biosSettings03_02)).Should(Satisfy(apierrors.IsNotFound))Also applies to: 602-607
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/controller/biossettingsset_controller_test.go` around lines 524 -
530, The delete assertions for biosSettingsSet1, biosSettings02, and
biosSettings03 can flake if resources were already removed; update the cleanup
to treat NotFound as success by capturing the error from k8sClient.Delete and
asserting that err == nil || apierrors.IsNotFound(err) (instead of
Expect(...).To(Succeed())), and keep the subsequent
Eventually(Get(...)).Should(Satisfy(apierrors.IsNotFound)) checks as-is; apply
the same change for the similar block referencing the same resources around the
602-607 area.
There was a problem hiding this comment.
@coderabbitai the unit test setup does not automatically delete the child resources, hence these object cant not be "already removed" ?
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
envtest garbage collection owner references enabled
💡 Result:
In controller-runtime/pkg/envtest, garbage collection is effectively not enabled because envtest only starts etcd + kube-apiserver, not kube-controller-manager (garbage-collector-controller). So even if you set metadata.ownerReferences, dependents generally won’t be deleted in envtest the way they would be in a real cluster. [1]
How to deal with this:
- Recommended for envtest: assert that the child objects have the expected
ownerReferences(ownership), rather than expecting Kubernetes to actually delete them. [1] - If you need real GC behavior: run tests against a cluster that includes the controller manager (e.g., Kind/minikube/real cluster), or otherwise run
kube-controller-manageryourself with the garbage-collector-controller enabled (it’s part of the--controllersset). [2]
Sources:
[1] Kubebuilder book – envtest “Testing considerations” (no built-in controllers; GC example)
[2] Kubernetes docs – kube-controller-manager controllers list (includes garbage-collector-controller)
@nagadeesh-nagaraja You're absolutely right! I've confirmed that envtest does not enable garbage collection by default because it only runs etcd and kube-apiserver, not kube-controller-manager. Without the garbage collector controller running, owner references won't trigger automatic deletion of child resources.
In your test setup, the child BIOSSettings objects will remain until explicitly deleted, so they cannot be "already removed" before your delete calls. The current implementation without client.IgnoreNotFound() is appropriate for this environment.
My suggestion would only apply in scenarios where:
- Tests run against a full cluster (Kind/minikube) with garbage collection enabled
- There's potential for race conditions between parent and child deletion
- Tests need to be more defensive against future environment changes
For your current envtest-based setup, the explicit deletes as written are correct and won't flake. Thank you for the clarification!
🧠 Learnings used
Learnt from: CR
Repo: ironcore-dev/metal-operator PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-02-17T13:12:50.124Z
Learning: Applies to internal/controller/**/*_controller.go : Implement finalizers to clean up external resources (buckets, VMs, DNS entries) before resource deletion
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Fixes #620