AUTOSCALE-558: Expose KubeletConfig on OpenShiftEC2Nodeclass as structured fields + preserveunknown/overflow#8192
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Skipping CI for Draft Pull Request. |
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughIntroduces structured kubelet configuration: a new KubeletConfiguration API type with custom JSON marshal/unmarshal, IsZero semantics, and an optional Sequence Diagram(s)sequenceDiagram
actor User
participant OSE2NC as OpenshiftEC2NodeClass
participant KarpIgnition as KarpenterIgnitionController
participant KubeletCM as Kubelet ConfigMap\n(Management Cluster)
participant EC2NC as EC2NodeClass\n(Karpenter)
participant NP as NodePool\n(Karpenter)
participant Node as Provisioned Node
User->>OSE2NC: Create/Update with spec.kubelet
OSE2NC->>KarpIgnition: Notify controller of change
alt spec.kubelet is set
KarpIgnition->>KarpIgnition: Add kubeletConfigFinalizer
KarpIgnition->>KarpIgnition: Merge user kubelet config + base taints
KarpIgnition->>KubeletCM: Create/Update per-NodeClass ConfigMap (data["config"]=KubeletConfig YAML)
else spec.kubelet is unset
KarpIgnition->>KubeletCM: Delete per-NodeClass ConfigMap
KarpIgnition->>KarpIgnition: Remove kubeletConfigFinalizer if present
end
User->>NP: Create NodePool referencing OpenshiftEC2NodeClass
NP->>EC2NC: Populate EC2NodeClass.Kubelet via KarpenterKubeletConfiguration()
NP->>Node: Provision node referencing Kubelet ConfigMap
Node->>KubeletCM: Read and apply kubelet config
sequenceDiagram
actor Operator
participant KarController as KarpenterController
participant MgrCM as Management ConfigMap\n(karpenter taint)
participant Support as support/karpenter helpers
Operator->>KarController: Reconcile loop
KarController->>Support: KarpenterTaintConfigManifest()
Support-->>KarController: YAML manifest
KarController->>MgrCM: Create/Update global taint ConfigMap (data["config"]=manifest)
MgrCM-->>KarController: Created/Updated or Error
🚥 Pre-merge checks | ✅ 7 | ❌ 5❌ Failed checks (5 warnings)
✅ Passed checks (7 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jkyros The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8192 +/- ##
==========================================
+ Coverage 36.30% 36.38% +0.07%
==========================================
Files 764 764
Lines 93015 93150 +135
==========================================
+ Hits 33772 33892 +120
- Misses 56530 56533 +3
- Partials 2713 2725 +12
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/test e2e-aws-autonode |
|
Heyyy that is super cool, I don't have to have my claude watch and root cause test failures anymore |
|
KubeletConfig passed, teardown failure. One more time |
| // kubelet configures kubelet settings for nodes provisioned by this NodeClass. | ||
| // These settings are injected into the node's ignition configuration via MachineConfig. | ||
| // +optional | ||
| Kubelet *KubeletConfiguration `json:"kubelet,omitempty"` |
There was a problem hiding this comment.
remove pointer and use omitzero
There was a problem hiding this comment.
pointer removed, added omitzero
|
|
||
| - linters: | ||
| - kubeapilinter | ||
| path: karpenter/v1beta1/karpenter_types.go |
There was a problem hiding this comment.
why are we not requiring minimum properties 1?
There was a problem hiding this comment.
Minimum properties 1 doesn't play nice with the custom JSON marshaller we need to capture the preserved field "overflow", it's how the controller-gen code is written. Our "API hackery" apparently has a cost 😞
| // +kubebuilder:object:generate=false | ||
| // +kubebuilder:pruning:PreserveUnknownFields | ||
| // +kubebuilder:validation:XValidation:rule="!has(self.imageGCHighThresholdPercent) || !has(self.imageGCLowThresholdPercent) || self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent",message="imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent" | ||
| type KubeletConfiguration struct { |
There was a problem hiding this comment.
I can either come back afterwards and add it, or we can wait for that to merge and then do this one, and I can add it here. I'll set a test up based on your branch in the mean time.
|
/test e2e-aws-autonode |
|
tests are just taking too long i think 😂 We can see TestKarpenter hitting the 2 hour mark which is when it stops. |
| if openshiftEC2NodeClass.Spec.Kubelet == nil { | ||
| if err := r.ManagementClient.Delete(ctx, cm); err != nil && !apierrors.IsNotFound(err) { | ||
| return fmt.Errorf("failed to delete kubelet config configmap %s: %w", configMapName, err) | ||
| } |
There was a problem hiding this comment.
nit: but I think we can use supportutil.DeleteIfNeeded() so we don't have to check for IsNotFound error manually.
There was a problem hiding this comment.
switched to supportutil.DeleteIfNeeded(), thanks!
|
|
||
| // When Spec.Kubelet is set, only the per-nodeclass kubelet config ref is included. | ||
| // set-karpenter-taint is omitted because the taint is merged into the per-nodeclass | ||
| // manifest via ToKubeletConfigManifestWithTaints to avoid two KubeletConfigs targeting |
There was a problem hiding this comment.
Is this function name still accurate?
There was a problem hiding this comment.
nope, it wasn't. I changed it when I refactored this from the old way. Updated. Thanks!
| }) | ||
| } | ||
|
|
||
| func TestReconcileKubeletConfigMap(t *testing.T) { |
There was a problem hiding this comment.
Can we also check that overflow fields would still show up here too?
There was a problem hiding this comment.
Yep, added a test case to check that
c11822a to
4d6300e
Compare
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (2)
503-510: Note: Shallow merge will fully replaceregisterWithTaintsif user-settable in future.The current implementation correctly ensures the Karpenter base taints always win. However,
mergeKubeletConfigMapsperforms a shallow merge—ifregisterWithTaintsever becomes user-settable (as the comment at line 504 acknowledges it currently isn't), user-provided taints would be fully replaced rather than merged with the base taints.If this is intentional for security/correctness reasons (enforcing only Karpenter's taints), consider documenting this in the API or adding a more explicit comment. If future user-defined taints should coexist with the base taints, the merge logic would need to be enhanced.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go` around lines 503 - 510, The current shallow merge in mergeKubeletConfigMaps will entirely replace registerWithTaints if users set it; decide and implement one of two fixes: (A) make the behavior explicit by updating the comment near the call and the API docs to state that karpenterutil.KarpenterBaseTaintMap() always overrides user registerWithTaints, or (B) change mergeKubeletConfigMaps so that for the "registerWithTaints" key it performs a deep/semantic merge — parse both nodeClassKubeletMap["registerWithTaints"] and karpenterutil.KarpenterBaseTaintMap() as taint lists, concatenate them ensuring base taints are appended last and deduplicate by taint identity — then pass that merged value into kubeletConfigManifest; update tests accordingly.
237-240: Minor: Log message may be misleading if ConfigMap didn't exist.
supportutil.DeleteIfNeededreturns successfully even when the ConfigMap doesn't exist (was already deleted or never created). The log message "Deleted kubelet config ConfigMap" could be slightly inaccurate in that case. Consider logging based on the return value.🔧 Optional improvement
- if _, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm); err != nil { + deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm) + if err != nil { return ctrl.Result{}, fmt.Errorf("failed to delete kubelet config configmap %s: %w", configMapName, err) } - log.Info("Deleted kubelet config ConfigMap", "name", configMapName) + if deleted { + log.Info("Deleted kubelet config ConfigMap", "name", configMapName) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go` around lines 237 - 240, supportutil.DeleteIfNeeded currently returns a boolean indicating whether a delete actually happened; change the call to capture that value (e.g., deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm)), keep the error handling the same, and log conditionally: if deleted log "Deleted kubelet config ConfigMap" with configMapName via log.Info, otherwise log "Kubelet config ConfigMap not present, nothing to delete" (or similar) so the message reflects whether the resource existed; reference symbols: supportutil.DeleteIfNeeded, r.ManagementClient, cm, configMapName, log.Info.api/.golangci.yml (1)
332-347: Consider anchoring exclusion patterns to prevent unintended matches.These new exclusion rules follow the existing convention in the file (unanchored patterns), but anchoring
textpatterns with^...$will reduce the risk of unintentionally suppressing future kubeapilinter findings. For example, the unanchorednomapspattern could match unintended messages if field names are extended (e.g.,SystemReservedExtrawould match the current regex).This is consistent with all other rules in the file; if anchoring is adopted, apply it uniformly across the entire exclusions list rather than just these additions.
Example: anchored text pattern
- text: 'nomaps: field KubeletConfiguration\.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)' + text: '^nomaps: field KubeletConfiguration\.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)$'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/.golangci.yml` around lines 332 - 347, Anchor the new kubeapilinter exclusion text patterns to avoid accidental matches: update the three added entries so their text fields use anchored regexes (e.g., '^minlength: field OpenshiftEC2NodeClassSpec.Kubelet type KubeletConfiguration must have a minimum properties, add kubebuilder:validation:MinProperties marker$' and similarly anchor the 'optionalfields' message for OpenshiftEC2NodeClassSpec.Kubelet and the 'nodurations' message for KubeletConfiguration.EvictionSoftGracePeriod), and change the 'nomaps' pattern that targets KubeletConfiguration.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod) to an anchored form '^nomaps: field KubeletConfiguration\\.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)$'; apply the same anchoring convention consistently across the exclusions list so future kubeapilinter messages won't be accidentally suppressed.test/e2e/karpenter_kubelet_checker_pod.yaml (1)
6-8: Disable service account token automount for this privileged host-mounted podThis pod does not call the Kubernetes API. Setting
automountServiceAccountToken: falsereduces credential exposure in a privileged context.Proposed hardening
spec: + automountServiceAccountToken: false restartPolicy: NeverAlso applies to: 67-76
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/karpenter_kubelet_checker_pod.yaml` around lines 6 - 8, Add automountServiceAccountToken: false to the Pod spec to prevent automatic mounting of service account tokens for this privileged host-mounted pod; update the spec block (the top-level pod spec where restartPolicy and containers are defined) and also apply the same change to the other Pod manifest instance referenced (the block covering lines 67-76) so both pod specs explicitly set automountServiceAccountToken: false.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/karpenter_kubelet_checker_pod.yaml`:
- Around line 20-23: The check() helper currently uses unscoped grep -q with
"$@" against "$FILE" which can match identical values in systemReserved and
produce false positives for kubeReserved; change the matching to literal string
matching (use grep -F -q) or implement section-aware parsing to restrict
searches to the kubeReserved block within FILE when verifying values, and update
references to the check() invocation accordingly (e.g., where check() is used to
validate kubeReserved entries). Also harden the diagnostic pod spec by setting
automountServiceAccountToken: false on the pod spec to avoid unnecessary service
account token exposure.
---
Nitpick comments:
In `@api/.golangci.yml`:
- Around line 332-347: Anchor the new kubeapilinter exclusion text patterns to
avoid accidental matches: update the three added entries so their text fields
use anchored regexes (e.g., '^minlength: field OpenshiftEC2NodeClassSpec.Kubelet
type KubeletConfiguration must have a minimum properties, add
kubebuilder:validation:MinProperties marker$' and similarly anchor the
'optionalfields' message for OpenshiftEC2NodeClassSpec.Kubelet and the
'nodurations' message for KubeletConfiguration.EvictionSoftGracePeriod), and
change the 'nomaps' pattern that targets
KubeletConfiguration.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)
to an anchored form '^nomaps: field
KubeletConfiguration\\.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)$';
apply the same anchoring convention consistently across the exclusions list so
future kubeapilinter messages won't be accidentally suppressed.
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 503-510: The current shallow merge in mergeKubeletConfigMaps will
entirely replace registerWithTaints if users set it; decide and implement one of
two fixes: (A) make the behavior explicit by updating the comment near the call
and the API docs to state that karpenterutil.KarpenterBaseTaintMap() always
overrides user registerWithTaints, or (B) change mergeKubeletConfigMaps so that
for the "registerWithTaints" key it performs a deep/semantic merge — parse both
nodeClassKubeletMap["registerWithTaints"] and
karpenterutil.KarpenterBaseTaintMap() as taint lists, concatenate them ensuring
base taints are appended last and deduplicate by taint identity — then pass that
merged value into kubeletConfigManifest; update tests accordingly.
- Around line 237-240: supportutil.DeleteIfNeeded currently returns a boolean
indicating whether a delete actually happened; change the call to capture that
value (e.g., deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient,
cm)), keep the error handling the same, and log conditionally: if deleted log
"Deleted kubelet config ConfigMap" with configMapName via log.Info, otherwise
log "Kubelet config ConfigMap not present, nothing to delete" (or similar) so
the message reflects whether the resource existed; reference symbols:
supportutil.DeleteIfNeeded, r.ManagementClient, cm, configMapName, log.Info.
In `@test/e2e/karpenter_kubelet_checker_pod.yaml`:
- Around line 6-8: Add automountServiceAccountToken: false to the Pod spec to
prevent automatic mounting of service account tokens for this privileged
host-mounted pod; update the spec block (the top-level pod spec where
restartPolicy and containers are defined) and also apply the same change to the
other Pod manifest instance referenced (the block covering lines 67-76) so both
pod specs explicitly set automountServiceAccountToken: false.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: c242fdcc-7af5-4987-ba5a-e303736aa7e4
⛔ Files ignored due to path filters (9)
api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.gois excluded by!client/**client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yamlis excluded by!karpenter-operator/controllers/karpenter/assets/*.yamlvendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config_karpenter.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*
📒 Files selected for processing (16)
api/.golangci.ymlapi/karpenter/v1beta1/karpenter_types.goapi/karpenter/v1beta1/kubelet_config.goapi/karpenter/v1beta1/kubelet_config_karpenter.goapi/karpenter/v1beta1/kubelet_config_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/karpenter.gokarpenter-operator/controllers/karpenter/karpenter_controller.gokarpenter-operator/controllers/karpenter/karpenter_controller_test.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gokarpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.gosupport/karpenter/karpenter.gosupport/karpenter/karpenter_test.gotest/e2e/karpenter_kubelet_checker_pod.yamltest/e2e/karpenter_test.go
✅ Files skipped from review due to trivial changes (4)
- api/karpenter/v1beta1/kubelet_config_test.go
- karpenter-operator/controllers/karpenter/karpenter_controller_test.go
- karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
- api/karpenter/v1beta1/kubelet_config.go
🚧 Files skipped from review as they are similar to previous changes (6)
- api/karpenter/v1beta1/kubelet_config_karpenter.go
- support/karpenter/karpenter_test.go
- api/karpenter/v1beta1/karpenter_types.go
- support/karpenter/karpenter.go
- test/e2e/karpenter_test.go
- hypershift-operator/controllers/hostedcluster/karpenter.go
4043705 to
5c18545
Compare
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)
92-120:⚠️ Potential issue | 🟠 MajorHandle deleting NodeClasses before requiring the HCP.
Line 92 returns early when the HostedControlPlane is already gone, so Lines 117-120 never reach
reconcileDeletedNodeClass. That can leavekubeletConfigFinalizerstuck on a terminatingOpenshiftEC2NodeClasseven though the management-side ConfigMap is already gone with the HCP namespace. The deletion path should still run when the HCP lookup returns not found.
♻️ Duplicate comments (1)
test/e2e/karpenter_kubelet_checker_pod.yaml (1)
20-23:⚠️ Potential issue | 🟡 MinorUse literal matching in grep helpers to reduce false positives.
grep -qtreats patterns as regex; these checks are exact-string assertions and should be literal.🔧 Suggested fix
check() { local label="$1"; shift - if grep -q "$@" "$FILE"; then + if grep -Fq -- "$*" "$FILE"; then echo "PASS: $label" else echo "FAIL: $label (pattern: $*)" FAIL=1 fi } @@ check_env() { local label="$1"; shift - if grep -q "$@" "$ENV_FILE"; then + if grep -Fq -- "$*" "$ENV_FILE"; then echo "PASS: $label" else echo "FAIL: $label (pattern: $*)" FAIL=1 fi }Also applies to: 30-33
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/karpenter_kubelet_checker_pod.yaml` around lines 20 - 23, The helper check() currently uses grep -q "$@" "$FILE" which treats the patterns as regexes; change those calls to use literal matching (e.g. grep -F -q "$@" "$FILE") — or grep -F -x -q if you need whole-line exact matches — so the assertions are performed as exact strings; update the occurrence inside check() and the duplicate occurrence around lines 30-33 to use grep -F (and -x if appropriate).
🧹 Nitpick comments (1)
support/karpenter/karpenter_test.go (1)
189-210: Avoid single-element assumption in taint manifest assertion.At Line 205, the test only checks
taints[0]. This both misses validation for additional taints and can panic if the base taint list is ever empty. Prefer iterating and asserting all entries (as done inTestKarpenterBaseTaintMap).Diff suggestion
- taint, ok := taints[0].(map[string]interface{}) - g.Expect(ok).To(BeTrue()) - g.Expect(taint["key"]).To(Equal(KarpenterBaseTaints[0].Key)) - g.Expect(taint["value"]).To(Equal(KarpenterBaseTaints[0].Value)) - g.Expect(taint["effect"]).To(Equal(string(KarpenterBaseTaints[0].Effect))) + for i, entry := range taints { + taint, ok := entry.(map[string]interface{}) + g.Expect(ok).To(BeTrue()) + g.Expect(taint["key"]).To(Equal(KarpenterBaseTaints[i].Key)) + g.Expect(taint["value"]).To(Equal(KarpenterBaseTaints[i].Value)) + g.Expect(taint["effect"]).To(Equal(string(KarpenterBaseTaints[i].Effect))) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@support/karpenter/karpenter_test.go` around lines 189 - 210, The test assumes at least one taint and only asserts taints[0], which can panic or miss mismatches; update the test in the KarpenterTaintConfigManifest case to iterate over the parsed taints slice (`taints`) and for each index i assert that the parsed taint map's "key", "value", and "effect" equal KarpenterBaseTaints[i].Key, .Value and string(.Effect) respectively (use the existing `taints`, `kubeletConfig`, `spec`, and `KarpenterBaseTaints` identifiers), so all entries are validated and no single-element assumption is made.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/karpenter/v1beta1/kubelet_config_karpenter.go`:
- Around line 11-27: The current KarpenterKubeletConfiguration method uses
spec.Kubelet.IsZero() which is too broad and returns an empty
KubeletConfiguration when only overflow fields are set; instead construct the
awskarpenterv1.KubeletConfiguration value from the mapped fields
(ImageGCHighThresholdPercent, ImageGCLowThresholdPercent, MaxPods, CPUCFSQuota,
EvictionHard, EvictionSoft, EvictionSoftGracePeriod, EvictionMaxPodGracePeriod,
PodsPerCore, SystemReserved, KubeReserved) into a local variable and then return
nil if every one of those mapped fields is empty/zero (otherwise return
&thatLocalValue), removing the reliance on spec.Kubelet.IsZero().
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 158-185: After successfully patching finalizers on
openshiftEC2NodeClass (the r.GuestClient.Patch calls that add/remove
kubeletConfigFinalizer), re-fetch the latest object or requeue immediately to
avoid using a stale in-memory resourceVersion (which causes optimistic lock 409s
in updateConfigVersionAnnotation/reconcileKubeletConfigMap). Concretely: after
each successful Patch call in this block, either call r.GuestClient.Get(ctx,
client.ObjectKeyFromObject(openshiftEC2NodeClass), openshiftEC2NodeClass) to
reload the resource before continuing, or return ctrl.Result{Requeue:true} so
the next reconcile runs with the up-to-date object.
In `@test/e2e/karpenter_kubelet_checker_pod.yaml`:
- Line 10: The Pod spec currently uses an unpinned image reference "image:
alpine"; replace this with a pinned image digest to ensure deterministic e2e
runs (use the canonical format like alpine@sha256:<digest> or
alpine:<tag>@sha256:<digest>). Update the image field in the YAML (the container
image entry in karpenter_kubelet_checker_pod.yaml) to a specific verified digest
for the desired Alpine version, and commit the new digest string so the tests
pull an immutable, reproducible image.
---
Duplicate comments:
In `@test/e2e/karpenter_kubelet_checker_pod.yaml`:
- Around line 20-23: The helper check() currently uses grep -q "$@" "$FILE"
which treats the patterns as regexes; change those calls to use literal matching
(e.g. grep -F -q "$@" "$FILE") — or grep -F -x -q if you need whole-line exact
matches — so the assertions are performed as exact strings; update the
occurrence inside check() and the duplicate occurrence around lines 30-33 to use
grep -F (and -x if appropriate).
---
Nitpick comments:
In `@support/karpenter/karpenter_test.go`:
- Around line 189-210: The test assumes at least one taint and only asserts
taints[0], which can panic or miss mismatches; update the test in the
KarpenterTaintConfigManifest case to iterate over the parsed taints slice
(`taints`) and for each index i assert that the parsed taint map's "key",
"value", and "effect" equal KarpenterBaseTaints[i].Key, .Value and
string(.Effect) respectively (use the existing `taints`, `kubeletConfig`,
`spec`, and `KarpenterBaseTaints` identifiers), so all entries are validated and
no single-element assumption is made.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 7ced5847-9203-4a20-ad1a-01f97180ccd4
⛔ Files ignored due to path filters (9)
api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.gois excluded by!client/**client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yamlis excluded by!karpenter-operator/controllers/karpenter/assets/*.yamlvendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config_karpenter.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*
📒 Files selected for processing (16)
api/.golangci.ymlapi/karpenter/v1beta1/karpenter_types.goapi/karpenter/v1beta1/kubelet_config.goapi/karpenter/v1beta1/kubelet_config_karpenter.goapi/karpenter/v1beta1/kubelet_config_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/karpenter.gokarpenter-operator/controllers/karpenter/karpenter_controller.gokarpenter-operator/controllers/karpenter/karpenter_controller_test.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gokarpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.gosupport/karpenter/karpenter.gosupport/karpenter/karpenter_test.gotest/e2e/karpenter_kubelet_checker_pod.yamltest/e2e/karpenter_test.go
🚧 Files skipped from review as they are similar to previous changes (7)
- karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
- karpenter-operator/controllers/karpenter/karpenter_controller.go
- karpenter-operator/controllers/karpenter/karpenter_controller_test.go
- api/karpenter/v1beta1/karpenter_types.go
- support/karpenter/karpenter.go
- test/e2e/karpenter_test.go
- karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
| // If spec.kubelet is configured, add a finalizer to clean up the configmap. We can't just use owner | ||
| // references because this is cross cluster (the configmap lives in the control plane) | ||
| if !openshiftEC2NodeClass.Spec.Kubelet.IsZero() { | ||
|
|
||
| if !controllerutil.ContainsFinalizer(openshiftEC2NodeClass, kubeletConfigFinalizer) { | ||
| original := openshiftEC2NodeClass.DeepCopy() | ||
| controllerutil.AddFinalizer(openshiftEC2NodeClass, kubeletConfigFinalizer) | ||
| if err := r.GuestClient.Patch(ctx, openshiftEC2NodeClass, | ||
| client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil { | ||
| return ctrl.Result{}, fmt.Errorf("failed to add kubelet config finalizer: %w", err) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if err := r.reconcileKubeletConfigMap(ctx, hcp, openshiftEC2NodeClass); err != nil { | ||
| return ctrl.Result{}, fmt.Errorf("failed to reconcile kubelet config configmap: %w", err) | ||
| } | ||
|
|
||
| // The reconcile will have deleted the configmap if we make it here, so we can | ||
| // remove the finalizer | ||
| if openshiftEC2NodeClass.Spec.Kubelet.IsZero() && controllerutil.ContainsFinalizer(openshiftEC2NodeClass, kubeletConfigFinalizer) { | ||
| original := openshiftEC2NodeClass.DeepCopy() | ||
| controllerutil.RemoveFinalizer(openshiftEC2NodeClass, kubeletConfigFinalizer) | ||
| if err := r.GuestClient.Patch(ctx, openshiftEC2NodeClass, | ||
| client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil { | ||
| return ctrl.Result{}, fmt.Errorf("failed to remove kubelet config finalizer: %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Re-fetch after finalizer patches.
The metadata patches at Lines 162-168 and 179-183 advance resourceVersion, but reconcile continues with the stale in-memory object. updateConfigVersionAnnotation() later uses optimistic locking, so the first reconcile after adding/removing this finalizer can 409 on itself. Re-get the NodeClass, or return/requeue, after these patches.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`
around lines 158 - 185, After successfully patching finalizers on
openshiftEC2NodeClass (the r.GuestClient.Patch calls that add/remove
kubeletConfigFinalizer), re-fetch the latest object or requeue immediately to
avoid using a stale in-memory resourceVersion (which causes optimistic lock 409s
in updateConfigVersionAnnotation/reconcileKubeletConfigMap). Concretely: after
each successful Patch call in this block, either call r.GuestClient.Get(ctx,
client.ObjectKeyFromObject(openshiftEC2NodeClass), openshiftEC2NodeClass) to
reload the resource before continuing, or return ctrl.Result{Requeue:true} so
the next reconcile runs with the up-to-date object.
There was a problem hiding this comment.
a.) wouldn't the 409 result in a requeue anyway, which I assume would be fine and/or b.) does patch not update the "local" object too when it patches?
There was a problem hiding this comment.
@jkyros — you're correct on both counts, and my concern was overstated:
b) Yes, Patch updates the local object in-place. controller-runtime's client.Patch() decodes the API server's response body back into the object pointer you pass. So after a successful r.GuestClient.Patch(ctx, openshiftEC2NodeClass, ...), the in-memory openshiftEC2NodeClass already has the updated resourceVersion from the server. The subsequent updateConfigVersionAnnotation (which does its own original := openshiftEC2NodeClass.DeepCopy() + optimistic-lock patch) would be working with the correct, up-to-date version — no stale-lock issue.
a) And even if something did go wrong, the patch errors are properly propagated (return ctrl.Result{}, fmt.Errorf(...)), so any 409 would surface as an error, get requeued with backoff, and the next reconcile would start with a fresh Get. No silent corruption.
My original comment was incorrect — there's no real risk here. No changes needed.
📝 Learnings were identified and not saved (knowledge base disabled). Enable
| t.Logf("Make sure OpenshiftEC2NodeClass %q is Ready before nodepool creation", nc.Name) | ||
| e2eutil.EventuallyObject(t, ctx, fmt.Sprintf("OpenshiftEC2NodeClass %q to be Ready", nc.Name), |
There was a problem hiding this comment.
nit: Could we just put this ready check in the polling above on line 1173?
There was a problem hiding this comment.
We could, but those blocks are checking different things with different timings from different controllers. Feels like we'd be trading clarity for...I don't know what, saving a Get call? Unless I'm missing something expensive here. I think we should leave it.
985e242 to
4697b7f
Compare
|
Rebased, slightly re-arranged IsZero so |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)
162-170:⚠️ Potential issue | 🟡 MinorStale resourceVersion after finalizer patch.
After the finalizer patch at lines 165-168, the in-memory
openshiftEC2NodeClassretains its pre-patchresourceVersion. WhenupdateConfigVersionAnnotationis later called with optimistic locking, it may 409 on the first reconcile after adding this finalizer.Consider re-fetching the object after successful finalizer patch, or returning early to requeue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go` around lines 162 - 170, The finalizer Add+Patch on openshiftEC2NodeClass (controllerutil.AddFinalizer + r.GuestClient.Patch) leaves the in-memory object with a stale resourceVersion which can cause optimistic-lock 409s in subsequent updateConfigVersionAnnotation calls; after a successful Patch either re-fetch the resource into openshiftEC2NodeClass via r.GuestClient.Get (using the same ctx and namespaced name) so the object has the updated resourceVersion, or immediately return ctrl.Result{Requeue: true} to let a fresh reconcile read the updated resource; apply this change where the finalizer is added (the block referencing kubeletConfigFinalizer and r.GuestClient.Patch) and ensure updateConfigVersionAnnotation uses the refreshed object or runs on the next reconcile.
🧹 Nitpick comments (3)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go (2)
1174-1259: These finalizer subtests are patching state instead of exercising the controller path.Both branches mutate
nodeClass.Finalizersdirectly and then assert the patched object, so they still pass ifReconcile()stops adding or removingkubeletConfigFinalizer. Please drive these expectations through the reconcile/helper code that actually owns the mutation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go` around lines 1174 - 1259, Tests are mutating nodeClass.Finalizers directly instead of invoking the reconciler logic; replace those direct Patch manipulations with calls into the reconciler helper that owns finalizer changes (use KarpenterIgnitionReconciler.reconcileKubeletConfigMap or Reconcile() as appropriate). Specifically, in the "finalizer present" removal test set nodeClass.Spec.Kubelet = nil (if not already), create the reconciler r with fakeManagementClient/fakeGuestClient, call r.reconcileKubeletConfigMap(ctx, hcp, nodeClass) (or r.Reconcile(...) if that drives the same path), then Get the nodeClass from fakeGuestClient and assert kubeletConfigFinalizer is gone; similarly, for the addition test remove the manual fakeGuestClient.Patch that appends the finalizer and instead let r.reconcileKubeletConfigMap drive finalizer addition and then verify via Get. Ensure you reference KarpenterIgnitionReconciler, reconcileKubeletConfigMap, Reconcile(), and kubeletConfigFinalizer when locating code to change.
1480-1509: Add test coverage for overflow-only kubelet configuration.Current test cases only verify behavior when typed fields like
MaxPodsare set. Add a test case where only overflow fields (e.g.,{"podPidsLimit":4096}) are configured to ensure the per-nodeclass kubelet config ref is included. This improves test coverage and helps prevent regressions in theIsZero()check logic.Suggested test shape
+ t.Run("When kubelet config has only overflow fields it should still use the per-nodeclass config", func(t *testing.T) { + g := NewWithT(t) + var kubeletCfg hyperkarpenterv1.KubeletConfiguration + g.Expect(json.Unmarshal([]byte(`{"podPidsLimit":4096}`), &kubeletCfg)).To(Succeed()) + + nodeClass := &hyperkarpenterv1.OpenshiftEC2NodeClass{ + ObjectMeta: metav1.ObjectMeta{Name: testNodeClassName}, + Spec: hyperkarpenterv1.OpenshiftEC2NodeClassSpec{Kubelet: kubeletCfg}, + } + + np := r.createInMemoryNodePool(hcp, nodeClass, hcp.Spec.ReleaseImage) + g.Expect(np.Spec.Config).To(HaveLen(1)) + g.Expect(np.Spec.Config[0].Name).To(Equal(karpenterutil.KarpenterNodeClassKubeletConfigName(testNodeClassName))) + })Also applies to: 1619-1678
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go` around lines 1480 - 1509, Add a new subtest alongside the existing "When kubelet config is set..." case that constructs an OpenshiftEC2NodeClass whose Spec.Kubelet only sets overflow fields (e.g., raw map/json with podPidsLimit:4096 or equivalent overflow-only field), call r.createInMemoryNodePool(hcp, nodeClass, hcp.Spec.ReleaseImage), and assert the NodePool still contains exactly one Config entry whose Name equals karpenterutil.KarpenterNodeClassKubeletConfigName(testNodeClassName) (and other existing expectations like np.Name/Namespace/Labels/ClusterName/Replicas/Release.Image/Arch). This ensures the IsZero/emptiness logic treats overflow-only kubelet configuration as non-empty; reference the test variables and helpers used in the file (NewWithT, testNodeClassName, createInMemoryNodePool, karpenterutil.KarpenterNodeClassKubeletConfigName).karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)
237-240: Log message may be misleading when ConfigMap was already absent.
supportutil.DeleteIfNeededreturns(deleted bool, error). The log at line 240 unconditionally states "Deleted kubelet config ConfigMap" even when the ConfigMap didn't exist. Consider using the return value for accurate logging.🔧 Suggested fix
- if _, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm); err != nil { + deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm) + if err != nil { return ctrl.Result{}, fmt.Errorf("failed to delete kubelet config configmap %s: %w", configMapName, err) } - log.Info("Deleted kubelet config ConfigMap", "name", configMapName) + if deleted { + log.Info("Deleted kubelet config ConfigMap", "name", configMapName) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go` around lines 237 - 240, supportutil.DeleteIfNeeded returns (deleted bool, error) but the code ignores the deleted bool and always logs "Deleted kubelet config ConfigMap"; change the call in karpenterignition_controller.go to capture the boolean (e.g., deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm)), handle err as before, and use the deleted flag to log a precise message: log.Info("Deleted kubelet config ConfigMap", "name", configMapName) only when deleted is true and log.Info("Kubelet config ConfigMap already absent", "name", configMapName) when deleted is false.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/karpenter/v1beta1/kubelet_config.go`:
- Around line 79-82: Add a non-negative validation constraint to the
EvictionMaxPodGracePeriod field so negative values are rejected at admission
time: add the kubebuilder validation marker (e.g. //
+kubebuilder:validation:Minimum=0) immediately above the
EvictionMaxPodGracePeriod declaration and keep the field as *int32
`json:"evictionMaxPodGracePeriod,omitempty"` so the CRD generator will emit a
Minimum=0 schema constraint; after adding the marker regenerate CRDs/clients as
needed.
- Around line 47-72: Update the kubebuilder XValidation annotations for
SystemReserved, KubeReserved, EvictionHard, and EvictionSoft to reject empty
string values: add an additional XValidation rule that asserts each map value is
non-empty (e.g. rule="self.all(x, self[x] != '')" with an appropriate message
like "value for <field> cannot be empty") so admission rejects entries with ""
(in addition to existing key and negative-value checks); locate the annotations
on the SystemReserved, KubeReserved, EvictionHard, and EvictionSoft fields and
append the new validation clause for each.
---
Duplicate comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 162-170: The finalizer Add+Patch on openshiftEC2NodeClass
(controllerutil.AddFinalizer + r.GuestClient.Patch) leaves the in-memory object
with a stale resourceVersion which can cause optimistic-lock 409s in subsequent
updateConfigVersionAnnotation calls; after a successful Patch either re-fetch
the resource into openshiftEC2NodeClass via r.GuestClient.Get (using the same
ctx and namespaced name) so the object has the updated resourceVersion, or
immediately return ctrl.Result{Requeue: true} to let a fresh reconcile read the
updated resource; apply this change where the finalizer is added (the block
referencing kubeletConfigFinalizer and r.GuestClient.Patch) and ensure
updateConfigVersionAnnotation uses the refreshed object or runs on the next
reconcile.
---
Nitpick comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`:
- Around line 1174-1259: Tests are mutating nodeClass.Finalizers directly
instead of invoking the reconciler logic; replace those direct Patch
manipulations with calls into the reconciler helper that owns finalizer changes
(use KarpenterIgnitionReconciler.reconcileKubeletConfigMap or Reconcile() as
appropriate). Specifically, in the "finalizer present" removal test set
nodeClass.Spec.Kubelet = nil (if not already), create the reconciler r with
fakeManagementClient/fakeGuestClient, call r.reconcileKubeletConfigMap(ctx, hcp,
nodeClass) (or r.Reconcile(...) if that drives the same path), then Get the
nodeClass from fakeGuestClient and assert kubeletConfigFinalizer is gone;
similarly, for the addition test remove the manual fakeGuestClient.Patch that
appends the finalizer and instead let r.reconcileKubeletConfigMap drive
finalizer addition and then verify via Get. Ensure you reference
KarpenterIgnitionReconciler, reconcileKubeletConfigMap, Reconcile(), and
kubeletConfigFinalizer when locating code to change.
- Around line 1480-1509: Add a new subtest alongside the existing "When kubelet
config is set..." case that constructs an OpenshiftEC2NodeClass whose
Spec.Kubelet only sets overflow fields (e.g., raw map/json with
podPidsLimit:4096 or equivalent overflow-only field), call
r.createInMemoryNodePool(hcp, nodeClass, hcp.Spec.ReleaseImage), and assert the
NodePool still contains exactly one Config entry whose Name equals
karpenterutil.KarpenterNodeClassKubeletConfigName(testNodeClassName) (and other
existing expectations like
np.Name/Namespace/Labels/ClusterName/Replicas/Release.Image/Arch). This ensures
the IsZero/emptiness logic treats overflow-only kubelet configuration as
non-empty; reference the test variables and helpers used in the file (NewWithT,
testNodeClassName, createInMemoryNodePool,
karpenterutil.KarpenterNodeClassKubeletConfigName).
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 237-240: supportutil.DeleteIfNeeded returns (deleted bool, error)
but the code ignores the deleted bool and always logs "Deleted kubelet config
ConfigMap"; change the call in karpenterignition_controller.go to capture the
boolean (e.g., deleted, err := supportutil.DeleteIfNeeded(ctx,
r.ManagementClient, cm)), handle err as before, and use the deleted flag to log
a precise message: log.Info("Deleted kubelet config ConfigMap", "name",
configMapName) only when deleted is true and log.Info("Kubelet config ConfigMap
already absent", "name", configMapName) when deleted is false.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 0dfe082a-358a-4bf0-ae81-d51fc3503ca4
⛔ Files ignored due to path filters (9)
api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.gois excluded by!client/**client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yamlis excluded by!karpenter-operator/controllers/karpenter/assets/*.yamlvendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config_karpenter.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*
📒 Files selected for processing (16)
api/.golangci.ymlapi/karpenter/v1beta1/karpenter_types.goapi/karpenter/v1beta1/kubelet_config.goapi/karpenter/v1beta1/kubelet_config_karpenter.goapi/karpenter/v1beta1/kubelet_config_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/karpenter.gokarpenter-operator/controllers/karpenter/karpenter_controller.gokarpenter-operator/controllers/karpenter/karpenter_controller_test.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gokarpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.gosupport/karpenter/karpenter.gosupport/karpenter/karpenter_test.gotest/e2e/karpenter_kubelet_checker_pod.yamltest/e2e/karpenter_test.go
✅ Files skipped from review due to trivial changes (1)
- hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
🚧 Files skipped from review as they are similar to previous changes (5)
- api/.golangci.yml
- karpenter-operator/controllers/karpenter/karpenter_controller_test.go
- api/karpenter/v1beta1/kubelet_config_karpenter.go
- api/karpenter/v1beta1/kubelet_config_test.go
- hypershift-operator/controllers/hostedcluster/karpenter.go
| // systemReserved is a set of ResourceName=ResourceQuantity pairs that describe | ||
| // resources reserved for non-kubernetes components. | ||
| // Currently only cpu, memory, ephemeral-storage, and pid are supported. | ||
| // +kubebuilder:validation:XValidation:message="valid keys for systemReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')" | ||
| // +kubebuilder:validation:XValidation:message="systemReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))" | ||
| // +kubebuilder:validation:MinProperties=1 | ||
| // +optional | ||
| SystemReserved map[string]string `json:"systemReserved,omitempty"` | ||
| // kubeReserved is a set of ResourceName=ResourceQuantity pairs that describe | ||
| // resources reserved for kubernetes system components. | ||
| // Currently only cpu, memory, ephemeral-storage, and pid are supported. | ||
| // +kubebuilder:validation:XValidation:message="valid keys for kubeReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')" | ||
| // +kubebuilder:validation:XValidation:message="kubeReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))" | ||
| // +kubebuilder:validation:MinProperties=1 | ||
| // +optional | ||
| KubeReserved map[string]string `json:"kubeReserved,omitempty"` | ||
| // evictionHard is a map of signal names to quantities that defines hard eviction thresholds. | ||
| // +kubebuilder:validation:XValidation:message="valid keys for evictionHard are ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available']",rule="self.all(x, x in ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available'])" | ||
| // +kubebuilder:validation:MinProperties=1 | ||
| // +optional | ||
| EvictionHard map[string]string `json:"evictionHard,omitempty"` | ||
| // evictionSoft is a map of signal names to quantities that defines soft eviction thresholds. | ||
| // +kubebuilder:validation:XValidation:message="valid keys for evictionSoft are ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available']",rule="self.all(x, x in ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available'])" | ||
| // +kubebuilder:validation:MinProperties=1 | ||
| // +optional | ||
| EvictionSoft map[string]string `json:"evictionSoft,omitempty"` |
There was a problem hiding this comment.
Reject empty values in the string-backed kubelet maps.
systemReserved, kubeReserved, evictionHard, and evictionSoft currently accept "" values. That still serializes into kubelet config, so obviously invalid input is deferred until ignition/MCO instead of being rejected at admission.
Suggested validation tightening
- // +kubebuilder:validation:XValidation:message="systemReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
+ // +kubebuilder:validation:XValidation:message="systemReserved value cannot be empty or a negative resource quantity",rule="self.all(x, self[x] != '' && !self[x].startsWith('-'))"
// +kubebuilder:validation:MinProperties=1
// +optional
SystemReserved map[string]string `json:"systemReserved,omitempty"`
@@
- // +kubebuilder:validation:XValidation:message="kubeReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
+ // +kubebuilder:validation:XValidation:message="kubeReserved value cannot be empty or a negative resource quantity",rule="self.all(x, self[x] != '' && !self[x].startsWith('-'))"
// +kubebuilder:validation:MinProperties=1
// +optional
KubeReserved map[string]string `json:"kubeReserved,omitempty"`
@@
+ // +kubebuilder:validation:XValidation:message="evictionHard values cannot be empty",rule="self.all(x, self[x] != '')"
// +kubebuilder:validation:MinProperties=1
// +optional
EvictionHard map[string]string `json:"evictionHard,omitempty"`
@@
+ // +kubebuilder:validation:XValidation:message="evictionSoft values cannot be empty",rule="self.all(x, self[x] != '')"
// +kubebuilder:validation:MinProperties=1
// +optional
EvictionSoft map[string]string `json:"evictionSoft,omitempty"`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/karpenter/v1beta1/kubelet_config.go` around lines 47 - 72, Update the
kubebuilder XValidation annotations for SystemReserved, KubeReserved,
EvictionHard, and EvictionSoft to reject empty string values: add an additional
XValidation rule that asserts each map value is non-empty (e.g.
rule="self.all(x, self[x] != '')" with an appropriate message like "value for
<field> cannot be empty") so admission rejects entries with "" (in addition to
existing key and negative-value checks); locate the annotations on the
SystemReserved, KubeReserved, EvictionHard, and EvictionSoft fields and append
the new validation clause for each.
There was a problem hiding this comment.
Would this not be deviating from the upstream validation?
There was a problem hiding this comment.
@jkyros, that's a fair point. Looking at the upstream karpenter-provider-aws v1 API source, the validation rules in this PR's KubeletConfiguration already mirror upstream exactly:
SystemReservedupstream has only a key-allowlist rule and"systemReserved value cannot be a negative resource quantity"withself.all(x, !self[x].startsWith('-'))— no empty-string check.KubeReservedfollows the same pattern — also no empty-string check.EvictionHardandEvictionSoftupstream have only a key-allowlist rule with no value validation at all.
So yes, adding self[x] != '' would be a deliberate tightening beyond what upstream Karpenter enforces. That's a reasonable concern to raise — it could mean that a KubeletConfiguration payload accepted by the upstream EC2NodeClass webhook could be rejected by the OpenshiftEC2NodeClass wrapper.
That said, there's a valid argument for the extra strictness: empty string values will silently pass through to ignition/MCO and fail later, whereas admission rejection gives a clearer error. Whether to deviate from upstream for that defensive reason is a judgment call for your team. I'm happy to withdraw the suggestion if you'd prefer to maintain strict parity with upstream.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)
237-240: Log message may be misleading when ConfigMap doesn't exist.
supportutil.DeleteIfNeededreturns whether the object was actually deleted. The log at line 240 unconditionally says "Deleted kubelet config ConfigMap" even if the ConfigMap didn't exist. Consider logging conditionally based on the return value.📝 Proposed fix
- if _, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm); err != nil { + deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm) + if err != nil { return ctrl.Result{}, fmt.Errorf("failed to delete kubelet config configmap %s: %w", configMapName, err) } - log.Info("Deleted kubelet config ConfigMap", "name", configMapName) + if deleted { + log.Info("Deleted kubelet config ConfigMap", "name", configMapName) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go` around lines 237 - 240, The log is unconditional; change the call to supportutil.DeleteIfNeeded in karpenterignition_controller.go to capture the boolean result (e.g., deleted, err := supportutil.DeleteIfNeeded(...)) and only call log.Info("Deleted kubelet config ConfigMap", "name", configMapName) when deleted is true; if deleted is false consider emitting a different debug/info message like "kubelet config ConfigMap not present, nothing to delete" or omit logging—keep error handling the same by returning the wrapped fmt.Errorf on err.api/karpenter/v1beta1/kubelet_config.go (1)
170-173: Typo in comment: "uzed" should be "used".Line 171 contains a typo.
📝 Proposed fix
// hasTypedFields reports whether any explicitly defined struct fields are set. -// This is uzed by IsZero, but it separate so we can differentiate the Zero case vs +// This is used by IsZero, but it's separate so we can differentiate the Zero case vs // the "only overflow fields set" case. This must be kept in sync with the typed fields // in KubeletConfiguration.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/karpenter/v1beta1/kubelet_config.go` around lines 170 - 173, Fix the typo in the comment for the hasTypedFields function: change "uzed" to "used" in the comment block that describes hasTypedFields (which is referenced by IsZero and kept in sync with KubeletConfiguration), ensuring the surrounding comment text remains unchanged aside from the corrected word.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`:
- Around line 1174-1221: The test currently patches nodeClass.Finalizers
directly so assertions never verify controller behavior; instead invoke the
reconciler logic that adds the finalizer (do not mutate Finalizers manually).
Update the subtest to remove the manual Patch(...) that appends
kubeletConfigFinalizer and, after creating the ConfigMap, call the controller
path that is responsible for adding the finalizer (either invoke
r.reconcileKubeletConfigMap(ctx, hcp, nodeClass) again if that function adds the
finalizer, or call the public Reconcile method on KarpenterIgnitionReconciler
with a reconcile.Request for the nodeClass) and then fetch nodeClass from
fakeGuestClient and assert that kubeletConfigFinalizer is present; reference
symbols: kubeletConfigFinalizer, KarpenterIgnitionReconciler,
reconcileKubeletConfigMap, and Reconcile.
- Around line 1458-1509: Add a new subtest to both TestCreateInMemoryNodePool
and TestReconcileKubeletConfigMap that sets nodeClass.Spec.Kubelet to contain
only an "overflow-only" kubelet field (e.g., PodPidsLimit/PodPidsLimit: 4096) so
KarpenterKubeletConfiguration() will return nil but the controller should still
create the per-NodeClass kubelet ConfigMap and include its ref in the NodePool;
locate createInMemoryNodePool and assert np.Spec.Config has length 1 and
np.Spec.Config[0].Name equals
karpenterutil.KarpenterNodeClassKubeletConfigName(testNodeClassName) (and
analogous expectations used in the existing tests), and in
TestReconcileKubeletConfigMap assert the ConfigMap exists and contains the
overflow field so the controller path where IsZero() is false but
KarpenterKubeletConfiguration() returns nil is covered.
---
Nitpick comments:
In `@api/karpenter/v1beta1/kubelet_config.go`:
- Around line 170-173: Fix the typo in the comment for the hasTypedFields
function: change "uzed" to "used" in the comment block that describes
hasTypedFields (which is referenced by IsZero and kept in sync with
KubeletConfiguration), ensuring the surrounding comment text remains unchanged
aside from the corrected word.
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 237-240: The log is unconditional; change the call to
supportutil.DeleteIfNeeded in karpenterignition_controller.go to capture the
boolean result (e.g., deleted, err := supportutil.DeleteIfNeeded(...)) and only
call log.Info("Deleted kubelet config ConfigMap", "name", configMapName) when
deleted is true; if deleted is false consider emitting a different debug/info
message like "kubelet config ConfigMap not present, nothing to delete" or omit
logging—keep error handling the same by returning the wrapped fmt.Errorf on err.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: deddf304-46f1-45c4-ae5b-7e60baf42362
⛔ Files ignored due to path filters (9)
api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.gois excluded by!client/**client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yamlis excluded by!karpenter-operator/controllers/karpenter/assets/*.yamlvendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config_karpenter.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*
📒 Files selected for processing (16)
api/.golangci.ymlapi/karpenter/v1beta1/karpenter_types.goapi/karpenter/v1beta1/kubelet_config.goapi/karpenter/v1beta1/kubelet_config_karpenter.goapi/karpenter/v1beta1/kubelet_config_test.gohypershift-operator/controllers/hostedcluster/hostedcluster_controller.gohypershift-operator/controllers/hostedcluster/karpenter.gokarpenter-operator/controllers/karpenter/karpenter_controller.gokarpenter-operator/controllers/karpenter/karpenter_controller_test.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gokarpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.gosupport/karpenter/karpenter.gosupport/karpenter/karpenter_test.gotest/e2e/karpenter_kubelet_checker_pod.yamltest/e2e/karpenter_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
- karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
- hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
- api/.golangci.yml
- hypershift-operator/controllers/hostedcluster/karpenter.go
4697b7f to
0b483c5
Compare
|
Added adjusted/test cases for CodeRabbt's satisfaction. |
0b483c5 to
47cad9f
Compare
|
/test e2e-aws-autonode |
47cad9f to
908f27c
Compare
…eClass Add KubeletConfiguration type with custom JSON marshal/unmarshal to support both explicitly typed fields (maxPods, systemReserved, etc.) and arbitrary overflow fields that pass through to the node's kubelet config via MachineConfig. What we wanted to do was rawExtension with some CEL validation on some of the fields, but the CEL validation can't see inside the rawExtension, so instead this was the best we could do without a webhook. The kubelet config is propagated through the karpenter-operator's ignition controller and injected into node ignition via MachineConfig. The taint ConfigMap creation is moved from the hypershift-operator to the karpenter-operator for centralized management. Signed-off-by: John Kyros <jkyros@redhat.com>
908f27c to
af5b611
Compare
|
@jkyros: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| "testing" | ||
| "time" | ||
|
|
||
| awskarpenterv1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1" |
There was a problem hiding this comment.
can we please move the whole util.go outside the api/ so we don't couple api importing with additional deps?
I also would like to point llms to this API as an example of good patterns and this is not one of them.
There was a problem hiding this comment.
done here #8355
Let's don't reintroduce external deps into the API package
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| // the upstream Karpenter KubeletConfiguration, mapping only fields that exist | ||
| // in both structs. Fields in our struct that are not present in Karpenter's | ||
| // struct (including overflow fields) are silently dropped. | ||
| func (spec OpenshiftEC2NodeClassSpec) KarpenterKubeletConfiguration() *awskarpenterv1.KubeletConfiguration { |
There was a problem hiding this comment.
Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryNo CI test jobs have failed — all presubmit jobs that ran (images, verify-deps, verify-workflows, security, okd-scos-images) passed successfully, and all E2E jobs remain pending because tide will not trigger them while the PR has a merge conflict. The sole error is from the tide merge controller, which reports the PR is unmergeable due to a git merge conflict introduced by PR #8355 ("NO-JIRA: Do not import external dependencies into API packages"), merged to Root CausePR #8192 adds a new file On April 28, 2026, PR #8355 (by @JoelSpeed) was merged to
PR #8192's new Recommendations
Evidence
|
JoelSpeed
left a comment
There was a problem hiding this comment.
Is there a generated CRD schema for the karpenter types? I'm concerned that this might be creating a non-structural schema
| // kubelet configures kubelet settings for nodes provisioned by this NodeClass. | ||
| // These settings are injected into the node's ignition configuration via MachineConfig. | ||
| // +optional | ||
| Kubelet KubeletConfiguration `json:"kubelet,omitempty,omitzero"` |
There was a problem hiding this comment.
Omitempty doesn't do anything for a non pointer struct
| Kubelet KubeletConfiguration `json:"kubelet,omitempty,omitzero"` | |
| Kubelet KubeletConfiguration `json:"kubelet,omitzero"` |
| // The fields listed below are validated at admission time. Additional kubelet configuration | ||
| // fields beyond those listed here are also accepted and will be passed through to the node's | ||
| // kubelet configuration without validation. | ||
| // +kubebuilder:pruning:PreserveUnknownFields |
There was a problem hiding this comment.
Really? This surprises me. What's the motivation?
| // +kubebuilder:validation:XValidation:rule="!has(self.imageGCHighThresholdPercent) || !has(self.imageGCLowThresholdPercent) || self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent",message="imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent" | ||
| type KubeletConfiguration struct { | ||
| // maxPods is the maximum number of pods that can run on a node. | ||
| // +kubebuilder:validation:Minimum=0 |
There was a problem hiding this comment.
I would normally request a maximum here... What does openshift currently support?
| // +kubebuilder:validation:XValidation:rule="!has(self.imageGCHighThresholdPercent) || !has(self.imageGCLowThresholdPercent) || self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent",message="imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent" | ||
| type KubeletConfiguration struct { | ||
| // maxPods is the maximum number of pods that can run on a node. | ||
| // +kubebuilder:validation:Minimum=0 |
There was a problem hiding this comment.
Why would you ever set this value to 0 as an end user?
| // +kubebuilder:validation:Minimum=0 | ||
| // +optional | ||
| MaxPods *int32 `json:"maxPods,omitempty"` | ||
| // podsPerCore is the maximum number of pods per core. Cannot exceed maxPods. |
There was a problem hiding this comment.
Validate this relationship with CEL?
(I see the coderabbit conversation, I don't see a reason why we shouldn't be stricter here if we know it's going to be bad config)
| // +optional | ||
| MaxPods *int32 `json:"maxPods,omitempty"` | ||
| // podsPerCore is the maximum number of pods per core. Cannot exceed maxPods. | ||
| // +kubebuilder:validation:Minimum=0 |
There was a problem hiding this comment.
Why would I set this as 0 as an end user?
| // +kubebuilder:validation:XValidation:message="valid keys for systemReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')" | ||
| // +kubebuilder:validation:XValidation:message="systemReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))" |
There was a problem hiding this comment.
Well these look fun
One of your rules appears to think x is a key, one thinks it's a value? I don't think you can validate the map keys in the way you have done here
I think you'd need
[has(self.cpu),has(self.memory),has(self.ephemeral-storage),has(self.pid)].filter(x, x == true).size() == self.size()
| // +kubebuilder:validation:XValidation:message="valid keys for kubeReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')" | ||
| // +kubebuilder:validation:XValidation:message="kubeReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))" |
There was a problem hiding this comment.
Same problem with the CEL rules here, these seem confused
What this PR does / why we need it:
OpenShiftEC2NodeClassas a set of structured fields (the ones Karpenter needs for scheduling/bin packing) + preserves unknownec2nodeclassso it can use themWhich issue(s) this PR fixes:
Fixes
AUTOSCALE-558
Special notes for your reviewer:
Checklist:
Summary by CodeRabbit
New Features
Tests