Skip to content

AUTOSCALE-558: Expose KubeletConfig on OpenShiftEC2Nodeclass as structured fields + preserveunknown/overflow#8192

Open
jkyros wants to merge 1 commit intoopenshift:mainfrom
jkyros:autoscale-558-kubeletconfig-overflow
Open

AUTOSCALE-558: Expose KubeletConfig on OpenShiftEC2Nodeclass as structured fields + preserveunknown/overflow#8192
jkyros wants to merge 1 commit intoopenshift:mainfrom
jkyros:autoscale-558-kubeletconfig-overflow

Conversation

@jkyros
Copy link
Copy Markdown
Member

@jkyros jkyros commented Apr 9, 2026

What this PR does / why we need it:

  • Exposes spec.Kubelet on OpenShiftEC2NodeClass as a set of structured fields (the ones Karpenter needs for scheduling/bin packing) + preserves unknown
  • Reconciles the structured fields to Karpenter's ec2nodeclass so it can use them
  • Preserves the unstructured fields and sends them on to ignition so they make it to the node

Which issue(s) this PR fixes:

Fixes
AUTOSCALE-558

Special notes for your reviewer:

  • CEL expressions can't see inside the unstructured 😞
  • This tries to give us the approximate behavior we wanted from our sync discussion
  • The API Guidelines for OpenShift APIs want the bools to be enums, but that's going to be a weird corner if the karpenter-specific bools are enums and the rest arent. I left them as bools and marked them out of the linter, I will adjust it however you want

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • New Features

    • NodeClass now accepts detailed kubelet configuration and maps selected fields to provisioned nodes; per-NodeClass kubelet ConfigMaps, a cluster taint ConfigMap, finalizers, and NodePool config reference selection are reconciled. Added helpers to generate Karpenter taint manifests and label/name helpers. Added privileged checker Pod manifest for node kubelet validation.
  • Tests

    • New unit and e2e tests covering JSON marshal/unmarshal, unknown-field preservation, mapping to upstream config, ConfigMap lifecycle, finalizers, manifest validation, and runtime kubelet checks.

@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 9, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 9, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 9, 2026

@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue.

Details

In response to this:

What this PR does / why we need it:

  • Exposes spec.Kubelet on OpenShiftEC2NodeClass as a set of structured fields (the ones Karpenter needs for scheduling/bin packing) + preserves unknown
  • Reconciles the structured fields to Karpenter's ec2nodeclass so it can use them
  • Preserves the unstructured fields and sends them on to ignition so they make it to the node

Which issue(s) this PR fixes:

Fixes
AUTOSCALE-558

Special notes for your reviewer:

  • CEL expressions can't see inside the unstructured 😞
  • This tries to give us the approximate behavior we wanted from our sync discussion
  • The API Guidelines for OpenShift APIs want the bools to be enums, but that's going to be a weird corner if the karpenter-specific bools are enums and the rest arent. I left them as bools and marked them out of the linter, I will adjust it however you want

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces structured kubelet configuration: a new KubeletConfiguration API type with custom JSON marshal/unmarshal, IsZero semantics, and an optional kubelet field on OpenshiftEC2NodeClassSpec. Adds helpers/constants to generate a Karpenter taint KubeletConfig manifest. Controllers now reconcile a global Karpenter taint ConfigMap and per-NodeClass kubelet ConfigMaps (create/update/delete and finalizer lifecycle). Reconciliation copies selected kubelet fields into upstream EC2NodeClass Kubelet. Extensive unit and e2e tests validate JSON overflow preservation, ConfigMap lifecycle, manifest contents, and node-level kubelet propagation. Lint exclusions updated for kubelet fields.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant OSE2NC as OpenshiftEC2NodeClass
    participant KarpIgnition as KarpenterIgnitionController
    participant KubeletCM as Kubelet ConfigMap\n(Management Cluster)
    participant EC2NC as EC2NodeClass\n(Karpenter)
    participant NP as NodePool\n(Karpenter)
    participant Node as Provisioned Node

    User->>OSE2NC: Create/Update with spec.kubelet
    OSE2NC->>KarpIgnition: Notify controller of change

    alt spec.kubelet is set
        KarpIgnition->>KarpIgnition: Add kubeletConfigFinalizer
        KarpIgnition->>KarpIgnition: Merge user kubelet config + base taints
        KarpIgnition->>KubeletCM: Create/Update per-NodeClass ConfigMap (data["config"]=KubeletConfig YAML)
    else spec.kubelet is unset
        KarpIgnition->>KubeletCM: Delete per-NodeClass ConfigMap
        KarpIgnition->>KarpIgnition: Remove kubeletConfigFinalizer if present
    end

    User->>NP: Create NodePool referencing OpenshiftEC2NodeClass
    NP->>EC2NC: Populate EC2NodeClass.Kubelet via KarpenterKubeletConfiguration()
    NP->>Node: Provision node referencing Kubelet ConfigMap
    Node->>KubeletCM: Read and apply kubelet config
Loading
sequenceDiagram
    actor Operator
    participant KarController as KarpenterController
    participant MgrCM as Management ConfigMap\n(karpenter taint)
    participant Support as support/karpenter helpers

    Operator->>KarController: Reconcile loop
    KarController->>Support: KarpenterTaintConfigManifest()
    Support-->>KarController: YAML manifest
    KarController->>MgrCM: Create/Update global taint ConfigMap (data["config"]=manifest)
    MgrCM-->>KarController: Created/Updated or Error
Loading
🚥 Pre-merge checks | ✅ 7 | ❌ 5

❌ Failed checks (5 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Test code manually patches finalizers instead of calling Reconcile(), verifying state mutation but not controller logic execution. Refactor finalizer tests to call r.Reconcile() directly instead of manually manipulating finalizers to verify the controller's own logic is exercised.
Microshift Test Compatibility ⚠️ Warning The e2e test TestKarpenter uses MicroShift-unavailable APIs (Karpenter NodePool, AWS EC2NodeClass) and features with no protection mechanisms to prevent execution on MicroShift clusters. Add [Skipped:MicroShift] label to test name or wrap with exutil.IsMicroShiftCluster() check that calls g.Skip() to prevent execution on MicroShift.
Single Node Openshift (Sno) Test Compatibility ⚠️ Warning The testKubeletPropagation e2e test lacks SNO protection and will fail on Single Node OpenShift clusters. Add [Skipped:SingleReplicaTopology] label to test or use exutil.IsSingleNode() check with g.Skip().
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning Pod manifest specifies 'image: alpine' without registry prefix, requiring external pull from Docker Hub in disconnected environments; test lacks IP family detection for IPv6-only clusters. Use image from internal registry, add IP family detection with GetIPAddressFamily(), or add [Skipped:Disconnected] tag for disconnected cluster scenarios.
✅ Passed checks (7 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly summarizes the main change: exposing KubeletConfig on OpenShiftEC2NodeClass as structured fields with overflow preservation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test function names referenced in the PR (TestReconcileTaintConfigMap, TestCreateInMemoryNodePool, TestReconcileKubeletConfigMap, TestReconcileDeletedNodeClass) are stable, deterministic, and contain no dynamic values like timestamps, UUIDs, or generated identifiers.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces karpenter integration without topology-aware scheduling constraints. Changes include kubelet config API types, controller reconciliation logic, and e2e tests with no affinity rules or control-plane node assumptions.
Ote Binary Stdout Contract ✅ Passed Module-level variable initializer contains panic(err) in IIFE, but panic writes to stderr, not stdout. Embedded YAML is valid, so panic condition should never execute. Code does not emit non-JSON content to stdout.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jkyros
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Apr 9, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 73.98844% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.38%. Comparing base (222a19f) to head (af5b611).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
.../karpenterignition/karpenterignition_controller.go 73.60% 22 Missing and 11 partials ⚠️
...ator/controllers/karpenter/karpenter_controller.go 66.66% 5 Missing and 1 partial ⚠️
support/karpenter/karpenter.go 80.76% 4 Missing and 1 partial ⚠️
...trollers/hostedcluster/hostedcluster_controller.go 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8192      +/-   ##
==========================================
+ Coverage   36.30%   36.38%   +0.07%     
==========================================
  Files         764      764              
  Lines       93015    93150     +135     
==========================================
+ Hits        33772    33892     +120     
- Misses      56530    56533       +3     
- Partials     2713     2725      +12     
Files with missing lines Coverage Δ
...ft-operator/controllers/hostedcluster/karpenter.go 75.49% <100.00%> (+12.16%) ⬆️
.../controllers/nodeclass/ec2_nodeclass_controller.go 50.72% <100.00%> (+0.08%) ⬆️
...trollers/hostedcluster/hostedcluster_controller.go 43.23% <0.00%> (ø)
support/karpenter/karpenter.go 71.69% <80.76%> (+8.73%) ⬆️
...ator/controllers/karpenter/karpenter_controller.go 28.44% <66.66%> (+2.22%) ⬆️
.../karpenterignition/karpenterignition_controller.go 64.94% <73.60%> (+2.27%) ⬆️
Flag Coverage Δ
cmd-support 30.06% <80.76%> (+0.04%) ⬆️
cpo-hostedcontrolplane 37.05% <ø> (ø)
cpo-other 35.69% <ø> (ø)
hypershift-operator 47.94% <66.66%> (+0.05%) ⬆️
other 28.33% <72.91%> (+0.64%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 9, 2026

/test e2e-aws-autonode

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 9, 2026

Heyyy that is super cool, I don't have to have my claude watch and root cause test failures anymore
/test e2e-aws-autonode

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 10, 2026

KubeletConfig passed, teardown failure. One more time
/test e2e-aws-autonode

// kubelet configures kubelet settings for nodes provisioned by this NodeClass.
// These settings are injected into the node's ignition configuration via MachineConfig.
// +optional
Kubelet *KubeletConfiguration `json:"kubelet,omitempty"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove pointer and use omitzero

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointer removed, added omitzero

Comment thread api/.golangci.yml

- linters:
- kubeapilinter
path: karpenter/v1beta1/karpenter_types.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not requiring minimum properties 1?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimum properties 1 doesn't play nice with the custom JSON marshaller we need to capture the preserved field "overflow", it's how the controller-gen code is written. Our "API hackery" apparently has a cost 😞

// +kubebuilder:object:generate=false
// +kubebuilder:pruning:PreserveUnknownFields
// +kubebuilder:validation:XValidation:rule="!has(self.imageGCHighThresholdPercent) || !has(self.imageGCLowThresholdPercent) || self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent",message="imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent"
type KubeletConfiguration struct {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need test coverage once d448ab4 merges

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can either come back afterwards and add it, or we can wait for that to merge and then do this one, and I can add it here. I'll set a test up based on your branch in the mean time.

@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 10, 2026

/test e2e-aws-autonode

@maxcao13
Copy link
Copy Markdown
Member

tests are just taking too long i think 😂

We can see TestKarpenter hitting the 2 hour mark which is when it stops.

if openshiftEC2NodeClass.Spec.Kubelet == nil {
if err := r.ManagementClient.Delete(ctx, cm); err != nil && !apierrors.IsNotFound(err) {
return fmt.Errorf("failed to delete kubelet config configmap %s: %w", configMapName, err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: but I think we can use supportutil.DeleteIfNeeded() so we don't have to check for IsNotFound error manually.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to supportutil.DeleteIfNeeded(), thanks!


// When Spec.Kubelet is set, only the per-nodeclass kubelet config ref is included.
// set-karpenter-taint is omitted because the taint is merged into the per-nodeclass
// manifest via ToKubeletConfigManifestWithTaints to avoid two KubeletConfigs targeting
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function name still accurate?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, it wasn't. I changed it when I refactored this from the old way. Updated. Thanks!

})
}

func TestReconcileKubeletConfigMap(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check that overflow fields would still show up here too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, added a test case to check that

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2026
@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch from c11822a to 4d6300e Compare April 15, 2026 07:03
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 15, 2026

@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

What this PR does / why we need it:

  • Exposes spec.Kubelet on OpenShiftEC2NodeClass as a set of structured fields (the ones Karpenter needs for scheduling/bin packing) + preserves unknown
  • Reconciles the structured fields to Karpenter's ec2nodeclass so it can use them
  • Preserves the unstructured fields and sends them on to ignition so they make it to the node

Which issue(s) this PR fixes:

Fixes
AUTOSCALE-558

Special notes for your reviewer:

  • CEL expressions can't see inside the unstructured 😞
  • This tries to give us the approximate behavior we wanted from our sync discussion
  • The API Guidelines for OpenShift APIs want the bools to be enums, but that's going to be a weird corner if the karpenter-specific bools are enums and the rest arent. I left them as bools and marked them out of the linter, I will adjust it however you want

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

Release Notes

  • New Features

  • Added kubelet configuration field to NodeClass specifications with support for image garbage collection thresholds, eviction policies, and resource reservation settings that are applied to provisioned nodes.

  • Tests

  • Added comprehensive tests for kubelet configuration lifecycle management, YAML serialization, and end-to-end validation of kubelet settings on nodes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (2)

503-510: Note: Shallow merge will fully replace registerWithTaints if user-settable in future.

The current implementation correctly ensures the Karpenter base taints always win. However, mergeKubeletConfigMaps performs a shallow merge—if registerWithTaints ever becomes user-settable (as the comment at line 504 acknowledges it currently isn't), user-provided taints would be fully replaced rather than merged with the base taints.

If this is intentional for security/correctness reasons (enforcing only Karpenter's taints), consider documenting this in the API or adding a more explicit comment. If future user-defined taints should coexist with the base taints, the merge logic would need to be enhanced.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`
around lines 503 - 510, The current shallow merge in mergeKubeletConfigMaps will
entirely replace registerWithTaints if users set it; decide and implement one of
two fixes: (A) make the behavior explicit by updating the comment near the call
and the API docs to state that karpenterutil.KarpenterBaseTaintMap() always
overrides user registerWithTaints, or (B) change mergeKubeletConfigMaps so that
for the "registerWithTaints" key it performs a deep/semantic merge — parse both
nodeClassKubeletMap["registerWithTaints"] and
karpenterutil.KarpenterBaseTaintMap() as taint lists, concatenate them ensuring
base taints are appended last and deduplicate by taint identity — then pass that
merged value into kubeletConfigManifest; update tests accordingly.

237-240: Minor: Log message may be misleading if ConfigMap didn't exist.

supportutil.DeleteIfNeeded returns successfully even when the ConfigMap doesn't exist (was already deleted or never created). The log message "Deleted kubelet config ConfigMap" could be slightly inaccurate in that case. Consider logging based on the return value.

🔧 Optional improvement
-	if _, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm); err != nil {
+	deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm)
+	if err != nil {
 		return ctrl.Result{}, fmt.Errorf("failed to delete kubelet config configmap %s: %w", configMapName, err)
 	}
-	log.Info("Deleted kubelet config ConfigMap", "name", configMapName)
+	if deleted {
+		log.Info("Deleted kubelet config ConfigMap", "name", configMapName)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`
around lines 237 - 240, supportutil.DeleteIfNeeded currently returns a boolean
indicating whether a delete actually happened; change the call to capture that
value (e.g., deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient,
cm)), keep the error handling the same, and log conditionally: if deleted log
"Deleted kubelet config ConfigMap" with configMapName via log.Info, otherwise
log "Kubelet config ConfigMap not present, nothing to delete" (or similar) so
the message reflects whether the resource existed; reference symbols:
supportutil.DeleteIfNeeded, r.ManagementClient, cm, configMapName, log.Info.
api/.golangci.yml (1)

332-347: Consider anchoring exclusion patterns to prevent unintended matches.

These new exclusion rules follow the existing convention in the file (unanchored patterns), but anchoring text patterns with ^...$ will reduce the risk of unintentionally suppressing future kubeapilinter findings. For example, the unanchored nomaps pattern could match unintended messages if field names are extended (e.g., SystemReservedExtra would match the current regex).

This is consistent with all other rules in the file; if anchoring is adopted, apply it uniformly across the entire exclusions list rather than just these additions.

Example: anchored text pattern
-        text: 'nomaps: field KubeletConfiguration\.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)'
+        text: '^nomaps: field KubeletConfiguration\.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)$'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/.golangci.yml` around lines 332 - 347, Anchor the new kubeapilinter
exclusion text patterns to avoid accidental matches: update the three added
entries so their text fields use anchored regexes (e.g., '^minlength: field
OpenshiftEC2NodeClassSpec.Kubelet type KubeletConfiguration must have a minimum
properties, add kubebuilder:validation:MinProperties marker$' and similarly
anchor the 'optionalfields' message for OpenshiftEC2NodeClassSpec.Kubelet and
the 'nodurations' message for KubeletConfiguration.EvictionSoftGracePeriod), and
change the 'nomaps' pattern that targets
KubeletConfiguration.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)
to an anchored form '^nomaps: field
KubeletConfiguration\\.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)$';
apply the same anchoring convention consistently across the exclusions list so
future kubeapilinter messages won't be accidentally suppressed.
test/e2e/karpenter_kubelet_checker_pod.yaml (1)

6-8: Disable service account token automount for this privileged host-mounted pod

This pod does not call the Kubernetes API. Setting automountServiceAccountToken: false reduces credential exposure in a privileged context.

Proposed hardening
 spec:
+  automountServiceAccountToken: false
   restartPolicy: Never

Also applies to: 67-76

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/karpenter_kubelet_checker_pod.yaml` around lines 6 - 8, Add
automountServiceAccountToken: false to the Pod spec to prevent automatic
mounting of service account tokens for this privileged host-mounted pod; update
the spec block (the top-level pod spec where restartPolicy and containers are
defined) and also apply the same change to the other Pod manifest instance
referenced (the block covering lines 67-76) so both pod specs explicitly set
automountServiceAccountToken: false.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/karpenter_kubelet_checker_pod.yaml`:
- Around line 20-23: The check() helper currently uses unscoped grep -q with
"$@" against "$FILE" which can match identical values in systemReserved and
produce false positives for kubeReserved; change the matching to literal string
matching (use grep -F -q) or implement section-aware parsing to restrict
searches to the kubeReserved block within FILE when verifying values, and update
references to the check() invocation accordingly (e.g., where check() is used to
validate kubeReserved entries). Also harden the diagnostic pod spec by setting
automountServiceAccountToken: false on the pod spec to avoid unnecessary service
account token exposure.

---

Nitpick comments:
In `@api/.golangci.yml`:
- Around line 332-347: Anchor the new kubeapilinter exclusion text patterns to
avoid accidental matches: update the three added entries so their text fields
use anchored regexes (e.g., '^minlength: field OpenshiftEC2NodeClassSpec.Kubelet
type KubeletConfiguration must have a minimum properties, add
kubebuilder:validation:MinProperties marker$' and similarly anchor the
'optionalfields' message for OpenshiftEC2NodeClassSpec.Kubelet and the
'nodurations' message for KubeletConfiguration.EvictionSoftGracePeriod), and
change the 'nomaps' pattern that targets
KubeletConfiguration.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)
to an anchored form '^nomaps: field
KubeletConfiguration\\.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)$';
apply the same anchoring convention consistently across the exclusions list so
future kubeapilinter messages won't be accidentally suppressed.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 503-510: The current shallow merge in mergeKubeletConfigMaps will
entirely replace registerWithTaints if users set it; decide and implement one of
two fixes: (A) make the behavior explicit by updating the comment near the call
and the API docs to state that karpenterutil.KarpenterBaseTaintMap() always
overrides user registerWithTaints, or (B) change mergeKubeletConfigMaps so that
for the "registerWithTaints" key it performs a deep/semantic merge — parse both
nodeClassKubeletMap["registerWithTaints"] and
karpenterutil.KarpenterBaseTaintMap() as taint lists, concatenate them ensuring
base taints are appended last and deduplicate by taint identity — then pass that
merged value into kubeletConfigManifest; update tests accordingly.
- Around line 237-240: supportutil.DeleteIfNeeded currently returns a boolean
indicating whether a delete actually happened; change the call to capture that
value (e.g., deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient,
cm)), keep the error handling the same, and log conditionally: if deleted log
"Deleted kubelet config ConfigMap" with configMapName via log.Info, otherwise
log "Kubelet config ConfigMap not present, nothing to delete" (or similar) so
the message reflects whether the resource existed; reference symbols:
supportutil.DeleteIfNeeded, r.ManagementClient, cm, configMapName, log.Info.

In `@test/e2e/karpenter_kubelet_checker_pod.yaml`:
- Around line 6-8: Add automountServiceAccountToken: false to the Pod spec to
prevent automatic mounting of service account tokens for this privileged
host-mounted pod; update the spec block (the top-level pod spec where
restartPolicy and containers are defined) and also apply the same change to the
other Pod manifest instance referenced (the block covering lines 67-76) so both
pod specs explicitly set automountServiceAccountToken: false.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: c242fdcc-7af5-4987-ba5a-e303736aa7e4

📥 Commits

Reviewing files that changed from the base of the PR and between 4d6300e and 4043705.

⛔ Files ignored due to path filters (9)
  • api/karpenter/v1beta1/zz_generated.deepcopy.go is excluded by !**/zz_generated*.go, !**/zz_generated*
  • client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.go is excluded by !client/**
  • client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.go is excluded by !client/**
  • client/applyconfiguration/utils.go is excluded by !client/**
  • karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yaml is excluded by !karpenter-operator/controllers/karpenter/assets/*.yaml
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config_karpenter.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*.go, !**/zz_generated*
📒 Files selected for processing (16)
  • api/.golangci.yml
  • api/karpenter/v1beta1/karpenter_types.go
  • api/karpenter/v1beta1/kubelet_config.go
  • api/karpenter/v1beta1/kubelet_config_karpenter.go
  • api/karpenter/v1beta1/kubelet_config_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/karpenter.go
  • karpenter-operator/controllers/karpenter/karpenter_controller.go
  • karpenter-operator/controllers/karpenter/karpenter_controller_test.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
  • karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
  • support/karpenter/karpenter.go
  • support/karpenter/karpenter_test.go
  • test/e2e/karpenter_kubelet_checker_pod.yaml
  • test/e2e/karpenter_test.go
✅ Files skipped from review due to trivial changes (4)
  • api/karpenter/v1beta1/kubelet_config_test.go
  • karpenter-operator/controllers/karpenter/karpenter_controller_test.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
  • api/karpenter/v1beta1/kubelet_config.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • api/karpenter/v1beta1/kubelet_config_karpenter.go
  • support/karpenter/karpenter_test.go
  • api/karpenter/v1beta1/karpenter_types.go
  • support/karpenter/karpenter.go
  • test/e2e/karpenter_test.go
  • hypershift-operator/controllers/hostedcluster/karpenter.go

Comment thread test/e2e/karpenter_kubelet_checker_pod.yaml
@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch from 4043705 to 5c18545 Compare April 17, 2026 05:23
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 17, 2026

@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

What this PR does / why we need it:

  • Exposes spec.Kubelet on OpenShiftEC2NodeClass as a set of structured fields (the ones Karpenter needs for scheduling/bin packing) + preserves unknown
  • Reconciles the structured fields to Karpenter's ec2nodeclass so it can use them
  • Preserves the unstructured fields and sends them on to ignition so they make it to the node

Which issue(s) this PR fixes:

Fixes
AUTOSCALE-558

Special notes for your reviewer:

  • CEL expressions can't see inside the unstructured 😞
  • This tries to give us the approximate behavior we wanted from our sync discussion
  • The API Guidelines for OpenShift APIs want the bools to be enums, but that's going to be a weird corner if the karpenter-specific bools are enums and the rest arent. I left them as bools and marked them out of the linter, I will adjust it however you want

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • New Features

  • NodeClass now accepts kubelet configuration to propagate kubelet settings (image GC, eviction policies, reservations, MaxPods, pods-per-core, CPU CFS quota, etc.) to provisioned nodes.

  • Per-NodeClass kubelet ConfigMap lifecycle, a cluster taint ConfigMap, and finalizer handling are reconciled to apply and merge kubelet and taint settings.

  • Tests

  • Added unit and e2e tests for kubelet config JSON round-trips, unknown-field preservation, mapping to provisioners, ConfigMap lifecycle, finalizers, and runtime node validation (checker pod).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)

92-120: ⚠️ Potential issue | 🟠 Major

Handle deleting NodeClasses before requiring the HCP.

Line 92 returns early when the HostedControlPlane is already gone, so Lines 117-120 never reach reconcileDeletedNodeClass. That can leave kubeletConfigFinalizer stuck on a terminating OpenshiftEC2NodeClass even though the management-side ConfigMap is already gone with the HCP namespace. The deletion path should still run when the HCP lookup returns not found.

♻️ Duplicate comments (1)
test/e2e/karpenter_kubelet_checker_pod.yaml (1)

20-23: ⚠️ Potential issue | 🟡 Minor

Use literal matching in grep helpers to reduce false positives.

grep -q treats patterns as regex; these checks are exact-string assertions and should be literal.

🔧 Suggested fix
       check() {
         local label="$1"; shift
-        if grep -q "$@" "$FILE"; then
+        if grep -Fq -- "$*" "$FILE"; then
           echo "PASS: $label"
         else
           echo "FAIL: $label  (pattern: $*)"
           FAIL=1
         fi
       }
@@
       check_env() {
         local label="$1"; shift
-        if grep -q "$@" "$ENV_FILE"; then
+        if grep -Fq -- "$*" "$ENV_FILE"; then
           echo "PASS: $label"
         else
           echo "FAIL: $label  (pattern: $*)"
           FAIL=1
         fi
       }

Also applies to: 30-33

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/karpenter_kubelet_checker_pod.yaml` around lines 20 - 23, The helper
check() currently uses grep -q "$@" "$FILE" which treats the patterns as
regexes; change those calls to use literal matching (e.g. grep -F -q "$@"
"$FILE") — or grep -F -x -q if you need whole-line exact matches — so the
assertions are performed as exact strings; update the occurrence inside check()
and the duplicate occurrence around lines 30-33 to use grep -F (and -x if
appropriate).
🧹 Nitpick comments (1)
support/karpenter/karpenter_test.go (1)

189-210: Avoid single-element assumption in taint manifest assertion.

At Line 205, the test only checks taints[0]. This both misses validation for additional taints and can panic if the base taint list is ever empty. Prefer iterating and asserting all entries (as done in TestKarpenterBaseTaintMap).

Diff suggestion
-		taint, ok := taints[0].(map[string]interface{})
-		g.Expect(ok).To(BeTrue())
-		g.Expect(taint["key"]).To(Equal(KarpenterBaseTaints[0].Key))
-		g.Expect(taint["value"]).To(Equal(KarpenterBaseTaints[0].Value))
-		g.Expect(taint["effect"]).To(Equal(string(KarpenterBaseTaints[0].Effect)))
+		for i, entry := range taints {
+			taint, ok := entry.(map[string]interface{})
+			g.Expect(ok).To(BeTrue())
+			g.Expect(taint["key"]).To(Equal(KarpenterBaseTaints[i].Key))
+			g.Expect(taint["value"]).To(Equal(KarpenterBaseTaints[i].Value))
+			g.Expect(taint["effect"]).To(Equal(string(KarpenterBaseTaints[i].Effect)))
+		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@support/karpenter/karpenter_test.go` around lines 189 - 210, The test assumes
at least one taint and only asserts taints[0], which can panic or miss
mismatches; update the test in the KarpenterTaintConfigManifest case to iterate
over the parsed taints slice (`taints`) and for each index i assert that the
parsed taint map's "key", "value", and "effect" equal
KarpenterBaseTaints[i].Key, .Value and string(.Effect) respectively (use the
existing `taints`, `kubeletConfig`, `spec`, and `KarpenterBaseTaints`
identifiers), so all entries are validated and no single-element assumption is
made.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/karpenter/v1beta1/kubelet_config_karpenter.go`:
- Around line 11-27: The current KarpenterKubeletConfiguration method uses
spec.Kubelet.IsZero() which is too broad and returns an empty
KubeletConfiguration when only overflow fields are set; instead construct the
awskarpenterv1.KubeletConfiguration value from the mapped fields
(ImageGCHighThresholdPercent, ImageGCLowThresholdPercent, MaxPods, CPUCFSQuota,
EvictionHard, EvictionSoft, EvictionSoftGracePeriod, EvictionMaxPodGracePeriod,
PodsPerCore, SystemReserved, KubeReserved) into a local variable and then return
nil if every one of those mapped fields is empty/zero (otherwise return
&thatLocalValue), removing the reliance on spec.Kubelet.IsZero().

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 158-185: After successfully patching finalizers on
openshiftEC2NodeClass (the r.GuestClient.Patch calls that add/remove
kubeletConfigFinalizer), re-fetch the latest object or requeue immediately to
avoid using a stale in-memory resourceVersion (which causes optimistic lock 409s
in updateConfigVersionAnnotation/reconcileKubeletConfigMap). Concretely: after
each successful Patch call in this block, either call r.GuestClient.Get(ctx,
client.ObjectKeyFromObject(openshiftEC2NodeClass), openshiftEC2NodeClass) to
reload the resource before continuing, or return ctrl.Result{Requeue:true} so
the next reconcile runs with the up-to-date object.

In `@test/e2e/karpenter_kubelet_checker_pod.yaml`:
- Line 10: The Pod spec currently uses an unpinned image reference "image:
alpine"; replace this with a pinned image digest to ensure deterministic e2e
runs (use the canonical format like alpine@sha256:<digest> or
alpine:<tag>@sha256:<digest>). Update the image field in the YAML (the container
image entry in karpenter_kubelet_checker_pod.yaml) to a specific verified digest
for the desired Alpine version, and commit the new digest string so the tests
pull an immutable, reproducible image.

---

Duplicate comments:
In `@test/e2e/karpenter_kubelet_checker_pod.yaml`:
- Around line 20-23: The helper check() currently uses grep -q "$@" "$FILE"
which treats the patterns as regexes; change those calls to use literal matching
(e.g. grep -F -q "$@" "$FILE") — or grep -F -x -q if you need whole-line exact
matches — so the assertions are performed as exact strings; update the
occurrence inside check() and the duplicate occurrence around lines 30-33 to use
grep -F (and -x if appropriate).

---

Nitpick comments:
In `@support/karpenter/karpenter_test.go`:
- Around line 189-210: The test assumes at least one taint and only asserts
taints[0], which can panic or miss mismatches; update the test in the
KarpenterTaintConfigManifest case to iterate over the parsed taints slice
(`taints`) and for each index i assert that the parsed taint map's "key",
"value", and "effect" equal KarpenterBaseTaints[i].Key, .Value and
string(.Effect) respectively (use the existing `taints`, `kubeletConfig`,
`spec`, and `KarpenterBaseTaints` identifiers), so all entries are validated and
no single-element assumption is made.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 7ced5847-9203-4a20-ad1a-01f97180ccd4

📥 Commits

Reviewing files that changed from the base of the PR and between 4043705 and 5c18545.

⛔ Files ignored due to path filters (9)
  • api/karpenter/v1beta1/zz_generated.deepcopy.go is excluded by !**/zz_generated*.go, !**/zz_generated*
  • client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.go is excluded by !client/**
  • client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.go is excluded by !client/**
  • client/applyconfiguration/utils.go is excluded by !client/**
  • karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yaml is excluded by !karpenter-operator/controllers/karpenter/assets/*.yaml
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config_karpenter.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*.go, !**/zz_generated*
📒 Files selected for processing (16)
  • api/.golangci.yml
  • api/karpenter/v1beta1/karpenter_types.go
  • api/karpenter/v1beta1/kubelet_config.go
  • api/karpenter/v1beta1/kubelet_config_karpenter.go
  • api/karpenter/v1beta1/kubelet_config_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/karpenter.go
  • karpenter-operator/controllers/karpenter/karpenter_controller.go
  • karpenter-operator/controllers/karpenter/karpenter_controller_test.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
  • karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
  • support/karpenter/karpenter.go
  • support/karpenter/karpenter_test.go
  • test/e2e/karpenter_kubelet_checker_pod.yaml
  • test/e2e/karpenter_test.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
  • karpenter-operator/controllers/karpenter/karpenter_controller.go
  • karpenter-operator/controllers/karpenter/karpenter_controller_test.go
  • api/karpenter/v1beta1/karpenter_types.go
  • support/karpenter/karpenter.go
  • test/e2e/karpenter_test.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go

Comment thread api/karpenter/v1beta1/kubelet_config_karpenter.go
Comment on lines +158 to +185
// If spec.kubelet is configured, add a finalizer to clean up the configmap. We can't just use owner
// references because this is cross cluster (the configmap lives in the control plane)
if !openshiftEC2NodeClass.Spec.Kubelet.IsZero() {

if !controllerutil.ContainsFinalizer(openshiftEC2NodeClass, kubeletConfigFinalizer) {
original := openshiftEC2NodeClass.DeepCopy()
controllerutil.AddFinalizer(openshiftEC2NodeClass, kubeletConfigFinalizer)
if err := r.GuestClient.Patch(ctx, openshiftEC2NodeClass,
client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to add kubelet config finalizer: %w", err)
}
}
}

if err := r.reconcileKubeletConfigMap(ctx, hcp, openshiftEC2NodeClass); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to reconcile kubelet config configmap: %w", err)
}

// The reconcile will have deleted the configmap if we make it here, so we can
// remove the finalizer
if openshiftEC2NodeClass.Spec.Kubelet.IsZero() && controllerutil.ContainsFinalizer(openshiftEC2NodeClass, kubeletConfigFinalizer) {
original := openshiftEC2NodeClass.DeepCopy()
controllerutil.RemoveFinalizer(openshiftEC2NodeClass, kubeletConfigFinalizer)
if err := r.GuestClient.Patch(ctx, openshiftEC2NodeClass,
client.MergeFromWithOptions(original, client.MergeFromWithOptimisticLock{})); err != nil {
return ctrl.Result{}, fmt.Errorf("failed to remove kubelet config finalizer: %w", err)
}
}
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Re-fetch after finalizer patches.

The metadata patches at Lines 162-168 and 179-183 advance resourceVersion, but reconcile continues with the stale in-memory object. updateConfigVersionAnnotation() later uses optimistic locking, so the first reconcile after adding/removing this finalizer can 409 on itself. Re-get the NodeClass, or return/requeue, after these patches.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`
around lines 158 - 185, After successfully patching finalizers on
openshiftEC2NodeClass (the r.GuestClient.Patch calls that add/remove
kubeletConfigFinalizer), re-fetch the latest object or requeue immediately to
avoid using a stale in-memory resourceVersion (which causes optimistic lock 409s
in updateConfigVersionAnnotation/reconcileKubeletConfigMap). Concretely: after
each successful Patch call in this block, either call r.GuestClient.Get(ctx,
client.ObjectKeyFromObject(openshiftEC2NodeClass), openshiftEC2NodeClass) to
reload the resource before continuing, or return ctrl.Result{Requeue:true} so
the next reconcile runs with the up-to-date object.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.) wouldn't the 409 result in a requeue anyway, which I assume would be fine and/or b.) does patch not update the "local" object too when it patches?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkyros — you're correct on both counts, and my concern was overstated:

b) Yes, Patch updates the local object in-place. controller-runtime's client.Patch() decodes the API server's response body back into the object pointer you pass. So after a successful r.GuestClient.Patch(ctx, openshiftEC2NodeClass, ...), the in-memory openshiftEC2NodeClass already has the updated resourceVersion from the server. The subsequent updateConfigVersionAnnotation (which does its own original := openshiftEC2NodeClass.DeepCopy() + optimistic-lock patch) would be working with the correct, up-to-date version — no stale-lock issue.

a) And even if something did go wrong, the patch errors are properly propagated (return ctrl.Result{}, fmt.Errorf(...)), so any 409 would surface as an error, get requeued with backoff, and the next reconcile would start with a fresh Get. No silent corruption.

My original comment was incorrect — there's no real risk here. No changes needed.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

Comment thread test/e2e/karpenter_kubelet_checker_pod.yaml
Comment on lines +1197 to +1198
t.Logf("Make sure OpenshiftEC2NodeClass %q is Ready before nodepool creation", nc.Name)
e2eutil.EventuallyObject(t, ctx, fmt.Sprintf("OpenshiftEC2NodeClass %q to be Ready", nc.Name),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we just put this ready check in the polling above on line 1173?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but those blocks are checking different things with different timings from different controllers. Feels like we'd be trading clarity for...I don't know what, saving a Get call? Unless I'm missing something expensive here. I think we should leave it.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 21, 2026
@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch 2 times, most recently from 985e242 to 4697b7f Compare April 23, 2026 02:16
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 23, 2026
@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 23, 2026

Rebased, slightly re-arranged IsZero so KarpenterKubeletConfiguration() can return a nil if only overflow fields are set.
/test e2e-aws-autonode

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)

162-170: ⚠️ Potential issue | 🟡 Minor

Stale resourceVersion after finalizer patch.

After the finalizer patch at lines 165-168, the in-memory openshiftEC2NodeClass retains its pre-patch resourceVersion. When updateConfigVersionAnnotation is later called with optimistic locking, it may 409 on the first reconcile after adding this finalizer.

Consider re-fetching the object after successful finalizer patch, or returning early to requeue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`
around lines 162 - 170, The finalizer Add+Patch on openshiftEC2NodeClass
(controllerutil.AddFinalizer + r.GuestClient.Patch) leaves the in-memory object
with a stale resourceVersion which can cause optimistic-lock 409s in subsequent
updateConfigVersionAnnotation calls; after a successful Patch either re-fetch
the resource into openshiftEC2NodeClass via r.GuestClient.Get (using the same
ctx and namespaced name) so the object has the updated resourceVersion, or
immediately return ctrl.Result{Requeue: true} to let a fresh reconcile read the
updated resource; apply this change where the finalizer is added (the block
referencing kubeletConfigFinalizer and r.GuestClient.Patch) and ensure
updateConfigVersionAnnotation uses the refreshed object or runs on the next
reconcile.
🧹 Nitpick comments (3)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go (2)

1174-1259: These finalizer subtests are patching state instead of exercising the controller path.

Both branches mutate nodeClass.Finalizers directly and then assert the patched object, so they still pass if Reconcile() stops adding or removing kubeletConfigFinalizer. Please drive these expectations through the reconcile/helper code that actually owns the mutation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`
around lines 1174 - 1259, Tests are mutating nodeClass.Finalizers directly
instead of invoking the reconciler logic; replace those direct Patch
manipulations with calls into the reconciler helper that owns finalizer changes
(use KarpenterIgnitionReconciler.reconcileKubeletConfigMap or Reconcile() as
appropriate). Specifically, in the "finalizer present" removal test set
nodeClass.Spec.Kubelet = nil (if not already), create the reconciler r with
fakeManagementClient/fakeGuestClient, call r.reconcileKubeletConfigMap(ctx, hcp,
nodeClass) (or r.Reconcile(...) if that drives the same path), then Get the
nodeClass from fakeGuestClient and assert kubeletConfigFinalizer is gone;
similarly, for the addition test remove the manual fakeGuestClient.Patch that
appends the finalizer and instead let r.reconcileKubeletConfigMap drive
finalizer addition and then verify via Get. Ensure you reference
KarpenterIgnitionReconciler, reconcileKubeletConfigMap, Reconcile(), and
kubeletConfigFinalizer when locating code to change.

1480-1509: Add test coverage for overflow-only kubelet configuration.

Current test cases only verify behavior when typed fields like MaxPods are set. Add a test case where only overflow fields (e.g., {"podPidsLimit":4096}) are configured to ensure the per-nodeclass kubelet config ref is included. This improves test coverage and helps prevent regressions in the IsZero() check logic.

Suggested test shape
+	t.Run("When kubelet config has only overflow fields it should still use the per-nodeclass config", func(t *testing.T) {
+		g := NewWithT(t)
+		var kubeletCfg hyperkarpenterv1.KubeletConfiguration
+		g.Expect(json.Unmarshal([]byte(`{"podPidsLimit":4096}`), &kubeletCfg)).To(Succeed())
+
+		nodeClass := &hyperkarpenterv1.OpenshiftEC2NodeClass{
+			ObjectMeta: metav1.ObjectMeta{Name: testNodeClassName},
+			Spec: hyperkarpenterv1.OpenshiftEC2NodeClassSpec{Kubelet: kubeletCfg},
+		}
+
+		np := r.createInMemoryNodePool(hcp, nodeClass, hcp.Spec.ReleaseImage)
+		g.Expect(np.Spec.Config).To(HaveLen(1))
+		g.Expect(np.Spec.Config[0].Name).To(Equal(karpenterutil.KarpenterNodeClassKubeletConfigName(testNodeClassName)))
+	})

Also applies to: 1619-1678

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`
around lines 1480 - 1509, Add a new subtest alongside the existing "When kubelet
config is set..." case that constructs an OpenshiftEC2NodeClass whose
Spec.Kubelet only sets overflow fields (e.g., raw map/json with
podPidsLimit:4096 or equivalent overflow-only field), call
r.createInMemoryNodePool(hcp, nodeClass, hcp.Spec.ReleaseImage), and assert the
NodePool still contains exactly one Config entry whose Name equals
karpenterutil.KarpenterNodeClassKubeletConfigName(testNodeClassName) (and other
existing expectations like
np.Name/Namespace/Labels/ClusterName/Replicas/Release.Image/Arch). This ensures
the IsZero/emptiness logic treats overflow-only kubelet configuration as
non-empty; reference the test variables and helpers used in the file (NewWithT,
testNodeClassName, createInMemoryNodePool,
karpenterutil.KarpenterNodeClassKubeletConfigName).
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)

237-240: Log message may be misleading when ConfigMap was already absent.

supportutil.DeleteIfNeeded returns (deleted bool, error). The log at line 240 unconditionally states "Deleted kubelet config ConfigMap" even when the ConfigMap didn't exist. Consider using the return value for accurate logging.

🔧 Suggested fix
-	if _, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm); err != nil {
+	deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm)
+	if err != nil {
 		return ctrl.Result{}, fmt.Errorf("failed to delete kubelet config configmap %s: %w", configMapName, err)
 	}
-	log.Info("Deleted kubelet config ConfigMap", "name", configMapName)
+	if deleted {
+		log.Info("Deleted kubelet config ConfigMap", "name", configMapName)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`
around lines 237 - 240, supportutil.DeleteIfNeeded returns (deleted bool, error)
but the code ignores the deleted bool and always logs "Deleted kubelet config
ConfigMap"; change the call in karpenterignition_controller.go to capture the
boolean (e.g., deleted, err := supportutil.DeleteIfNeeded(ctx,
r.ManagementClient, cm)), handle err as before, and use the deleted flag to log
a precise message: log.Info("Deleted kubelet config ConfigMap", "name",
configMapName) only when deleted is true and log.Info("Kubelet config ConfigMap
already absent", "name", configMapName) when deleted is false.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/karpenter/v1beta1/kubelet_config.go`:
- Around line 79-82: Add a non-negative validation constraint to the
EvictionMaxPodGracePeriod field so negative values are rejected at admission
time: add the kubebuilder validation marker (e.g. //
+kubebuilder:validation:Minimum=0) immediately above the
EvictionMaxPodGracePeriod declaration and keep the field as *int32
`json:"evictionMaxPodGracePeriod,omitempty"` so the CRD generator will emit a
Minimum=0 schema constraint; after adding the marker regenerate CRDs/clients as
needed.
- Around line 47-72: Update the kubebuilder XValidation annotations for
SystemReserved, KubeReserved, EvictionHard, and EvictionSoft to reject empty
string values: add an additional XValidation rule that asserts each map value is
non-empty (e.g. rule="self.all(x, self[x] != '')" with an appropriate message
like "value for <field> cannot be empty") so admission rejects entries with ""
(in addition to existing key and negative-value checks); locate the annotations
on the SystemReserved, KubeReserved, EvictionHard, and EvictionSoft fields and
append the new validation clause for each.

---

Duplicate comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 162-170: The finalizer Add+Patch on openshiftEC2NodeClass
(controllerutil.AddFinalizer + r.GuestClient.Patch) leaves the in-memory object
with a stale resourceVersion which can cause optimistic-lock 409s in subsequent
updateConfigVersionAnnotation calls; after a successful Patch either re-fetch
the resource into openshiftEC2NodeClass via r.GuestClient.Get (using the same
ctx and namespaced name) so the object has the updated resourceVersion, or
immediately return ctrl.Result{Requeue: true} to let a fresh reconcile read the
updated resource; apply this change where the finalizer is added (the block
referencing kubeletConfigFinalizer and r.GuestClient.Patch) and ensure
updateConfigVersionAnnotation uses the refreshed object or runs on the next
reconcile.

---

Nitpick comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`:
- Around line 1174-1259: Tests are mutating nodeClass.Finalizers directly
instead of invoking the reconciler logic; replace those direct Patch
manipulations with calls into the reconciler helper that owns finalizer changes
(use KarpenterIgnitionReconciler.reconcileKubeletConfigMap or Reconcile() as
appropriate). Specifically, in the "finalizer present" removal test set
nodeClass.Spec.Kubelet = nil (if not already), create the reconciler r with
fakeManagementClient/fakeGuestClient, call r.reconcileKubeletConfigMap(ctx, hcp,
nodeClass) (or r.Reconcile(...) if that drives the same path), then Get the
nodeClass from fakeGuestClient and assert kubeletConfigFinalizer is gone;
similarly, for the addition test remove the manual fakeGuestClient.Patch that
appends the finalizer and instead let r.reconcileKubeletConfigMap drive
finalizer addition and then verify via Get. Ensure you reference
KarpenterIgnitionReconciler, reconcileKubeletConfigMap, Reconcile(), and
kubeletConfigFinalizer when locating code to change.
- Around line 1480-1509: Add a new subtest alongside the existing "When kubelet
config is set..." case that constructs an OpenshiftEC2NodeClass whose
Spec.Kubelet only sets overflow fields (e.g., raw map/json with
podPidsLimit:4096 or equivalent overflow-only field), call
r.createInMemoryNodePool(hcp, nodeClass, hcp.Spec.ReleaseImage), and assert the
NodePool still contains exactly one Config entry whose Name equals
karpenterutil.KarpenterNodeClassKubeletConfigName(testNodeClassName) (and other
existing expectations like
np.Name/Namespace/Labels/ClusterName/Replicas/Release.Image/Arch). This ensures
the IsZero/emptiness logic treats overflow-only kubelet configuration as
non-empty; reference the test variables and helpers used in the file (NewWithT,
testNodeClassName, createInMemoryNodePool,
karpenterutil.KarpenterNodeClassKubeletConfigName).

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 237-240: supportutil.DeleteIfNeeded returns (deleted bool, error)
but the code ignores the deleted bool and always logs "Deleted kubelet config
ConfigMap"; change the call in karpenterignition_controller.go to capture the
boolean (e.g., deleted, err := supportutil.DeleteIfNeeded(ctx,
r.ManagementClient, cm)), handle err as before, and use the deleted flag to log
a precise message: log.Info("Deleted kubelet config ConfigMap", "name",
configMapName) only when deleted is true and log.Info("Kubelet config ConfigMap
already absent", "name", configMapName) when deleted is false.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 0dfe082a-358a-4bf0-ae81-d51fc3503ca4

📥 Commits

Reviewing files that changed from the base of the PR and between 5c18545 and 985e242.

⛔ Files ignored due to path filters (9)
  • api/karpenter/v1beta1/zz_generated.deepcopy.go is excluded by !**/zz_generated*.go, !**/zz_generated*
  • client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.go is excluded by !client/**
  • client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.go is excluded by !client/**
  • client/applyconfiguration/utils.go is excluded by !client/**
  • karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yaml is excluded by !karpenter-operator/controllers/karpenter/assets/*.yaml
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config_karpenter.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*.go, !**/zz_generated*
📒 Files selected for processing (16)
  • api/.golangci.yml
  • api/karpenter/v1beta1/karpenter_types.go
  • api/karpenter/v1beta1/kubelet_config.go
  • api/karpenter/v1beta1/kubelet_config_karpenter.go
  • api/karpenter/v1beta1/kubelet_config_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/karpenter.go
  • karpenter-operator/controllers/karpenter/karpenter_controller.go
  • karpenter-operator/controllers/karpenter/karpenter_controller_test.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
  • karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
  • support/karpenter/karpenter.go
  • support/karpenter/karpenter_test.go
  • test/e2e/karpenter_kubelet_checker_pod.yaml
  • test/e2e/karpenter_test.go
✅ Files skipped from review due to trivial changes (1)
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • api/.golangci.yml
  • karpenter-operator/controllers/karpenter/karpenter_controller_test.go
  • api/karpenter/v1beta1/kubelet_config_karpenter.go
  • api/karpenter/v1beta1/kubelet_config_test.go
  • hypershift-operator/controllers/hostedcluster/karpenter.go

Comment on lines +47 to +72
// systemReserved is a set of ResourceName=ResourceQuantity pairs that describe
// resources reserved for non-kubernetes components.
// Currently only cpu, memory, ephemeral-storage, and pid are supported.
// +kubebuilder:validation:XValidation:message="valid keys for systemReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')"
// +kubebuilder:validation:XValidation:message="systemReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
// +kubebuilder:validation:MinProperties=1
// +optional
SystemReserved map[string]string `json:"systemReserved,omitempty"`
// kubeReserved is a set of ResourceName=ResourceQuantity pairs that describe
// resources reserved for kubernetes system components.
// Currently only cpu, memory, ephemeral-storage, and pid are supported.
// +kubebuilder:validation:XValidation:message="valid keys for kubeReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')"
// +kubebuilder:validation:XValidation:message="kubeReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
// +kubebuilder:validation:MinProperties=1
// +optional
KubeReserved map[string]string `json:"kubeReserved,omitempty"`
// evictionHard is a map of signal names to quantities that defines hard eviction thresholds.
// +kubebuilder:validation:XValidation:message="valid keys for evictionHard are ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available']",rule="self.all(x, x in ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available'])"
// +kubebuilder:validation:MinProperties=1
// +optional
EvictionHard map[string]string `json:"evictionHard,omitempty"`
// evictionSoft is a map of signal names to quantities that defines soft eviction thresholds.
// +kubebuilder:validation:XValidation:message="valid keys for evictionSoft are ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available']",rule="self.all(x, x in ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available'])"
// +kubebuilder:validation:MinProperties=1
// +optional
EvictionSoft map[string]string `json:"evictionSoft,omitempty"`
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject empty values in the string-backed kubelet maps.

systemReserved, kubeReserved, evictionHard, and evictionSoft currently accept "" values. That still serializes into kubelet config, so obviously invalid input is deferred until ignition/MCO instead of being rejected at admission.

Suggested validation tightening
-	// +kubebuilder:validation:XValidation:message="systemReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
+	// +kubebuilder:validation:XValidation:message="systemReserved value cannot be empty or a negative resource quantity",rule="self.all(x, self[x] != '' && !self[x].startsWith('-'))"
 	// +kubebuilder:validation:MinProperties=1
 	// +optional
 	SystemReserved map[string]string `json:"systemReserved,omitempty"`
@@
-	// +kubebuilder:validation:XValidation:message="kubeReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
+	// +kubebuilder:validation:XValidation:message="kubeReserved value cannot be empty or a negative resource quantity",rule="self.all(x, self[x] != '' && !self[x].startsWith('-'))"
 	// +kubebuilder:validation:MinProperties=1
 	// +optional
 	KubeReserved map[string]string `json:"kubeReserved,omitempty"`
@@
+	// +kubebuilder:validation:XValidation:message="evictionHard values cannot be empty",rule="self.all(x, self[x] != '')"
 	// +kubebuilder:validation:MinProperties=1
 	// +optional
 	EvictionHard map[string]string `json:"evictionHard,omitempty"`
@@
+	// +kubebuilder:validation:XValidation:message="evictionSoft values cannot be empty",rule="self.all(x, self[x] != '')"
 	// +kubebuilder:validation:MinProperties=1
 	// +optional
 	EvictionSoft map[string]string `json:"evictionSoft,omitempty"`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/karpenter/v1beta1/kubelet_config.go` around lines 47 - 72, Update the
kubebuilder XValidation annotations for SystemReserved, KubeReserved,
EvictionHard, and EvictionSoft to reject empty string values: add an additional
XValidation rule that asserts each map value is non-empty (e.g.
rule="self.all(x, self[x] != '')" with an appropriate message like "value for
<field> cannot be empty") so admission rejects entries with "" (in addition to
existing key and negative-value checks); locate the annotations on the
SystemReserved, KubeReserved, EvictionHard, and EvictionSoft fields and append
the new validation clause for each.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not be deviating from the upstream validation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkyros, that's a fair point. Looking at the upstream karpenter-provider-aws v1 API source, the validation rules in this PR's KubeletConfiguration already mirror upstream exactly:

  • SystemReserved upstream has only a key-allowlist rule and "systemReserved value cannot be a negative resource quantity" with self.all(x, !self[x].startsWith('-')) — no empty-string check.
  • KubeReserved follows the same pattern — also no empty-string check.
  • EvictionHard and EvictionSoft upstream have only a key-allowlist rule with no value validation at all.

So yes, adding self[x] != '' would be a deliberate tightening beyond what upstream Karpenter enforces. That's a reasonable concern to raise — it could mean that a KubeletConfiguration payload accepted by the upstream EC2NodeClass webhook could be rejected by the OpenshiftEC2NodeClass wrapper.

That said, there's a valid argument for the extra strictness: empty string values will silently pass through to ignition/MCO and fail later, whereas admission rejection gives a clearer error. Whether to deviate from upstream for that defensive reason is a judgment call for your team. I'm happy to withdraw the suggestion if you'd prefer to maintain strict parity with upstream.

Comment thread api/karpenter/v1beta1/kubelet_config.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)

237-240: Log message may be misleading when ConfigMap doesn't exist.

supportutil.DeleteIfNeeded returns whether the object was actually deleted. The log at line 240 unconditionally says "Deleted kubelet config ConfigMap" even if the ConfigMap didn't exist. Consider logging conditionally based on the return value.

📝 Proposed fix
-	if _, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm); err != nil {
+	deleted, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm)
+	if err != nil {
 		return ctrl.Result{}, fmt.Errorf("failed to delete kubelet config configmap %s: %w", configMapName, err)
 	}
-	log.Info("Deleted kubelet config ConfigMap", "name", configMapName)
+	if deleted {
+		log.Info("Deleted kubelet config ConfigMap", "name", configMapName)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`
around lines 237 - 240, The log is unconditional; change the call to
supportutil.DeleteIfNeeded in karpenterignition_controller.go to capture the
boolean result (e.g., deleted, err := supportutil.DeleteIfNeeded(...)) and only
call log.Info("Deleted kubelet config ConfigMap", "name", configMapName) when
deleted is true; if deleted is false consider emitting a different debug/info
message like "kubelet config ConfigMap not present, nothing to delete" or omit
logging—keep error handling the same by returning the wrapped fmt.Errorf on err.
api/karpenter/v1beta1/kubelet_config.go (1)

170-173: Typo in comment: "uzed" should be "used".

Line 171 contains a typo.

📝 Proposed fix
 // hasTypedFields reports whether any explicitly defined struct fields are set.
-// This is uzed by IsZero, but it separate so we can differentiate the Zero case vs
+// This is used by IsZero, but it's separate so we can differentiate the Zero case vs
 // the "only overflow fields set" case. This must be kept in sync with the typed fields
 // in KubeletConfiguration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/karpenter/v1beta1/kubelet_config.go` around lines 170 - 173, Fix the typo
in the comment for the hasTypedFields function: change "uzed" to "used" in the
comment block that describes hasTypedFields (which is referenced by IsZero and
kept in sync with KubeletConfiguration), ensuring the surrounding comment text
remains unchanged aside from the corrected word.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`:
- Around line 1174-1221: The test currently patches nodeClass.Finalizers
directly so assertions never verify controller behavior; instead invoke the
reconciler logic that adds the finalizer (do not mutate Finalizers manually).
Update the subtest to remove the manual Patch(...) that appends
kubeletConfigFinalizer and, after creating the ConfigMap, call the controller
path that is responsible for adding the finalizer (either invoke
r.reconcileKubeletConfigMap(ctx, hcp, nodeClass) again if that function adds the
finalizer, or call the public Reconcile method on KarpenterIgnitionReconciler
with a reconcile.Request for the nodeClass) and then fetch nodeClass from
fakeGuestClient and assert that kubeletConfigFinalizer is present; reference
symbols: kubeletConfigFinalizer, KarpenterIgnitionReconciler,
reconcileKubeletConfigMap, and Reconcile.
- Around line 1458-1509: Add a new subtest to both TestCreateInMemoryNodePool
and TestReconcileKubeletConfigMap that sets nodeClass.Spec.Kubelet to contain
only an "overflow-only" kubelet field (e.g., PodPidsLimit/PodPidsLimit: 4096) so
KarpenterKubeletConfiguration() will return nil but the controller should still
create the per-NodeClass kubelet ConfigMap and include its ref in the NodePool;
locate createInMemoryNodePool and assert np.Spec.Config has length 1 and
np.Spec.Config[0].Name equals
karpenterutil.KarpenterNodeClassKubeletConfigName(testNodeClassName) (and
analogous expectations used in the existing tests), and in
TestReconcileKubeletConfigMap assert the ConfigMap exists and contains the
overflow field so the controller path where IsZero() is false but
KarpenterKubeletConfiguration() returns nil is covered.

---

Nitpick comments:
In `@api/karpenter/v1beta1/kubelet_config.go`:
- Around line 170-173: Fix the typo in the comment for the hasTypedFields
function: change "uzed" to "used" in the comment block that describes
hasTypedFields (which is referenced by IsZero and kept in sync with
KubeletConfiguration), ensuring the surrounding comment text remains unchanged
aside from the corrected word.

In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 237-240: The log is unconditional; change the call to
supportutil.DeleteIfNeeded in karpenterignition_controller.go to capture the
boolean result (e.g., deleted, err := supportutil.DeleteIfNeeded(...)) and only
call log.Info("Deleted kubelet config ConfigMap", "name", configMapName) when
deleted is true; if deleted is false consider emitting a different debug/info
message like "kubelet config ConfigMap not present, nothing to delete" or omit
logging—keep error handling the same by returning the wrapped fmt.Errorf on err.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: deddf304-46f1-45c4-ae5b-7e60baf42362

📥 Commits

Reviewing files that changed from the base of the PR and between 985e242 and 4697b7f.

⛔ Files ignored due to path filters (9)
  • api/karpenter/v1beta1/zz_generated.deepcopy.go is excluded by !**/zz_generated*.go, !**/zz_generated*
  • client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.go is excluded by !client/**
  • client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.go is excluded by !client/**
  • client/applyconfiguration/utils.go is excluded by !client/**
  • karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yaml is excluded by !karpenter-operator/controllers/karpenter/assets/*.yaml
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubelet_config_karpenter.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*.go, !**/zz_generated*
📒 Files selected for processing (16)
  • api/.golangci.yml
  • api/karpenter/v1beta1/karpenter_types.go
  • api/karpenter/v1beta1/kubelet_config.go
  • api/karpenter/v1beta1/kubelet_config_karpenter.go
  • api/karpenter/v1beta1/kubelet_config_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • hypershift-operator/controllers/hostedcluster/karpenter.go
  • karpenter-operator/controllers/karpenter/karpenter_controller.go
  • karpenter-operator/controllers/karpenter/karpenter_controller_test.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go
  • karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
  • karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
  • support/karpenter/karpenter.go
  • support/karpenter/karpenter_test.go
  • test/e2e/karpenter_kubelet_checker_pod.yaml
  • test/e2e/karpenter_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • api/.golangci.yml
  • hypershift-operator/controllers/hostedcluster/karpenter.go

@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch from 4697b7f to 0b483c5 Compare April 24, 2026 23:45
@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 24, 2026

Added adjusted/test cases for CodeRabbt's satisfaction.
/test e2e-aws-autonode

@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch from 0b483c5 to 47cad9f Compare April 27, 2026 07:06
@jkyros
Copy link
Copy Markdown
Member Author

jkyros commented Apr 27, 2026

/test e2e-aws-autonode

@jkyros jkyros marked this pull request as ready for review April 27, 2026 15:36
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 27, 2026
@openshift-ci openshift-ci Bot requested review from clebs and enxebre April 27, 2026 15:37
@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch from 47cad9f to 908f27c Compare April 27, 2026 16:20
…eClass

Add KubeletConfiguration type with custom JSON marshal/unmarshal to support
both explicitly typed fields (maxPods, systemReserved, etc.) and arbitrary
overflow fields that pass through to the node's kubelet config via MachineConfig.

What we wanted to do was rawExtension with some CEL validation on some
of the fields, but the CEL validation can't see inside the rawExtension,
so instead this was the best we could do without a webhook.

The kubelet config is propagated through the karpenter-operator's ignition
controller and injected into node ignition via MachineConfig. The taint
ConfigMap creation is moved from the hypershift-operator to the
karpenter-operator for centralized management.

Signed-off-by: John Kyros <jkyros@redhat.com>
@jkyros jkyros force-pushed the autoscale-558-kubeletconfig-overflow branch from 908f27c to af5b611 Compare April 27, 2026 21:31
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 27, 2026

@jkyros: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

"testing"
"time"

awskarpenterv1 "github.com/aws/karpenter-provider-aws/pkg/apis/v1"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please move the whole util.go outside the api/ so we don't couple api importing with additional deps?
I also would like to point llms to this API as an example of good patterns and this is not one of them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done here #8355
Let's don't reintroduce external deps into the API package

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

// the upstream Karpenter KubeletConfiguration, mapping only fields that exist
// in both structs. Fields in our struct that are not present in Karpenter's
// struct (including overflow fields) are silently dropped.
func (spec OpenshiftEC2NodeClassSpec) KarpenterKubeletConfiguration() *awskarpenterv1.KubeletConfiguration {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented Apr 29, 2026

Test Failure Analysis Complete

Job Information

  • Prow Job: tide (merge gate)
  • Build ID: N/A (not a CI test job — this is the Prow merge controller)
  • PR: #8192AUTOSCALE-558: Expose KubeletConfig on OpenShiftEC2Nodeclass as structured fields + preserveunknown/overflow
  • Branch: autoscale-558-kubeletconfig-overflow
  • State: ERROR — merge conflict

Test Failure Analysis

Error

tide: Not mergeable. PR has a merge conflict.

Summary

No CI test jobs have failed — all presubmit jobs that ran (images, verify-deps, verify-workflows, security, okd-scos-images) passed successfully, and all E2E jobs remain pending because tide will not trigger them while the PR has a merge conflict. The sole error is from the tide merge controller, which reports the PR is unmergeable due to a git merge conflict introduced by PR #8355 ("NO-JIRA: Do not import external dependencies into API packages"), merged to main on April 28, 2026.

Root Cause

PR #8192 adds a new file api/karpenter/v1beta1/kubelet_config_karpenter.go that imports github.com/aws/karpenter-provider-aws/pkg/apis/v1 (aliased as awskarpenterv1). This file lives in the api/karpenter/v1beta1 package and provides a KarpenterKubeletConfiguration() conversion method.

On April 28, 2026, PR #8355 (by @JoelSpeed) was merged to main. That PR made two sweeping changes that directly conflict with PR #8192:

  1. Removed github.com/aws/karpenter-provider-aws from api/go.mod — The entire karpenter dependency was stripped from the API module because external dependencies (like Karpenter and AWS SDK) should not be in the API module, which is imported by downstream consumers.

  2. Deleted all api/vendor/github.com/aws/karpenter-provider-aws/ files — 13 vendored karpenter files were removed from the api vendor directory.

  3. Moved api/karpenter/v1beta1/util.gokarpenter-operator/controllers/nodeclass/karpenter_util.go — Karpenter utility functions that depend on the awskarpenterv1 package were relocated from the API module to the karpenter-operator module, which is allowed to have external dependencies.

PR #8192's new kubelet_config_karpenter.go file in api/karpenter/v1beta1/ directly violates the design principle enforced by PR #8355: the API module must not import external dependencies like Karpenter. The go.mod and vendor tree conflicts make the branch unmergeable.

Recommendations
  1. Rebase onto current main and resolve the merge conflicts.

  2. Move kubelet_config_karpenter.go out of the API module — The KarpenterKubeletConfiguration() method imports awskarpenterv1, which is no longer allowed in the api/ module (per PR NO-JIRA: Do not import external dependencies into API packages #8355). Move this file to karpenter-operator/controllers/nodeclass/ (alongside the existing karpenter_util.go that was relocated for the same reason).

  3. Update the conversion call site — The call in karpenter-operator/controllers/nodeclass/ec2_nodeclass_controller.go should reference the relocated function from its new package.

  4. Ensure api/go.mod does not re-add karpenter dependencies — After the move, run go mod tidy in the api/ module to verify no external deps are reintroduced.

  5. Coordinate with @JoelSpeed if architectural questions arise about what belongs in api/ vs. karpenter-operator/.

Evidence
Evidence Detail
Tide error message "Not mergeable. PR has a merge conflict."
PR mergeable state mergeable: CONFLICTING, mergeStateStatus: DIRTY, rebaseable: false
Conflicting PR #8355 "NO-JIRA: Do not import external dependencies into API packages" (merged 2026-04-28, 3933 files changed)
Conflict cause #1 PR #8355 removed github.com/aws/karpenter-provider-aws from api/go.mod; PR #8192 adds api/karpenter/v1beta1/kubelet_config_karpenter.go which imports it
Conflict cause #2 PR #8355 deleted all api/vendor/github.com/aws/karpenter-provider-aws/ files (13 files removed)
Conflict cause #3 Commit e46f3d3ef9 (part of PR #8355) moved api/karpenter/v1beta1/util.gokarpenter-operator/controllers/nodeclass/karpenter_util.go
CI test status All completed presubmit jobs passed (images, verify-deps, verify-workflows, security, okd-scos-images); E2E jobs remain pending (blocked by tide)
PR #8192 created 2026-04-09
PR #8355 merged 2026-04-28 (19 days after PR #8192 was opened)

Copy link
Copy Markdown
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a generated CRD schema for the karpenter types? I'm concerned that this might be creating a non-structural schema

// kubelet configures kubelet settings for nodes provisioned by this NodeClass.
// These settings are injected into the node's ignition configuration via MachineConfig.
// +optional
Kubelet KubeletConfiguration `json:"kubelet,omitempty,omitzero"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitempty doesn't do anything for a non pointer struct

Suggested change
Kubelet KubeletConfiguration `json:"kubelet,omitempty,omitzero"`
Kubelet KubeletConfiguration `json:"kubelet,omitzero"`

// The fields listed below are validated at admission time. Additional kubelet configuration
// fields beyond those listed here are also accepted and will be passed through to the node's
// kubelet configuration without validation.
// +kubebuilder:pruning:PreserveUnknownFields
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? This surprises me. What's the motivation?

// +kubebuilder:validation:XValidation:rule="!has(self.imageGCHighThresholdPercent) || !has(self.imageGCLowThresholdPercent) || self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent",message="imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent"
type KubeletConfiguration struct {
// maxPods is the maximum number of pods that can run on a node.
// +kubebuilder:validation:Minimum=0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would normally request a maximum here... What does openshift currently support?

// +kubebuilder:validation:XValidation:rule="!has(self.imageGCHighThresholdPercent) || !has(self.imageGCLowThresholdPercent) || self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent",message="imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent"
type KubeletConfiguration struct {
// maxPods is the maximum number of pods that can run on a node.
// +kubebuilder:validation:Minimum=0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you ever set this value to 0 as an end user?

// +kubebuilder:validation:Minimum=0
// +optional
MaxPods *int32 `json:"maxPods,omitempty"`
// podsPerCore is the maximum number of pods per core. Cannot exceed maxPods.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate this relationship with CEL?

(I see the coderabbit conversation, I don't see a reason why we shouldn't be stricter here if we know it's going to be bad config)

// +optional
MaxPods *int32 `json:"maxPods,omitempty"`
// podsPerCore is the maximum number of pods per core. Cannot exceed maxPods.
// +kubebuilder:validation:Minimum=0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would I set this as 0 as an end user?

Comment on lines +50 to +51
// +kubebuilder:validation:XValidation:message="valid keys for systemReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')"
// +kubebuilder:validation:XValidation:message="systemReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well these look fun

One of your rules appears to think x is a key, one thinks it's a value? I don't think you can validate the map keys in the way you have done here

I think you'd need

[has(self.cpu),has(self.memory),has(self.ephemeral-storage),has(self.pid)].filter(x, x == true).size() == self.size()

Comment on lines +58 to +59
// +kubebuilder:validation:XValidation:message="valid keys for kubeReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')"
// +kubebuilder:validation:XValidation:message="kubeReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem with the CEL rules here, these seem confused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/api Indicates the PR includes changes for the API area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants