Skip to content

AUTOSCALE-692: remove default debug log level from karpenter container#8561

Merged
openshift-merge-bot[bot] merged 2 commits into
openshift:mainfrom
maxcao13:AUTOSCALE-692/karpenter-log-level-info
May 22, 2026
Merged

AUTOSCALE-692: remove default debug log level from karpenter container#8561
openshift-merge-bot[bot] merged 2 commits into
openshift:mainfrom
maxcao13:AUTOSCALE-692/karpenter-log-level-info

Conversation

@maxcao13
Copy link
Copy Markdown
Member

@maxcao13 maxcao13 commented May 20, 2026

What this PR does / why we need it:

Debug was a leftover from early development. It floods CloudWatch with unnecessary volume and increases customer costs.

Assisted-by: Claude Opus 4 (via Cursor)

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/AUTOSCALE-692

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Chores

    • Reduced Karpenter controller logging verbosity from debug to info to lower log noise while retaining important operational messages.
  • Tests

    • Simplified end-to-end Karpenter test flow by removing log-based version checks; test now focuses on CRD validation and core plumbing checks.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 20, 2026

@maxcao13: This pull request references AUTOSCALE-692 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Debug was a leftover from early development. It floods CloudWatch with unnecessary volume and increases customer costs.

Assisted-by: Claude Opus 4 (via Cursor)

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/AUTOSCALE-692

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8140a720-cdd5-426c-8288-9aa860c162f3

📥 Commits

Reviewing files that changed from the base of the PR and between d62f07d and 56123a8.

📒 Files selected for processing (1)
  • test/e2e/karpenter_test.go
💤 Files with no reviewable changes (1)
  • test/e2e/karpenter_test.go

📝 Walkthrough

Walkthrough

This PR lowers Karpenter controller logging by changing the Deployment container arg --log-level from debug to info, and it removes Karpenter pod log-scanning from the E2E test: the test no longer creates a kube client to list pods or scan logs for a version string, and the checkKarpenterVersionInLogs helper and its related imports were deleted.

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning testKarpenterPlumbing violates single responsibility by testing 8+ unrelated behaviors (metrics, vCPUs, CRDs, NodeClass validation, deletion/modification protection, conditions) in one test block. Split testKarpenterPlumbing into focused single-responsibility tests, e.g., testKarpenterMetrics, testKarpenterCRDs, testNodeClassProtection, testAutoNodeCondition, following patterns established in testARM64Provisioning.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the main change: switching Karpenter's log level from debug to info, which aligns with both the deployment YAML modification and the test update to remove version log verification.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies test/e2e/karpenter_test.go which uses Go stdlib testing (t.Run), not Ginkgo. The check for stable Ginkgo test names is not applicable here.
Microshift Test Compatibility ✅ Passed PR removes existing tests rather than adding new ones. Revert deletes checkKarpenterVersionInLogs function and 67 lines of test code. Check only applies when new tests are added.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR does not add new Ginkgo e2e tests. test/e2e/karpenter_test.go uses standard Go testing with t.Run(), not Ginkgo patterns. Custom check applies only to new Ginkgo tests.
Topology-Aware Scheduling Compatibility ✅ Passed No topology-aware scheduling constraints introduced. Changes only adjust Karpenter log levels and update tests. Deployment has no affinity, toleration, nodeSelector, or topology constraints.
Ote Binary Stdout Contract ✅ Passed Removed code was pod log scanning inside test cases, not process-level code. Test suite uses zap logging configured to stderr with no problematic stdout writes in TestMain.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR removes test code and changes a YAML deployment file; no new Ginkgo e2e tests are added. Check only applies when new Ginkgo tests (It/Describe/Context/When) are added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from Nirshal and jparrill May 20, 2026 23:17
@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release and removed do-not-merge/needs-area labels May 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.40%. Comparing base (d86f3d4) to head (56123a8).
⚠️ Report is 17 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8561      +/-   ##
==========================================
+ Coverage   40.34%   40.40%   +0.06%     
==========================================
  Files         755      755              
  Lines       93167    93235      +68     
==========================================
+ Hits        37587    37675      +88     
+ Misses      52877    52858      -19     
+ Partials     2703     2702       -1     

see 3 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.44% <ø> (+0.13%) ⬆️
cpo-hostedcontrolplane 41.76% <ø> (ø)
cpo-other 40.31% <ø> (+0.17%) ⬆️
hypershift-operator 50.72% <ø> (ø)
other 31.54% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

image: aws-karpenter-provider-aws # replaced by payload
args:
- "--log-level=debug"
- "--log-level=info"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just omit the flag and let Karpenter use its default?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me, 👍

The deployment asset hardcoded --log-level=debug, a leftover from
early development. This floods CloudWatch and increases customer
costs. Remove the flag entirely and let Karpenter use its default
(info).

Signed-off-by: Max Cao <macao@redhat.com>
Assisted-by: Claude Opus 4 (via Cursor)
Co-authored-by: Cursor <cursoragent@cursor.com>
@maxcao13 maxcao13 force-pushed the AUTOSCALE-692/karpenter-log-level-info branch from bcd2b12 to d62f07d Compare May 21, 2026 00:29
@maxcao13 maxcao13 changed the title AUTOSCALE-692: change default log level from debug to info AUTOSCALE-692: remove default log level from karpenter container May 21, 2026
@maxcao13 maxcao13 changed the title AUTOSCALE-692: remove default log level from karpenter container AUTOSCALE-692: remove default debug log level from karpenter container May 21, 2026
@joshbranham
Copy link
Copy Markdown
Contributor

Can we check that it looks good in a prow run (ie logs flow as info)?

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 21, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented May 21, 2026

Test Results

e2e-aws

e2e-aks

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2057259763279859712 | Cost: $2.60012325 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@maxcao13
Copy link
Copy Markdown
Member Author

maxcao13 commented May 21, 2026

Seems like it's working: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_hypershift/8561/pull-ci-openshift-hypershift-main-e2e-aws/2057259763279859712/artifacts/e2e-aws/hypershift-aws-run-e2e-nested/artifacts/TestKarpenter/namespaces/e2e-clusters-rcqtn-karpenter-gjg6q/core/pods/logs/karpenter-7d57544986-2gt8q-karpenter.log

But it also seems like it removes the version log. Not sure how we feel about that. Is that okay from SRE standpoint?

So either we ignore it, and remove the test, or we need to change our karpenter fork code and backport.

@joshbranham
Copy link
Copy Markdown
Contributor

Seems like it's working: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_hypershift/8561/pull-ci-openshift-hypershift-main-e2e-aws/2057259763279859712/artifacts/e2e-aws/hypershift-aws-run-e2e-nested/artifacts/TestKarpenter/namespaces/e2e-clusters-rcqtn-karpenter-gjg6q/core/pods/logs/karpenter-7d57544986-2gt8q-karpenter.log

But it also seems like it removes the version log. Not sure how we feel about that. Is that okay from SRE standpoint?

So either we ignore it, and remove the test, or we need to change our karpenter fork code and backport.

I'm not sure why we would care about the karpenter version in the logs, or why we have a test for it, so all good from my perspective.

@enxebre
Copy link
Copy Markdown
Member

enxebre commented May 21, 2026

this might be legit https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_hypershift/8561/pull-ci-openshift-hypershift-main-e2e-aws/2057259763279859712
fwiw I'm fine dropping that e2e entirely

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, maxcao13

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Reverts openshift@a58ab5e.
We need to do this since we switched default log level of karpenter to INFO, which removes version log.
We will move this verification to openshift/autoscale-tests.

Signed-off-by: Max Cao <macao@redhat.com>
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 21, 2026
@maxcao13
Copy link
Copy Markdown
Member Author

/test e2e-aws-autonode

@openshift-ci openshift-ci Bot added the area/testing Indicates the PR includes changes for e2e testing label May 21, 2026
@joshbranham
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 21, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2057488157519122432 | Cost: $3.0523737500000006 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@maxcao13
Copy link
Copy Markdown
Member Author

/retest-required

@hypershift-jira-solve-ci
Copy link
Copy Markdown

That background download also completed successfully (the e2e nested step build log for the e2e-aws-4-22 job). I had already gathered all the necessary evidence from the build logs and JUnit XML files to complete the analysis. The full report above covers both job failures comprehensively.


@maxcao13
Copy link
Copy Markdown
Member Author

/test e2e-aws

@maxcao13
Copy link
Copy Markdown
Member Author

/verified by e2e-aws

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 22, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@maxcao13: This PR has been marked as verified by e2e-aws.

Details

In response to this:

/verified by e2e-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 22, 2026

@maxcao13: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit d24af10 into openshift:main May 22, 2026
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants