Skip to content

AUTOSCALE-681: remove TechPreviewNoUpgrade gate from karpenter upgrade test#8498

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
maxcao13:remove-techpreview-gate-karpenter-upgrade
May 13, 2026
Merged

AUTOSCALE-681: remove TechPreviewNoUpgrade gate from karpenter upgrade test#8498
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
maxcao13:remove-techpreview-gate-karpenter-upgrade

Conversation

@maxcao13
Copy link
Copy Markdown
Member

@maxcao13 maxcao13 commented May 12, 2026

What this PR does / why we need it:

AutoNode/Karpenter is no longer behind TechPreviewNoUpgrade in 5.0, so the
TECH_PREVIEW_NO_UPGRADE=true env check in TestKarpenterUpgradeControlPlane
is stale. The test is unconditionally skipped in any CI job that doesn't set
this env var, meaning the upgrade test never runs in the standard periodic jobs.

This removes the gate so the test can run in non-techpreview CI configurations.

Also bumps the version requirement to 4.22 to match TestKarpenter.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Assisted-by: Cursor Agent

Made with Cursor

Summary by CodeRabbit

  • Tests
    • Increased the minimum e2e utility requirement for Karpenter control plane upgrade tests.
    • Removed a conditional skip, simplifying test execution flow.
    • Tests now explicitly target the AWS platform for upgrade validation.
    • Subtests use isolated copies of cluster fixtures to reduce shared-state flakiness.
    • Certain checks now re-fetch cluster state before asserting to improve stability.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 12, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 12, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@maxcao13: This pull request explicitly references no jira issue.

Details

In response to this:

What this PR does / why we need it:

AutoNode/Karpenter is no longer behind TechPreviewNoUpgrade in 5.0, so the
TECH_PREVIEW_NO_UPGRADE=true env check in TestKarpenterUpgradeControlPlane
is stale. The test is unconditionally skipped in any CI job that doesn't set
this env var, meaning the upgrade test never runs in the standard periodic jobs.

This removes the gate so the test can run in non-techpreview CI configurations.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

One-line removal. The os import is still used by os.ReadFile later in the
test, so no import cleanup needed.

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Assisted-by: Cursor Agent

Made with Cursor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

📝 Walkthrough

Walkthrough

The upgrade test TestKarpenterUpgradeControlPlane now requires e2eutil.Version422 (was Version419) and the prior TECH_PREVIEW_NO_UPGRADE skip was removed; the test then checks globalOpts.Platform == hyperv1.AWSPlatform. In test/e2e/karpenter_test.go, multiple AWS Karpenter subtests were changed to operate on a local deep-copy hc := hostedCluster.DeepCopy() (and in some cases re-fetched via mgtClient.Get()) for reading/updating fields, building HCP namespaces, deriving selector tags (e.g., InfraID), and filtering node readiness/disappearance waits by hc.Spec.Platform.Type.

Sequence Diagram(s)

Suggested reviewers

  • jparrill
  • enxebre
  • devguyio
🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Assertion messages are inconsistent. Lines 55 and 58 in karpenter_control_plane_upgrade_test.go lack meaningful failure messages, violating the assertion quality requirement (#4). Add descriptive messages to assertions on lines 55 and 58: use "failed to read pull secret file" and "failed to lookup release image" respectively to match the pattern established elsewhere in the test.
✅ Passed checks (10 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names use static string literals with no dynamic content. No pod names, timestamps, UUIDs, IPs, or namespace names embedded in test titles. Test names remain consistent across runs.
Microshift Test Compatibility ✅ Passed Custom check not applicable: PR modifies existing tests only, does not add new Ginkgo e2e tests. Check applies only when new tests are added.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR modifies existing tests without adding new Ginkgo test declarations. The check applies only to new test additions (It, Describe, etc.), not modifications to existing tests.
Topology-Aware Scheduling Compatibility ✅ Passed Changes only modify E2E test files (test/e2e/), not deployment manifests, operator code, or controllers. Custom check scope requires production code changes.
Ote Binary Stdout Contract ✅ Passed No OTE stdout contract violations found. No fmt.Print, klog, or stdout writes at process level. All I/O is in test functions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The custom check applies only to NEW Ginkgo e2e tests being added. This PR modifies existing tests using standard Go testing.T with t.Run() subtests, not Ginkgo BDD tests. The check is not applicable.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main change: removing the TechPreviewNoUpgrade gate from the karpenter upgrade test, which is directly supported by the changeset showing removal of conditional skip logic.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels May 12, 2026
AutoNode/Karpenter is no longer behind TechPreviewNoUpgrade in 5.0,
so the TECH_PREVIEW_NO_UPGRADE env check is stale and causes the
test to be skipped unconditionally in non-techpreview CI jobs.

Also bump the minimum version gate from 4.19 to 4.22 since this is
a 5.0 feature.

Co-authored-by: Cursor <cursoragent@cursor.com>
@maxcao13 maxcao13 force-pushed the remove-techpreview-gate-karpenter-upgrade branch from 944ed0d to 756fbc5 Compare May 12, 2026 22:10
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.00%. Comparing base (4341d0c) to head (d3cb204).
⚠️ Report is 7 commits behind head on main.

⚠️ Current head d3cb204 differs from pull request most recent head 756fbc5

Please upload reports for the commit 756fbc5 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8498      +/-   ##
==========================================
+ Coverage   39.95%   40.00%   +0.05%     
==========================================
  Files         751      751              
  Lines       92733    92838     +105     
==========================================
+ Hits        37048    37137      +89     
- Misses      52998    53014      +16     
  Partials     2687     2687              

see 21 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.09% <ø> (+0.01%) ⬆️
cpo-hostedcontrolplane 40.56% <ø> (+0.04%) ⬆️
cpo-other 40.14% <ø> (+0.05%) ⬆️
hypershift-operator 50.53% <ø> (+0.09%) ⬆️
other 31.54% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maxcao13 maxcao13 force-pushed the remove-techpreview-gate-karpenter-upgrade branch 2 times, most recently from d3cb204 to 756fbc5 Compare May 12, 2026 23:35
@maxcao13
Copy link
Copy Markdown
Member Author

/test e2e-aws

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2054344967605719040 | Cost: $3.9531792499999985 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@cwbotbot
Copy link
Copy Markdown

cwbotbot commented May 13, 2026

Test Results

e2e-aws

e2e-aks

@maxcao13
Copy link
Copy Markdown
Member Author

/test e2e-aws e2e-aws-4-22

@hypershift-jira-solve-ci
Copy link
Copy Markdown

I have all the evidence I need. Let me produce the final report.

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-e2e-aws
  • Build ID: 2054344967605719040
  • Target: e2e-aws
  • PR: #8498 — "NO-JIRA: fix(e2e): remove TechPreviewNoUpgrade gate from karpenter upgrade test"
  • Test Results: 556 total, 527 passed, 4 failed (2 tests × 2 subtests), 25 skipped

Test Failure Analysis

Error

AWSEndpointAvailable=False: AWSError(cannot list security groups: operation error EC2:
DescribeSecurityGroups, get identity: get credentials: failed to refresh cached credentials,
failed to retrieve jwt from provide source, unable to read file at
/var/run/secrets/openshift/serviceaccount/token: open
/var/run/secrets/openshift/serviceaccount/token: no such file or directory)

Summary

Both TestUpgradeControlPlane and TestKarpenterUpgradeControlPlane failed because their HostedClusters never became available within the 10-minute timeout. The root cause is a missing service account token file (/var/run/secrets/openshift/serviceaccount/token) in the hosted control plane pods, which prevented the AWS endpoint controller from authenticating to EC2. This caused a cascading failure: no AWS VPC endpoint → etcd peer DNS resolution failure → no quorum → kube-apiserver never created → capi-provider stuck in init → HostedCluster never available. This is a pre-existing infrastructure/platform issue unrelated to the PR, which only removes a test-gate (TECH_PREVIEW_NO_UPGRADE env check) and updates a minimum version constant. All 13 other tests (TestCreateCluster, TestKarpenter, TestNodePool, TestAutoscaling, etc.) passed successfully.

Root Cause

Primary cause: Missing service account token in hosted control plane namespace prevents AWS authentication.

The control-plane-operator pods in the HCP namespaces for both upgrade test clusters could not read the projected service account token at /var/run/secrets/openshift/serviceaccount/token. Without this token, the AWS SDK credential chain fails, and the AWS endpoint controller cannot call EC2:DescribeSecurityGroups to set up VPC networking.

Cascading failure chain:

  1. Missing SA tokenAWSEndpointAvailable=False (cannot authenticate to AWS)
  2. No AWS endpoint → VPC networking not established for hosted control plane
  3. No network → etcd peers cannot resolve each other via DNS (no such host for etcd-2.etcd-discovery...)
  4. No etcd quorumEtcdAvailable=False: EtcdWaitingForQuorum
  5. No etcd → kube-apiserver deployment never created (KubeAPIServerAvailable=False: NotFound)
  6. No kube-apiservercapi-provider init container (availability-prober) stuck in Pending
  7. No control planeAvailable=False: KubeconfigWaitingForCreate → test times out at 600s

This PR did NOT cause the failure. The diff modifies only test/e2e/karpenter_control_plane_upgrade_test.go (+1/-4 lines): it removes the TECH_PREVIEW_NO_UPGRADE env gate and updates the minimum version from Version419 to Version422. No operator, infrastructure, or cluster provisioning code is changed. TestUpgradeControlPlane (the other failing test) has no gate and was already running — its failure is entirely independent of this PR. TestKarpenterUpgradeControlPlane was previously skipped; this PR un-skips it, exposing it to the same pre-existing bootstrapping issue.

The fact that 13 other tests passed (including TestKarpenter which also creates HyperShift clusters on AWS) indicates this is likely a transient infrastructure condition specific to timing or resource pressure during the upgrade test cluster bootstrapping.

Recommendations
  1. Re-run the CI job — This is likely a transient infrastructure issue. The 13 passing tests confirm the management cluster and AWS integration are fundamentally healthy.
  2. This PR is safe to merge — The code change (removing a test gate) has no bearing on the failure. TestUpgradeControlPlane fails identically and was not modified by this PR.
  3. Investigate SA token projection for upgrade test clusters — The missing /var/run/secrets/openshift/serviceaccount/token may indicate a race condition or misconfiguration in how upgrade test HostedClusters configure projected service account volumes versus non-upgrade clusters (which all passed).
  4. Check if TestUpgradeControlPlane is a known flake — Since it fails independently of this PR with the same infrastructure error, it may warrant a separate bug or flake investigation.
Evidence
Evidence Detail
Failed tests TestUpgradeControlPlane/ValidateHostedCluster (600.00s timeout), TestKarpenterUpgradeControlPlane/ValidateHostedCluster (600.00s timeout)
Root error open /var/run/secrets/openshift/serviceaccount/token: no such file or directory
AWS condition AWSEndpointAvailable=False: AWSError(cannot list security groups: ...failed to retrieve jwt from provide source)
Etcd condition EtcdAvailable=False: EtcdWaitingForQuorum(Waiting for etcd to reach quorum)
KubeAPI condition KubeAPIServerAvailable=False: NotFound(Kube APIServer deployment not found)
Degraded condition Degraded=True: UnavailableReplicas(capi-provider deployment has 1 unavailable replicas)
Affected HostedCluster 1 e2e-clusters-k2tjp/control-plane-upgrade-dkw4s (TestUpgradeControlPlane)
Affected HostedCluster 2 e2e-clusters-c5fgm/karpenter-upgrade-control-plane-dwmqq (TestKarpenterUpgradeControlPlane)
PR change scope 1 file (test/e2e/karpenter_control_plane_upgrade_test.go), +1/-4 lines — removes env gate only
Passing tests 13 tests passed including TestCreateCluster, TestKarpenter, TestNodePool, TestAutoscaling, TestHAEtcdChaos, TestCreateClusterPrivate, TestCreateClusterProxy
Infrastructure status InfrastructureReady=True: AsExpected(All is well) — AWS infra itself was provisioned correctly
Test phase step e2e-aws-hypershift-aws-run-e2e-nested failed after 1h23m41s

@maxcao13 maxcao13 marked this pull request as ready for review May 13, 2026 15:41
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2026
@openshift-ci openshift-ci Bot requested review from clebs and enxebre May 13, 2026 15:41
@enxebre
Copy link
Copy Markdown
Member

enxebre commented May 13, 2026

/approve

@enxebre
Copy link
Copy Markdown
Member

enxebre commented May 13, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 13, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, maxcao13

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 13, 2026
@maxcao13
Copy link
Copy Markdown
Member Author

/test
/verified by @maxcao13

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@maxcao13: This PR has been marked as verified by @maxcao13.

Details

In response to this:

/test
/verified by @maxcao13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@maxcao13
Copy link
Copy Markdown
Member Author

/pipeline required

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws
/test e2e-v2-gke

@maxcao13
Copy link
Copy Markdown
Member Author

/verified by e2e-aws

@openshift-ci-robot
Copy link
Copy Markdown

@maxcao13: This PR has been marked as verified by e2e-aws.

Details

In response to this:

/verified by e2e-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@maxcao13 maxcao13 changed the title NO-JIRA: fix(e2e): remove TechPreviewNoUpgrade gate from karpenter upgrade test AUTOSCALE-681: remove TechPreviewNoUpgrade gate from karpenter upgrade test May 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 13, 2026

@maxcao13: This pull request references AUTOSCALE-681 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

AutoNode/Karpenter is no longer behind TechPreviewNoUpgrade in 5.0, so the
TECH_PREVIEW_NO_UPGRADE=true env check in TestKarpenterUpgradeControlPlane
is stale. The test is unconditionally skipped in any CI job that doesn't set
this env var, meaning the upgrade test never runs in the standard periodic jobs.

This removes the gate so the test can run in non-techpreview CI configurations.

Also bumps the version requirement to 4.22 to match TestKarpenter.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Assisted-by: Cursor Agent

Made with Cursor

Summary by CodeRabbit

  • Tests
  • Increased the minimum e2e utility requirement for Karpenter control plane upgrade tests.
  • Removed a conditional skip, simplifying test execution flow.
  • Tests now explicitly target the AWS platform for upgrade validation.
  • Subtests use isolated copies of cluster fixtures to reduce shared-state flakiness.
  • Certain checks now re-fetch cluster state before asserting to improve stability.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@maxcao13: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot Bot merged commit 60c460f into openshift:main May 13, 2026
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants