Skip to content

CNTRLPLANE-3308: deps: bump k8s.io 0.34 → 0.35 and openshift/api#8286

Open
muraee wants to merge 7 commits intoopenshift:mainfrom
muraee:bump-openshift-api
Open

CNTRLPLANE-3308: deps: bump k8s.io 0.34 → 0.35 and openshift/api#8286
muraee wants to merge 7 commits intoopenshift:mainfrom
muraee:bump-openshift-api

Conversation

@muraee
Copy link
Copy Markdown
Contributor

@muraee muraee commented Apr 20, 2026

Summary

  • Bump k8s.io/* from v0.34.3 to v0.35.1
  • Bump github.com/openshift/api to 3c6b218b (openshift/api#2786) to pick up the ObservedRevisionGeneration field on ClusterAPIStatus
  • Bump github.com/openshift/client-go to a19e917 (compatible with new API)
  • Bump karpenter forks to versions built against k8s 0.35

Code fixes for API changes

  • MustBaseEnvSet: removed bool param in k8s 0.35 (support/validations/authentication.go, control-plane-operator/.../auth.go)
  • ClusterImagePolicy moved from config/v1alpha1 to config/v1 (hypershift-operator/controllers/nodepool/config.go)
  • NodeSelectorRequirementWithMinValues no longer embeds corev1.NodeSelectorRequirement (test/e2e/karpenter_test.go)
  • Removed etcd/tests/v3 dependency to eliminate olekukonko/tablewriter v0.x/v1.x conflict (etcdctl uses v0.x API, karpenter requires v1.x)

Why

The k8s 0.35 bump is required by the latest openshift/api which adds the ObservedRevisionGeneration field. This field is needed by PR #7996 to properly wait for the Cluster CAPI Operator to acknowledge unmanaged CRDs during hypershift install.

Test plan

  • make build passes
  • make test passes (all unit tests)
  • go vet passes
  • make update succeeds

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Updated Go toolchain to a newer patch release (build images and tooling aligned).
    • Refreshed core and indirect dependencies across the Kubernetes and related ecosystems.
    • Adjusted code verification tooling configuration to refine files excluded from checks.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The Makefile verify-codespell target’s codespell --skip list was adjusted to add ./api/go.sum while retaining existing skips such as ./go.sum, ./hack/workspace/go.work.sum, and other prior patterns. api/go.mod updates the Go toolchain directive from go 1.25.3 to go 1.25.7 and upgrades multiple direct and indirect dependencies (notably Kubernetes/OpenShift-related modules, go-openapi-related modules, and various indirects), plus updated replace directives to newer OpenShift pseudo-versions. Dockerfile.github-actions-runner updates the GO_VERSION build ARG from 1.25.3 to 1.25.7.


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
Ote Binary Stdout Contract ❌ Error PR introduces fmt.Println() and fmt.Printf() calls in TestE2EV2() function that write directly to stdout before RunSpecs() is called, violating OTE Binary Stdout Contract. Replace fmt.Print*/Printf calls with fmt.Fprintf(os.Stderr, ...) or GinkgoWriter to redirect output from stdout to stderr.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main changes: bumping Kubernetes and OpenShift API dependencies.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Stable And Deterministic Test Names ✅ Passed PR modifies Makefile, Dockerfile, and api/go.mod without introducing Ginkgo tests with dynamic test names.
Test Structure And Quality ✅ Passed The PR does not introduce Ginkgo-style tests using Describe/Context/It blocks; it uses standard Go testing with t.Run() subtests and Gomega matchers. Since no Ginkgo test code is present, this check is not applicable.
Microshift Test Compatibility ✅ Passed This PR is exclusively a dependency update bump with associated code changes to accommodate API compatibility changes. No new Ginkgo e2e tests are being added to the codebase.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR contains only dependency updates and modifications to existing tests, not new Ginkgo e2e test additions.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only dependency upgrades and build configuration changes; no pod scheduling constraints incompatible with SNO/Two-Node/HyperShift topologies detected.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR; only existing test code is modified to handle API changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from devguyio and jparrill April 20, 2026 13:12
@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/testing Indicates the PR includes changes for e2e testing and removed do-not-merge/needs-area labels Apr 20, 2026
@muraee muraee changed the title deps: bump openshift/api for ClusterAPI ObservedRevisionGeneration deps: bump k8s.io 0.34 → 0.35 and openshift/api for ClusterAPI ObservedRevisionGeneration Apr 20, 2026
@muraee muraee changed the title deps: bump k8s.io 0.34 → 0.35 and openshift/api for ClusterAPI ObservedRevisionGeneration deps: bump k8s.io 0.34 → 0.35 and openshift/api Apr 20, 2026
@muraee muraee force-pushed the bump-openshift-api branch 3 times, most recently from 7c58b5f to ce5d649 Compare April 20, 2026 14:14
@muraee muraee changed the title deps: bump k8s.io 0.34 → 0.35 and openshift/api CNTRLPLANE-3308: deps: bump k8s.io 0.34 → 0.35 and openshift/api Apr 20, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 20, 2026

@muraee: This pull request references CNTRLPLANE-3308 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Bump k8s.io/* from v0.34.3 to v0.35.1
  • Bump github.com/openshift/api to 3c6b218b (openshift/api#2786) to pick up the ObservedRevisionGeneration field on ClusterAPIStatus
  • Bump github.com/openshift/client-go to a19e917 (compatible with new API)
  • Bump karpenter forks to versions built against k8s 0.35

Code fixes for API changes

  • MustBaseEnvSet: removed bool param in k8s 0.35 (support/validations/authentication.go, control-plane-operator/.../auth.go)
  • ClusterImagePolicy moved from config/v1alpha1 to config/v1 (hypershift-operator/controllers/nodepool/config.go)
  • NodeSelectorRequirementWithMinValues no longer embeds corev1.NodeSelectorRequirement (test/e2e/karpenter_test.go)
  • Removed etcd/tests/v3 dependency to eliminate olekukonko/tablewriter v0.x/v1.x conflict (etcdctl uses v0.x API, karpenter requires v1.x)

Why

The k8s 0.35 bump is required by the latest openshift/api which adds the ObservedRevisionGeneration field. This field is needed by PR #7996 to properly wait for the Cluster CAPI Operator to acknowledge unmanaged CRDs during hypershift install.

Test plan

  • make build passes
  • make test passes (all unit tests)
  • go vet passes
  • make update succeeds

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@muraee muraee force-pushed the bump-openshift-api branch from ce5d649 to e9aafae Compare April 20, 2026 14:26
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Apr 20, 2026

/test "ci/prow/security"

@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Apr 20, 2026

/retest

@muraee muraee force-pushed the bump-openshift-api branch 3 times, most recently from 06ac34a to 6fa1d7d Compare April 20, 2026 15:57
@muraee muraee force-pushed the bump-openshift-api branch from b6ec5d4 to d30af86 Compare April 27, 2026 17:38
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Apr 27, 2026

/test e2e-aws
/test e2e-aks

@openshift-ci openshift-ci Bot added the area/control-plane-pki-operator Indicates the PR includes changes for the control plane PKI operator - in an OCP release label Apr 27, 2026
@muraee muraee force-pushed the bump-openshift-api branch from d30af86 to d1642ae Compare April 28, 2026 07:54
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Apr 28, 2026

/test e2e-aws
/test e2e-aks

1 similar comment
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Apr 28, 2026

/test e2e-aws
/test e2e-aks

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-aws | Build: 2049034916259172352 | Cost: $2.15918875 | Failed step: hypershift-azure-run-e2e

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@muraee muraee force-pushed the bump-openshift-api branch from c4e6318 to bfb357f Compare April 28, 2026 12:10
@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Apr 28, 2026

/retest

@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Apr 28, 2026

/test e2e-aws
/test e2e-aks

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2026
muraee and others added 6 commits April 29, 2026 11:16
Bump github.com/openshift/api to 3c6b218b (openshift/api#2786) which
adds the ObservedRevisionGeneration field to ClusterAPIStatus.

The openshift/api bump requires k8s.io/* v0.35.1, which cascades into:
- Bump openshift/client-go to a19e917 (compatible with new API)
- Bump karpenter forks to versions built against k8s 0.35
Update vendor and generated files after k8s.io 0.34 → 0.35 and openshift/api bump
- Fix MustBaseEnvSet call signature change (removed bool param)
- Fix ClusterImagePolicy moved from config/v1alpha1 to config/v1
- Fix NodeSelectorRequirementWithMinValues struct change in karpenter
- Remove etcd/tests/v3 dependency to eliminate tablewriter v0.x/v1.x
  conflict (etcdctl uses v0.x API, karpenter requires v1.x)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bumps cli-runtime, kube-aggregator, kube-scheduler, kubectl, and
pod-security-admission from v0.34.2 to v0.35.1 to align with the
core k8s.io modules already bumped in this branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
client-go 0.35 enables WatchListClient by default, causing informers
to use sendInitialEvents=true in watch requests. The hosted cluster's
API server may not support this feature, causing the reflector to
retry indefinitely without falling back to LIST.

Disable the feature by default for all components:
- HyperShift operator: guard in main(), covers all subcommands
- CPO binary: guard in main(), covers all subcommands (ignition-server,
  etcd-defrag, konnectivity, token-minter, kas-bootstrap, etc.)
- HCCO: env var set in deployment manifest (always false)
- karpenter-operator: env var set in deployment manifest
- control-plane-pki-operator: guard in main()

The HO propagates its KUBE_FEATURE_WatchListClient env var to the CPO
deployment dynamically, so the value can be overridden at the HO level.
Components with the env var set in their deployment manifest (HCCO,
karpenter-operator) are not affected by the code guard, as it only
sets the value when the env var is not already present.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ARC runners share a persistent Go build cache. When go.mod bumps
the Go version (e.g. 1.25.3 → 1.25.7), stale cached objects compiled
with the old version cause "does not match go tool version" errors.

Include hashFiles('go.mod') in the cache key so the cache is
invalidated when the Go version or dependencies change. Also add
actions/setup-go to the test job to ensure the correct Go version
from go.mod is used instead of the runner's pre-installed version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the bump-openshift-api branch from bfb357f to ad4f544 Compare April 29, 2026 10:06
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the bump-openshift-api branch from 1b444bd to 08add78 Compare April 29, 2026 12:56
@bryan-cox
Copy link
Copy Markdown
Member

/lgtm

@bryan-cox
Copy link
Copy Markdown
Member

/verified by e2e & ut

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 29, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@bryan-cox: This PR has been marked as verified by e2e & ut.

Details

In response to this:

/verified by e2e & ut

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

@muraee: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aks-4-22 375a8c9 link true /test e2e-aks-4-22
ci/prow/e2e-azure-self-managed 375a8c9 link true /test e2e-azure-self-managed
ci/prow/e2e-kubevirt-aws-ovn-reduced 375a8c9 link true /test e2e-kubevirt-aws-ovn-reduced
ci/prow/e2e-aws-upgrade-hypershift-operator 375a8c9 link true /test e2e-aws-upgrade-hypershift-operator
ci/prow/e2e-aws-4-22 375a8c9 link true /test e2e-aws-4-22
ci/prow/e2e-v2-aws 375a8c9 link true /test e2e-v2-aws
ci/prow/okd-scos-images 08add78 link true /test okd-scos-images
ci/prow/images 08add78 link true /test images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented Apr 29, 2026

Now I have a complete picture. Let me produce the final report:

Test Failure Analysis Complete

Job Information

  • Prow Job: Envtest OCP API Validation (GitHub Actions)
  • Build ID: 25110153446
  • PR: #8286CNTRLPLANE-3308: deps: bump k8s.io 0.34 → 0.35 and openshift/api
  • Failed Jobs:
    • Envtest OCP (K8s 1.33.2) — job 73581779899
    • Envtest OCP (K8s 1.35.1) — job 73581779890
    • Conclusion — job 73583306249 (aggregate gate)

Test Failure Analysis

Error

[FAIL] CRD Installation [It] should install all CRDs for feature set "Default"
  test/envtest/generator.go:262

[FAILED] Timed out after 30.001s.
  CRD clusters.cluster.x-k8s.io should be fully removed          (K8s 1.35.1)
  CRD awsclustercontrolleridentities.infrastructure.cluster.x-k8s.io should be fully removed  (K8s 1.33.2)

588 Passed | 1 Failed | 0 Pending | 0 Skipped

Summary

The GenerateCRDInstallTest("Default") test installs all 69 HyperShift/CAPI CRDs, then uninstalls them and polls each CRD with a 30-second timeout to confirm deletion. The CRD deletion request succeeds (no error from envtest.UninstallCRDs), but the Kubernetes API server takes longer than 30 seconds to fully remove certain CAPI CRDs (the specific CRD varies non-deterministically between runs). This is a pre-existing flaky test that also fails on the main branch (2/30 = 6.7% flake rate), but the k8s.io v0.34→v0.35 dependency bump in this PR increases the flake rate to approximately 26% (6/23 runs fail). K8s 1.35.1 is the most affected version (fails in 5 of 6 failing runs), with K8s 1.33.2 and 1.34.1 also occasionally affected.

Root Cause

The test at test/envtest/generator.go:262 uses a 30-second timeout for CRD deletion verification:

Eventually(func() bool {
    err := k8sClient.Get(ctx, key, &apiextensionsv1.CustomResourceDefinition{})
    return apierrors.IsNotFound(err)
}, "30s", "1s").Should(BeTrue(), fmt.Sprintf("CRD %s should be fully removed", crd.Name))

After envtest.UninstallCRDs() sends deletion requests for all 69 CRDs, the envtest API server processes these deletions asynchronously. The CRD finalizer (customresourcecleanup.apiextensions.k8s.io) must complete before each CRD is fully removed. Under resource pressure (69 simultaneous CRD deletions on a single-node envtest server), some CRDs take 30-35 seconds to be garbage-collected, exceeding the timeout.

Why the PR worsens the flake rate (~4x increase):

  1. k8s.io/apiextensions-apiserver v0.34→v0.35 type changes: The client-side CRD type definitions in v0.35 include structural changes (e.g., ObservedRevisionGeneration field in ClusterAPIStatus from OCPCLOUD-3359: Add component names, manifestSubstitutions, and observedGeneration to CAPI revisions api#2786). When k8sClient.Get() deserializes the CRD from the envtest API server, the v0.35 client may take marginally longer to process the response, tightening the already-thin timing margin.

  2. CRD manifest changes: The Default feature set CRDs (hostedclusters, hostedcontrolplanes) grew slightly (+800 bytes each) from the openshift/api bump, and new TLSAdherence feature gate manifests were added (14K lines total). While TLSAdherence is not in the Default feature set, the code generation and loading adds I/O overhead to the test environment.

  3. Version skew: controller-runtime is pinned to v0.19.7 (designed for k8s.io v0.31) via a replace directive, but is now operating against k8s.io v0.35 client types — a 4-version skew (vs 3-version on main). This increases the likelihood of subtle serialization/deserialization timing differences.

Why it's non-deterministic: The CRD that times out varies across runs (clusters.cluster.x-k8s.io, clusterclasses.cluster.x-k8s.io, machinedrainrules.cluster.x-k8s.io, awsclustercontrolleridentities.infrastructure.cluster.x-k8s.io, clusterresourcesets.addons.cluster.x-k8s.io) because CRD finalization order depends on API server scheduling, which varies with GitHub Actions runner load.

Recommendations
  1. Increase CRD deletion timeout from 30s to 60s in test/envtest/generator.go:262. This is the simplest fix and addresses both the PR-introduced regression and the pre-existing main-branch flake. The deletion consistently completes within ~35 seconds, so 60s provides adequate headroom.

  2. Alternative: delete CRDs in smaller batches — instead of uninstalling all 69 CRDs simultaneously, batch them in groups of 10-15 to reduce API server contention during finalization.

  3. File a separate issue to track the underlying envtest CRD deletion flakiness on main (affects K8s 1.34.1 and 1.35.1 envtest versions).

  4. Re-run the workflow — since this is a flaky test (passes 3 out of 4 times on this PR), a retry will likely pass. However, the increased flake rate should be addressed.

Evidence
Evidence Detail
Test file test/envtest/generator.go:262 — 30s timeout on Eventually() CRD deletion check
K8s 1.35.1 failing CRD clusters.cluster.x-k8s.io (this run); varies across runs
K8s 1.33.2 failing CRD awsclustercontrolleridentities.infrastructure.cluster.x-k8s.io
Test duration at failure 35.477s (K8s 1.33.2), 30.001s timeout hit (K8s 1.35.1)
PR branch flake rate 6/23 runs fail (26.1%) — mostly K8s 1.35.1, occasionally 1.33.2 and 1.34.1
Main branch flake rate 2/30 runs fail (6.7%) — same generator.go:262 timeout, same CRD type (CAPI)
Main branch failing CRDs clusterresourcesets.addons.cluster.x-k8s.io (both main failures)
Passing PR runs 25108305374, 25102825490, 25051981101 — all 6 K8s versions pass
Dependency change k8s.io/apiextensions-apiserver v0.34.3 → v0.35.1; controller-runtime remains v0.19.7 (replace)
CRD count 69 CRDs installed/uninstalled simultaneously in the "Default" feature set test
New CRD manifests TLSAdherence feature gate: +7091 lines (hostedclusters) + 6935 lines (hostedcontrolplanes)
Conclusion job Pure aggregate gate — fails because envtest-ocp matrix has failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/control-plane-pki-operator Indicates the PR includes changes for the control plane PKI operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants