Skip to content

OCPBUGS-83604: fix(kubevirt): filter link-local addresses from EndpointSlice endpoints#8264

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
orenc1:fix_kubevirt_eps_lla
Apr 17, 2026
Merged

OCPBUGS-83604: fix(kubevirt): filter link-local addresses from EndpointSlice endpoints#8264
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
orenc1:fix_kubevirt_eps_lla

Conversation

@orenc1
Copy link
Copy Markdown
Contributor

@orenc1 orenc1 commented Apr 16, 2026

Summary

The HCCO machine controller fails to create EndpointSlices for KubeVirt guest LoadBalancer shadow services on Kubernetes 1.33+ clusters. This is because link-local addresses (169.254.0.0/16, fe80::/10) reported in Machine.Status.Addresses are now rejected by EndpointSlice validation.

Problem

KubeVirt VMs can report link-local addresses (e.g. 169.254.0.2, fe80::1) as InternalIP entries in Machine.Status.Addresses. The reconcileKubevirtPassthroughService function collected all InternalIP addresses without filtering and placed them into EndpointSlice endpoints.

Kubernetes 1.33 introduced validation that rejects link-local addresses in EndpointSlices, causing errors like:

EndpointSlice "default-ingress-passthrough-service-...-ipv4" is invalid:
  endpoints[0].addresses[1]: Invalid value: "169.254.0.2":
    may not be in the link-local range (169.254.0.0/16, fe80::/10)

Because the reconciler returns early on error, the failure on the ingress passthrough service also prevented processing of any subsequent KCCM shadow services.

Fix

Filter out link-local addresses using netip.Addr.IsLinkLocalUnicast() (which covers both IPv4 169.254.0.0/16 and IPv6 fe80::/10) before populating EndpointSlice endpoints. Filtered addresses are logged for observability.

Test plan

  • Added unit test: "When machines have link-local addresses it should filter them from EndpointSlice endpoints"
    • Machines carry a mix of routable (192.168.1.x, 2001:db8::x) and link-local (169.254.0.2, 169.254.169.254, fe80::1, fe80::dead:beef) addresses
    • Verifies only routable addresses appear in the resulting EndpointSlices
    • Covers both ingress passthrough and KCCM service EndpointSlices
  • All existing tests continue to pass

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/OCPBUGS-83604

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed KubeVirt passthrough to properly filter out link-local IP addresses when constructing EndpointSlice endpoints, ensuring only routable addresses are exposed.
  • Tests

    • Added test coverage validating correct endpoint filtering for machines with mixed link-local and standard IP addresses.

KubeVirt VMs can report link-local addresses (169.254.0.0/16,
fe80::/10) in Machine.Status.Addresses. Kubernetes 1.33 added
validation that rejects link-local addresses in EndpointSlices,
causing the HCCO machine reconciler to fail when creating
EndpointSlices for guest LoadBalancer shadow services. The early
return on error also prevented processing of subsequent KCCM
shadow services.

Filter out link-local addresses using netip.Addr.IsLinkLocalUnicast()
before populating EndpointSlice endpoints.

Signed-off-by: Oren Cohen <ocohen@redhat.com>
Assisted-by: Claude
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 16, 2026

📝 Walkthrough

Walkthrough

The changes modify the KubeVirt passthrough EndpointSlice reconciliation process in the machine controller to filter out link-local unicast IP addresses. When processing MachineInternalIP addresses during endpoint construction, the code now detects and skips link-local unicast addresses via IsLinkLocalUnicast() checks before adding them to address lists. A new test fixture and test case have been added to verify that machines containing both standard and link-local addresses produce endpoint slices containing only the non-link-local IPv4 and IPv6 addresses.

🚥 Pre-merge checks | ✅ 9 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Stable And Deterministic Test Names ✅ Passed Test file uses Go's standard testing framework rather than Ginkgo test declarations, so check is not applicable.
Test Structure And Quality ✅ Passed The test code demonstrates good Ginkgo testing practices with comprehensive test data, proper table-driven pattern integration, multiple assertions verifying correct EndpointSlices with only routable addresses, and seamless integration with existing test infrastructure.
Microshift Test Compatibility ✅ Passed The test changes added in this PR use Go's standard testing package, not Ginkgo e2e tests. No Ginkgo DSL functions are present.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The pull request does not add any new Ginkgo e2e tests; it contains only standard Go unit tests in machine_test.go using the testing package.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces no scheduling constraints or topology-related assumptions. Changes are purely IP address filtering logic in the HCCO machine controller to exclude link-local addresses from EndpointSlice endpoints, addressing a Kubernetes 1.33+ validation requirement.
Ote Binary Stdout Contract ✅ Passed This pull request modifies a Kubernetes controller reconciler, not process-level code or test suite setup. The logging statement occurs during normal controller operation, outside the scope of the OTE Binary Stdout Contract check.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This pull request does not add Ginkgo e2e tests. The test added is a standard Go unit test using table-driven patterns with fake Kubernetes clients and Gomega assertions, not Ginkgo-style tests.
Title check ✅ Passed The title clearly and accurately describes the main change: filtering link-local addresses from EndpointSlice endpoints in the KubeVirt context, directly matching the code modifications in machine.go and test additions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from Nirshal and sjenning April 16, 2026 15:36
@openshift-ci openshift-ci bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release and removed do-not-merge/needs-area labels Apr 16, 2026
@orenc1
Copy link
Copy Markdown
Contributor Author

orenc1 commented Apr 16, 2026

/cc @qinqon

@openshift-ci openshift-ci bot requested a review from qinqon April 16, 2026 15:37
@orenc1 orenc1 changed the title fix(kubevirt): filter link-local addresses from EndpointSlice endpoints OCPBUGS-83604: fix(kubevirt): filter link-local addresses from EndpointSlice endpoints Apr 16, 2026
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 16, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@orenc1: This pull request references Jira Issue OCPBUGS-83604, which is invalid:

  • expected the bug to target the "5.0.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

The HCCO machine controller fails to create EndpointSlices for KubeVirt guest LoadBalancer shadow services on Kubernetes 1.33+ clusters. This is because link-local addresses (169.254.0.0/16, fe80::/10) reported in Machine.Status.Addresses are now rejected by EndpointSlice validation.

Problem

KubeVirt VMs can report link-local addresses (e.g. 169.254.0.2, fe80::1) as InternalIP entries in Machine.Status.Addresses. The reconcileKubevirtPassthroughService function collected all InternalIP addresses without filtering and placed them into EndpointSlice endpoints.

Kubernetes 1.33 introduced validation that rejects link-local addresses in EndpointSlices, causing errors like:

EndpointSlice "default-ingress-passthrough-service-...-ipv4" is invalid:
 endpoints[0].addresses[1]: Invalid value: "169.254.0.2":
   may not be in the link-local range (169.254.0.0/16, fe80::/10)

Because the reconciler returns early on error, the failure on the ingress passthrough service also prevented processing of any subsequent KCCM shadow services.

Fix

Filter out link-local addresses using netip.Addr.IsLinkLocalUnicast() (which covers both IPv4 169.254.0.0/16 and IPv6 fe80::/10) before populating EndpointSlice endpoints. Filtered addresses are logged for observability.

Test plan

  • Added unit test: "When machines have link-local addresses it should filter them from EndpointSlice endpoints"
  • Machines carry a mix of routable (192.168.1.x, 2001:db8::x) and link-local (169.254.0.2, 169.254.169.254, fe80::1, fe80::dead:beef) addresses
  • Verifies only routable addresses appear in the resulting EndpointSlices
  • Covers both ingress passthrough and KCCM service EndpointSlices
  • All existing tests continue to pass

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/OCPBUGS-83604

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Bug Fixes

  • Fixed KubeVirt passthrough to properly filter out link-local IP addresses when constructing EndpointSlice endpoints, ensuring only routable addresses are exposed.

  • Tests

  • Added test coverage validating correct endpoint filtering for machines with mixed link-local and standard IP addresses.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@orenc1
Copy link
Copy Markdown
Contributor Author

orenc1 commented Apr 16, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 16, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@orenc1: This pull request references Jira Issue OCPBUGS-83604, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go (1)

130-130: Consider debug verbosity for per-address skip logs.

Line 130 logs every filtered address at Info; log.V(1).Info(...) would preserve observability with less default log noise in larger clusters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go`
at line 130, The per-address skip log currently uses log.Info which is too
noisy; change the call to use debug verbosity by replacing log.Info("Skipping
link-local address for EndpointSlice", "address", machineAddress.Address,
"machine", machine.Name) with log.V(1).Info(...) so the message is emitted at
verbosity level 1; update the single invocation located in machine.go where the
code logs the skipped link-local address (references: machineAddress.Address and
machine.Name) so it retains the same keys and values but uses log.V(1).Info for
lower default log noise.
control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine_test.go (1)

480-497: Optional: add an all-link-local-only machine scenario.

A case where machines have only link-local internal IPs would lock in expected behavior for empty endpoint lists and guard against future regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine_test.go`
around lines 480 - 497, Add a new test case in machine_test.go that covers
machines with only link-local internal IPs: create a machinesAllLinkLocal (or
similar) fixture with machines whose only InternalIPs are link-local, use the
existing pairOfVirtualMachines and services (defaultIngressService,
kccmService), and add a case named like "When machines have only link-local
addresses it should produce no EndpointSlice endpoints" that sets machines:
machinesAllLinkLocal, virtualMachines: pairOfVirtualMachines, services:
[]corev1.Service{defaultIngressService, kccmService}, expectedServices: the same
services, expectedIngressEndpointSlices: an empty []discoveryv1.EndpointSlice{},
and hcp: kubevirtHCP; ensure the new case follows the same structure as the
existing link-local test so the test harness exercises the empty-endpoints
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine_test.go`:
- Around line 480-497: Add a new test case in machine_test.go that covers
machines with only link-local internal IPs: create a machinesAllLinkLocal (or
similar) fixture with machines whose only InternalIPs are link-local, use the
existing pairOfVirtualMachines and services (defaultIngressService,
kccmService), and add a case named like "When machines have only link-local
addresses it should produce no EndpointSlice endpoints" that sets machines:
machinesAllLinkLocal, virtualMachines: pairOfVirtualMachines, services:
[]corev1.Service{defaultIngressService, kccmService}, expectedServices: the same
services, expectedIngressEndpointSlices: an empty []discoveryv1.EndpointSlice{},
and hcp: kubevirtHCP; ensure the new case follows the same structure as the
existing link-local test so the test harness exercises the empty-endpoints
behavior.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go`:
- Line 130: The per-address skip log currently uses log.Info which is too noisy;
change the call to use debug verbosity by replacing log.Info("Skipping
link-local address for EndpointSlice", "address", machineAddress.Address,
"machine", machine.Name) with log.V(1).Info(...) so the message is emitted at
verbosity level 1; update the single invocation located in machine.go where the
code logs the skipped link-local address (references: machineAddress.Address and
machine.Name) so it retains the same keys and values but uses log.V(1).Info for
lower default log noise.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: ed44e3cb-a064-4406-a97d-1708545b5cd3

📥 Commits

Reviewing files that changed from the base of the PR and between 846f2e9 and 272d8a3.

📒 Files selected for processing (2)
  • control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/machine/machine_test.go

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 35.62%. Comparing base (846f2e9) to head (272d8a3).
⚠️ Report is 18 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8264   +/-   ##
=======================================
  Coverage   35.61%   35.62%           
=======================================
  Files         767      767           
  Lines       93333    93336    +3     
=======================================
+ Hits        33245    33248    +3     
  Misses      57399    57399           
  Partials     2689     2689           
Files with missing lines Coverage Δ
...usterconfigoperator/controllers/machine/machine.go 67.16% <100.00%> (+0.49%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Apr 17, 2026

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Apr 17, 2026

/cc @bryan-cox
Can you approve ?

@openshift-ci openshift-ci bot requested a review from bryan-cox April 17, 2026 09:47
Copy link
Copy Markdown
Contributor

@jparrill jparrill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Apr 17, 2026

/verified by e2e

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2026
@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 17, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@qinqon: This PR has been marked as verified by e2e.

Details

In response to this:

/verified by e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Apr 17, 2026

/jira cherry-pick release-4.21

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Apr 17, 2026

/jira help

@hypershift-jira-solve-ci
Copy link
Copy Markdown

AI Test Failure Analysis

Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed | Build: 2045076546158333952 | Cost: $1.9520265000000006 | Failed step: hypershift-aws-run-e2e-nested

View full analysis report


Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6

@hypershift-jira-solve-ci
Copy link
Copy Markdown

All 7 failures cascade from just 2 leaf failures (EnsureNoCrashingPods in HC0 and HC2). Now I have all the evidence to produce the final report.

Test Failure Analysis Complete

Job Information

  • Prow Job: pull-ci-openshift-hypershift-main-e2e-azure-self-managed
  • Build ID: 2045076546158333952
  • Target: e2e-azure-self-managed
  • PR: openshift/hypershift#8264 — "OCPBUGS-83604: fix(kubevirt): filter link-local addresses from EndpointSlice endpoints"
  • Result: 252 tests — 232 passed, 13 skipped, 7 failures (2 leaf failures + 5 parent propagations)

Test Failure Analysis

Error

TestNodePool/HostedCluster0/ValidateHostedCluster/EnsureNoCrashingPods:
  util.go:817: Container packageserver in pod packageserver-7d9c59974d-97627 has a restartCount > 0 (1)

TestNodePool/HostedCluster2/ValidateHostedCluster/EnsureNoCrashingPods:
  util.go:817: Container packageserver in pod packageserver-59598cd884-8cpc2 has a restartCount > 0 (1)

Summary

The packageserver container in 2 of 8 hosted clusters (HostedCluster0 and HostedCluster2) experienced a single restart during initial bootstrapping due to a context deadline exceeded timeout while waiting for the kube-apiserver to become ready. The EnsureNoCrashingPods test detected restartCount > 0 (tolerance is 0 for non-KubeVirt Azure platforms) and failed. All 5 parent test nodes (ValidateHostedCluster, HostedCluster0/2, TestNodePool) propagated the failure. The remaining 6 hosted clusters in the same run passed EnsureNoCrashingPods with zero restarts. This is a transient bootstrapping race condition unrelated to the PR under test.

Root Cause

Transient packageserver startup timeout during hosted cluster bootstrapping (flaky, unrelated to PR)

The packageserver container timed out with context deadline exceeded during initial hosted cluster bootstrapping. The pod lifecycle timeline for HostedCluster0 shows:

  1. 10:32:06Z — Pod created and scheduled
  2. 10:32:25Z — Network interface added (eth0, 10.126.0.163/23)
  3. 10:32:27Z — Container images pulled
  4. 10:32:55Zavailability-prober init container started (probes kube-apiserver readiness)
  5. 10:39:40Zavailability-prober completed (~7 minutes of waiting for kube-apiserver)
  6. 10:40:05Zpackageserver container started (first attempt)
  7. ~10:42:07Zpackageserver container FAILED: context deadline exceeded — the packageserver timed out waiting for the kube-apiserver to be fully functional
  8. 10:43:32Zkas-readiness-check sidecar restarted
  9. 10:43:44Zkonnectivity-proxy-socks5 sidecar restarted
  10. 10:44:58Zpackageserver restarted successfully (restartCount=1, now Running/Ready)

The ~2 minute window between the packageserver starting (10:40:05Z) and failing (10:42:07Z) indicates the kube-apiserver was not fully ready despite the availability-prober having completed. After the restart, the kube-apiserver was fully available and the packageserver connected successfully.

Why this is NOT related to PR #8264:

  • The PR modifies only machine.go and machine_test.go in the KubeVirt passthrough service controller
  • These changes add filtering of IPv6 link-local addresses from EndpointSlice endpoints
  • This code path is KubeVirt-platform-specific — this is an Azure test
  • The packageserver bootstrapping lifecycle has zero interaction with EndpointSlice address filtering
  • 6 of 8 hosted clusters in this same job passed EnsureNoCrashingPods with zero restarts, confirming non-deterministic timing
Recommendations
  1. Retry the job — This is a transient flaky failure that should not block the PR. The packageserver recovered after a single restart and the cluster was fully functional.
  2. Consider adding packageserver to podCrashTolerations in test/e2e/util/util.go with a tolerance of 1, similar to other OLM components (olm-operator already has tolerance of 3). The packageserver is known to occasionally time out during bootstrapping when the kube-apiserver takes longer than expected to become ready.
  3. No code changes needed in PR OCPBUGS-83604: fix(kubevirt): filter link-local addresses from EndpointSlice endpoints #8264 — The PR's KubeVirt EndpointSlice filtering changes have no bearing on this Azure test failure.
Evidence
Evidence Detail
Failing test EnsureNoCrashingPods in HostedCluster0 and HostedCluster2 only
Crash reason context deadline exceeded — packageserver timed out waiting for kube-apiserver
Restart count 1 (tolerance is 0 on Azure platform)
Pod state at test time Running, Ready: true — fully recovered after restart
lastState Empty ({}) — previous container state garbage-collected, confirming transient failure
Affected clusters 2 of 8 hosted clusters (25%) — non-deterministic timing issue
Passing clusters TestCreateCluster, TestUpgradeControlPlane, TestAutoscaling, TestAzureOAuthLoadBalancer, TestAzurePrivateTopology, TestHAEtcdChaos — all passed EnsureNoCrashingPods
All other subtests EnsureNodeCountMatchesNodePoolReplicas, EnsureOAPIMountsTrustBundle, EnsureGuestWebhooksValidated, ValidateConfigurationStatus — all PASSED for the failing clusters
PR changed files machine.go (+4 lines), machine_test.go (+57 lines) — KubeVirt-only code path
Platform mismatch PR changes are KubeVirt-specific; this test runs on Azure
In-job analysis CI's own hypershift-analyze-e2e-failure step independently concluded: "Transient packageserver startup timeout (flaky test, unrelated to PR)"

@cwbotbot
Copy link
Copy Markdown

Test Results

e2e-aws

e2e-aks

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Apr 17, 2026

/retest-required

Copy link
Copy Markdown
Contributor

@Nirshal Nirshal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 17, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jparrill, Nirshal, orenc1, qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@qinqon
Copy link
Copy Markdown
Contributor

qinqon commented Apr 17, 2026

/jira backport release-4.22,release-4.21,release-4.20

@openshift-ci-robot
Copy link
Copy Markdown

@qinqon: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.22
/cherrypick release-4.21
/cherrypick release-4.20

Details

In response to this:

/jira backport release-4.22,release-4.21,release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@openshift-ci-robot: once the present PR merges, I will cherry-pick it on top of release-4.20, release-4.21, release-4.22 in new PRs and assign them to you.

Details

In response to this:

@qinqon: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.22
/cherrypick release-4.21
/cherrypick release-4.20

In response to this:

/jira backport release-4.22,release-4.21,release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 17, 2026

@orenc1: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 1d9c83c into openshift:main Apr 17, 2026
36 checks passed
@openshift-ci-robot
Copy link
Copy Markdown

@orenc1: Jira Issue Verification Checks: Jira Issue OCPBUGS-83604
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-83604 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Summary

The HCCO machine controller fails to create EndpointSlices for KubeVirt guest LoadBalancer shadow services on Kubernetes 1.33+ clusters. This is because link-local addresses (169.254.0.0/16, fe80::/10) reported in Machine.Status.Addresses are now rejected by EndpointSlice validation.

Problem

KubeVirt VMs can report link-local addresses (e.g. 169.254.0.2, fe80::1) as InternalIP entries in Machine.Status.Addresses. The reconcileKubevirtPassthroughService function collected all InternalIP addresses without filtering and placed them into EndpointSlice endpoints.

Kubernetes 1.33 introduced validation that rejects link-local addresses in EndpointSlices, causing errors like:

EndpointSlice "default-ingress-passthrough-service-...-ipv4" is invalid:
 endpoints[0].addresses[1]: Invalid value: "169.254.0.2":
   may not be in the link-local range (169.254.0.0/16, fe80::/10)

Because the reconciler returns early on error, the failure on the ingress passthrough service also prevented processing of any subsequent KCCM shadow services.

Fix

Filter out link-local addresses using netip.Addr.IsLinkLocalUnicast() (which covers both IPv4 169.254.0.0/16 and IPv6 fe80::/10) before populating EndpointSlice endpoints. Filtered addresses are logged for observability.

Test plan

  • Added unit test: "When machines have link-local addresses it should filter them from EndpointSlice endpoints"
  • Machines carry a mix of routable (192.168.1.x, 2001:db8::x) and link-local (169.254.0.2, 169.254.169.254, fe80::1, fe80::dead:beef) addresses
  • Verifies only routable addresses appear in the resulting EndpointSlices
  • Covers both ingress passthrough and KCCM service EndpointSlices
  • All existing tests continue to pass

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/OCPBUGS-83604

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Bug Fixes

  • Fixed KubeVirt passthrough to properly filter out link-local IP addresses when constructing EndpointSlice endpoints, ensuring only routable addresses are exposed.

  • Tests

  • Added test coverage validating correct endpoint filtering for machines with mixed link-local and standard IP addresses.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@openshift-ci-robot: new pull request created: #8270

Details

In response to this:

@qinqon: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.22
/cherrypick release-4.21
/cherrypick release-4.20

In response to this:

/jira backport release-4.22,release-4.21,release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@openshift-ci-robot: new pull request created: #8271

Details

In response to this:

@qinqon: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.22
/cherrypick release-4.21
/cherrypick release-4.20

In response to this:

/jira backport release-4.22,release-4.21,release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@openshift-ci-robot: new pull request created: #8272

Details

In response to this:

@qinqon: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.22
/cherrypick release-4.21
/cherrypick release-4.20

In response to this:

/jira backport release-4.22,release-4.21,release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants