Skip to content

OCPBUGS-87028: Fix KAS ReconcileServiceStatus to check LoadBalancer.Hostname before LB provisioning#8663

Open
sdminonne wants to merge 3 commits into
openshift:mainfrom
sdminonne:OCPBUGS-87028/cpo-kas-lb-hostname-check
Open

OCPBUGS-87028: Fix KAS ReconcileServiceStatus to check LoadBalancer.Hostname before LB provisioning#8663
sdminonne wants to merge 3 commits into
openshift:mainfrom
sdminonne:OCPBUGS-87028/cpo-kas-lb-hostname-check

Conversation

@sdminonne
Copy link
Copy Markdown
Contributor

@sdminonne sdminonne commented Jun 4, 2026

Summary

  • Add a RouteHost helper in support/netutil/route.go that resolves route hostnames consistently, checking RoutePublishingStrategy.Hostname first, then falling back to status.Conditions with the route.openshift.io/host key, and finally the legacy BaseDomain-based default.
  • Fix kas.ReconcileServiceStatus (and oauth.ReconcileServiceStatus) to check LoadBalancer.Hostname before calling CollectLBMessageIfNotProvisioned, so that clusters with a pre-configured LB hostname skip the cloud LB provisioning wait entirely.
  • Update infra.ReconcileInfrastructure to use the RouteHost helper for OAuth and Ignition route hostname resolution, ensuring consistent behavior across Route and LoadBalancer service types.

Test plan

  • Unit tests: kas/service_test.go, oauth/service_test.go, infra/infra_test.go, netutil/route_test.go
  • Verify make build && make test pass
  • E2e: existing break-glass credential tests should continue to pass on 4.23+ payloads

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved hostname resolution for API server, Konnectivity, and OAuth across publishing strategies (prefer explicit route/spec host, fallback to route ingress host; ignore router canonical hostname)
    • Ensured preconfigured LoadBalancer hostnames take immediate precedence and avoid incorrect shared-ingress routing when KAS uses LoadBalancer
  • Improvements

    • Infrastructure status logging now includes richer status details
  • Tests

    • Expanded test coverage for reconciliation and service hostname/port behaviors

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 4, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 4, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-87028, which is invalid:

  • expected the bug to target only the "5.0.0" version, but multiple target versions were set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • Add a RouteHost helper in support/netutil/route.go that resolves route hostnames consistently, checking RoutePublishingStrategy.Hostname first, then falling back to status.Conditions with the route.openshift.io/host key, and finally the legacy BaseDomain-based default.
  • Fix kas.ReconcileServiceStatus (and oauth.ReconcileServiceStatus) to check LoadBalancer.Hostname before calling CollectLBMessageIfNotProvisioned, so that clusters with a pre-configured LB hostname skip the cloud LB provisioning wait entirely.
  • Update infra.ReconcileInfrastructure to use the RouteHost helper for OAuth and Ignition route hostname resolution, ensuring consistent behavior across Route and LoadBalancer service types.

Test plan

  • Unit tests: kas/service_test.go, oauth/service_test.go, infra/infra_test.go, netutil/route_test.go
  • Verify make build && make test pass
  • E2e: existing break-glass credential tests should continue to pass on 4.23+ payloads

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a3bb936f-a552-430e-9b53-3e29c44e0704

📥 Commits

Reviewing files that changed from the base of the PR and between efc01ec and a164ff6.

📒 Files selected for processing (3)
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go

📝 Walkthrough

Walkthrough

Adds netutil.RouteHost(route) to compute an effective Route hostname (Spec.Host then Status.Ingress[0].Host), updates KAS and OAuth service reconciliation to prefer configured LoadBalancer hostnames and to use RouteHost for Route-based host resolution, tightens a shared-ingress special-case to Route strategies, expands unit tests for these behaviors, and adds two small structured-log improvements in the hosted control plane controller.

Sequence Diagram(s)

sequenceDiagram
  participant HCPCtrl as HostedControlPlaneController
  participant ServiceRecon as ReconcileServiceStatus (KAS/OAuth)
  participant Route as routev1.Route / Service.Status.LoadBalancer
  participant util as netutil.RouteHost
  participant Infra as InfrastructureStatus

  HCPCtrl->>ServiceRecon: trigger reconciliation
  ServiceRecon->>Route: read Route or Service status
  ServiceRecon->>util: RouteHost(route)
  util-->>ServiceRecon: hostname or ""
  ServiceRecon->>Infra: assemble host/port/message into InfrastructureStatus
  HCPCtrl->>Infra: log infraStatus (structured)
Loading

Suggested reviewers

  • cblecker
  • muraee
🚥 Pre-merge checks | ✅ 10 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning New test assertions lack meaningful failure messages. Unlike existing infra_test.go, new tests use plain Expect() without diagnostic messages to help diagnose failures. Add meaningful failure messages to Expect() calls in new tests, e.g., g.Expect(host).To(Equal(expected), "host mismatch").
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: fixing KAS ReconcileServiceStatus to check LoadBalancer.Hostname before LB provisioning, which is the main focus of the PR and directly supported by the changes across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in the PR are stable and deterministic. No tests contain dynamic content like UUIDs, timestamps, IPs, or generated identifiers. Test names use clear, static descriptions.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies service/infrastructure status reconciliation and route hostname resolution logic only; no scheduling constraints, affinity rules, or topology spreads introduced.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR adds only standard Go unit tests (func TestXxx(t *testing.T)), not Ginkgo e2e tests. The custom check applies only to Ginkgo e2e tests, so it is not applicable.
No-Weak-Crypto ✅ Passed No weak cryptography (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto, or non-constant-time secret comparisons detected. Comparisons involve only non-sensitive infrastructure data.
Container-Privileges ✅ Passed PR contains only Go source code changes to control plane operator logic; no Kubernetes manifests or container privilege settings are present or modified.
No-Sensitive-Data-In-Logs ✅ Passed The PR logs infrastructure status (hostnames, ports, and status messages) which are non-sensitive operational data. No passwords, tokens, API keys, PII, session IDs, or customer data are exposed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release label Jun 4, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdminonne
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Jun 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 78.78788% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.51%. Comparing base (6a6d8c1) to head (a164ff6).
⚠️ Report is 159 commits behind head on main.

Files with missing lines Patch % Lines
...ator/controllers/hostedcontrolplane/infra/infra.go 50.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8663      +/-   ##
==========================================
+ Coverage   40.68%   41.51%   +0.82%     
==========================================
  Files         755      756       +1     
  Lines       93368    93673     +305     
==========================================
+ Hits        37985    38884     +899     
+ Misses      52649    52067     -582     
+ Partials     2734     2722      -12     
Files with missing lines Coverage Δ
...ostedcontrolplane/hostedcontrolplane_controller.go 45.70% <100.00%> (+0.67%) ⬆️
...ator/controllers/hostedcontrolplane/kas/service.go 52.43% <100.00%> (+12.07%) ⬆️
...or/controllers/hostedcontrolplane/oauth/service.go 82.00% <100.00%> (+7.25%) ⬆️
support/netutil/route.go 71.07% <100.00%> (+2.89%) ⬆️
...ator/controllers/hostedcontrolplane/infra/infra.go 53.43% <50.00%> (+2.89%) ⬆️

... and 58 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.90% <100.00%> (+0.19%) ⬆️
cpo-hostedcontrolplane 43.89% <68.18%> (+2.08%) ⬆️
cpo-other 42.74% <ø> (+1.35%) ⬆️
hypershift-operator 51.57% <ø> (+0.75%) ⬆️
other 31.64% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sdminonne and others added 2 commits June 4, 2026 11:29
…ostname resolution

Add a RouteHost helper in support/netutil that resolves route hostnames
consistently across service types. The helper checks
RoutePublishingStrategy.Hostname first, then falls back to
status.Conditions with the route.openshift.io/host key, and finally
the legacy BaseDomain-based default.

Signed-off-by: Salvatore Dario Minonne <sminonne@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ovisioning in KAS service status

- Update kas.ReconcileServiceStatus and oauth.ReconcileServiceStatus to
  check LoadBalancer.Hostname before calling
  CollectLBMessageIfNotProvisioned, so clusters with a pre-configured LB
  hostname skip the cloud LB provisioning wait entirely.
- Update infra.ReconcileInfrastructure to use the RouteHost helper for
  OAuth and Ignition route hostname resolution, ensuring consistent
  behavior across Route and LoadBalancer service types.

Signed-off-by: Salvatore Dario Minonne <sminonne@redhat.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sdminonne sdminonne force-pushed the OCPBUGS-87028/cpo-kas-lb-hostname-check branch from b2f9af1 to efc01ec Compare June 4, 2026 09:40
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/infra/infra.go`:
- Around line 55-62: InfrastructureStatus.String() currently prints raw
hostnames and message (APIHost, KonnectivityHost, InternalHCPRouterHost,
ExternalHCPRouterHost, Message, etc.), which can leak sensitive topology/runtime
details; change the formatter to avoid emitting raw values by replacing host
fields with presence flags or redacted placeholders (e.g., "set" / "<redacted>")
and truncate or redact Message (or replace with a generic summary), keeping
boolean flags like NeedInternalRouter/NeedExternalRouter and ports if
non-sensitive; update the InfrastructureStatus.String() implementation to
reference those fields (APIHost, APIPort, KonnectivityHost, KonnectivityPort,
OAuthEnabled, OAuthHost, OAuthPort, NeedInternalRouter, InternalHCPRouterHost,
NeedExternalRouter, ExternalHCPRouterHost, Message) but only emit non-sensitive
indicators instead of the verbatim hostnames/messages.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5e35afe5-9ecf-4d62-9913-1b0a35a473e5

📥 Commits

Reviewing files that changed from the base of the PR and between b2f9af1 and efc01ec.

⛔ Files ignored due to path filters (2)
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_When_ARO_shared_ingress_is_enabled_with_KAS_LoadBalancer_strategy__it_should_use_LoadBalancer_host_not_shared_ingress.yaml is excluded by !**/testdata/**
  • control-plane-operator/controllers/hostedcontrolplane/infra/testdata/zz_fixture_TestReconcileInfrastructure_When_NonePlatform_cluster_has_KAS_LoadBalancer_with_configured_hostname__it_should_use_hostname_without_waiting_for_LB.yaml is excluded by !**/testdata/**
📒 Files selected for processing (9)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra.go
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra_test.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go
  • control-plane-operator/controllers/hostedcontrolplane/oauth/service.go
  • control-plane-operator/controllers/hostedcontrolplane/oauth/service_test.go
  • support/netutil/route.go
  • support/netutil/route_test.go
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
🚧 Files skipped from review as they are similar to previous changes (7)
  • control-plane-operator/controllers/hostedcontrolplane/oauth/service_test.go
  • support/netutil/route.go
  • control-plane-operator/controllers/hostedcontrolplane/oauth/service.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service.go
  • control-plane-operator/controllers/hostedcontrolplane/kas/service_test.go
  • support/netutil/route_test.go
  • control-plane-operator/controllers/hostedcontrolplane/infra/infra_test.go

Comment on lines +55 to +62
func (s InfrastructureStatus) String() string {
return fmt.Sprintf("apiHost=%s apiPort=%d konnectivityHost=%s konnectivityPort=%d oauthEnabled=%t oauthHost=%s oauthPort=%d needInternalRouter=%t internalRouterHost=%s needExternalRouter=%t externalRouterHost=%s message=%s",
s.APIHost, s.APIPort, s.KonnectivityHost, s.KonnectivityPort,
s.OAuthEnabled, s.OAuthHost, s.OAuthPort,
s.NeedInternalRouter, s.InternalHCPRouterHost,
s.NeedExternalRouter, s.ExternalHCPRouterHost,
s.Message)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid serializing hostnames/message verbatim in InfrastructureStatus.String().

Line 56 currently includes internal/external hostnames and raw message, which can leak sensitive topology/runtime details when this struct is logged. Prefer presence flags (or redacted values) instead of raw host/message content.

Proposed safer formatter
 func (s InfrastructureStatus) String() string {
-	return fmt.Sprintf("apiHost=%s apiPort=%d konnectivityHost=%s konnectivityPort=%d oauthEnabled=%t oauthHost=%s oauthPort=%d needInternalRouter=%t internalRouterHost=%s needExternalRouter=%t externalRouterHost=%s message=%s",
-		s.APIHost, s.APIPort, s.KonnectivityHost, s.KonnectivityPort,
-		s.OAuthEnabled, s.OAuthHost, s.OAuthPort,
-		s.NeedInternalRouter, s.InternalHCPRouterHost,
-		s.NeedExternalRouter, s.ExternalHCPRouterHost,
-		s.Message)
+	return fmt.Sprintf("apiPort=%d konnectivityPort=%d oauthEnabled=%t oauthPort=%d needInternalRouter=%t needExternalRouter=%t apiHostSet=%t konnectivityHostSet=%t oauthHostSet=%t internalRouterHostSet=%t externalRouterHostSet=%t hasMessage=%t",
+		s.APIPort, s.KonnectivityPort, s.OAuthEnabled, s.OAuthPort,
+		s.NeedInternalRouter, s.NeedExternalRouter,
+		s.APIHost != "", s.KonnectivityHost != "", s.OAuthHost != "",
+		s.InternalHCPRouterHost != "", s.ExternalHCPRouterHost != "",
+		s.Message != "")
 }

As per coding guidelines: "Flag logging that may expose passwords, tokens, API keys, PII ... internal hostnames, or customer data."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s InfrastructureStatus) String() string {
return fmt.Sprintf("apiHost=%s apiPort=%d konnectivityHost=%s konnectivityPort=%d oauthEnabled=%t oauthHost=%s oauthPort=%d needInternalRouter=%t internalRouterHost=%s needExternalRouter=%t externalRouterHost=%s message=%s",
s.APIHost, s.APIPort, s.KonnectivityHost, s.KonnectivityPort,
s.OAuthEnabled, s.OAuthHost, s.OAuthPort,
s.NeedInternalRouter, s.InternalHCPRouterHost,
s.NeedExternalRouter, s.ExternalHCPRouterHost,
s.Message)
}
func (s InfrastructureStatus) String() string {
return fmt.Sprintf("apiPort=%d konnectivityPort=%d oauthEnabled=%t oauthPort=%d needInternalRouter=%t needExternalRouter=%t apiHostSet=%t konnectivityHostSet=%t oauthHostSet=%t internalRouterHostSet=%t externalRouterHostSet=%t hasMessage=%t",
s.APIPort, s.KonnectivityPort, s.OAuthEnabled, s.OAuthPort,
s.NeedInternalRouter, s.NeedExternalRouter,
s.APIHost != "", s.KonnectivityHost != "", s.OAuthHost != "",
s.InternalHCPRouterHost != "", s.ExternalHCPRouterHost != "",
s.Message != "")
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@control-plane-operator/controllers/hostedcontrolplane/infra/infra.go` around
lines 55 - 62, InfrastructureStatus.String() currently prints raw hostnames and
message (APIHost, KonnectivityHost, InternalHCPRouterHost,
ExternalHCPRouterHost, Message, etc.), which can leak sensitive topology/runtime
details; change the formatter to avoid emitting raw values by replacing host
fields with presence flags or redacted placeholders (e.g., "set" / "<redacted>")
and truncate or redact Message (or replace with a generic summary), keeping
boolean flags like NeedInternalRouter/NeedExternalRouter and ports if
non-sensitive; update the InfrastructureStatus.String() implementation to
reference those fields (APIHost, APIPort, KonnectivityHost, KonnectivityPort,
OAuthEnabled, OAuthHost, OAuthPort, NeedInternalRouter, InternalHCPRouterHost,
NeedExternalRouter, ExternalHCPRouterHost, Message) but only emit non-sensitive
indicators instead of the verbatim hostnames/messages.

@sdminonne
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 4, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-87028, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sdminonne sdminonne marked this pull request as ready for review June 4, 2026 10:13
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 4, 2026
@openshift-ci openshift-ci Bot requested review from enxebre and sjenning June 4, 2026 10:14
@sdminonne
Copy link
Copy Markdown
Contributor Author

/pipeline-required

func ReconcileServiceStatus(svc *corev1.Service, strategy *hyperv1.ServicePublishingStrategy, apiServerPort int, messageCollector events.MessageCollector) (host string, port int32, message string, err error) {
switch strategy.Type {
case hyperv1.LoadBalancer:
if strategy.LoadBalancer != nil && strategy.LoadBalancer.Hostname != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think more carefully about the implications of this. Currently we block provisioning if the LoadBalancer has not provisioned. By returning the configured hostname, we're skipping that and proceeding with the control plane rollout. It's not a bad change, but it's definitely a change in behavior and we need to check whether it might negatively impact existing deployments.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it make sense. I've moved the LB-bypass out of ReconcileServiceStatus (now reverted to original logic) and only for NonePlatform in reconcileAPIServerServiceStatus. I think this should preserve all the other platform behaviors. At the end of the day this is only to test NonePlatform avoiding different behaviours between AWS and AKS (at least for the moment).

… only

Revert ReconcileServiceStatus to the original behavior where the LB
provisioning check always runs before returning a host. The LB-bypass
for a configured LoadBalancer.Hostname is now handled at the call site
in reconcileAPIServerServiceStatus, scoped to NonePlatform only.

On NonePlatform there is no cloud provider to provision a real load
balancer, so the configured hostname is authoritative and waiting for
LB ingress status would block indefinitely. For all other platforms
the existing behavior is preserved: the LB must be provisioned before
the control plane rollout proceeds.

This follows the same pattern used for shared ingress and IBMCloud
platform-specific early returns in the same function.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sdminonne
Copy link
Copy Markdown
Contributor Author

@csrwng PTAL
Commited on top for clarity. Happy to squash in case the new approach looks more viable.
TY!

@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-87028, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Summary

  • Add a RouteHost helper in support/netutil/route.go that resolves route hostnames consistently, checking RoutePublishingStrategy.Hostname first, then falling back to status.Conditions with the route.openshift.io/host key, and finally the legacy BaseDomain-based default.
  • Fix kas.ReconcileServiceStatus (and oauth.ReconcileServiceStatus) to check LoadBalancer.Hostname before calling CollectLBMessageIfNotProvisioned, so that clusters with a pre-configured LB hostname skip the cloud LB provisioning wait entirely.
  • Update infra.ReconcileInfrastructure to use the RouteHost helper for OAuth and Ignition route hostname resolution, ensuring consistent behavior across Route and LoadBalancer service types.

Test plan

  • Unit tests: kas/service_test.go, oauth/service_test.go, infra/infra_test.go, netutil/route_test.go
  • Verify make build && make test pass
  • E2e: existing break-glass credential tests should continue to pass on 4.23+ payloads

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

  • Improved hostname resolution for API server, Konnectivity, and OAuth across publishing strategies (prefer explicit route/spec host, fallback to route ingress host; ignore router canonical hostname)

  • Ensured preconfigured LoadBalancer hostnames take immediate precedence and avoid incorrect shared-ingress routing when KAS uses LoadBalancer

  • Improvements

  • Infrastructure status logging now includes richer status details

  • Tests

  • Expanded test coverage for reconciliation and service hostname/port behaviors

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

codecov/project: 40.40% (-0.28%) compared to 6a6d8c1

Summary

The codecov/project check failed because the aggregate project coverage dropped by 0.28% (40.68% → 40.40%). This is not caused by the PR's code changes, which actually have 68.18% patch coverage — well above the project baseline. The real cause is that two Codecov coverage flags (cmd-support and hypershift-operator) have expired carryforward data, causing 309 files and ~54,861 lines to be excluded from the aggregate coverage calculation. The Codecov report itself notes it is "159 commits behind head on main," confirming stale base data.

Root Cause

The codecov/project check compares the PR's aggregate coverage against the base branch and fails if coverage decreases. The -0.28% drop is a false positive caused by missing carryforward flag data, not by this PR's changes:

  1. Expired carryforward flags: The codecov.yml configures five coverage flags (cmd-support, cpo-hostedcontrolplane, cpo-other, hypershift-operator, other) with carryforward: true. Two of these flags — cmd-support and hypershift-operator — show ? in the PR report, meaning their coverage data was not uploaded in this CI run and the previously carried-forward data has expired.

  2. Massive file/line count drop: The base has 755 files (93,368 lines) but the PR report only counts 446 files (38,507 lines) — a loss of 309 files and 54,861 lines. This magnitude of change is structurally impossible from a PR that touches 4 files. It reflects the missing flag data removing entire file sets from the aggregate.

  3. Stale base report: Codecov notes the report is "159 commits behind head on main." This means the base commit (6a6d8c1) coverage snapshot is outdated. When combined with missing carryforward data on the PR side, the comparison becomes unreliable.

  4. The PR itself improves coverage: The 7 uncovered lines in infra.go (the String() method on InfrastructureStatus) represent only the infra.go patch at 50% coverage. All other changed files (kas/service.go, oauth/service.go, hostedcontrolplane_controller.go) have 100% patch coverage. The overall patch coverage of 68.18% exceeds the project's 40.68% baseline, meaning this PR, line-for-line, raises coverage.

Recommendations
  1. This check failure can be safely ignored — it is a Codecov infrastructure issue (expired carryforward data), not a code quality problem. The PR's patch coverage (68.18%) exceeds the project target (40.68%).

  2. Rebase onto latest main and re-push — this will refresh the Codecov base report (currently 159 commits behind) and may restore the carryforward flag data, resolving the false-positive drop.

  3. Optional: add a unit test for InfrastructureStatus.String() — the 7 uncovered lines are all in the new String() method on InfrastructureStatus. While this is a trivial debug/logging helper and doesn't affect the project-level failure, adding a single test would bring patch coverage to 100% and eliminate the minor infra.go coverage gap.

  4. Long-term: the openshift/hypershift repo should configure an explicit coverage.project.default.threshold in codecov.yml (e.g., threshold: 1%) to tolerate small fluctuations caused by carryforward expiry. Currently no threshold is set, so any drop triggers a failure.

Evidence
Evidence Detail
Check result codecov/project: FAILURE — "40.40% (-0.28%) compared to 6a6d8c1"
Patch coverage 68.18% (15 of 22 changed lines hit) — exceeds 40.68% project baseline
codecov/patch result SUCCESS — "68.18% of diff hit (target 40.68%)"
File count anomaly Base: 755 files → PR: 446 files (-309 files missing from aggregate)
Line count anomaly Base: 93,368 lines → PR: 38,507 lines (-54,861 lines dropped)
Missing flags cmd-support: ?, hypershift-operator: ? (carryforward expired)
Stale base Codecov: "Report is 159 commits behind head on main"
Uncovered lines 7 lines in infra.go — the InfrastructureStatus.String() method body
Files with 100% patch kas/service.go, oauth/service.go, hostedcontrolplane_controller.go

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

@sdminonne: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants