OCPBUGS-77856: fix: use NodePort for HCP router Service on non-cloud platforms#8439
OCPBUGS-77856: fix: use NodePort for HCP router Service on non-cloud platforms#8439vsolanki12 wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@vsolanki12: This pull request references Jira Issue OCPBUGS-77856, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
📝 WalkthroughWalkthroughThe pull request introduces platform-aware service type selection for the router service in the hosted control plane. When reconciling the router service, the service type is now conditionally assigned: NodePort is used for non-cloud platforms (AgentPlatform, KubevirtPlatform, OpenStackPlatform, NonePlatform), while LoadBalancer is used for cloud platforms. Corresponding changes were made to handle NodePort services in status reconciliation, where the ClusterIP is returned as the host instead of collecting load balancer information. Additional test coverage was added to verify service type behavior across platform categories. 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vsolanki12 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
On non-cloud platforms (Agent, KubeVirt, OpenStack, None), the HCP router Service was unconditionally created as LoadBalancer. Without a cloud LB controller, the Service stays pending forever, blocking InfrastructureReady. - ReconcileRouterService: use NodePort for non-cloud platforms - reconcileRouterServiceStatus: guard against NodePort before calling CollectLBMessageIfNotProvisioned - kas.ReconcileServiceStatus: add NodePort guard in Route case Signed-off-by: Vimal Solanki <vsolanki@redhat.com> Closes: OCPBUGS-77856
610f561 to
ccc8bcb
Compare
|
@vsolanki12: This pull request references Jira Issue OCPBUGS-77856, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8439 +/- ##
=======================================
Coverage 37.44% 37.44%
=======================================
Files 751 751
Lines 91969 91984 +15
=======================================
+ Hits 34435 34443 +8
- Misses 54894 54900 +6
- Partials 2640 2641 +1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/kas/service.go`:
- Around line 170-173: The NodePort branch dereferences strategy.Route without
checking for nil, which can panic; update the NodePort handling in service.go so
you first check if strategy.Route is non-nil (and optionally that
strategy.Route.Hostname is non-empty) before reading strategy.Route.Hostname and
assigning host/port—e.g. wrap the host = strategy.Route.Hostname and port = 443
assignments inside an if strategy.Route != nil { ... } block (keeping the
existing svc.Spec.Ports and NodePort checks intact) so no nil dereference
occurs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 34176f5e-a42c-4fab-9813-0c0e14dc16ec
📒 Files selected for processing (4)
control-plane-operator/controllers/hostedcontrolplane/infra/infra.gocontrol-plane-operator/controllers/hostedcontrolplane/ingress/router.gocontrol-plane-operator/controllers/hostedcontrolplane/ingress/router_test.gocontrol-plane-operator/controllers/hostedcontrolplane/kas/service.go
| if svc.Spec.Type == corev1.ServiceTypeNodePort { | ||
| if len(svc.Spec.Ports) > 0 && svc.Spec.Ports[0].NodePort != 0 { | ||
| host = strategy.Route.Hostname | ||
| port = 443 |
There was a problem hiding this comment.
Guard against nil strategy.Route before dereference.
Line 172 can panic when strategy.Route is nil (strategy.Route.Hostname). Please add a nil check before reading hostname in this new NodePort Route branch.
Suggested fix
case hyperv1.Route:
if svc.Spec.Type == corev1.ServiceTypeNodePort {
if len(svc.Spec.Ports) > 0 && svc.Spec.Ports[0].NodePort != 0 {
- host = strategy.Route.Hostname
- port = 443
+ if strategy.Route != nil && strategy.Route.Hostname != "" {
+ host = strategy.Route.Hostname
+ port = 443
+ }
}
return
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if svc.Spec.Type == corev1.ServiceTypeNodePort { | |
| if len(svc.Spec.Ports) > 0 && svc.Spec.Ports[0].NodePort != 0 { | |
| host = strategy.Route.Hostname | |
| port = 443 | |
| if svc.Spec.Type == corev1.ServiceTypeNodePort { | |
| if len(svc.Spec.Ports) > 0 && svc.Spec.Ports[0].NodePort != 0 { | |
| if strategy.Route != nil && strategy.Route.Hostname != "" { | |
| host = strategy.Route.Hostname | |
| port = 443 | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@control-plane-operator/controllers/hostedcontrolplane/kas/service.go` around
lines 170 - 173, The NodePort branch dereferences strategy.Route without
checking for nil, which can panic; update the NodePort handling in service.go so
you first check if strategy.Route is non-nil (and optionally that
strategy.Route.Hostname is non-empty) before reading strategy.Route.Hostname and
assigning host/port—e.g. wrap the host = strategy.Route.Hostname and port = 443
assignments inside an if strategy.Route != nil { ... } block (keeping the
existing svc.Spec.Ports and NodePort checks intact) so no nil dereference
occurs.
There was a problem hiding this comment.
How about the other factors, example Line 180 (host = strategy.Route.Hostname) is the existing code — it also accesses strategy.Route.Hostname without a nil check. If we add a nil guard on line 172 but the existing line 180 doesn't have one, we'd be inconsistent, and the existing code would still panic in that scenario.
In practice, when strategy.Type == hyperv1.Route, strategy.Route should always be populated — that's an API contract. Adding a nil check only in our new block while the original code doesn't have one doesn't add real safety.
what do you think in this prospective as well.
There was a problem hiding this comment.
@vsolanki12, that's a fair and thoughtful point. You're right that adding a guard at line 172 while line 180 remains unchecked creates internal inconsistency within ReconcileServiceStatus, and the API contract argument has real merit — callers are generally expected to populate strategy.Route when strategy.Type == Route.
That said, I'd note there are two clean ways to resolve this:
Option A — Accept the API contract, add no guard (your argument):
Keep both lines 172 and 180 without nil checks, relying on the contract that strategy.Route is always non-nil when strategy.Type == Route. This is internally consistent and the simpler approach.
Option B — Harden both, aligning with the pattern used elsewhere:
Looking at ReconcileKonnectivityServerServiceStatus in the same file, that function does include a nil check:
if strategy.Route != nil && strategy.Route.Hostname != "" {
host = strategy.Route.Hostname
port = 443
return
}So the codebase already has precedent for defensive nil checks on strategy.Route in service status functions. The most consistent approach across both functions would be to harden line 180 as well, not just line 172:
case hyperv1.Route:
if svc.Spec.Type == corev1.ServiceTypeNodePort {
if len(svc.Spec.Ports) > 0 && svc.Spec.Ports[0].NodePort != 0 {
if strategy.Route != nil && strategy.Route.Hostname != "" {
host = strategy.Route.Hostname
port = 443
}
}
return
}
if message, err := k8sutil.CollectLBMessageIfNotProvisioned(svc, messageCollector); err != nil || message != "" {
return host, port, message, err
}
if strategy.Route != nil && strategy.Route.Hostname != "" { // also guard existing line 180
host = strategy.Route.Hostname
}
port = 443Either option is defensible — Option A is fine if you're confident in caller discipline, but Option B brings ReconcileServiceStatus in line with how ReconcileKonnectivityServerServiceStatus already handles this. The choice is yours.
|
/auto-cc |
|
Now I have the complete picture. Here is the analysis: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe PR introduces three logically parallel code changes — all adding NodePort handling for non-cloud platforms — but only writes tests for one of them:
The overall patch coverage (6 covered / 17 total = 35.29%) falls below the repository's Codecov target (37.44%), triggering the check failure. Recommendations
Evidence
|
What this PR does / why we need it:
On non-cloud platforms (Agent, KubeVirt, OpenStack, None), the HCP router Service was unconditionally created as LoadBalancer. Since these platforms lack a cloud LB controller, the Service stays pending forever, blocking InfrastructureReady and preventing the hosted control plane from becoming available.
Three code paths are fixed:
ReconcileRouterService: use NodePort for non-cloud platforms instead of unconditional LoadBalancerreconcileRouterServiceStatus: add NodePort guard before callingCollectLBMessageIfNotProvisionedkas.ReconcileServiceStatus: add NodePort guard in the Route caseWhich issue(s) this PR fixes:
Fixes OCPBUGS-77856
Special notes for your reviewer:
LoadBalancerSourceRangeson NodePort services is harmless — Kubernetes ignores this field for non-LoadBalancer types.etcd-upload/s3_uploader_mock.gobuild failure in CI is a pre-existing issue on main, unrelated to this PR.Checklist:
Summary by CodeRabbit
Bug Fixes
Tests