-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-33428: Reconcile KAS endpoints and endpoint slice #3942
OCPBUGS-33428: Reconcile KAS endpoints and endpoint slice #3942
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code seems straightforward - is this associated with a Jira ticket?
control-plane-operator/hostedclusterconfigoperator/controllers/resources/kas/reconcile.go
Outdated
Show resolved
Hide resolved
control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go
Outdated
Show resolved
Hide resolved
@stevekuznetsov Thanks for the review. I will address the comments and will be opening a Jira with the details. I wanted to pull together some test results with background information as to why I think we need this change. |
b3d7e03
to
bcab756
Compare
10062ed
to
fcc7e45
Compare
/retitle OCPBUGS-33428: Reconcile KAS endpoints and endpoint slice |
@rtheis: This pull request references Jira Issue OCPBUGS-33428, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
fcc7e45
to
225b286
Compare
✅ Deploy Preview for hypershift-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/test e2e-aws |
/test e2e-azure |
/jira refresh |
@jeffnowicki: This pull request references Jira Issue OCPBUGS-33428, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@@ -1163,6 +1157,39 @@ func (r *reconciler) reconcileOpenshiftOAuthAPIServerAPIServices(ctx context.Con | |||
return errors.NewAggregate(errs) | |||
} | |||
|
|||
func (r *reconciler) reconcileKASEndpoints(ctx context.Context, hcp *hyperv1.HostedControlPlane) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really wish we had some way to test the whole reconciler together so we could capture that it now does this. @sjenning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Why is this issue only hit when having haproxy to intercept traffic to the endpoint? why is it not hit with a traditional svc->endpoint->pod flow? I'm concern we opt out to other goodies by setting none for --endpoint-reconciler-type Other than the concerns above, code changes lgtm |
@enxebre My understanding is that setting the KAS advertise-address results in a single address for the KAS endpoint in the data plane regardless of there being multiple KAS pods/endpoints in the control plane. As a result, when any KAS pod is terminated, the pod will now remove its address from the endpoint. In previous releases, the KAS would see a single address and refuse to remove its address from the endpoint. That failsafe was removed by Kubernetes and thus we see single endpoint setups, such as used by HyperShift, encounter this KAS in cluster availability problem during routine maintenance activities. Based on our discussions with the Kubernetes community via kubernetes/kubernetes#118777, I think we unfortunately need to set |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre, rtheis, stevekuznetsov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/retest-required |
5 similar comments
/retest-required |
/retest-required |
/retest-required |
/retest-required |
/retest-required |
@rtheis: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@rtheis: Jira Issue OCPBUGS-33428: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-33428 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cherry-pick release-4.16 |
/cherry-pick release-4.15 |
@rtheis: new pull request created: #4096 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@rtheis: new pull request created: #4097 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[ART PR BUILD NOTIFIER] This PR has been included in build ose-hypershift-container-v4.17.0-202405271841.p0.g4c2f51c.assembly.stream.el9 for distgit hypershift. |
What this PR does / why we need it:
See issue for details and test results.
Which issue(s) this PR fixes (optional, use
fixes #<issue_number>(, fixes #<issue_number>, ...)
format, where issue_number might be a GitHub issue, or a Jira story:Fixes # https://issues.redhat.com/browse/OCPBUGS-33428
Checklist