New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-10318: [release-4.12] node: add node healthz server for cloud load balancers #1570
OCPBUGS-10318: [release-4.12] node: add node healthz server for cloud load balancers #1570
Conversation
openflowManager code is only marginally health-check related, while the service code provides healthz responses for local traffic policy services on the node. No code changes, just moving stuff around. Signed-off-by: Dan Williams <dcbw@redhat.com> (cherry picked from commit c8489e3)
For Cluster traffic policy services every node should accept traffic and balance to a node with an endpoint for that service. The cloud LB periodically health checks each node to know what nodes it can send traffic to. For Local traffic policy the cloud LB health-checks the specific service's port on every node to determine whether that node has any endpoints for the service, so no node-level health checks are needed. GCE's legacy cloud provider hardcodes port 10256 (the default kube-proxy port) for its node-level health checks. kube-proxy starts a healthz server on port 10256 on every node for Cluster traffic policy services. ovnkube didn't do that, so in some cases the cloud LB wouldn't consider nodes healthy. Fixes: https://issues.redhat.com/browse/OCPBUGS-7158 Signed-off-by: Dan Williams <dcbw@redhat.com> (cherry picked from commit 9a836e3)
@ricky-rav: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
@ricky-rav: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@ricky-rav do we have a Jira for this backport yet? |
@ricky-rav: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ricky-rav: This pull request references Jira Issue OCPBUGS-10318, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ricky-rav: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ricky-rav: This pull request references Jira Issue OCPBUGS-10318, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@ricky-rav: This pull request references Jira Issue OCPBUGS-10318, which is valid. The bug has been moved to the POST state. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
@ricky-rav: This pull request references Jira Issue OCPBUGS-10318, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ricky-rav: No Bugzilla bug is referenced in the title of this pull request. Retaining the bugzilla/valid-bug label as it was manually added. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: ricky-rav The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label cherry-pick-approved |
/test e2e-gcp-ovn |
da42356
into
openshift:release-4.12
@ricky-rav: Jira Issue OCPBUGS-10318: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-10318 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.11 |
@ricky-rav: #1570 failed to apply on top of branch "release-4.11":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/label qe-approved |
4.12 backport of c8489e3 and 9a836e3
Not a clean backport, because 4.13 changes were against secondary network controller code in ovnk node, which we don't have in 4.12.
Also, in the backport of c8489e3, I moved to the new source file
pkg/node/openflow_manager.go
the relevant bits of https://github.com/openshift/ovn-kubernetes/blob/release-4.12/go-controller/pkg/node/healthcheck.go instead of what we had in 4.13, where a few functions (checkForStaleOVSInternalPorts
,checkForStaleOVSRepresentorInterfaces
,checkForStaleOVSInterfaces
) had been removed in previous commits.The node healthz server is then enabled by CNO with this PR:
openshift/cluster-network-operator#1731
closes #OCPBUGS-10318