-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-5077: [release-4.10] Fix load balancer health check locking and update #1463
OCPBUGS-5077: [release-4.10] Fix load balancer health check locking and update #1463
Conversation
Maps for endpoints and services may be accessed at the same time because the services and endpoints handlers are attached to two different informer goroutines. This can cause: fatal error: concurrent map read and map write goroutine 322 [running]: runtime.throw(0x1b2bac6, 0x21) /usr/lib/golang/src/runtime/panic.go:1117 +0x72 fp=0xc000811b60 sp=0xc000811b30 pc=0x43e612 runtime.mapaccess2(0x18e4000, 0xc0018c5e90, 0xc000811bf0, 0x2921980, 0x0) /usr/lib/golang/src/runtime/map.go:469 +0x255 fp=0xc000811ba0 sp=0xc000811b60 pc=0x415c55 github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*loadBalancerHealthChecker).AddEndpoints(0xc0018c5ef0, 0xc0004d9180) /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/healthcheck.go: github.com/ovn-org/ovn-kubernetes/go-controller/pkg/node.(*gateway).AddEndpoints(...) /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/node/gateway.go:107 Signed-off-by: Tim Rozet <trozet@redhat.com> (cherry picked from commit 915c779)
We were not creating healthchecks for service updates when the etp changed from local to cluster or back. This PR fixes this. It also fixes some endpoint slices code that earlier assumed svc:endpointslice was 1:1 mapping. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit 4c7add6) (cherry picked from commit 5464286)
@trozet: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@trozet: This pull request references Jira Issue OCPBUGS-5077, which is valid. The bug has been moved to the POST state. 6 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @tssurya |
/override ci/prow/unit |
@abhat: Overrode contexts on behalf of abhat: ci/prow/unit In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
/retest-required |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dcbw, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label cherry-pick-approved |
/retest-required |
depends on #1479 for unit test fix |
/retest-required |
/override ci/prow/unit |
@trozet: Overrode contexts on behalf of trozet: ci/prow/unit In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@trozet: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@trozet: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-5077 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Clean backport of fixes that fix locking around healthchecks and handle updates correctly.