New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release 4.14] OCPBUGS-16267: Fix controller reboot bug #155
[release 4.14] OCPBUGS-16267: Fix controller reboot bug #155
Conversation
Make the ConditionStatus function private as it's only being used internally by IsNetworkUnavailable Also switch the order of the functions making the exported one at the top. Signed-off-by: Lior Noy <lnoy@redhat.com>
* We add a test case where we create 4 services and assert that when the controller restarts, it keeps assigning the first two services the same IPs, and not removing/changing them. * Move functions "validateDesiredLB" and "getIngressIPs" to a different package, to be able to reuse in the l2tests. Signed-off-by: Lior Noy <lnoy@redhat.com>
This commit changes the behavior of the service reconciler to fix a bug that the controller de-assign an ip for a service after reboot. Make the service reconciler initially ignore the services, up until the first reprocessAll event finishes, where we sort and handle all of the services with assigned IP first. By doing so, we make the controller aware of the LB services with existing external IPs and sync the internal state. Only after we reprocessed all services once, and know what services are allocated and what ips are in use, return to work as normal. Add unit tests for the service controller Add unit test cases to cover the FirstCongifurtaion flag. Testcase 1: Testing the service reconcile with the flag set to true. Testcase 2: Testing the reprocessAll with the flag set to true: validate that the value is modifeid to false by the controller. Signed-off-by: liornoy <lnoy@redhat.com>
@liornoy: This pull request references Jira Issue OCPBUGS-16267, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@liornoy: This pull request references Jira Issue OCPBUGS-16267, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@liornoy: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/approve |
/label backport-risk-assessed |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fedepaol, liornoy The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label cherry-pick-approved |
a09f95c
into
openshift:release-4.14
@liornoy: Jira Issue OCPBUGS-16267: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-16267 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.13 |
@liornoy: #155 failed to apply on top of branch "release-4.13":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR changes the behavior of the service reconciler
to fix the following bug:
There is an LB service with a specific IP (annotated) assigned to it.
Also there are other LB services in the cluster on "pending".
-MetalLB's controller resets and when it goes back up again,
it loops over the services, sees first the "pending" LB service,
and assigns it the IP that was assigned to the annotated service.
Here we make the reconciler ignore the services, up until the first
reprocessAll event, where we handle only the services with IP assigned
to them already.