New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.10] Bug 2113861: reconcile-node-lbs #1227
[release-4.10] Bug 2113861: reconcile-node-lbs #1227
Conversation
This commit adds a new unit test that shows the node load balancers are not removed upon node removal. If the node is re-added (as part of any remediation procedure), the newly created node logical switches will *not* point to the existing load balancers, since the ovn-k master assures the load-balancers are not re-created. The existing node logical switches will only be mutated to point at these load balancers upon load balancer creation. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> (cherry picked from commit 12d2b26) (cherry picked from commit 92a1fa0)
A node load balancer is a load balancer associated to any of the following services: - service with NodePort set - service with host-network endpoints - service with ExternalTrafficPolicy=Local - service with InternalTrafficPolicy=Local This commit forces the reconciliation of the load balancers upon node deletion. Since the node was deleted, the newly generated list of load balancers will *not* feature the deleted node's load balancers, which will lead to their deletion. Leaking these load balancers causes an issue when the deleted node is re-added (as part of a remediation procedure, for instance) since the newly created node logical switch does not point to these node load balancers, thus breaking connectivity of pods on the node to the services described in the list above. This commit fixes bug [0]. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=2068910 Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> (cherry picked from commit bbe7133) (cherry picked from commit 85291b5)
It is next to impossible to see the differences between expected / actual results without the full structures being printed. This commit adds a function that temporarily disables the maximum length of the diff, and returns a function that when invoked puts the original value back. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> (cherry picked from commit 7309fc4) (cherry picked from commit f881bfa)
The test that asserts the expected NBDB state previously matched ignoring UUIDs, because the ovsdb test framework does not provide any sort of weak reference cleanup - as such, it is impossible for it to remove the references of load balancers that were deleted from logical switches / routers. In order to make the test assert the NBDB state *with* UUIDs, we need to patch up the UUIDs in the expected state - since the test framework is unable to correlate the name of the load balancer to its expected UUID once the load balancer is deleted. This occurs because the production code only dissociates the load balancers from the node's logical switches / GW routers for load balancers that **are not** going to be deleted. Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com> (cherry picked from commit d3107e4) (cherry picked from commit ba01630)
@maiqueb: This pull request references Bugzilla bug 2113861, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test ci/prow/images |
@maiqueb: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test images |
/bugzilla refresh |
@maiqueb: This pull request references Bugzilla bug 2113861, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@maiqueb: An error was encountered querying GitHub for users with public email (anusaxen@redhat.com) for bug 2113861 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details. Full error message.
Post "http://ghproxy/graphql": dial tcp 172.30.229.2:80: i/o timeout
Please contact an administrator to resolve this issue, then request a bug refresh with In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@maiqueb: An error was encountered querying GitHub for users with public email (anusaxen@redhat.com) for bug 2113861 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details. Full error message.
Post "http://ghproxy/graphql": dial tcp 172.30.229.2:80: i/o timeout
Please contact an administrator to resolve this issue, then request a bug refresh with In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@maiqueb: No Jira issue with key OCPBUGS-2113861 exists in the tracker at https://issues.redhat.com/. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@maiqueb: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@maiqueb: No Jira issue with key OCPBUGS-46843 exists in the tracker at https://issues.redhat.com/. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@maiqueb: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/rename [release-4.10] Bug 2113861: reconcile-node-lbs |
@maiqueb: This pull request references Bugzilla bug 2113861, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest-required |
/test e2e-azure-ovn |
/test e2e-openstack-ovn |
@maiqueb: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcaamano, maiqueb The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label backport-risk-assessed |
/label cherry-pick-approved |
@maiqueb: All pull requests linked via external trackers have merged: Bugzilla bug 2113861 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
- What this PR does and why is it needed
services controller: reconcile on node deletion
A node load balancer is a load balancer associated to any of the following
services:
This commit forces the reconciliation of the load balancers upon node
deletion. Since the node was deleted, the newly generated list of load
balancers will not feature the deleted node's load balancers, which will
lead to their deletion.
Leaking these load balancers causes an issue when the deleted node is re-added
(as part of a remediation procedure, for instance) since the newly created node
logical switch does not point to these node load balancers, thus breaking
connectivity of pods on the node to the services described in the list above.
- Special notes for reviewers
Clean pick of the commits included in the ab7f0e636 merge commit to the master branch.
- How to verify it
- Description for the changelog
Force service reconciliation whenever a node is deleted.