New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.10] Bug 2105657: Fix egressIP object deletion if the node is deleted first #1228
[release-4.10] Bug 2105657: Fix egressIP object deletion if the node is deleted first #1228
Conversation
@tssurya: This pull request references Bugzilla bug 2105657, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 8 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
sorry to have you had to deal with all the conflicts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am missing some changes from commit Fix egressIP object deletion if the node is deleted first
damn what's with the rebase... yikes.. |
Currently gatewayCleanup doesn't delete rtoj-* ports which is fine since it deletes the router and probably on the OVN side the ports on the router are also removed as they are dependent on the router. However from a testing perspective its nice to have the gatewayCleanup remove these ports explictly just like multiJoinSwitchGatewayCleanup does. Corresponding gatewayCleanup unit tests are also updated. We also add a unit test to egressip testing where we test reassignment upon egress node deletion, specifically where the node has already gotten deleted by the node watcher. This test uncovers a bug we have in egressIP code where the reassignment will not be successful and will fail with the error E0608 09:42:53.781497 971577 obj_retry.go:1513] Failed to delete resource object node1 of type *factory.egressNode, error: Re-assignment for EgressIP: egressip failed, unable to update object, err: unable to retrieve gateway IP for node: node1, protocol is IPv6: false, err: attempt at finding node gateway router network information failed, err: unable to find router port rtoj-GR_node1: object not found Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit 9d0aebe) (cherry picked from commit 2100d49) Conflicts: go-controller/pkg/ovn/gateway_cleanup.go Conflict because libovsdbops.DeleteLogicalRouterPorts(oc.nbClient, &logicalRouter, &logicalRouterPort) doesn't exist in 4.10 since we are missing openshift@8b14898#diff-5d62dde13a2856ea3d4b324968dbaeed261fcb74cb3b81909edf40a6634fa1f4R161 {libovsdb-refactor-cleanup-4.11}
…dicate Partial pick of openshift@4e3c26a#diff-5d62dde13a2856ea3d4b324968dbaeed261fcb74cb3b81909edf40a6634fa1f4 from 4e3c26a. Had to bring in the code required to do DeleteNextHopFromLogicalRouterPoliciesWithPredicate Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
/assign @huiran0826 |
/lgtm |
There are a few instances in egressIP code where we shouldn't fail hard if objects are not found, example if a node gets deleted before egressIP watcher takes action. In this case, we can log a warning saying objects couldn't be deleted and move on since its technically not an error if the object was deleted by some other code path. In oder to ensure we don't leave behind stale reroute policy entries, we also add the logic of removing policies that have nexthop == joinIP of the node that's getting deleted into the gateway cleanup code. Signed-off-by: Surya Seetharaman <suryaseetharaman.9@gmail.com> (cherry picked from commit 95ab759) (cherry picked from commit 95486d9) Conflicts: go-controller/pkg/ovn/egressip.go go-controller/pkg/util/net.go Conflicts because EIP retry code is absent in 4.10 Note: Brought in the chnages done in retry code for getGatewayRouterJoinIP since they made more sense that way.
/retest |
@tssurya: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: flavio-fernandes, jcaamano, tssurya The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label cherry-pick-approved |
@tssurya: All pull requests linked via external trackers have merged: Bugzilla bug 2105657 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Brings in #1143 from 4.11 into 4.10. Conflicts are outlined in each PR. Extra commit (partially, needed parts) picked up from the libovsdb cleanup code to facilitate the backport: 4e3c26a
/assign @jcaamano