New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set mac binding age threshold in gateway routers #3678
Conversation
0e7bd02
to
c8f1c6b
Compare
/retest-required |
85b6ef4
to
9225e4d
Compare
Saving this flake for later
https://github.com/ovn-org/ovn-kubernetes/actions/runs/5834359113/job/15825042191?pr=3678 |
So not a flake after all, consistently failing. The test that fails checks that two pods on different zones can talk to each other, then brings one pod to the same zone as the other and does the same check and then takes the former pod to its original zone and checks again. The last check fails. It looks like the pod's node never becomes ready after that last change. ovnkube-node fails finding the output flow for the node's management port. Checking the NBDB, that port is present but not up. It's row is missing from the SBDB Port_Binding table (!!). The port is there in OVS, with ovn-installed=true. Compare ovn-controller logs before the change
to after
One of the things I am thinking about is if the way we were previously updating the mac bindings in the SBDB directly (instead of NBDB that this PR does) forces a different kind of recompute in ovn-controller. |
@almusil FYI |
@numansiddique helped me here and advised me to delete the static mac bindings on gateway cleanup, something that we were not doing before. On one side, OVN-K should be doing this. On the other side it looks like OVN should be handling this better so perhaps file a bug against OVN as well. |
9225e4d
to
6ec6a5a
Compare
@JacobTanenbaum I added some logic here to clean the static mac bindings that we do for HO on the NBDB cleanup that was already in place. However, this cleanup only seems to happen on node deletion but not on a node update when the node is missing the appropriate labels or when HO has been disabled. PTAL. |
78ecf96
to
f161c2d
Compare
testing downstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall i think this change makes sense - it would make more sense if we can wait for ales's fixes to land so that we don't start to suddenly remove bindings after 5mins even if they are being used? I think 23.09 is only a few weeks away right?
return item.IP == types.V4NodeLocalNATSubnetNextHop || item.IP == types.V6NodeLocalNATSubnetNextHop | ||
} | ||
err := libovsdbops.DeleteMacBindingWithPredicate(oc.sbClient, p) | ||
err := libovsdbutil.DeleteSbdbMacBindingsWithIPs(oc.sbClient, types.V4NodeLocalNATSubnetNextHop, types.V6NodeLocalNATSubnetNextHop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too bad we don't have versioning upstream, We stopped doing cleanupDGP
a long time ago, if we can just get rid of this block of code, we probably don't even want to maintain this utility DeleteSbdbMacBindingsWithIPs
specially if this is the only spot we call it from.
Having said that, I guess it would just be a no-op almost always since topology version won't match it since its a one-time-only operation.
@dcbw @trozet : do you think we still have users using the old gateway mode code? we should probably just get rid of this dead code at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we should just remove this code now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do this in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the cleanup is version dependent (not done unconditionally)
// If version is less than Host -> Service with OpenFlow, we need to remove and cleanup DGP
if err == nil && ((ver < types.OvnHostToSvcOFTopoVersion && config.Gateway.Mode == config.GatewayModeShared) ||
(ver < types.OvnRoutingViaHostTopoVersion)) {
err = oc.cleanupDGP(existingNodes)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was due to upgrade constraints at the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so from this commit onwards we move all bindings to NBDB's static binding table and stop directly creating stuff in SBDB? Is there a migration path for existing bindings to be wiped off during update from the old table? to me it seems like old bindings will continue to be there in sbdb's mac binding table and then new ones will start to be created in nbdb? Shouldn't we take care of upgrades?
Also does this mean moving forward OVN handles the mac binding logic totally? -> So as of today, we go with the timestamps and forcefully prune all bindings older than 300s whether they are active or not, and once ales's fixes land, in 23.09, we stop flushing stupidly and instead use the idle age logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so from this commit onwards we move all bindings to NBDB's static binding table and stop directly creating stuff in SBDB? Is there a migration path for existing bindings to be wiped off during update from the old table? to me it seems like old bindings will continue to be there in sbdb's mac binding table and then new ones will start to be created in nbdb? Shouldn't we take care of upgrades?
The aging mechanism will get rid of the old mac bindings. Before that, the new mac bindings will overwrite the old mac bindings which are, in any case, the same bindings anyway.
Also does this mean moving forward OVN handles the mac binding logic totally? -> So as of today, we go with the timestamps and forcefully prune all bindings older than 300s whether they are active or not, and once ales's fixes land, in 23.09, we stop flushing stupidly and instead use the idle age logic?
yes, but I don't know if that's available in 23.09 already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so from this commit onwards we move all bindings to NBDB's static binding table and stop directly creating stuff in SBDB? Is there a migration path for existing bindings to be wiped off during update from the old table? to me it seems like old bindings will continue to be there in sbdb's mac binding table and then new ones will start to be created in nbdb? Shouldn't we take care of upgrades?
The aging mechanism will get rid of the old mac bindings. Before that, the new mac bindings will overwrite the old mac bindings which are, in any case, the same bindings anyway.
ah I see so:
when you create the new stuff in nbdb northd will update the stuff a.k.a overwrite it in SBDB.
and after 5mins they will get flushed anyways and that means SBDB will also have the same flush. ok gotcha makes sense. (resolve)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you had written this information in the commit message it would make it more obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add it to the commit message.
|
||
// DeleteDummyGWMacBindings removes mac bindings (ipv4 and ipv6) for a fake next hops | ||
// used by host->service traffic | ||
func DeleteDummyGWMacBindings(nbClient libovsdbclient.Client, nodeName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this is only needed for multiple nodes per zone case right? usually if the node is gone, then GR is also gone and we would think this logicalPort is gone as well along with the binding but maybe not because we (OVNK) manually create the dummy binding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
northd is not cleaning up the mac binding for us when the logical port is gone, this might be a northd bug though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dceara @numansiddique is this a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema for NB. (static) mac bindings is very relaxed:
"Static_MAC_Binding": {
"columns": {
"logical_port": {"type": "string"},
"ip": {"type": "string"},
"mac": {"type": "string"},
"override_dynamic_mac": {"type": "boolean"}},
"indexes": [["logical_port", "ip"]],
"isRoot": true},
The schema for the SB mac binding table is very similar. The only difference is that ovn-northd doesn't expect anyone else to write to the SB directly (ovn-kube did but that's not really the right way to interact with OVN) so ovn-northd is in charge of creating/deleting SB.MAC_Bindings.
For the NB, it's a bit different, ovn-northd didn't create the NB.Static_MAC_Binding record, it's not clear whether it should clean it or not.
There's always the possibility that the CMS/user/admin/etc creates the Static_MAC_Binding before creating the LRP. The schema allows that. If northd were to automatically delete all "stale" Static_MAC_Bindings, this use case would break.
TL/DR: I don't really think this is a bug. NB records are usually not owned by ovn-northd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the NB records are usually not owned by ovn-northd, we would have a different behavior with the referential integrity that could have happened if logical_port
was a strong reference and Static_MAC_Binding
was a non root table.
I am not sure if there is a real use case behind the current schema design or if this has been perhaps an oversight when copying the SBDB table definition to NBDB which might have more implications.
One of those implications, and probably the real bug, is that northd fails to reconcile if logical_port
does not exist which probably should be handled more gracefully with the current schema design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not OK to add the strong reference now. We'd potentially break users. I suspect there are users. This feature was developed upstream and on a community member request: ovn-org/ovn@b22684d
One of those implications, and probably the real bug, is that northd fails to reconcile if
logical_port
does not exist which probably should be handled more gracefully with the current schema design.
We cannot assume that the NB entry should be removed. With the current schema the user is allowed to insert NB.Static_MAC_Binding and NB.Logical_Router_Port in any order they wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not OK to add the strong reference now. We'd potentially break users. I suspect there are users. This feature was developed upstream and on a community member request: ovn-org/ovn@b22684d
Sure, I was not asking for it to be changed, just comparing the functionality we would have if that had been designed in that way form the beginning.
We cannot assume that the NB entry should be removed. With the current schema the user is allowed to insert NB.Static_MAC_Binding and NB.Logical_Router_Port in any order they wish.
Perhaps not remove it but ignore it? Let me explain better.
Let's say we have a static mac binding for a port on a router and everything is ok. Let's say then I delete the router but I don't delete the static mac binding. This is a perfectly valid NBDB state as per schema but northd fails to reconcile because the resulting SBDB has a mac binding pointing to the router SB datapath
{"details":"cannot delete Datapath_Binding row f60f2cf6-2ee7-4962-9579-c463817c18fc because of 1 remaining reference(s)","error":"referential integrity violation"}
You have more information than I do. How do you know this is already available in 23.09? In regards to waiting, we already discussed renewing the entries every 5 mins shouldn't be a major problem and wouldn't be a functional difference. If there is an unexpected problem, this can give us an early signal. But I don't have a problem with waiting if we think that is better. |
I see that the mailing list mentions 23.09 but I read that as a plan not as a fact. |
@almusil said its in 23.09
I think the early signal makes sense :) won't stop this PR from merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Not in yet, but planned to be there https://patchwork.ozlabs.org/project/ovn/list/?series=366554 I asked and it doesn't require any specific configuration other than what is being already set in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see the commit messages updated with proper commit body and explanation of what the commit is doing.
return item.IP == types.V4NodeLocalNATSubnetNextHop || item.IP == types.V6NodeLocalNATSubnetNextHop | ||
} | ||
err := libovsdbops.DeleteMacBindingWithPredicate(oc.sbClient, p) | ||
err := libovsdbutil.DeleteSbdbMacBindingsWithIPs(oc.sbClient, types.V4NodeLocalNATSubnetNextHop, types.V6NodeLocalNATSubnetNextHop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes we should just remove this code now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you had written this information in the commit message it would make it more obvious.
|
||
// DeleteDummyGWMacBindings removes mac bindings (ipv4 and ipv6) for a fake next hops | ||
// used by host->service traffic | ||
func DeleteDummyGWMacBindings(nbClient libovsdbclient.Client, nodeName string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dceara @numansiddique is this a bug?
go-controller/pkg/types/const.go
Outdated
// GRMacBindingAgeThreshold is the lifetime in seconds of each MAC binding | ||
// entry for the gateway routers. After this time, the entry is removed and | ||
// may be refreshed with a new ARP request. | ||
GRMacBindingAgeThreshold = "300" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be MAC
Delete using the native client API instead of model client API to avoid having to monitor & cache the SBDB mac binding table. As opposed to the model client API, which would result in noop if there is no cache hit, the native client API will successfully build a delete OP regardless of whether there is a cache hit or not. The operation is idempotent and there will be no error if there is nothing to delete. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Create static bindings in the NBDB Static_Mac_Binding table instead of the SBDB Mac_Binding table. northd will then add those static entries to the SBDB Mac_Binding table, overwriting the ones added there previously by us pre-upgrade. Clean up all code related to handling those entries in the SBDB Mac_Binding table. The only remaining operation is old DGP entries clean up for which the native client API is being used since the previous commit, so we don't need to monitor & cache the table. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Not just becasue we should, but also because northd fails to reconcile if there are leftovers in the NBDB Static_Mac_Binding table referring to logical ports that don't existr, something that did not happen when we were using the SBDB Mac_Binding table. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Not just becasue we should, but also because northd fails to reconcile if there are leftovers in the NBDB Static_Mac_Binding table referring to logical ports that don't existr, something that did not happen when we were using the SBDB Mac_Binding table. Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
On scenarios where the cluster is shut down and started up after some time later, the dynamic mac binding entries in the SBDB might be stale. If any endpoint changed MAC address and sent a GARP, OVN has no chance of processing it if it is shut down. Set mac_binding_age_threshold to 300 seconds. Most common values used in other implementations range from 60s to 1200s being 300s the most common after a google search. Unfortunately the mac binding age of an entry is not refreshed on any network event, so entries will be removed from the mac binding table after 300 seconds unconditionaly. There is an initiative to improve this behavior in OVN [1] [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-May/052475.html Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
f161c2d
to
14e1830
Compare
Added commit msgs and reworded some comments to hopefully make it easier to understand |
Fix included in accepted release 4.15.0-0.nightly-2023-10-24-230302 |
On scenarios where the cluster is shut down and started up after some time later, the dynamic mac binding entries in the SBDB might be stale. If any endpoint changed MAC address and sent a GARP, OVN has no chance of processing it if it is shut down.
Set mac_binding_age_threshold to 300 seconds. Most common values used in other implementations range from 60s to 1200s being 300s the most common after a google search.
Unfortunately the mac binding age of an entry is not refreshed on any network event, so entries will be removed from the mac binding table after 300 seconds unconditionaly. There is an initiative to improve this behavior in OVN [1]
This requires also to set mac bindings int the Static_Mac_Binding table instead of the Mac_Binding table so that they are not wiped by the aging mechanism. We also need to clean them up appropriately, otherwise northd may fail to reconcile if they point to a LRP that no longer exists.
[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-May/052475.html