Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set mac binding age threshold in gateway routers #3678

Merged
merged 5 commits into from Oct 17, 2023

Conversation

jcaamano
Copy link
Contributor

@jcaamano jcaamano commented Jun 16, 2023

On scenarios where the cluster is shut down and started up after some time later, the dynamic mac binding entries in the SBDB might be stale. If any endpoint changed MAC address and sent a GARP, OVN has no chance of processing it if it is shut down.

Set mac_binding_age_threshold to 300 seconds. Most common values used in other implementations range from 60s to 1200s being 300s the most common after a google search.

Unfortunately the mac binding age of an entry is not refreshed on any network event, so entries will be removed from the mac binding table after 300 seconds unconditionaly. There is an initiative to improve this behavior in OVN [1]

This requires also to set mac bindings int the Static_Mac_Binding table instead of the Mac_Binding table so that they are not wiped by the aging mechanism. We also need to clean them up appropriately, otherwise northd may fail to reconcile if they point to a LRP that no longer exists.

[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-May/052475.html

@jcaamano
Copy link
Contributor Author

@dcbw @trozet PTAL

@jcaamano jcaamano force-pushed the OCPBUGS-11710 branch 2 times, most recently from 0e7bd02 to c8f1c6b Compare June 21, 2023 10:24
@jcaamano
Copy link
Contributor Author

/retest-required

@jcaamano
Copy link
Contributor Author

Saving this flake for later

[Fail] Multi node zones interconnect [It] Pod interconnectivity 
/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/multi_node_zones_interconnect.go:83

[Fail] Multi node zones interconnect [It] Pod interconnectivity 
/home/runner/go/pkg/mod/k8s.io/kubernetes@v1.24.0/test/e2e/framework/pods.go:107

https://github.com/ovn-org/ovn-kubernetes/actions/runs/5834359113/job/15825042191?pr=3678

@jcaamano
Copy link
Contributor Author

jcaamano commented Aug 14, 2023

Saving this flake for later

[Fail] Multi node zones interconnect [It] Pod interconnectivity 
/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/multi_node_zones_interconnect.go:83

[Fail] Multi node zones interconnect [It] Pod interconnectivity 
/home/runner/go/pkg/mod/k8s.io/kubernetes@v1.24.0/test/e2e/framework/pods.go:107

https://github.com/ovn-org/ovn-kubernetes/actions/runs/5834359113/job/15825042191?pr=3678

So not a flake after all, consistently failing.

The test that fails checks that two pods on different zones can talk to each other, then brings one pod to the same zone as the other and does the same check and then takes the former pod to its original zone and checks again. The last check fails.

It looks like the pod's node never becomes ready after that last change. ovnkube-node fails finding the output flow for the node's management port. Checking the NBDB, that port is present but not up. It's row is missing from the SBDB Port_Binding table (!!). The port is there in OVS, with ovn-installed=true.

Compare ovn-controller logs before the change

2023-08-14T14:18:15.620Z|00018|main|INFO|OVS feature set changed, force recompute.
2023-08-14T14:18:16.784Z|00019|reconnect|INFO|tcp:172.18.0.4:6642: connecting...
2023-08-14T14:18:16.793Z|00020|main|INFO|OVNSB commit failed, force recompute next time.
2023-08-14T14:18:16.793Z|00021|reconnect|INFO|tcp:172.18.0.4:6642: connected
2023-08-14T14:18:17.454Z|00022|binding|INFO|Claiming lport cr-rtos-ovn-worker3 for this chassis.
2023-08-14T14:18:17.454Z|00023|binding|INFO|cr-rtos-ovn-worker3: Claiming 0a:58:0a:f4:02:01 10.244.2.1/24
2023-08-14T14:18:17.514Z|00024|binding|INFO|Claiming lport k8s-ovn-worker3 for this chassis.
2023-08-14T14:18:17.514Z|00025|binding|INFO|k8s-ovn-worker3: Claiming 72:ec:6a:44:e0:e6 10.244.2.2
2023-08-14T14:18:17.515Z|00026|binding|INFO|Claiming lport jtor-GR_ovn-worker3 for this chassis.
2023-08-14T14:18:17.515Z|00027|binding|INFO|jtor-GR_ovn-worker3: Claiming router
2023-08-14T14:18:17.515Z|00028|binding|INFO|Claiming lport rtoe-GR_ovn-worker3 for this chassis.
2023-08-14T14:18:17.515Z|00029|binding|INFO|rtoe-GR_ovn-worker3: Claiming 02:42:ac:12:00:05 172.18.0.5/16
2023-08-14T14:18:17.515Z|00030|binding|INFO|Claiming lport rtoj-GR_ovn-worker3 for this chassis.
2023-08-14T14:18:17.515Z|00031|binding|INFO|rtoj-GR_ovn-worker3: Claiming 0a:58:64:40:00:04 100.64.0.4/16
2023-08-14T14:18:17.515Z|00032|binding|INFO|Claiming lport etor-GR_ovn-worker3 for this chassis.
2023-08-14T14:18:17.515Z|00033|binding|INFO|etor-GR_ovn-worker3: Claiming 02:42:ac:12:00:05
2023-08-14T14:18:17.540Z|00034|binding|INFO|Removing lport k8s-ovn-worker3 ovn-installed in OVS
2023-08-14T14:18:17.700Z|00035|binding|INFO|Claiming lport multi-node-zones-5452_client-pod for this chassis.
2023-08-14T14:18:17.701Z|00036|binding|INFO|multi-node-zones-5452_client-pod: Claiming 0a:58:0a:f4:02:03 10.244.2.3
2023-08-14T14:18:17.707Z|00037|binding|INFO|Removing lport multi-node-zones-5452_client-pod ovn-installed in OVS
2023-08-14T14:18:17.709Z|00038|binding|INFO|Setting lport k8s-ovn-worker3 up in Southbound
2023-08-14T14:18:17.710Z|00039|binding|INFO|Setting lport k8s-ovn-worker3 ovn-installed in OVS
2023-08-14T14:18:17.710Z|00040|binding|INFO|Setting lport multi-node-zones-5452_client-pod ovn-installed in OVS
2023-08-14T14:18:17.710Z|00041|binding|INFO|Setting lport multi-node-zones-5452_client-pod up in Southbound
2023-08-14T14:18:25.632Z|00042|memory|INFO|14736 kB peak resident set size after 10.0 seconds
2023-08-14T14:18:25.632Z|00043|memory|INFO|idl-cells-OVN_Southbound:7207 idl-cells-Open_vSwitch:635 if_status_mgr_ifaces_state_usage-KB:1 if_status_mgr_ifaces_usage-KB:1 lflow-cache-entries-cache-expr:383 lflow-cache-entries-cache-matches:577 lflow-cache-size-KB:1452 local_datapath_usage-KB:1 ofctrl_desired_flow_usage-KB:291 ofctrl_installed_flow_usage-KB:216 ofctrl_sb_flow_ref_usage-KB:116
2023-08-14T14:18:47.708Z|00044|memory_trim|INFO|Detected inactivity (last active 30002 ms ago): trimming memory

to after

2023-08-14T12:55:27.831Z|00018|main|INFO|OVS feature set changed, force recompute.
2023-08-14T12:55:28.960Z|00019|reconnect|INFO|tcp:172.18.0.2:6642: connecting...
2023-08-14T12:55:28.968Z|00020|main|INFO|OVNSB commit failed, force recompute next time.
2023-08-14T12:55:28.968Z|00021|reconnect|INFO|tcp:172.18.0.2:6642: connected
2023-08-14T12:55:28.988Z|00022|binding|INFO|Claiming lport etor-GR_ovn-worker3 for this chassis.
2023-08-14T12:55:28.988Z|00023|binding|INFO|etor-GR_ovn-worker3: Claiming 02:42:ac:12:00:03
2023-08-14T12:55:28.988Z|00024|binding|INFO|Claiming lport rtoe-GR_ovn-worker3 for this chassis.
2023-08-14T12:55:28.988Z|00025|binding|INFO|rtoe-GR_ovn-worker3: Claiming 02:42:ac:12:00:03 172.18.0.3/16
2023-08-14T12:55:37.838Z|00026|memory|INFO|14192 kB peak resident set size after 10.0 seconds
2023-08-14T12:55:37.838Z|00027|memory|INFO|idl-cells-OVN_Southbound:6206 idl-cells-Open_vSwitch:635 lflow-cache-entries-cache-expr:237 lflow-cache-entries-cache-matches:438 lflow-cache-size-KB:889 local_datapath_usage-KB:1 ofctrl_desired_flow_usage-KB:68 ofctrl_installed_flow_usage-KB:50 ofctrl_sb_flow_ref_usage-KB:33
2023-08-14T12:55:58.993Z|00028|memory_trim|INFO|Detected inactivity (last active 30000 ms ago): trimming memory

One of the things I am thinking about is if the way we were previously updating the mac bindings in the SBDB directly (instead of NBDB that this PR does) forces a different kind of recompute in ovn-controller.

@numansiddique
Copy link
Contributor

@almusil FYI

@jcaamano
Copy link
Contributor Author

@numansiddique helped me here and advised me to delete the static mac bindings on gateway cleanup, something that we were not doing before. On one side, OVN-K should be doing this. On the other side it looks like OVN should be handling this better so perhaps file a bug against OVN as well.

@jcaamano
Copy link
Contributor Author

@JacobTanenbaum I added some logic here to clean the static mac bindings that we do for HO on the NBDB cleanup that was already in place. However, this cleanup only seems to happen on node deletion but not on a node update when the node is missing the appropriate labels or when HO has been disabled. PTAL.

@jcaamano jcaamano force-pushed the OCPBUGS-11710 branch 2 times, most recently from 78ecf96 to f161c2d Compare August 16, 2023 20:18
@jcaamano
Copy link
Contributor Author

testing downstream
openshift/ovn-kubernetes#1823

Copy link
Member

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall i think this change makes sense - it would make more sense if we can wait for ales's fixes to land so that we don't start to suddenly remove bindings after 5mins even if they are being used? I think 23.09 is only a few weeks away right?

return item.IP == types.V4NodeLocalNATSubnetNextHop || item.IP == types.V6NodeLocalNATSubnetNextHop
}
err := libovsdbops.DeleteMacBindingWithPredicate(oc.sbClient, p)
err := libovsdbutil.DeleteSbdbMacBindingsWithIPs(oc.sbClient, types.V4NodeLocalNATSubnetNextHop, types.V6NodeLocalNATSubnetNextHop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too bad we don't have versioning upstream, We stopped doing cleanupDGP a long time ago, if we can just get rid of this block of code, we probably don't even want to maintain this utility DeleteSbdbMacBindingsWithIPs specially if this is the only spot we call it from.
Having said that, I guess it would just be a no-op almost always since topology version won't match it since its a one-time-only operation.

@dcbw @trozet : do you think we still have users using the old gateway mode code? we should probably just get rid of this dead code at some point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we should just remove this code now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do this in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the cleanup is version dependent (not done unconditionally)

	// If version is less than Host -> Service with OpenFlow, we need to remove and cleanup DGP
	if err == nil && ((ver < types.OvnHostToSvcOFTopoVersion && config.Gateway.Mode == config.GatewayModeShared) ||
		(ver < types.OvnRoutingViaHostTopoVersion)) {
		err = oc.cleanupDGP(existingNodes)
	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was due to upgrade constraints at the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove in another PR

go-controller/pkg/libovsdb/util/mac_binding.go Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so from this commit onwards we move all bindings to NBDB's static binding table and stop directly creating stuff in SBDB? Is there a migration path for existing bindings to be wiped off during update from the old table? to me it seems like old bindings will continue to be there in sbdb's mac binding table and then new ones will start to be created in nbdb? Shouldn't we take care of upgrades?

Also does this mean moving forward OVN handles the mac binding logic totally? -> So as of today, we go with the timestamps and forcefully prune all bindings older than 300s whether they are active or not, and once ales's fixes land, in 23.09, we stop flushing stupidly and instead use the idle age logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so from this commit onwards we move all bindings to NBDB's static binding table and stop directly creating stuff in SBDB? Is there a migration path for existing bindings to be wiped off during update from the old table? to me it seems like old bindings will continue to be there in sbdb's mac binding table and then new ones will start to be created in nbdb? Shouldn't we take care of upgrades?

The aging mechanism will get rid of the old mac bindings. Before that, the new mac bindings will overwrite the old mac bindings which are, in any case, the same bindings anyway.

Also does this mean moving forward OVN handles the mac binding logic totally? -> So as of today, we go with the timestamps and forcefully prune all bindings older than 300s whether they are active or not, and once ales's fixes land, in 23.09, we stop flushing stupidly and instead use the idle age logic?

yes, but I don't know if that's available in 23.09 already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so from this commit onwards we move all bindings to NBDB's static binding table and stop directly creating stuff in SBDB? Is there a migration path for existing bindings to be wiped off during update from the old table? to me it seems like old bindings will continue to be there in sbdb's mac binding table and then new ones will start to be created in nbdb? Shouldn't we take care of upgrades?

The aging mechanism will get rid of the old mac bindings. Before that, the new mac bindings will overwrite the old mac bindings which are, in any case, the same bindings anyway.

ah I see so:
when you create the new stuff in nbdb northd will update the stuff a.k.a overwrite it in SBDB.
and after 5mins they will get flushed anyways and that means SBDB will also have the same flush. ok gotcha makes sense. (resolve)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you had written this information in the commit message it would make it more obvious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it to the commit message.


// DeleteDummyGWMacBindings removes mac bindings (ipv4 and ipv6) for a fake next hops
// used by host->service traffic
func DeleteDummyGWMacBindings(nbClient libovsdbclient.Client, nodeName string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is only needed for multiple nodes per zone case right? usually if the node is gone, then GR is also gone and we would think this logicalPort is gone as well along with the binding but maybe not because we (OVNK) manually create the dummy binding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

northd is not cleaning up the mac binding for us when the logical port is gone, this might be a northd bug though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dceara @numansiddique is this a bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema for NB. (static) mac bindings is very relaxed:

        "Static_MAC_Binding": {
            "columns": {
                "logical_port": {"type": "string"},
                "ip": {"type": "string"},
                "mac": {"type": "string"},
                "override_dynamic_mac": {"type": "boolean"}},
            "indexes": [["logical_port", "ip"]],
             "isRoot": true},

The schema for the SB mac binding table is very similar. The only difference is that ovn-northd doesn't expect anyone else to write to the SB directly (ovn-kube did but that's not really the right way to interact with OVN) so ovn-northd is in charge of creating/deleting SB.MAC_Bindings.

For the NB, it's a bit different, ovn-northd didn't create the NB.Static_MAC_Binding record, it's not clear whether it should clean it or not.

There's always the possibility that the CMS/user/admin/etc creates the Static_MAC_Binding before creating the LRP. The schema allows that. If northd were to automatically delete all "stale" Static_MAC_Bindings, this use case would break.

TL/DR: I don't really think this is a bug. NB records are usually not owned by ovn-northd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dceara

While the NB records are usually not owned by ovn-northd, we would have a different behavior with the referential integrity that could have happened if logical_port was a strong reference and Static_MAC_Binding was a non root table.

I am not sure if there is a real use case behind the current schema design or if this has been perhaps an oversight when copying the SBDB table definition to NBDB which might have more implications.

One of those implications, and probably the real bug, is that northd fails to reconcile if logical_port does not exist which probably should be handled more gracefully with the current schema design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not OK to add the strong reference now. We'd potentially break users. I suspect there are users. This feature was developed upstream and on a community member request: ovn-org/ovn@b22684d

One of those implications, and probably the real bug, is that northd fails to reconcile if logical_port does not exist which probably should be handled more gracefully with the current schema design.

We cannot assume that the NB entry should be removed. With the current schema the user is allowed to insert NB.Static_MAC_Binding and NB.Logical_Router_Port in any order they wish.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not OK to add the strong reference now. We'd potentially break users. I suspect there are users. This feature was developed upstream and on a community member request: ovn-org/ovn@b22684d

Sure, I was not asking for it to be changed, just comparing the functionality we would have if that had been designed in that way form the beginning.

We cannot assume that the NB entry should be removed. With the current schema the user is allowed to insert NB.Static_MAC_Binding and NB.Logical_Router_Port in any order they wish.

Perhaps not remove it but ignore it? Let me explain better.

Let's say we have a static mac binding for a port on a router and everything is ok. Let's say then I delete the router but I don't delete the static mac binding. This is a perfectly valid NBDB state as per schema but northd fails to reconcile because the resulting SBDB has a mac binding pointing to the router SB datapath

{"details":"cannot delete Datapath_Binding row f60f2cf6-2ee7-4962-9579-c463817c18fc because of 1 remaining reference(s)","error":"referential integrity violation"}

go-controller/pkg/ovn/master_test.go Outdated Show resolved Hide resolved
go-controller/pkg/ovn/hybrid.go Show resolved Hide resolved
@jcaamano
Copy link
Contributor Author

overall i think this change makes sense - it would make more sense if we can wait for ales's fixes to land so that we don't start to suddenly remove bindings after 5mins even if they are being used? I think 23.09 is only a few weeks away right?

You have more information than I do. How do you know this is already available in 23.09? In regards to waiting, we already discussed renewing the entries every 5 mins shouldn't be a major problem and wouldn't be a functional difference. If there is an unexpected problem, this can give us an early signal. But I don't have a problem with waiting if we think that is better.

@jcaamano
Copy link
Contributor Author

overall i think this change makes sense - it would make more sense if we can wait for ales's fixes to land so that we don't start to suddenly remove bindings after 5mins even if they are being used? I think 23.09 is only a few weeks away right?

You have more information than I do. How do you know this is already available in 23.09? In regards to waiting, we already discussed renewing the entries every 5 mins shouldn't be a major problem and wouldn't be a functional difference. If there is an unexpected problem, this can give us an early signal. But I don't have a problem with waiting if we think that is better.

I see that the mailing list mentions 23.09 but I read that as a plan not as a fact.

@tssurya
Copy link
Member

tssurya commented Aug 22, 2023

You have more information than I do. How do you know this is already available in 23.09?

@almusil said its in 23.09

In regards to waiting, we already discussed renewing the entries every 5 mins shouldn't be a major problem and wouldn't be a functional difference. If there is an unexpected problem, this can give us an early signal. But I don't have a problem with waiting if we think that is better.

I think the early signal makes sense :) won't stop this PR from merging.

Copy link
Member

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@jcaamano
Copy link
Contributor Author

@almusil said its in 23.09

Not in yet, but planned to be there

https://patchwork.ozlabs.org/project/ovn/list/?series=366554

I asked and it doesn't require any specific configuration other than what is being already set in this PR.

Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the commit messages updated with proper commit body and explanation of what the commit is doing.

go-controller/pkg/libovsdb/util/mac_binding.go Outdated Show resolved Hide resolved
return item.IP == types.V4NodeLocalNATSubnetNextHop || item.IP == types.V6NodeLocalNATSubnetNextHop
}
err := libovsdbops.DeleteMacBindingWithPredicate(oc.sbClient, p)
err := libovsdbutil.DeleteSbdbMacBindingsWithIPs(oc.sbClient, types.V4NodeLocalNATSubnetNextHop, types.V6NodeLocalNATSubnetNextHop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we should just remove this code now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you had written this information in the commit message it would make it more obvious.


// DeleteDummyGWMacBindings removes mac bindings (ipv4 and ipv6) for a fake next hops
// used by host->service traffic
func DeleteDummyGWMacBindings(nbClient libovsdbclient.Client, nodeName string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dceara @numansiddique is this a bug?

// GRMacBindingAgeThreshold is the lifetime in seconds of each MAC binding
// entry for the gateway routers. After this time, the entry is removed and
// may be refreshed with a new ARP request.
GRMacBindingAgeThreshold = "300"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be MAC

Delete using the native client API instead of model client API to avoid
having to monitor & cache the SBDB mac binding table. As opposed to the
model client API, which would result in noop if there is no cache hit,
the native client API will successfully build a delete OP regardless of
whether there is a cache hit or not. The operation is idempotent and
there will be no error if there is nothing to delete.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Create static bindings in the NBDB Static_Mac_Binding table instead of
the SBDB Mac_Binding table.

northd will then add those static entries to the SBDB Mac_Binding
table, overwriting the ones added there previously by us pre-upgrade.

Clean up all code related to handling those entries in the SBDB
Mac_Binding table. The only remaining operation is old DGP entries
clean up for which the native client API is being used since the
previous commit, so we don't need to monitor & cache the table.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Not just becasue we should, but also because northd fails to reconcile
if there are leftovers in the NBDB Static_Mac_Binding table referring to
logical ports that don't existr, something that did not happen when we
were using the SBDB Mac_Binding table.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Not just becasue we should, but also because northd fails to reconcile
if there are leftovers in the NBDB Static_Mac_Binding table referring to
logical ports that don't existr, something that did not happen when we
were using the SBDB Mac_Binding table.

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
On scenarios where the cluster is shut down and started up after some
time later, the dynamic mac binding entries in the SBDB might be stale.
If any endpoint changed MAC address and sent a GARP, OVN has no chance
of processing it if it is shut down.

Set mac_binding_age_threshold to 300 seconds. Most common values used in
other implementations range from 60s to 1200s being 300s the most common
after a google search.

Unfortunately the mac binding age of an entry is not refreshed on any
network event, so entries will be removed from the mac binding table
after 300 seconds unconditionaly. There is an initiative to improve this
behavior in OVN [1]

[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-May/052475.html

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
@jcaamano
Copy link
Contributor Author

Added commit msgs and reworded some comments to hopefully make it easier to understand

@coveralls
Copy link

Coverage Status

coverage: 50.322% (+0.02%) from 50.302% when pulling 14e1830 on jcaamano:OCPBUGS-11710 into 01f61af on ovn-org:master.

@trozet trozet merged commit 13ce1eb into ovn-org:master Oct 17, 2023
28 of 29 checks passed
@openshift-merge-robot
Copy link

Fix included in accepted release 4.15.0-0.nightly-2023-10-24-230302

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants