New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds port security for external OVS bridge for OVN MAC address #3984
Conversation
@wizhaoredhat not sure I did the DPU stuff right...I need to go double check it. But this affects DPU so tagging you. |
b145a7d
to
030d2d3
Compare
var err error | ||
if portsOutput, stderr, err = util.RunOVSVsctl("--no-heading", "--data=bare", "--format=csv", "--columns", | ||
"name", "list", "interface"); err != nil { | ||
// bridge exists, but could not list ports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove this comment
} | ||
|
||
if len(bridge) == 0 { | ||
// bridge exists but no patch port was found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for tagging me Tim, I think I reviewed the DPU specific part of your changes. Are your changes ready to test? Or is this still WIP?
} | ||
|
||
var bridgeMACAddress net.HardwareAddr | ||
if config.OvnKubeNode.Mode == types.NodeModeDPU { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to gateway.go:bridgeForInterface() this looks correct.
@wizhaoredhat yeah it is ready to test. thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all the flows that match on in_port being the patch port, match on the source mac, just some of them.
Is there a specific reason for this? I understand that some of them might not strictly need it, but would following that simple rule work anyways for all the flows?
I ask this because I don't see an obvious reason and some other person in the future might not see it either and perhaps ends up copy pasting from the wrong flow and missing the mac address match.
Perhaps defining it like
// getOFPortPatchMatch returns a patch port match that also matches on mac address for port security
func getOFPortPatchMatch(ofPortPatch, bridgeMacAddress string) string {
ofPortPatchMatch := fmt.Sprintf("in_port=%s, dl_src=%s", ofPortPatch, bridgeMacAddress)
}
and then using it in all the flows like
dftFlows = append(dftFlows,
fmt.Sprintf("cookie=%s, priority=10, table=0, %s, actions=output:NORMAL",
defaultOpenFlowCookie, getOFPortPatchMatch(ofPortPatch, bridgeMacAddress)))
is a better way to document what is being done.
If I am missing something here, perhaps it is worthwhile to document it in some other way. Or maybe I am missing something obvious here?
LGTM otherwise.
@jcaamano my patch port here is 2. Here are the flows without the dl_src:
Flows with priority 500 are the host -> service flows. That traffic doesnt leave the host so I wasn't worried about it, and there is some funky behavior there with how MACs work so I didn't want to risk a regression. The other 2 flows are drop flows to ensure traffic with source ip of the service cidr or source ip of pod subnets not on this node are dropped. I figured it is better to leave those generic since we want to drop those there. Maybe I should add some large comment at the beginning of the flows function that states if you are adding flows that allow traffic to egress the node, ensure you match on mac address in table 0? |
What about these others: https://github.com/trozet/ovn-kubernetes/blob/030d2d336d54ca242e9efc3d777d6731d14036eb/go-controller/pkg/node/gateway_shared_intf.go#L213
OK. I have the feeling that it would be good to have some logic around the bridge flow tables that might make it less error prone. Like for example, table 0 is just port security, table 1 whatever and so on. Having basic checks distributed through different flows doesn't look good. |
Oops I missed these, they should also include the mac. I guess I dont like the idea of hiding the openflow behind a function because all of the other openflow is explicitly written. So we dont use other generator functions to fill out other parts of the flows. I thought at first about making a port security table, and shifting all the tables by 1, but I wanted to avoid it since folks are used to debugging the current pipeline. I think if i put a large warning comment in the openflow manager code for future developers to see it should be good enough. Is that OK with you? |
OK. We have gone through some options and if that's what you prefer to do I am fine with it. |
030d2d3
to
b629b17
Compare
Do you mean that the bond mac address can change while ovnk is running? Isn't that just when |
@jcaamano I think the short answer is yes, this is what I'm suggesting. I'm not sure if this fail_over_mac is the only time the bond could change macs. I was thinking if it was taken down then brought back up and linux picked the other slave to use as the bond mac...similar to the reboot scenario. Either way we can solve this by just using netlink similar to how we do for IP events, and just detect link down/up events for the bond and check if the mac address changed. I'll work on this. |
@jcaamano I added a commit that will handle detecting if the MAC changes on the bridge. I tested it in kind and it works. PTAL |
a4d21f1
to
2926983
Compare
There can be a case where the MAC address might change of the physical interface attached to OVS. One common way this can happen is with a bond. The MAC address of the bond may change depending on which slave comes up first. When this scenario happens, the host will now be using a new MAC, while the OVN GR has the old MAC. This can cause hosts outside of the cluster to constantly update their IP neighbor table with the wrong MAC for the node, and cause traffic outage. In order to prevent such a scenario, ensure that OVNK does not allow traffic sent by OVN GR that does not match the current MAC address. Signed-off-by: Tim Rozet <trozet@redhat.com>
In the scenario where an OVN GR may have the wrong mac, and traffic disruption is present, the OVNK node process may stall at trying to start the watch factory and wait for informer caches to sync. At this point in time, ovnkube node is not able to start its OpenFlow manager and program the needed flows to block the OVN GR from poisoning external ip neighbor entries. This commit adds a boostrap function which attempts to install basic flows before anything else is done on the node during boot time. This is a temporary flow installation that is only done when there is just a single NORMAL flow in the external bridge and is overriden once OF Manager starts. Note this intentionally does not write the bootstrap flows if OVNK has been killed, but the OVS flows from the previous run remain. It is only for cases where OVS has no previously programmed flows, like on node boot up. Signed-off-by: Tim Rozet <trozet@redhat.com>
While OVNK is running a user could modify system settings so that the MAC of the shared gw bridge may change. If this happens, OVNK needs to detect it and update the node annotation so that the GR can be reconfigured with the right MAC. This commit takes the link manager that was added for egress IP and abstracts it out of the egress IP controller. It also introduces passing a handler function to link manager which gets executed during normal runtime. This function will check to see if the MAC has changed on the shared gw bridge and if so, update the the l3 gateway annotation. Note, the link manager currently does not subscribe to netlink events for links, but instead runs periodically and lists all interfaces. Since changing a MAC address on the bridge is a disruptive action anyway, I figured it is OK to leave our detection on a 30 second interval. MAC changing on the bridge should be an extremely rare occurrence. However, in the future we could add netlink subscriber and receive events when the link changes. Signed-off-by: Tim Rozet <trozet@redhat.com>
2926983
to
1e53aea
Compare
@dcbw PTAL |
Changes-Include: - Move all ofm receiver functions to the proper file. - Ensure proper locking of the bridges. - Use proper encapsulation and expose functions in ofm to get bridge info. Signed-off-by: Tim Rozet <trozet@redhat.com>
Now link manager will react to events for links, rather than just running every so often. Signed-off-by: Tim Rozet <trozet@redhat.com>
4de98f9
to
5f2fef3
Compare
@dcbw @jcaamano fixed up the enapsulation, locking, and added a netlink subscribe for watching links. I tested it locally in kind by issuing 'ovs-vsctl set bridge breth0 other-config:hwaddr="00:15:17:a0:29:88"' on a node, then checking if it immediately updates the annotation, NBDB, and the openflow flows. Maybe I should add an E2E test for this as well in the node ip migration lane? wdyt? |
// Bootstrap flows in OVS if just normal flow is present | ||
if err := bootstrapOVSFlows(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that we do this before ovn-controller adds the patch port to br-ex/breth0. Perhaps worth a comment somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually im confused by your comment here. We would need ovn-controller to have the patch port created so that this could run properly. What am I missing?
case <-linkSyncTimer.C: | ||
if subscribed { | ||
klog.V(5).Info("Link manager calling sync() explicitly") | ||
c.sync() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious as why would we run sync
only if we are subscribed. It would make sense to run it if we can't subscribe first and fore most, wouldn't it?
Something like
if !subscribed {
if subscribed, addrChan, err = subscribe(); err == nil {
continue
}
klog.Errorf("Error during netlink re-subscribe for Link Manager: %v", err)
}
c.sync()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from address manager. I think the logic is like this because subscribe() also will call sync().
nc.gateway.SetDefaultGatewayBridgeMAC(link.Attrs().HardwareAddr) | ||
if err := nc.gateway.Reconcile(); err != nil { | ||
return fmt.Errorf("failed to reconcile gateway for MAC address update: %w", err) | ||
} | ||
nodeAnnotator := kube.NewNodeAnnotator(nc.Kube, node.Name) | ||
l3gwConf.MACAddress = link.Attrs().HardwareAddr | ||
if err := util.SetL3GatewayConfig(nodeAnnotator, l3gwConf); err != nil { | ||
return fmt.Errorf("failed to update L3 gateway config annotation for node: %s, error: %w", node.Name, err) | ||
} | ||
if err := nodeAnnotator.Run(); err != nil { | ||
return fmt.Errorf("failed to set node %s annotations: %w", nc.name, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do the node port watcher flows get updated? is that through the annotations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! I missed this. I'll try to figure out the easiest way to trigger this update.
I think that if
The test might miss enabling optional functionality that uses flows that need to be checked, but at least we would have this test as a starting point. |
Signed-off-by: Tim Rozet <trozet@redhat.com>
Added an additional test that will check to make sure the openflow flows for services are updated correctly when the IP address changes. Signed-off-by: Tim Rozet <trozet@redhat.com>
going to add another e2e test to check flows for mac changes tmrw. |
Signed-off-by: Tim Rozet <trozet@redhat.com>
// Services create OpenFlow flows as well, need to update them all | ||
if gw.servicesRetryFramework != nil { | ||
if errs := gw.addAllServices(); errs != nil { | ||
err := kerrors2.NewAggregate(errs) | ||
klog.Errorf("Failed to sync all services after node IP change: %v", err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I need to be excused from being confused :P
This was other problem (not updating service flows because of IP changes) that we also had? Do we also need a test for this?
Would it make sense to simplify things further and just call gw.Reconcile()
and have everything needed there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right this is a pre-existing problem we had where service flows were not updated when node ip migration happend. @pliurh @zshi-redhat FYI
I updated the node ip migration test to check for this in this commit:
32490b3
I think it makes sense to consolidate everything to Reconcile. Could I do that in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the node ip migration test to check for this in this commit: 32490b3
oh I missed that one
Fix included in accepted release 4.16.0-0.nightly-2024-01-09-085011 |
If the MAC changes on the interface used in OVS and the OVS bridge, OVN may still retain the old MAC on its GR. This can cause OVN and the host to both respond with different MACs for the same IP address, leading to network disruption. A common scenario for this is a server with a bond. When the node is rebooted, the bond MAC may change depending on which slave comes up first. When this happens, ovnkube-node starts up and is unable to sync its informers and proceed to full start up, due to disruption with kapi.
Note, this can be avoided by setting the MAC specifically for the bond to always be the mac of the first slave.
This PR adds port security in a sense to the breth0/br-ex bridge. It drops any traffic sent from the OVN patch port that has a MAC other than the one we expect by modifying the flows that OFManager installs.
Additionally, at start up there can be only a normal flow on the switch. This is common on node reboot. When this happens in the scenario above, ovnkube-node cannot get to the part of the code that starts the OFManager, because it is blocked waiting for the informers to sync. For this purpose, a new bootstrap function is added which will install these basic security flows if only a single flow exists in the bridge at start up.
I looked at moving OFManager to start earlier, but it is tightly coupled with gateway initialization and kapi objects, so rather than introducing a new regression there I created a simple bootstrap function. The bootstrap flows are wiped when OFManager starts and syncs its new flows (which include the security flows anyway).
How to test: