Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds port security for external OVS bridge for OVN MAC address #3984

Merged
merged 8 commits into from Dec 21, 2023

Conversation

trozet
Copy link
Contributor

@trozet trozet commented Oct 27, 2023

If the MAC changes on the interface used in OVS and the OVS bridge, OVN may still retain the old MAC on its GR. This can cause OVN and the host to both respond with different MACs for the same IP address, leading to network disruption. A common scenario for this is a server with a bond. When the node is rebooted, the bond MAC may change depending on which slave comes up first. When this happens, ovnkube-node starts up and is unable to sync its informers and proceed to full start up, due to disruption with kapi.

Note, this can be avoided by setting the MAC specifically for the bond to always be the mac of the first slave.

This PR adds port security in a sense to the breth0/br-ex bridge. It drops any traffic sent from the OVN patch port that has a MAC other than the one we expect by modifying the flows that OFManager installs.

Additionally, at start up there can be only a normal flow on the switch. This is common on node reboot. When this happens in the scenario above, ovnkube-node cannot get to the part of the code that starts the OFManager, because it is blocked waiting for the informers to sync. For this purpose, a new bootstrap function is added which will install these basic security flows if only a single flow exists in the bridge at start up.

I looked at moving OFManager to start earlier, but it is tightly coupled with gateway initialization and kapi objects, so rather than introducing a new regression there I created a simple bootstrap function. The bootstrap flows are wiped when OFManager starts and syncs its new flows (which include the security flows anyway).

How to test:

  1. Launch kind
  2. docker exec -it ovn-worker /bin/bash
  3. crictl exec -it /bin/bash
  4. Inside ovs, create flows.txt with the content: priority=0, table=0, actions=output:NORMAL
  5. ovs-ofctl -O openflow13 replace-flows breth0 flows.txt
  6. quickly crictl stop the ovnkube-node pod
  7. ovs-ofctl -O openflow13 replace-flows breth0 flows.txt
  8. while true; do echo "##########" >> output.txt && ovs-ofctl -O openflow13 dump-flows breth0 >> output.txt && sleep .5; done
  9. check output.txt to see that the bootstrap detected the normal flow correctly and configured the right flows:
##########
OFPST_FLOW reply (OF1.3) (xid=0x2): 
 cookie=0x0, duration=686.264s, table=0, n_packets=2142, n_bytes=941544, priority=0 actions=NORMAL
##########
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0xdeff105, duration=0.203s, table=0, n_packets=0, n_bytes=0, priority=10,in_port=2,dl_src=02:42:ac:12:00:03 actions=NORMAL
 cookie=0xdeff105, duration=0.203s, table=0, n_packets=0, n_bytes=0, priority=9,in_port=2 actions=drop
 cookie=0x0, duration=686.768s, table=0, n_packets=2167, n_bytes=944776, priority=0 actions=NORMAL
##########
OFPST_FLOW reply (OF1.3) (xid=0x2): 
 cookie=0xdeff105, duration=0.429s, table=0, n_packets=0, n_bytes=0, priority=500,ip,in_port=2,nw_src=172.18.0.3,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=172.18.0.3))
 cookie=0xdeff105, duration=0.429s, table=0, n_packets=0, n_bytes=0, priority=500,ip,in_port=LOCAL,nw_dst=169.254.169.1 actions=ct(table=5,zone=64002,nat)
 cookie=0xdeff105, duration=0.429s, table=0, n_packets=0, n_bytes=0, priority=500,ip,in_port=LOCAL,nw_dst=10.96.0.0/16 actions=ct(commit,table=2,zone=64001,nat(src=169.254.169.2))
 cookie=0xdeff105, duration=0.429s, table=0, n_packets=0, n_bytes=0, priority=105,ip,in_port=2,nw_dst=10.96.0.0/16 actions=drop
 cookie=0xdeff105, duration=0.429s, table=0, n_packets=0, n_bytes=0, priority=500,ip,in_port=2,nw_src=10.96.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=0.429s, table=0, n_packets=0, n_bytes=0, priority=205,udp,in_port=1,dl_dst=02:42:ac:12:00:03,tp_dst=6081 actions=LOCAL
 cookie=0xdeff105, duration=0.429s, table=0, n_packets=0, n_bytes=0, priority=200,udp,in_port=1,tp_dst=6081 actions=NORMAL
 cookie=0xdeff105, duration=0.429s, table=0, n_packets=0, n_bytes=0, priority=200,udp,in_port=LOCAL,tp_dst=6081 actions=output:1

@trozet
Copy link
Contributor Author

trozet commented Oct 27, 2023

@wizhaoredhat not sure I did the DPU stuff right...I need to go double check it. But this affects DPU so tagging you.

@coveralls
Copy link

coveralls commented Oct 27, 2023

Coverage Status

coverage: 50.678% (-0.1%) from 50.803%
when pulling 6ed5d48 on trozet:fix_mac_changing
into a970b0c on ovn-org:master.

var err error
if portsOutput, stderr, err = util.RunOVSVsctl("--no-heading", "--data=bare", "--format=csv", "--columns",
"name", "list", "interface"); err != nil {
// bridge exists, but could not list ports
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove this comment

}

if len(bridge) == 0 {
// bridge exists but no patch port was found
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove this comment

Copy link
Contributor

@wizhaoredhat wizhaoredhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tagging me Tim, I think I reviewed the DPU specific part of your changes. Are your changes ready to test? Or is this still WIP?

}

var bridgeMACAddress net.HardwareAddr
if config.OvnKubeNode.Mode == types.NodeModeDPU {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to gateway.go:bridgeForInterface() this looks correct.

@trozet
Copy link
Contributor Author

trozet commented Oct 31, 2023

Thanks for tagging me Tim, I think I reviewed the DPU specific part of your changes. Are your changes ready to test? Or is this still WIP?

@wizhaoredhat yeah it is ready to test. thanks in advance!

Copy link
Contributor

@jcaamano jcaamano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all the flows that match on in_port being the patch port, match on the source mac, just some of them.

Is there a specific reason for this? I understand that some of them might not strictly need it, but would following that simple rule work anyways for all the flows?

I ask this because I don't see an obvious reason and some other person in the future might not see it either and perhaps ends up copy pasting from the wrong flow and missing the mac address match.

Perhaps defining it like

// getOFPortPatchMatch returns a patch port match that also matches on mac address for port security
func getOFPortPatchMatch(ofPortPatch, bridgeMacAddress string) string {
   ofPortPatchMatch := fmt.Sprintf("in_port=%s, dl_src=%s", ofPortPatch, bridgeMacAddress)
}

and then using it in all the flows like

dftFlows = append(dftFlows,
		fmt.Sprintf("cookie=%s, priority=10, table=0, %s, actions=output:NORMAL",
			defaultOpenFlowCookie, getOFPortPatchMatch(ofPortPatch, bridgeMacAddress)))

is a better way to document what is being done.

If I am missing something here, perhaps it is worthwhile to document it in some other way. Or maybe I am missing something obvious here?

LGTM otherwise.

@trozet
Copy link
Contributor Author

trozet commented Nov 2, 2023

@jcaamano my patch port here is 2. Here are the flows without the dl_src:

[root@ovn-worker2 ~]# ovs-ofctl dump-flows breth0 | egrep 'in_port=2' | grep -v dl_src
 cookie=0xdeff105, duration=532.083s, table=0, n_packets=0, n_bytes=0, idle_age=532, priority=500,ip,in_port=2,nw_src=172.18.0.3,nw_dst=169.254.169.2 actions=ct(commit,table=4,zone=64001,nat(dst=172.18.0.3))
 cookie=0xdeff105, duration=532.083s, table=0, n_packets=0, n_bytes=0, idle_age=532, priority=105,ip,in_port=2,nw_dst=10.96.0.0/16 actions=drop
 cookie=0xdeff105, duration=532.083s, table=0, n_packets=0, n_bytes=0, idle_age=532, priority=500,ip,in_port=2,nw_src=10.96.0.0/16,nw_dst=169.254.169.2 actions=ct(table=3,zone=64001,nat)
 cookie=0xdeff105, duration=532.083s, table=0, n_packets=0, n_bytes=0, idle_age=532, priority=104,ip,in_port=2,nw_src=10.244.0.0/16 actions=drop

Flows with priority 500 are the host -> service flows. That traffic doesnt leave the host so I wasn't worried about it, and there is some funky behavior there with how MACs work so I didn't want to risk a regression. The other 2 flows are drop flows to ensure traffic with source ip of the service cidr or source ip of pod subnets not on this node are dropped. I figured it is better to leave those generic since we want to drop those there.

Maybe I should add some large comment at the beginning of the flows function that states if you are adding flows that allow traffic to egress the node, ensure you match on mac address in table 0?

@jcaamano
Copy link
Contributor

jcaamano commented Nov 2, 2023

Flows with priority 500 are the host -> service flows. That traffic doesnt leave the host so I wasn't worried about it, and there is some funky behavior there with how MACs work so I didn't want to risk a regression. The other 2 flows are drop flows to ensure traffic with source ip of the service cidr or source ip of pod subnets not on this node are dropped. I figured it is better to leave those generic since we want to drop those there.

What about these others:

https://github.com/trozet/ovn-kubernetes/blob/030d2d336d54ca242e9efc3d777d6731d14036eb/go-controller/pkg/node/gateway_shared_intf.go#L213
https://github.com/trozet/ovn-kubernetes/blob/030d2d336d54ca242e9efc3d777d6731d14036eb/go-controller/pkg/node/gateway_shared_intf.go#L329

Maybe I should add some large comment at the beginning of the flows function that states if you are adding flows that allow traffic to egress the node, ensure you match on mac address in table 0?

OK. I have the feeling that it would be good to have some logic around the bridge flow tables that might make it less error prone. Like for example, table 0 is just port security, table 1 whatever and so on. Having basic checks distributed through different flows doesn't look good.

@trozet
Copy link
Contributor Author

trozet commented Nov 3, 2023

What about these others:

https://github.com/trozet/ovn-kubernetes/blob/030d2d336d54ca242e9efc3d777d6731d14036eb/go-controller/pkg/node/gateway_shared_intf.go#L213 https://github.com/trozet/ovn-kubernetes/blob/030d2d336d54ca242e9efc3d777d6731d14036eb/go-controller/pkg/node/gateway_shared_intf.go#L329

Oops I missed these, they should also include the mac. I guess I dont like the idea of hiding the openflow behind a function because all of the other openflow is explicitly written. So we dont use other generator functions to fill out other parts of the flows. I thought at first about making a port security table, and shifting all the tables by 1, but I wanted to avoid it since folks are used to debugging the current pipeline. I think if i put a large warning comment in the openflow manager code for future developers to see it should be good enough. Is that OK with you?

@jcaamano
Copy link
Contributor

jcaamano commented Nov 6, 2023

OK. We have gone through some options and if that's what you prefer to do I am fine with it.

@trozet
Copy link
Contributor Author

trozet commented Nov 13, 2023

@dcbw pointed out that the bond may change while ovnk is running. I need to add some additional logic to detect this case @jcaamano

@jcaamano
Copy link
Contributor

jcaamano commented Nov 13, 2023

@dcbw pointed out that the bond may change while ovnk is running. I need to add some additional logic to detect this case @jcaamano

Do you mean that the bond mac address can change while ovnk is running? Isn't that just when fail_over_mac is active, something that we haven't been supporting? So you want to support that now or are we talking about something else? I am not aware on what other situations the bond mac address can change at runtime.

@trozet
Copy link
Contributor Author

trozet commented Nov 28, 2023

@jcaamano I think the short answer is yes, this is what I'm suggesting. I'm not sure if this fail_over_mac is the only time the bond could change macs. I was thinking if it was taken down then brought back up and linux picked the other slave to use as the bond mac...similar to the reboot scenario. Either way we can solve this by just using netlink similar to how we do for IP events, and just detect link down/up events for the bond and check if the mac address changed. I'll work on this.

@trozet
Copy link
Contributor Author

trozet commented Dec 1, 2023

@jcaamano I added a commit that will handle detecting if the MAC changes on the bridge. I tested it in kind and it works. PTAL

There can be a case where the MAC address might change of the physical
interface attached to OVS. One common way this can happen is with a
bond. The MAC address of the bond may change depending on which slave
comes up first.

When this scenario happens, the host will now be using a new MAC, while
the OVN GR has the old MAC. This can cause hosts outside of the cluster
to constantly update their IP neighbor table with the wrong MAC for the
node, and cause traffic outage.

In order to prevent such a scenario, ensure that OVNK does not allow
traffic sent by OVN GR that does not match the current MAC address.

Signed-off-by: Tim Rozet <trozet@redhat.com>
In the scenario where an OVN GR may have the wrong mac, and traffic
disruption is present, the OVNK node process may stall at trying to
start the watch factory and wait for informer caches to sync. At this
point in time, ovnkube node is not able to start its OpenFlow manager
and program the needed flows to block the OVN GR from poisoning external
ip neighbor entries.

This commit adds a boostrap function which attempts to install basic
flows before anything else is done on the node during boot time. This is
a temporary flow installation that is only done when there is just a
single NORMAL flow in the external bridge and is overriden once OF
Manager starts.

Note this intentionally does not write the bootstrap flows if OVNK has
been killed, but the OVS flows from the previous run remain. It is only
for cases where OVS has no previously programmed flows, like on node
boot up.

Signed-off-by: Tim Rozet <trozet@redhat.com>
While OVNK is running a user could modify system settings so that the
MAC of the shared gw bridge may change. If this happens, OVNK needs to
detect it and update the node annotation so that the GR can be
reconfigured with the right MAC.

This commit takes the link manager that was added for egress IP and
abstracts it out of the egress IP controller. It also introduces passing
a handler function to link manager which gets executed during normal
runtime. This function will check to see if the MAC has changed on the
shared gw bridge and if so, update the the l3 gateway annotation.

Note, the link manager currently does not subscribe to netlink events
for links, but instead runs periodically and lists all interfaces. Since
changing a MAC address on the bridge is a disruptive action anyway, I
figured it is OK to leave our detection on a 30 second interval. MAC
changing on the bridge should be an extremely rare occurrence. However,
in the future we could add netlink subscriber and receive events when
the link changes.

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet
Copy link
Contributor Author

trozet commented Dec 5, 2023

@dcbw PTAL

Changes-Include:
 - Move all ofm receiver functions to the proper file.
 - Ensure proper locking of the bridges.
 - Use proper encapsulation and expose functions in ofm to get
   bridge info.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Now link manager will react to events for links, rather than just
running every so often.

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet
Copy link
Contributor Author

trozet commented Dec 18, 2023

@dcbw @jcaamano fixed up the enapsulation, locking, and added a netlink subscribe for watching links. I tested it locally in kind by issuing 'ovs-vsctl set bridge breth0 other-config:hwaddr="00:15:17:a0:29:88"' on a node, then checking if it immediately updates the annotation, NBDB, and the openflow flows. Maybe I should add an E2E test for this as well in the node ip migration lane? wdyt?

Comment on lines +702 to +703
// Bootstrap flows in OVS if just normal flow is present
if err := bootstrapOVSFlows(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that we do this before ovn-controller adds the patch port to br-ex/breth0. Perhaps worth a comment somewhere.

Copy link
Contributor Author

@trozet trozet Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually im confused by your comment here. We would need ovn-controller to have the patch port created so that this could run properly. What am I missing?

case <-linkSyncTimer.C:
if subscribed {
klog.V(5).Info("Link manager calling sync() explicitly")
c.sync()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious as why would we run sync only if we are subscribed. It would make sense to run it if we can't subscribe first and fore most, wouldn't it?
Something like

if !subscribed {
    if subscribed, addrChan, err = subscribe(); err == nil {
        continue
    }
    klog.Errorf("Error during netlink re-subscribe for Link Manager: %v", err)
}
c.sync()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from address manager. I think the logic is like this because subscribe() also will call sync().

Comment on lines +557 to +568
nc.gateway.SetDefaultGatewayBridgeMAC(link.Attrs().HardwareAddr)
if err := nc.gateway.Reconcile(); err != nil {
return fmt.Errorf("failed to reconcile gateway for MAC address update: %w", err)
}
nodeAnnotator := kube.NewNodeAnnotator(nc.Kube, node.Name)
l3gwConf.MACAddress = link.Attrs().HardwareAddr
if err := util.SetL3GatewayConfig(nodeAnnotator, l3gwConf); err != nil {
return fmt.Errorf("failed to update L3 gateway config annotation for node: %s, error: %w", node.Name, err)
}
if err := nodeAnnotator.Run(); err != nil {
return fmt.Errorf("failed to set node %s annotations: %w", nc.name, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do the node port watcher flows get updated? is that through the annotations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch! I missed this. I'll try to figure out the easiest way to trigger this update.

@jcaamano
Copy link
Contributor

jcaamano commented Dec 19, 2023

@dcbw @jcaamano fixed up the enapsulation, locking, and added a netlink subscribe for watching links. I tested it locally in kind by issuing 'ovs-vsctl set bridge breth0 other-config:hwaddr="00:15:17:a0:29:88"' on a node, then checking if it immediately updates the annotation, NBDB, and the openflow flows. Maybe I should add an E2E test for this as well in the node ip migration lane? wdyt?

I think that if

  • we always match the source mac address when we match the patch port inport (unsure if there is a specific reason to not do this for all flows)
  • then we can have a reasonable e2e test that triggers the mac address change and checks that there are no flows with the incorrect mac address or no mac address

The test might miss enabling optional functionality that uses flows that need to be checked, but at least we would have this test as a starting point.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Added an additional test that will check to make sure the openflow flows
for services are updated correctly when the IP address changes.

Signed-off-by: Tim Rozet <trozet@redhat.com>
@trozet
Copy link
Contributor Author

trozet commented Dec 20, 2023

going to add another e2e test to check flows for mac changes tmrw.

Signed-off-by: Tim Rozet <trozet@redhat.com>
Comment on lines +113 to +119
// Services create OpenFlow flows as well, need to update them all
if gw.servicesRetryFramework != nil {
if errs := gw.addAllServices(); errs != nil {
err := kerrors2.NewAggregate(errs)
klog.Errorf("Failed to sync all services after node IP change: %v", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I need to be excused from being confused :P

This was other problem (not updating service flows because of IP changes) that we also had? Do we also need a test for this?

Would it make sense to simplify things further and just call gw.Reconcile() and have everything needed there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right this is a pre-existing problem we had where service flows were not updated when node ip migration happend. @pliurh @zshi-redhat FYI

I updated the node ip migration test to check for this in this commit:
32490b3

I think it makes sense to consolidate everything to Reconcile. Could I do that in a follow up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the node ip migration test to check for this in this commit: 32490b3

oh I missed that one

@trozet trozet merged commit 0bfb6bd into ovn-org:master Dec 21, 2023
29 checks passed
@openshift-merge-robot
Copy link

Fix included in accepted release 4.16.0-0.nightly-2024-01-09-085011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants