From 8c341b9d704cdf002126699527308203319954f0 Mon Sep 17 00:00:00 2001 From: Han Zhou Date: Mon, 19 Dec 2022 11:41:58 -0800 Subject: [PATCH] northd: Drop packets destined to router owned NAT IP for DGP. When a packet enters LR pipeline from a distributed gateway port with destination IP being a SNAT IP, it goes through the unSNAT stage and it is possible that the unSNAT fails to convert the dst IP when no conntrack entries are accociated with the packet. In this case, the packet is rerouted to the same DGP, and results in recirc loop in datapath. The packet would finally be dropped either due to ttl or the recirc limit, but it would have created unnecessary cost. To reproduce the problem, simply configure SNAT on a LR with the SNAT IP being the DGP's IP, and then send a packet from external (DGP's LS) to the SNAT IP. Kernel logs like below will be seen: openvswitch: ovs-system: deferred action limit reached, drop recirc action DP flow dump would also show plenty of flows related to this packet, each with a different ttl match, indicating the packet has been looped many times. Commit 802f9275ac (ovn-northd: Drop IP packets destined to router owned IPs (after NAT)) already added flows to drop packets failed unSNAT for Gateway Routers. It added flows with a low priority (2) to drop the packets that fail ARP resolve, to avoid triggering ARP request for the SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve flows are added for thoses NAT IPs so that the packets can continue the pipeline and possibly redirect to redirect chassis. So, because of these ARP resolve flows, even the packets failed unSNAT would continue the pipeline and won't hit the low priority (2) flows, thus not get dropped. To fix the problem, for each of the ARP resolve flow added for the DGP NAT IPs, a higher priority (150) flow is added to check if the packet's inport is the DGP (same as the outport), then drop the packet directly. Test cases are updated to cover both Gateway Router and DGP scenarios, with packets from both directions (uplink and downlink). Reported-by: Krzysztof Klimonda Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/ Reported-by: Frode Nordahl Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718 Reported-by: Roberto Bartzen Acosta Reported-at: https://github.com/ovn-org/ovn/issues/134 Reported-by: Syed Ammad Ali Reported-at: https://github.com/ovn-org/ovn/issues/153 Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html Signed-off-by: Han Zhou Acked-by: Dumitru Ceara --- northd/northd.c | 17 ++++++++ northd/ovn-northd.8.xml | 29 +++++++++---- tests/ovn.at | 92 ++++++++++++++++++++++++++++++++++------- 3 files changed, 116 insertions(+), 22 deletions(-) diff --git a/northd/northd.c b/northd/northd.c index bc5eebbab8..40a302579a 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -14361,6 +14361,23 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath *od, struct hmap *lflows, sset_add(&nat_entries, nat->external_ip); } else { if (!sset_contains(&nat_entries, nat->external_ip)) { + /* Drop packets coming in from external that still has + * destination IP equals to the NAT external IP, to avoid loop. + * The packets must have gone through DNAT/unSNAT stage but + * failed to convert the destination. */ + ds_clear(match); + ds_put_format( + match, "inport == %s && outport == %s && ip%s.dst == %s", + l3dgw_port->json_key, l3dgw_port->json_key, + is_v6 ? "6" : "4", nat->external_ip); + ovn_lflow_add_with_hint(lflows, od, + S_ROUTER_IN_ARP_RESOLVE, + 150, ds_cstr(match), + debug_drop_action(), + &nat->header_); + /* Now for packets coming from other (downlink) LRPs, allow ARP + * resolve for the NAT IP, so that such packets can be + * forwarded for E/W NAT. */ ds_clear(match); ds_put_format( match, "outport == %s && %s == %s", diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index d8e57b7f07..c34486057b 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -4262,13 +4262,17 @@ outport = P For each row in the NAT table with IPv4 address A in the column of - table, a priority-100 - flow with the match outport === P && - reg0 == A has actions eth.dst = E; - next;, where P is the distributed logical router - port, E is the Ethernet address if set in the - column - of table for of type + table, below two flows are + programmed: +

+ +

+ A priority-100 flow with the match outport == P + && reg0 == A has actions eth.dst = + E; next;, where P is the distributed + logical router port, E is the Ethernet address if set in + the + column of table for of type dnat_and_snat, otherwise the Ethernet address of the distributed logical router port. Note that if the is not @@ -4278,9 +4282,18 @@ outport = P will be added.

+

+ Corresponding to the above flow, a priority-150 flow with the match + inport == P && outport == P + && ip4.dst == A has actions + drop; to exclude packets that have gone through + DNAT/unSNAT stage but failed to convert the destination, to avoid + loop. +

+

For IPv6 NAT entries, same flows are added, but using the register - xxreg0 for the match. + xxreg0 and field ip6 for the match.

diff --git a/tests/ovn.at b/tests/ovn.at index 2e15656537..e9b8bc677a 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -28435,24 +28435,39 @@ wait_row_count Port_Binding 1 logical_port=lsp-cont1 chassis=$ch OVN_CLEANUP([hv1]) AT_CLEANUP +# TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS [ DGP | GR ] # Test dropping traffic destined to router owned IPs. -OVN_FOR_EACH_NORTHD([ -AT_SETUP([gateway router drop traffic for own IPs]) +m4_define([TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS], [ ovn_start -ovn-nbctl lr-add r1 -- set logical_router r1 options:chassis=hv1 -ovn-nbctl ls-add s1 +ovn-nbctl lr-add r1 # Gateway router or LR with DGP on the ext side +ovn-nbctl ls-add ext # simulate external LS +ovn-nbctl ls-add s2 # simulate internal LS -# Connnect r1 to s1. -ovn-nbctl lrp-add r1 lrp-r1-s1 00:00:00:00:01:01 10.0.1.1/24 -ovn-nbctl lsp-add s1 lsp-s1-r1 -- set Logical_Switch_Port lsp-s1-r1 type=router \ - options:router-port=lrp-r1-s1 addresses=router +# Connnect r1 to ext. +ovn-nbctl lrp-add r1 lrp-r1-ext 00:00:00:00:01:01 10.0.1.1/24 +if test X"$1" = X"DGP"; then + ovn-nbctl lrp-set-gateway-chassis lrp-r1-ext hv1 1 +else + ovn-nbctl set logical_router r1 options:chassis=hv1 +fi +ovn-nbctl lsp-add ext lsp-ext-r1 -- set Logical_Switch_Port lsp-ext-r1 type=router \ + options:router-port=lrp-r1-ext addresses=router -# Create logical port p1 in s1 -ovn-nbctl lsp-add s1 p1 \ +# Connnect r1 to s2. +ovn-nbctl lrp-add r1 lrp-r1-s2 00:00:00:00:02:01 10.0.2.1/24 +ovn-nbctl lsp-add s2 lsp-s2-r1 -- set Logical_Switch_Port lsp-s2-r1 type=router \ + options:router-port=lrp-r1-s2 addresses=router + +# Create logical port p1 in ext +ovn-nbctl lsp-add ext p1 \ -- lsp-set-addresses p1 "f0:00:00:00:01:02 10.0.1.2" \ -- lsp-set-port-security p1 "f0:00:00:00:01:02 10.0.1.2" +# Create logical port p2 in s2 +ovn-nbctl lsp-add s2 p2 \ +-- lsp-set-addresses p2 "f0:00:00:00:02:02 10.0.2.2" + # Create two hypervisor and create OVS ports corresponding to logical ports. net_add n1 @@ -28466,6 +28481,12 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \ options:rxq_pcap=hv1/vif1-rx.pcap \ ofport-request=1 +ovs-vsctl -- add-port br-int hv1-vif2 -- \ + set interface hv1-vif2 external-ids:iface-id=p2 \ + options:tx_pcap=hv1/vif2-tx.pcap \ + options:rxq_pcap=hv1/vif2-rx.pcap \ + ofport-request=2 + # Pre-populate the hypervisors' ARP tables so that we don't lose any # packets for ARP resolution (native tunneling doesn't queue packets # for ARP resolution). @@ -28476,9 +28497,10 @@ ovn-nbctl --wait=hv sync sw_key=$(ovn-sbctl --bare --columns tunnel_key list datapath_binding r1) +echo sw_key: $sw_key AT_CHECK([ovn-sbctl lflow-list | grep lr_in_arp_resolve | grep 10.0.1.1], [1], []) -# Send ip packets from p1 to lrp-r1-s1 +# Send ip packets from p1 to lrp-r1-ext src_mac="f00000000102" dst_mac="000000000101" src_ip=`ip_to_hex 10 0 1 2` @@ -28497,10 +28519,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=11, n_packets=1,.* ]) # Use the router IP as SNAT IP. -ovn-nbctl set logical_router r1 options:lb_force_snat_ip=10.0.1.1 +ovn-nbctl lr-nat-add r1 snat 10.0.1.1 10.8.8.0/24 ovn-nbctl --wait=hv sync -# Send ip packets from p1 to lrp-r1-s1 +# Send ip packets from p1 to lrp-r1-ext src_mac="f00000000102" dst_mac="000000000101" src_ip=`ip_to_hex 10 0 1 2` @@ -28515,11 +28537,53 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep ]) # The packet should've been dropped in the lr_in_arp_resolve stage. -AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=2,ip,metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl +if test X"$1" = X"DGP"; then + prio=150 + inport=reg14 + outport=reg15 +else + prio=2 +fi +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl 1 ]) +# Send ip packets from p2 to lrp-r1-ext +src_mac="f00000000202" +dst_mac="000000000201" +src_ip=`ip_to_hex 10 0 2 2` +dst_ip=`ip_to_hex 10 0 1 1` +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000 +as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet + +# Still no packet-ins should reach ovn-controller. +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c], [1], [dnl +0 +]) + +if test X"$1" = X"DGP"; then + # The packet dst should be resolved once for E/W centralized NAT purpose. + AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=100,reg0=0xa000101,reg15=.*metadata=0x${sw_key} actions=mod_dl_dst:00:00:00:00:01:01,resubmit" -c], [0], [dnl +1 +]) +fi + +# The packet should've been finally dropped in the lr_in_arp_resolve stage. +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=2,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl +1 +]) OVN_CLEANUP([hv1]) +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([gateway router drop traffic for own IPs]) +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(GR) +AT_CLEANUP +]) + +OVN_FOR_EACH_NORTHD([ +AT_SETUP([distributed gateway port drop traffic for own IPs]) +TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(DGP) AT_CLEANUP ])