Skip to content

Commit

Permalink
northd: Drop packets destined to router owned NAT IP for DGP.
Browse files Browse the repository at this point in the history
When a packet enters LR pipeline from a distributed gateway port with
destination IP being a SNAT IP, it goes through the unSNAT stage and
it is possible that the unSNAT fails to convert the dst IP when no
conntrack entries are accociated with the packet. In this case, the
packet is rerouted to the same DGP, and results in recirc loop in
datapath. The packet would finally be dropped either due to ttl or
the recirc limit, but it would have created unnecessary cost.

To reproduce the problem, simply configure SNAT on a LR with the SNAT IP
being the DGP's IP, and then send a packet from external (DGP's LS) to
the SNAT IP. Kernel logs like below will be seen:

openvswitch: ovs-system: deferred action limit reached, drop recirc action

DP flow dump would also show plenty of flows related to this packet,
each with a different ttl match, indicating the packet has been looped
many times.

Commit 802f927 (ovn-northd: Drop IP packets destined to router owned
IPs (after NAT)) already added flows to drop packets failed unSNAT for
Gateway Routers. It added flows with a low priority (2) to drop the
packets that fail ARP resolve, to avoid triggering ARP request for the
SNAT IPs. However, for the DGP case, to support E/W NAT, ARP resolve
flows are added for thoses NAT IPs so that the packets can continue the
pipeline and possibly redirect to redirect chassis. So, because of these
ARP resolve flows, even the packets failed unSNAT would continue the
pipeline and won't hit the low priority (2) flows, thus not get dropped.

To fix the problem, for each of the ARP resolve flow added for the DGP
NAT IPs, a higher priority (150) flow is added to check if the packet's
inport is the DGP (same as the outport), then drop the packet directly.

Test cases are updated to cover both Gateway Router and DGP scenarios,
with packets from both directions (uplink and downlink).

Reported-by: Krzysztof Klimonda <kklimonda@syntaxhighlighted.com>
Reported-at: https://patchwork.ozlabs.org/project/ovn/patch/20210816085206.69170-1-kklimonda@syntaxhighlighted.com/
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Reported-at: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967718
Reported-by: Roberto Bartzen Acosta <rbartzen@gmail.com>
Reported-at: #134
Reported-by: Syed Ammad Ali <syedammad83@gmail.com>
Reported-at: #153
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2021-August/051340.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
Acked-by: Dumitru Ceara <dceara@redhat.com>
  • Loading branch information
hzhou8 committed Jan 19, 2023
1 parent eee79e5 commit 8c341b9
Show file tree
Hide file tree
Showing 3 changed files with 116 additions and 22 deletions.
17 changes: 17 additions & 0 deletions northd/northd.c
Expand Up @@ -14361,6 +14361,23 @@ build_lrouter_nat_defrag_and_lb(struct ovn_datapath *od, struct hmap *lflows,
sset_add(&nat_entries, nat->external_ip);
} else {
if (!sset_contains(&nat_entries, nat->external_ip)) {
/* Drop packets coming in from external that still has
* destination IP equals to the NAT external IP, to avoid loop.
* The packets must have gone through DNAT/unSNAT stage but
* failed to convert the destination. */
ds_clear(match);
ds_put_format(
match, "inport == %s && outport == %s && ip%s.dst == %s",
l3dgw_port->json_key, l3dgw_port->json_key,
is_v6 ? "6" : "4", nat->external_ip);
ovn_lflow_add_with_hint(lflows, od,
S_ROUTER_IN_ARP_RESOLVE,
150, ds_cstr(match),
debug_drop_action(),
&nat->header_);
/* Now for packets coming from other (downlink) LRPs, allow ARP
* resolve for the NAT IP, so that such packets can be
* forwarded for E/W NAT. */
ds_clear(match);
ds_put_format(
match, "outport == %s && %s == %s",
Expand Down
29 changes: 21 additions & 8 deletions northd/ovn-northd.8.xml
Expand Up @@ -4262,13 +4262,17 @@ outport = <var>P</var>
For each row in the <code>NAT</code> table with IPv4 address
<var>A</var> in the <ref column="external_ip"
table="NAT" db="OVN_Northbound"/> column of
<ref table="NAT" db="OVN_Northbound"/> table, a priority-100
flow with the match <code>outport === <var>P</var> &amp;&amp;
reg0 == <var>A</var></code> has actions <code>eth.dst = <var>E</var>;
next;</code>, where <code>P</code> is the distributed logical router
port, <var>E</var> is the Ethernet address if set in the
<ref column="external_mac" table="NAT" db="OVN_Northbound"/> column
of <ref table="NAT" db="OVN_Northbound"/> table for of type
<ref table="NAT" db="OVN_Northbound"/> table, below two flows are
programmed:
</p>

<p>
A priority-100 flow with the match <code>outport == <var>P</var>
&amp;&amp; reg0 == <var>A</var></code> has actions <code>eth.dst =
<var>E</var>; next;</code>, where <code>P</code> is the distributed
logical router port, <var>E</var> is the Ethernet address if set in
the <ref column="external_mac" table="NAT" db="OVN_Northbound"/>
column of <ref table="NAT" db="OVN_Northbound"/> table for of type
<code>dnat_and_snat</code>, otherwise the Ethernet address of the
distributed logical router port. Note that if the
<ref column="external_ip" table="NAT" db="OVN_Northbound"/> is not
Expand All @@ -4278,9 +4282,18 @@ outport = <var>P</var>
will be added.
</p>

<p>
Corresponding to the above flow, a priority-150 flow with the match
<code>inport == <var>P</var> &amp;&amp; outport == <var>P</var>
&amp;&amp; ip4.dst == <var>A</var></code> has actions
<code>drop;</code> to exclude packets that have gone through
DNAT/unSNAT stage but failed to convert the destination, to avoid
loop.
</p>

<p>
For IPv6 NAT entries, same flows are added, but using the register
<code>xxreg0</code> for the match.
<code>xxreg0</code> and field <code>ip6</code> for the match.
</p>
</li>

Expand Down
92 changes: 78 additions & 14 deletions tests/ovn.at
Expand Up @@ -28435,24 +28435,39 @@ wait_row_count Port_Binding 1 logical_port=lsp-cont1 chassis=$ch
OVN_CLEANUP([hv1])
AT_CLEANUP

# TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS [ DGP | GR ]
# Test dropping traffic destined to router owned IPs.
OVN_FOR_EACH_NORTHD([
AT_SETUP([gateway router drop traffic for own IPs])
m4_define([TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS], [
ovn_start

ovn-nbctl lr-add r1 -- set logical_router r1 options:chassis=hv1
ovn-nbctl ls-add s1
ovn-nbctl lr-add r1 # Gateway router or LR with DGP on the ext side
ovn-nbctl ls-add ext # simulate external LS
ovn-nbctl ls-add s2 # simulate internal LS

# Connnect r1 to s1.
ovn-nbctl lrp-add r1 lrp-r1-s1 00:00:00:00:01:01 10.0.1.1/24
ovn-nbctl lsp-add s1 lsp-s1-r1 -- set Logical_Switch_Port lsp-s1-r1 type=router \
options:router-port=lrp-r1-s1 addresses=router
# Connnect r1 to ext.
ovn-nbctl lrp-add r1 lrp-r1-ext 00:00:00:00:01:01 10.0.1.1/24
if test X"$1" = X"DGP"; then
ovn-nbctl lrp-set-gateway-chassis lrp-r1-ext hv1 1
else
ovn-nbctl set logical_router r1 options:chassis=hv1
fi
ovn-nbctl lsp-add ext lsp-ext-r1 -- set Logical_Switch_Port lsp-ext-r1 type=router \
options:router-port=lrp-r1-ext addresses=router

# Create logical port p1 in s1
ovn-nbctl lsp-add s1 p1 \
# Connnect r1 to s2.
ovn-nbctl lrp-add r1 lrp-r1-s2 00:00:00:00:02:01 10.0.2.1/24
ovn-nbctl lsp-add s2 lsp-s2-r1 -- set Logical_Switch_Port lsp-s2-r1 type=router \
options:router-port=lrp-r1-s2 addresses=router

# Create logical port p1 in ext
ovn-nbctl lsp-add ext p1 \
-- lsp-set-addresses p1 "f0:00:00:00:01:02 10.0.1.2" \
-- lsp-set-port-security p1 "f0:00:00:00:01:02 10.0.1.2"

# Create logical port p2 in s2
ovn-nbctl lsp-add s2 p2 \
-- lsp-set-addresses p2 "f0:00:00:00:02:02 10.0.2.2"

# Create two hypervisor and create OVS ports corresponding to logical ports.
net_add n1

Expand All @@ -28466,6 +28481,12 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \
options:rxq_pcap=hv1/vif1-rx.pcap \
ofport-request=1

ovs-vsctl -- add-port br-int hv1-vif2 -- \
set interface hv1-vif2 external-ids:iface-id=p2 \
options:tx_pcap=hv1/vif2-tx.pcap \
options:rxq_pcap=hv1/vif2-rx.pcap \
ofport-request=2

# Pre-populate the hypervisors' ARP tables so that we don't lose any
# packets for ARP resolution (native tunneling doesn't queue packets
# for ARP resolution).
Expand All @@ -28476,9 +28497,10 @@ ovn-nbctl --wait=hv sync

sw_key=$(ovn-sbctl --bare --columns tunnel_key list datapath_binding r1)

echo sw_key: $sw_key
AT_CHECK([ovn-sbctl lflow-list | grep lr_in_arp_resolve | grep 10.0.1.1], [1], [])

# Send ip packets from p1 to lrp-r1-s1
# Send ip packets from p1 to lrp-r1-ext
src_mac="f00000000102"
dst_mac="000000000101"
src_ip=`ip_to_hex 10 0 1 2`
Expand All @@ -28497,10 +28519,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=11, n_packets=1,.*
])

# Use the router IP as SNAT IP.
ovn-nbctl set logical_router r1 options:lb_force_snat_ip=10.0.1.1
ovn-nbctl lr-nat-add r1 snat 10.0.1.1 10.8.8.0/24
ovn-nbctl --wait=hv sync

# Send ip packets from p1 to lrp-r1-s1
# Send ip packets from p1 to lrp-r1-ext
src_mac="f00000000102"
dst_mac="000000000101"
src_ip=`ip_to_hex 10 0 1 2`
Expand All @@ -28515,11 +28537,53 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep
])

# The packet should've been dropped in the lr_in_arp_resolve stage.
AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=2,ip,metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl
if test X"$1" = X"DGP"; then
prio=150
inport=reg14
outport=reg15
else
prio=2
fi
AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl
1
])

# Send ip packets from p2 to lrp-r1-ext
src_mac="f00000000202"
dst_mac="000000000201"
src_ip=`ip_to_hex 10 0 2 2`
dst_ip=`ip_to_hex 10 0 1 1`
packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
as hv1 ovs-appctl netdev-dummy/receive hv1-vif2 $packet

# Still no packet-ins should reach ovn-controller.
AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep "actions=controller" | grep -v n_packets=0 -c], [1], [dnl
0
])

if test X"$1" = X"DGP"; then
# The packet dst should be resolved once for E/W centralized NAT purpose.
AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=1,.* priority=100,reg0=0xa000101,reg15=.*metadata=0x${sw_key} actions=mod_dl_dst:00:00:00:00:01:01,resubmit" -c], [0], [dnl
1
])
fi

# The packet should've been finally dropped in the lr_in_arp_resolve stage.
AT_CHECK([as hv1 ovs-ofctl dump-flows br-int | grep -E "table=25, n_packets=2,.* priority=$prio,ip,$inport.*$outport.*metadata=0x${sw_key},nw_dst=10.0.1.1 actions=drop" -c], [0], [dnl
1
])
OVN_CLEANUP([hv1])
])

OVN_FOR_EACH_NORTHD([
AT_SETUP([gateway router drop traffic for own IPs])
TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(GR)
AT_CLEANUP
])

OVN_FOR_EACH_NORTHD([
AT_SETUP([distributed gateway port drop traffic for own IPs])
TEST_LR_DROP_TRAFFIC_FOR_OWN_IPS(DGP)
AT_CLEANUP
])

Expand Down

0 comments on commit 8c341b9

Please sign in to comment.