Skip to content

Commit

Permalink
Fix conntrack entry leaks because of TCP RST packets not sent to conn…
Browse files Browse the repository at this point in the history
…track.

The commit [1] - 28097d5("Fix tcp_reset action handling") fixed an issue
with tcp_reset OVN action. In order to fix that issue, this commit added
logical flows to skip all the TCP RST packets from conntrack.
Ideally it should have skipped only the TCP RST packets generated by
ovn-controller from conntrack. Since all the TCP RST packets are
skipped from conntrack, the connections in conntrack remain in
ESTABLISHED state even if the client/server sends TCP RST to close the
connection.  And these entries live for a long time and this is
causing performance issues as reported in the BZ.

This patch reverts the logical flows added in [1] and modifies the inner
actions of tcp_reset in the ingress logical switch pipeline
from - "tcp_reset { outport <-> inport; output; }"
to "tcp_reset { output <-> inport; next(pipeline=egress,table=5); }".
This causes the packet to resubmit to the egress table ls_out_qos_mark
skipping the egress ACL stage. Prior to this packet, next action was
not allowing a resubmit from ingress to egress pipeline. This patch
relaxes this limitation.

For the tcp_reset action in the egress logical switch pipeline, this patch
modifies the inner action
from - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=0); }"
to - "tcp_reset { outport <-> inport; next(pipeline=ingress,table=19); }".
This causes the packet to enter the ingress table ls_in_l2_lkup.

We don't see similar conntrack leaks with UDP. Although there is an issue
with the acl reject action for UDP packets. When ovn-controller generates icmp
destination unreachable packet, it doesn't get delivered. And the IP checksum is
incorrect in this packet. A follow up patch will fix these issues.

[1] - 28097d5("Fix tcp_reset action handling")

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1819785
Co-Authored-by: Tim Rozet <trozet@redhat.com>
Signed-off-by: Tim Rozet <trozet@redhat.com>
Acked-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
  • Loading branch information
numansiddique and trozet committed Apr 27, 2020
1 parent feb5d6e commit b4b6817
Show file tree
Hide file tree
Showing 9 changed files with 217 additions and 38 deletions.
1 change: 1 addition & 0 deletions AUTHORS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,7 @@ Thomas F. Herbert thomasfherbert@gmail.com
Thomas Goirand zigo@debian.org
Thomas Graf tgraf@noironetworks.com
Thomas Lacroix thomas.lacroix@citrix.com
Tim Rozet trozet@redhat.com
Timo Puha timox.puha@intel.com
Timothy Redaelli tredaelli@redhat.com
Todd Deshane deshantm@gmail.com
Expand Down
6 changes: 1 addition & 5 deletions lib/actions.c
Original file line number Diff line number Diff line change
Expand Up @@ -319,11 +319,7 @@ parse_NEXT(struct action_context *ctx)
}
}

if (pipeline == OVNACT_P_EGRESS && ctx->pp->pipeline == OVNACT_P_INGRESS) {
lexer_error(ctx->lexer,
"\"next\" action cannot advance from ingress to egress "
"pipeline (use \"output\" action instead)");
} else if (table >= ctx->pp->n_tables) {
if (table >= ctx->pp->n_tables) {
lexer_error(ctx->lexer,
"\"next\" action cannot advance beyond table %d.",
ctx->pp->n_tables - 1);
Expand Down
8 changes: 8 additions & 0 deletions northd/ovn-northd.8.xml
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,14 @@
for new connections and <code>reg0[1] = 1; next;</code> for existing
connections.
</li>
<li>
<code>reject</code> ACLs translate into logical
flows with the
<code>tcp_reset { output &lt;-&gt; inport;
next(pipeline=egress,table=5);}</code>
action for TCP connections and <code>icmp4/icmp6</code> action
for UDP connections.
</li>
<li>
Other ACLs translate to <code>drop;</code> for new or untracked
connections and <code>ct_commit(ct_label=1/1);</code> for known
Expand Down
14 changes: 8 additions & 6 deletions northd/ovn-northd.c
Original file line number Diff line number Diff line change
Expand Up @@ -4725,11 +4725,11 @@ build_pre_acls(struct ovn_datapath *od, struct hmap *lflows)
* unreachable packets. */
ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110,
"nd || nd_rs || nd_ra || icmp4.type == 3 || "
"icmp6.type == 1 || (tcp && tcp.flags == 20) || "
"icmp6.type == 1 || "
"(udp && udp.src == 546 && udp.dst == 547)", "next;");
ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110,
"nd || nd_rs || nd_ra || icmp4.type == 3 || "
"icmp6.type == 1 || (tcp && tcp.flags == 20) ||"
"icmp6.type == 1 || "
"(udp && udp.src == 546 && udp.dst == 547)", "next;");

/* Ingress and Egress Pre-ACL Table (Priority 100).
Expand Down Expand Up @@ -4842,11 +4842,11 @@ build_pre_lb(struct ovn_datapath *od, struct hmap *lflows,
/* Do not send ND packets to conntrack */
ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_LB, 110,
"nd || nd_rs || nd_ra || icmp4.type == 3 ||"
"icmp6.type == 1 || (tcp && tcp.flags == 20)",
"icmp6.type == 1",
"next;");
ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_LB, 110,
"nd || nd_rs || nd_ra || icmp4.type == 3 ||"
"icmp6.type == 1 || (tcp && tcp.flags == 20)",
"icmp6.type == 1",
"next;");

/* Do not send service monitor packets to conntrack. */
Expand Down Expand Up @@ -4992,7 +4992,8 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
ds_put_format(&actions, "reg0 = 0; "
"eth.dst <-> eth.src; ip4.dst <-> ip4.src; "
"tcp_reset { outport <-> inport; %s };",
ingress ? "output;" : "next(pipeline=ingress,table=0);");
ingress ? "next(pipeline=egress,table=5);"
: "next(pipeline=ingress,table=19);");
ovn_lflow_add_with_hint(lflows, od, stage,
acl->priority + OVN_ACL_PRI_OFFSET + 10,
ds_cstr(&match), ds_cstr(&actions), stage_hint);
Expand All @@ -5006,7 +5007,8 @@ build_reject_acl_rules(struct ovn_datapath *od, struct hmap *lflows,
ds_put_format(&actions, "reg0 = 0; "
"eth.dst <-> eth.src; ip6.dst <-> ip6.src; "
"tcp_reset { outport <-> inport; %s };",
ingress ? "output;" : "next(pipeline=ingress,table=0);");
ingress ? "next(pipeline=egress,table=5);"
: "next(pipeline=ingress,table=19);");
ovn_lflow_add_with_hint(lflows, od, stage,
acl->priority + OVN_ACL_PRI_OFFSET + 10,
ds_cstr(&match), ds_cstr(&actions), stage_hint);
Expand Down
10 changes: 6 additions & 4 deletions ovn-sb.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1112,10 +1112,12 @@
<var>pipeline</var> as a subroutine. The default <var>table</var> is
just after the current one. If <var>pipeline</var> is specified, it
may be <code>ingress</code> or <code>egress</code>; the default
<var>pipeline</var> is the one currently executing. Actions in the
ingress pipeline may not use <code>next</code> to jump into the
egress pipeline (use the <code>output</code> instead), but
transitions in the opposite direction are allowed.
<var>pipeline</var> is the one currently executing. Actions in the
both ingress and egress pipeline can use <code>next</code> to jump
across the other pipeline. Actions in the ingress pipeline should
use <code>next</code> to jump into the specific table of egress
pipeline only if it is certain that the packets are local and not
tunnelled and wants to skip certain stages in the packet processing.
</dd>

<dt><code><var>field</var> = <var>constant</var>;</code></dt>
Expand Down
3 changes: 2 additions & 1 deletion tests/automake.mk
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,8 @@ tests_ovstest_LDADD = $(OVS_LIBDIR)/libopenvswitch.la lib/libovn.la
# Python tests.
CHECK_PYFILES = \
tests/test-l7.py \
tests/uuidfilt.py
tests/uuidfilt.py \
tests/test-tcp-rst.py

EXTRA_DIST += $(CHECK_PYFILES)
PYCOV_CLEAN_FILES += $(CHECK_PYFILES:.py=.py,cover) .coverage
Expand Down
6 changes: 5 additions & 1 deletion tests/ovn.at
Original file line number Diff line number Diff line change
Expand Up @@ -852,7 +852,11 @@ next(pipeline=ingress, table=11);
encodes as resubmit(,19)

next(pipeline=egress);
"next" action cannot advance from ingress to egress pipeline (use "output" action instead)
formats as next(pipeline=egress, table=11);
encodes as resubmit(,51)

next(pipeline=egress, table=5);
encodes as resubmit(,45)

next(table=10);
formats as next(10);
Expand Down
170 changes: 149 additions & 21 deletions tests/system-ovn.at
Original file line number Diff line number Diff line change
Expand Up @@ -3719,60 +3719,86 @@ start_daemon ovn-controller

ovn-nbctl ls-add sw0

ovn-nbctl lsp-add sw0 sw0-p1
ovn-nbctl lsp-set-addresses sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
ovn-nbctl lsp-set-port-security sw0-p1 "50:54:00:00:00:03 10.0.0.3 aef0::3"
ovn-nbctl lsp-add sw0 sw0-p1-rej
ovn-nbctl lsp-set-addresses sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"
ovn-nbctl lsp-set-port-security sw0-p1-rej "50:54:00:00:00:03 10.0.0.3 aef0::3"

ovn-nbctl lsp-add sw0 sw0-p2
ovn-nbctl lsp-set-addresses sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
ovn-nbctl lsp-set-port-security sw0-p2 "50:54:00:00:00:04 10.0.0.4 aef0::4"
ovn-nbctl lsp-add sw0 sw0-p2-rej
ovn-nbctl lsp-set-addresses sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"
ovn-nbctl lsp-set-port-security sw0-p2-rej "50:54:00:00:00:04 10.0.0.4 aef0::4"

#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
#ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject

# Create port group and ACLs for sw0 ports.
ovn-nbctl pg-add pg0_drop sw0-p1-rej sw0-p2-rej
ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop

ovn-nbctl pg-add pg0 sw0-p1-rej sw0-p2-rej
ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
ovn-nbctl --log acl-add pg0 from-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 80" reject

ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p1\" && tcp && tcp.dst == 80" reject
ovn-nbctl --log acl-add sw0 from-lport 1000 "inport == \"sw0-p2\" && ip6 && tcp && tcp.dst == 80" reject
ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 82" allow-related
ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 82" allow-related
ovn-nbctl --log acl-add pg0 to-lport 1004 "inport == @pg0 && ip && tcp && tcp.dst == 84" reject

OVN_POPULATE_ARP
ovn-nbctl --wait=hv sync

ADD_NAMESPACES(sw0-p1)
ADD_VETH(sw0-p1, sw0-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
ADD_NAMESPACES(sw0-p1-rej)
ADD_VETH(sw0-p1-rej, sw0-p1-rej, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
"10.0.0.1")

ADD_NAMESPACES(sw0-p2)
ADD_VETH(sw0-p2, sw0-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
ADD_NAMESPACES(sw0-p2-rej)
ADD_VETH(sw0-p2-rej, sw0-p2-rej, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
"10.0.0.1")

NS_CHECK_EXEC([sw0-p1], [ip a a aef0::3/64 dev sw0-p1], [0])
NS_CHECK_EXEC([sw0-p2], [ip a a aef0::4/64 dev sw0-p2], [0])
NS_CHECK_EXEC([sw0-p1-rej], [ip a a aef0::3/64 dev sw0-p1-rej], [0])
NS_CHECK_EXEC([sw0-p2-rej], [ip a a aef0::4/64 dev sw0-p2-rej], [0])

# Capture packets in sw0-p1.
NS_CHECK_EXEC([sw0-p1], [tcpdump -n -c 2 -i sw0-p1 tcp port 80 > sw0-p1-ip4.pcap &], [0])
# Capture packets in sw0-p1-rej.
NS_CHECK_EXEC([sw0-p1-rej], [tcpdump -n -c 2 -i sw0-p1-rej tcp port 80 > sw0-p1-rej-ip4.pcap &], [0])
sleep 1

NS_CHECK_EXEC([sw0-p1], [nc 10.0.0.4 80], [1], [],
NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 80], [1], [],
[dnl
Ncat: Connection refused.
])

OVS_WAIT_UNTIL([
total=`cat sw0-p1-ip4.pcap | wc -l`
total=`cat sw0-p1-rej-ip4.pcap | wc -l`
echo "total = $total"
test "${total}" = "2"
])

# Now send traffic to port 84
NS_CHECK_EXEC([sw0-p1-rej], [nc 10.0.0.4 84], [1], [],
[dnl
Ncat: Connection refused.
])

AT_CHECK([
n_pkt=$(ovs-ofctl dump-flows br-int table=44 | grep -v n_packets=0 | \
grep controller | grep tp_dst=84 -c)
test $n_pkt -eq 1
])

# Without this sleep, test case fails intermittently.
sleep 3

NS_CHECK_EXEC([sw0-p2], [tcpdump -n -c 2 -i sw0-p2 tcp port 80 > sw0-p2-ip6.pcap &], [0])
NS_CHECK_EXEC([sw0-p2-rej], [tcpdump -n -c 2 -i sw0-p2-rej tcp port 80 > sw0-p2-rej-ip6.pcap &], [0])

sleep 1

NS_CHECK_EXEC([sw0-p2], [nc -6 aef0::3 80], [1], [],
NS_CHECK_EXEC([sw0-p2-rej], [nc -6 aef0::3 80], [1], [],
[dnl
Ncat: Connection refused.
])

OVS_WAIT_UNTIL([
total=`cat sw0-p2-ip6.pcap | wc -l`
total=`cat sw0-p2-rej-ip6.pcap | wc -l`
echo "total = $total"
test "${total}" = "2"
])
Expand Down Expand Up @@ -3943,3 +3969,105 @@ as
OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
/.*terminating with signal 15.*/d"])
AT_CLEANUP

# Tests that when an established connection sends TCP reset,
# the conntrack entry is not in established state.
AT_SETUP([ovn -- conntrack TCP reset])
AT_KEYWORDS([conntrack])
ovn_start

OVS_TRAFFIC_VSWITCHD_START()
ADD_BR([br-int])

# Set external-ids in br-int needed for ovn-controller
ovs-vsctl \
-- set Open_vSwitch . external-ids:system-id=hv1 \
-- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
-- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
-- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
-- set bridge br-int fail-mode=secure other-config:disable-in-band=true

# Start ovn-controller
start_daemon ovn-controller

ovn-nbctl ls-add sw0

ovn-nbctl lsp-add sw0 rst-p1
ovn-nbctl lsp-set-addresses rst-p1 "50:54:00:00:00:03"
ovn-nbctl lsp-set-port-security rst-p1 "50:54:00:00:00:03"

ovn-nbctl lsp-add sw0 rst-p2
ovn-nbctl lsp-set-addresses rst-p2 "50:54:00:00:00:04 10.0.0.4"
ovn-nbctl lsp-set-port-security rst-p2 "50:54:00:00:00:04 10.0.0.4"

# Create port group and ACLs for sw0 ports.
ovn-nbctl pg-add pg0_drop rst-p1 rst-p2
ovn-nbctl acl-add pg0_drop from-lport 1001 "inport == @pg0_drop && ip" drop
ovn-nbctl acl-add pg0_drop to-lport 1001 "outport == @pg0_drop && ip" drop

ovn-nbctl pg-add pg0 rst-p1 rst-p2
ovn-nbctl acl-add pg0 from-lport 1002 "inport == @pg0 && ip4" allow-related
ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && icmp4" allow-related
ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && tcp && tcp.dst == 80" allow-related
ovn-nbctl acl-add pg0 to-lport 1002 "outport == @pg0 && ip4 && ip4.src == 0.0.0.0/0 && udp && udp.dst == 80" allow-related

# Create a logical router and attach to logical switch.
ovn-nbctl lr-add lr0
ovn-nbctl lrp-add lr0 lr0-sw0 00:00:00:00:ff:01 10.0.0.1/24
ovn-nbctl lsp-add sw0 sw0-lr0
ovn-nbctl lsp-set-type sw0-lr0 router
ovn-nbctl lsp-set-addresses sw0-lr0 router
ovn-nbctl lsp-set-options sw0-lr0 router-port=lr0-sw0

ovn-nbctl lb-add lb1 10.0.0.10:80 10.0.0.3:80
ovn-nbctl --wait=sb ls-lb-add sw0 lb1
ovn-nbctl --wait=sb lr-lb-add lr0 lb1

OVN_POPULATE_ARP
ovn-nbctl --wait=hv sync

ADD_NAMESPACES(rst-p1)
ADD_VETH(rst-p1, rst-p1, br-int, "10.0.0.3/24", "50:54:00:00:00:03", \
"10.0.0.1")

ADD_NAMESPACES(rst-p2)
ADD_VETH(rst-p2, rst-p2, br-int, "10.0.0.4/24", "50:54:00:00:00:04", \
"10.0.0.1")

OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p1) = xup])
OVS_WAIT_UNTIL([test x$(ovn-nbctl lsp-get-up rst-p2) = xup])

# Start webservers in 'rst-p1'.
OVS_START_L7([rst-p1], [http])

NS_CHECK_EXEC([rst-p2], [$PYTHON $srcdir/test-tcp-rst.py --dst-port 80 --dst-ip 10.0.0.10])

# When tcp reset is sent, conntrack entry should be in the state - CLOSED or CLOSING.
# But there is a bug where tcp reset packet was not sent to the conntrack.
# This test case checks that the tcp reset packet is sent to conntrack
# and the state is not in established state.
AT_CHECK([
ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=ESTABLISHED -c)
test $ct_est_count -eq 0

ct_est_count=$(ovs-appctl dpctl/dump-conntrack | grep 10.0.0.10 | grep state=CLOS -c)
test $ct_est_count -eq 1
])

OVS_APP_EXIT_AND_WAIT([ovn-controller])

as ovn-sb
OVS_APP_EXIT_AND_WAIT([ovsdb-server])

as ovn-nb
OVS_APP_EXIT_AND_WAIT([ovsdb-server])

as northd
OVS_APP_EXIT_AND_WAIT([ovn-northd])

as
OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
/connection dropped.*/d
/Service monitor not found.*/d"])

AT_CLEANUP
37 changes: 37 additions & 0 deletions tests/test-tcp-rst.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/usr/bin/env python3
# Copyright (c) 2020 Red Hat, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at:
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Simple python script which connects to tcp server and then
# resets the connection.
import argparse
import socket
import sys
import struct
import time

parser = argparse.ArgumentParser(description='')
parser.add_argument("--src-port", type=int, default=11337, help="source port to use")
parser.add_argument("--dst-port", type=int, help="dst port to use")
parser.add_argument("--dst-ip", help="server ip to use")
args = parser.parse_args()
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = (args.dst_ip, args.dst_port)
sock.bind(('0.0.0.0', args.src_port))
sock.connect(server_address)
l_onoff = 1
l_linger = 0
time.sleep(1)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger))
sock.close()

0 comments on commit b4b6817

Please sign in to comment.