Skip to content

Commit

Permalink
Add ECMP symmetric replies.
Browse files Browse the repository at this point in the history
When traffic arrives over an ECMP route, there is no guarantee that the
reply traffic will egress over the same route. Sometimes, the nature of
the traffic (or the intervening equipment) means that it is important
for reply traffic to go out the same route it came in.

This commit introduces optional ECMP symmetric reply behavior. If
configured, then traffic to or from the ECMP route will be sent to
conntrack. New incoming traffic over the route will have the source MAC
address and incoming port saved in the ct_label. Reply traffic then uses
this saved information to send the packet back out the same way it came
in.

To facilitate this, a new table was added to the ingress logical router
pipeline. The ECMP_STATEFUL table is responsible for committing to
conntrack and setting the ct_label when it detects new incoming traffic
from the route.

Since ingress pipeline logic on the logical router depends on ct state
of a particular hypervisor, this feature is only usable on gateway
routers.

Signed-off-by: Mark Michelson <mmichels@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1849683
Acked-by: Numan Siddique <numans@ovn.org>
  • Loading branch information
putnopvut committed Jul 29, 2020
1 parent 6cfb44a commit 4fdca65
Show file tree
Hide file tree
Showing 10 changed files with 496 additions and 54 deletions.
4 changes: 4 additions & 0 deletions lib/logical-fields.c
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,10 @@ ovn_init_symtab(struct shash *symtab)
WR_CT_COMMIT);
expr_symtab_add_subfield_scoped(symtab, "ct_label.blocked", NULL,
"ct_label[0]", WR_CT_COMMIT);
expr_symtab_add_subfield_scoped(symtab, "ct_label.ecmp_reply_eth", NULL,
"ct_label[32..79]", WR_CT_COMMIT);
expr_symtab_add_subfield_scoped(symtab, "ct_label.ecmp_reply_port", NULL,
"ct_label[80..95]", WR_CT_COMMIT);

expr_symtab_add_field(symtab, "ct_state", MFF_CT_STATE, NULL, false);

Expand Down
49 changes: 39 additions & 10 deletions northd/ovn-northd.8.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2120,15 +2120,31 @@ icmp6 {
<p>
This is to send packets to connection tracker for tracking and
defragmentation. It contains a priority-0 flow that simply moves traffic
to the next table. If load balancing rules with virtual IP addresses
(and ports) are configured in <code>OVN_Northbound</code> database for a
Gateway router, a priority-100 flow is added for each configured virtual
IP address <var>VIP</var>. For IPv4 <var>VIPs</var> the flow matches
<code>ip &amp;&amp; ip4.dst == <var>VIP</var></code>. For IPv6
<var>VIPs</var>, the flow matches <code>ip &amp;&amp; ip6.dst ==
<var>VIP</var></code>. The flow uses the action <code>ct_next;</code>
to send IP packets to the connection tracker for packet de-fragmentation
and tracking before sending it to the next table.
to the next table.
</p>

<p>
If load balancing rules with virtual IP addresses (and ports) are
configured in <code>OVN_Northbound</code> database for a Gateway router,
a priority-100 flow is added for each configured virtual IP address
<var>VIP</var>. For IPv4 <var>VIPs</var> the flow matches <code>ip
&amp;&amp; ip4.dst == <var>VIP</var></code>. For IPv6 <var>VIPs</var>,
the flow matches <code>ip &amp;&amp; ip6.dst == <var>VIP</var></code>.
The flow uses the action <code>ct_next;</code> to send IP packets to the
connection tracker for packet de-fragmentation and tracking before
sending it to the next table.
</p>

<p>
If ECMP routes with symmetric reply are configured in the
<code>OVN_Northbound</code> database for a gateway router, a priority-100
flow is added for each router port on which symmetric replies are
configured. The matching logic for these ports essentially reverses the
configured logic of the ECMP route. So for instance, a route with a
destination routing policy will instead match if the source IP address
matches the static route's prefix. The flow uses the action
<code>ct_next</code> to send IP packets to the connection tracker for
packet de-fragmentation and tracking before sending it to the next table.
</p>

<h3>Ingress Table 5: UNSNAT</h3>
Expand Down Expand Up @@ -2489,7 +2505,15 @@ output;
table. This table, instead, is responsible for determine the ECMP
group id and select a member id within the group based on 5-tuple
hashing. It stores group id in <code>reg8[0..15]</code> and member id in
<code>reg8[16..31]</code>.
<code>reg8[16..31]</code>. This step is skipped if the traffic going
out the ECMP route is reply traffic, and the ECMP route was configured
to use symmetric replies. Instead, the stored <code>ct_label</code> value
is used to choose the destination. The least significant 48 bits of the
<code>ct_label</code> tell the destination MAC address to which the
packet should be sent. The next 16 bits tell the logical router port on
which the packet should be sent. These values in the
<code>ct_label</code> are set when the initial ingress traffic is
received over the ECMP route.
</p>

<p>
Expand Down Expand Up @@ -2639,6 +2663,11 @@ select(reg8[16..31], <var>MID1</var>, <var>MID2</var>, ...);
address and <code>reg1</code> as the source protocol address).
</p>

<p>
This processing is skipped for reply traffic being sent out of an ECMP
route if the route was configured to use symmetric replies.
</p>

<p>
This table contains the following logical flows:
</p>
Expand Down
123 changes: 108 additions & 15 deletions northd/ovn-northd.c
Original file line number Diff line number Diff line change
Expand Up @@ -172,16 +172,17 @@ enum ovn_stage {
PIPELINE_STAGE(ROUTER, IN, DEFRAG, 4, "lr_in_defrag") \
PIPELINE_STAGE(ROUTER, IN, UNSNAT, 5, "lr_in_unsnat") \
PIPELINE_STAGE(ROUTER, IN, DNAT, 6, "lr_in_dnat") \
PIPELINE_STAGE(ROUTER, IN, ND_RA_OPTIONS, 7, "lr_in_nd_ra_options") \
PIPELINE_STAGE(ROUTER, IN, ND_RA_RESPONSE, 8, "lr_in_nd_ra_response") \
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 9, "lr_in_ip_routing") \
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING_ECMP, 10, "lr_in_ip_routing_ecmp") \
PIPELINE_STAGE(ROUTER, IN, POLICY, 11, "lr_in_policy") \
PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 12, "lr_in_arp_resolve") \
PIPELINE_STAGE(ROUTER, IN, CHK_PKT_LEN , 13, "lr_in_chk_pkt_len") \
PIPELINE_STAGE(ROUTER, IN, LARGER_PKTS, 14,"lr_in_larger_pkts") \
PIPELINE_STAGE(ROUTER, IN, GW_REDIRECT, 15, "lr_in_gw_redirect") \
PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 16, "lr_in_arp_request") \
PIPELINE_STAGE(ROUTER, IN, ECMP_STATEFUL, 7, "lr_in_ecmp_stateful") \
PIPELINE_STAGE(ROUTER, IN, ND_RA_OPTIONS, 8, "lr_in_nd_ra_options") \
PIPELINE_STAGE(ROUTER, IN, ND_RA_RESPONSE, 9, "lr_in_nd_ra_response") \
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 10, "lr_in_ip_routing") \
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING_ECMP, 11, "lr_in_ip_routing_ecmp") \
PIPELINE_STAGE(ROUTER, IN, POLICY, 12, "lr_in_policy") \
PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 13, "lr_in_arp_resolve") \
PIPELINE_STAGE(ROUTER, IN, CHK_PKT_LEN , 14, "lr_in_chk_pkt_len") \
PIPELINE_STAGE(ROUTER, IN, LARGER_PKTS, 15,"lr_in_larger_pkts") \
PIPELINE_STAGE(ROUTER, IN, GW_REDIRECT, 16, "lr_in_gw_redirect") \
PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 17, "lr_in_arp_request") \
\
/* Logical router egress stages. */ \
PIPELINE_STAGE(ROUTER, OUT, UNDNAT, 0, "lr_out_undnat") \
Expand Down Expand Up @@ -7418,6 +7419,7 @@ struct parsed_route {
bool is_src_route;
uint32_t hash;
const struct nbrec_logical_router_static_route *route;
bool ecmp_symmetric_reply;
};

static uint32_t
Expand Down Expand Up @@ -7479,6 +7481,8 @@ parsed_routes_add(struct ovs_list *routes,
"src-ip"));
pr->hash = route_hash(pr);
pr->route = route;
pr->ecmp_symmetric_reply = smap_get_bool(&route->options,
"ecmp_symmetric_reply", false);
ovs_list_insert(routes, &pr->list_node);
return pr;
}
Expand Down Expand Up @@ -7727,26 +7731,102 @@ find_static_route_outport(struct ovn_datapath *od, struct hmap *ports,
return true;
}

static void
add_ecmp_symmetric_reply_flows(struct hmap *lflows,
struct ovn_datapath *od,
const char *port_ip,
struct ovn_port *out_port,
const struct parsed_route *route,
struct ds *route_match)
{
const struct nbrec_logical_router_static_route *st_route = route->route;
struct ds match = DS_EMPTY_INITIALIZER;
struct ds actions = DS_EMPTY_INITIALIZER;
struct ds ecmp_reply = DS_EMPTY_INITIALIZER;
char *cidr = normalize_v46_prefix(&route->prefix, route->plen);

/* If symmetric ECMP replies are enabled, then packets that arrive over
* an ECMP route need to go through conntrack.
*/
ds_put_format(&match, "inport == %s && ip%s.%s == %s",
out_port->json_key,
route->prefix.family == AF_INET ? "4" : "6",
route->is_src_route ? "dst" : "src",
cidr);
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, 100,
ds_cstr(&match), "ct_next;",
&st_route->header_);

/* And packets that go out over an ECMP route need conntrack */
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, 100,
ds_cstr(route_match), "ct_next;",
&st_route->header_);

/* Save src eth and inport in ct_label for packets that arrive over
* an ECMP route.
*
* NOTE: we purposely are not clearing match before this
* ds_put_cstr() call. The previous contents are needed.
*/
ds_put_cstr(&match, " && (ct.new && !ct.est)");

ds_put_format(&actions, "ct_commit { ct_label.ecmp_reply_eth = eth.src;"
" ct_label.ecmp_reply_port = %" PRId64 ";}; next;",
out_port->sb->tunnel_key);
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_ECMP_STATEFUL, 100,
ds_cstr(&match), ds_cstr(&actions),
&st_route->header_);

/* Bypass ECMP selection if we already have ct_label information
* for where to route the packet.
*/
ds_put_format(&ecmp_reply, "ct.rpl && ct_label.ecmp_reply_port == %"
PRId64, out_port->sb->tunnel_key);
ds_clear(&match);
ds_put_format(&match, "%s && %s", ds_cstr(&ecmp_reply),
ds_cstr(route_match));
ds_clear(&actions);
ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; "
"eth.src = %s; %sreg1 = %s; outport = %s; next;",
out_port->lrp_networks.ea_s,
route->prefix.family == AF_INET ? "" : "xx",
port_ip, out_port->json_key);
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_ROUTING, 100,
ds_cstr(&match), ds_cstr(&actions),
&st_route->header_);

/* Egress reply traffic for symmetric ECMP routes skips router policies. */
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_POLICY, 65535,
ds_cstr(&ecmp_reply), "next;",
&st_route->header_);

ds_clear(&actions);
ds_put_cstr(&actions, "eth.dst = ct_label.ecmp_reply_eth; next;");
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_ARP_RESOLVE,
200, ds_cstr(&ecmp_reply),
ds_cstr(&actions), &st_route->header_);
}

static void
build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od,
struct hmap *ports, struct ecmp_groups_node *eg)

{
bool is_ipv4 = (eg->prefix.family == AF_INET);
struct ds match = DS_EMPTY_INITIALIZER;
uint16_t priority;
struct ecmp_route_list_node *er;
struct ds route_match = DS_EMPTY_INITIALIZER;

char *prefix_s = build_route_prefix_s(&eg->prefix, eg->plen);
build_route_match(NULL, prefix_s, eg->plen, eg->is_src_route, is_ipv4,
&match, &priority);
&route_match, &priority);
free(prefix_s);

struct ds actions = DS_EMPTY_INITIALIZER;
ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
"; %s = select(", REG_ECMP_GROUP_ID, eg->id,
REG_ECMP_MEMBER_ID);

struct ecmp_route_list_node *er;
bool is_first = true;
LIST_FOR_EACH (er, list_node, &eg->route_list) {
if (is_first) {
Expand All @@ -7760,11 +7840,12 @@ build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od,
ds_put_cstr(&actions, ");");

ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, priority,
ds_cstr(&match), ds_cstr(&actions));
ds_cstr(&route_match), ds_cstr(&actions));

/* Add per member flow */
struct ds match = DS_EMPTY_INITIALIZER;
struct sset visited_ports = SSET_INITIALIZER(&visited_ports);
LIST_FOR_EACH (er, list_node, &eg->route_list) {

const struct parsed_route *route_ = er->route;
const struct nbrec_logical_router_static_route *route = route_->route;
/* Find the outgoing port. */
Expand All @@ -7774,6 +7855,15 @@ build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od,
&out_port)) {
continue;
}
/* Symmetric ECMP reply is only usable on gateway routers.
* It is NOT usable on distributed routers with a gateway port.
*/
if (smap_get(&od->nbr->options, "chassis") &&
route_->ecmp_symmetric_reply && sset_add(&visited_ports,
out_port->key)) {
add_ecmp_symmetric_reply_flows(lflows, od, lrp_addr_s, out_port,
route_, &route_match);
}
ds_clear(&match);
ds_put_format(&match, REG_ECMP_GROUP_ID" == %"PRIu16" && "
REG_ECMP_MEMBER_ID" == %"PRIu16,
Expand All @@ -7794,7 +7884,9 @@ build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od,
ds_cstr(&match), ds_cstr(&actions),
&route->header_);
}
sset_destroy(&visited_ports);
ds_destroy(&match);
ds_destroy(&route_match);
ds_destroy(&actions);
}

Expand Down Expand Up @@ -9078,6 +9170,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 0, "1", "next;");
ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 0, "1", "next;");
ovn_lflow_add(lflows, od, S_ROUTER_IN_ECMP_STATEFUL, 0, "1", "next;");

/* Send the IPv6 NS packets to next table. When ovn-controller
* generates IPv6 NS (for the action - nd_ns{}), the injected
Expand Down
7 changes: 4 additions & 3 deletions ovn-architecture.7.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1210,11 +1210,12 @@
<dd>
Fields that denote the connection tracking zones for routers. These
values only have local significance and are not meaningful between
chassis. OVN stores the zone information for DNATting in Open vSwitch
chassis. OVN stores the zone information for north to south traffic
(for DNATting or ECMP symmetric replies) in Open vSwitch
<!-- Keep the following in sync with MFF_LOG_DNAT_ZONE and
MFF_LOG_SNAT_ZONE in ovn/lib/logical-fields.h. -->
extension register number 11 and zone information for SNATing in
Open vSwitch extension register number 12.
extension register number 11 and zone information for south to north
traffic (for SNATing) in Open vSwitch extension register number 12.
</dd>

<dt>logical flow flags</dt>
Expand Down
7 changes: 5 additions & 2 deletions ovn-nb.ovsschema
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "OVN_Northbound",
"version": "5.24.0",
"cksum": "1092394564 25961",
"version": "5.25.0",
"cksum": "1354137211 26116",
"tables": {
"NB_Global": {
"columns": {
Expand Down Expand Up @@ -365,6 +365,9 @@
"min": 0, "max": 1}},
"nexthop": {"type": "string"},
"output_port": {"type": {"key": "string", "min": 0, "max": 1}},
"options": {
"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}},
"external_ids": {
"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}}},
Expand Down
16 changes: 16 additions & 0 deletions ovn-nb.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2481,6 +2481,22 @@
</column>
</group>

<group title="Common options">
<column name="options">
This column provides general key/value settings. The supported
options are described individually below.
</column>

<column name="options" key="ecmp_symmetric_reply">
It true, then new traffic that arrives over this route will have
its reply traffic bypass ECMP route selection and will be sent out
this route instead. Note that this option overrides any rules set
in the <ref table="Logical_Router_policy" /> table. This option
only works on gateway routers (routers that have
<ref column="options" key="chassis" table="Logical_Router" /> set).
</column>
</group>

</table>

<table name="Logical_Router_Policy" title="Logical router policies">
Expand Down

0 comments on commit 4fdca65

Please sign in to comment.