Skip to content

Commit

Permalink
northd: Add ECMP support to router policies.
Browse files Browse the repository at this point in the history
A user can add a policy now like:

ovn-nbctl lr-policy-add <LR> 100 "ip4.src == 10.0.0.4" reroute 172.0.0.5,172.0.0.6

We do have ECMP support for logical router static routes, but since
policies are applied after the routing stage, ECMP support for
policies is desired by ovn-kubernetes.

A new column 'nexthops' is added to the Logical_Router_Policy table
instead of modifying the existing column 'nexthop' to preserve
backward compatibility and avoid any upgrade issues.

Requested-by: Alexander Constantinescu <aconstan@redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1881826
Acked-by: Mark Michelson <mmichels@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>
  • Loading branch information
numansiddique committed Dec 15, 2020
1 parent 9af90b5 commit 35b00c7
Show file tree
Hide file tree
Showing 9 changed files with 430 additions and 49 deletions.
2 changes: 1 addition & 1 deletion NEWS
@@ -1,6 +1,6 @@
Post-v20.12.0
-------------------------

- Support ECMP multiple nexthops for reroute router policies.

OVN v20.12.0 - xx xxx xxxx
--------------------------
Expand Down
80 changes: 73 additions & 7 deletions northd/ovn-northd.8.xml
Expand Up @@ -3041,14 +3041,36 @@ outport = <var>P</var>;

<li>
<p>
If the policy action is <code>reroute</code>, then the logical
flow is added with the following actions:
If the policy action is <code>reroute</code> with 2 or more nexthops
defined, then the logical flow is added with the following actions:
</p>

<pre>
reg8[0..15] = <var>GID</var>;
reg8[16..31] = select(1,..n);
</pre>

<p>
where <var>GID</var> is the ECMP group id generated by
<code>ovn-northd</code> for this policy and <var>n</var>
is the number of nexthops. <code>select</code> action
selects one of the nexthop member id, stores it in the register
<code>reg8[16..31]</code> and advances the packet to the
next stage.
</p>
</li>

<li>
<p>
If the policy action is <code>reroute</code> with just one nexhop,
then the logical flow is added with the following actions:
</p>

<pre>
[xx]reg0 = <var>H</var>;
eth.src = <var>E</var>;
outport = <var>P</var>;
reg8[0..15] = 0;
flags.loopback = 1;
next;
</pre>
Expand All @@ -3072,7 +3094,51 @@ next;
</li>
</ul>

<h3>Ingress Table 13: ARP/ND Resolution</h3>
<h3>Ingress Table 13: ECMP handling for router policies</h3>
<p>
This table handles the ECMP for the router policies configured
with multiple nexthops.
</p>

<ul>
<li>
<p>
A priority-150 flow is added to advance the packet to the next stage
if the ECMP group id register <code>reg8[0..15]</code> is 0.
</p>
</li>

<li>
<p>
For each ECMP reroute router policy with multiple nexthops,
a priority-100 flow is added for each nexthop <var>H</var>
with the match <code>reg8[0..15] == <var>GID</var> &amp;&amp;
reg8[16..31] == <var>M</var></code> where <var>GID</var>
is the router policy group id generated by <code>ovn-northd</code>
and <var>M</var> is the member id of the nexthop <var>H</var>
generated by <code>ovn-northd</code>. The following actions are added
to the flow:
</p>

<pre>
[xx]reg0 = <var>H</var>;
eth.src = <var>E</var>;
outport = <var>P</var>
"flags.loopback = 1; "
"next;"
</pre>

<p>
where <var>H</var> is the <code>nexthop </code> defined in the
router policy, <var>E</var> is the ethernet address of the
logical router port from which the <code>nexthop</code> is
reachable and <var>P</var> is the logical router port from
which the <code>nexthop</code> is reachable.
</p>
</li>
</ul>

<h3>Ingress Table 14: ARP/ND Resolution</h3>

<p>
Any packet that reaches this table is an IP packet whose next-hop
Expand Down Expand Up @@ -3258,7 +3324,7 @@ next;

</ul>

<h3>Ingress Table 14: Check packet length</h3>
<h3>Ingress Table 15: Check packet length</h3>

<p>
For distributed logical routers with distributed gateway port configured
Expand Down Expand Up @@ -3288,7 +3354,7 @@ REGBIT_PKT_LARGER = check_pkt_larger(<var>L</var>); next;
and advances to the next table.
</p>

<h3>Ingress Table 15: Handle larger packets</h3>
<h3>Ingress Table 16: Handle larger packets</h3>

<p>
For distributed logical routers with distributed gateway port configured
Expand Down Expand Up @@ -3349,7 +3415,7 @@ icmp6 {
and advances to the next table.
</p>

<h3>Ingress Table 16: Gateway Redirect</h3>
<h3>Ingress Table 17: Gateway Redirect</h3>

<p>
For distributed logical routers where one of the logical router
Expand Down Expand Up @@ -3389,7 +3455,7 @@ icmp6 {
</li>
</ul>

<h3>Ingress Table 17: ARP Request</h3>
<h3>Ingress Table 18: ARP Request</h3>

<p>
In the common case where the Ethernet destination has been resolved, this
Expand Down
148 changes: 135 additions & 13 deletions northd/ovn-northd.c
Expand Up @@ -188,11 +188,12 @@ enum ovn_stage {
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 10, "lr_in_ip_routing") \
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING_ECMP, 11, "lr_in_ip_routing_ecmp") \
PIPELINE_STAGE(ROUTER, IN, POLICY, 12, "lr_in_policy") \
PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 13, "lr_in_arp_resolve") \
PIPELINE_STAGE(ROUTER, IN, CHK_PKT_LEN , 14, "lr_in_chk_pkt_len") \
PIPELINE_STAGE(ROUTER, IN, LARGER_PKTS, 15,"lr_in_larger_pkts") \
PIPELINE_STAGE(ROUTER, IN, GW_REDIRECT, 16, "lr_in_gw_redirect") \
PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 17, "lr_in_arp_request") \
PIPELINE_STAGE(ROUTER, IN, POLICY_ECMP, 13, "lr_in_policy_ecmp") \
PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 14, "lr_in_arp_resolve") \
PIPELINE_STAGE(ROUTER, IN, CHK_PKT_LEN , 15, "lr_in_chk_pkt_len") \
PIPELINE_STAGE(ROUTER, IN, LARGER_PKTS, 16, "lr_in_larger_pkts") \
PIPELINE_STAGE(ROUTER, IN, GW_REDIRECT, 17, "lr_in_gw_redirect") \
PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 18, "lr_in_arp_request") \
\
/* Logical router egress stages. */ \
PIPELINE_STAGE(ROUTER, OUT, UNDNAT, 0, "lr_out_undnat") \
Expand Down Expand Up @@ -7562,33 +7563,39 @@ build_routing_policy_flow(struct hmap *lflows, struct ovn_datapath *od,
struct ds actions = DS_EMPTY_INITIALIZER;

if (!strcmp(rule->action, "reroute")) {
ovs_assert(rule->n_nexthops <= 1);

char *nexthop =
(rule->n_nexthops == 1 ? rule->nexthops[0] : rule->nexthop);
struct ovn_port *out_port = get_outport_for_routing_policy_nexthop(
od, ports, rule->priority, rule->nexthop);
od, ports, rule->priority, nexthop);
if (!out_port) {
return;
}

const char *lrp_addr_s = find_lrp_member_ip(out_port, rule->nexthop);
const char *lrp_addr_s = find_lrp_member_ip(out_port, nexthop);
if (!lrp_addr_s) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
VLOG_WARN_RL(&rl, "lrp_addr not found for routing policy "
" priority %"PRId64" nexthop %s",
rule->priority, rule->nexthop);
rule->priority, nexthop);
return;
}
uint32_t pkt_mark = ovn_smap_get_uint(&rule->options, "pkt_mark", 0);
if (pkt_mark) {
ds_put_format(&actions, "pkt.mark = %u; ", pkt_mark);
}
bool is_ipv4 = strchr(rule->nexthop, '.') ? true : false;

bool is_ipv4 = strchr(nexthop, '.') ? true : false;
ds_put_format(&actions, "%s = %s; "
"%s = %s; "
"eth.src = %s; "
"outport = %s; "
"flags.loopback = 1; "
REG_ECMP_GROUP_ID" = 0; "
"next;",
is_ipv4 ? REG_NEXT_HOP_IPV4 : REG_NEXT_HOP_IPV6,
rule->nexthop,
nexthop,
is_ipv4 ? REG_SRC_IPV4 : REG_SRC_IPV6,
lrp_addr_s,
out_port->lrp_networks.ea_s,
Expand All @@ -7601,7 +7608,7 @@ build_routing_policy_flow(struct hmap *lflows, struct ovn_datapath *od,
if (pkt_mark) {
ds_put_format(&actions, "pkt.mark = %u; ", pkt_mark);
}
ds_put_cstr(&actions, "next;");
ds_put_cstr(&actions, REG_ECMP_GROUP_ID" = 0; next;");
}
ds_put_format(&match, "%s", rule->match);

Expand All @@ -7611,6 +7618,107 @@ build_routing_policy_flow(struct hmap *lflows, struct ovn_datapath *od,
ds_destroy(&actions);
}

static void
build_ecmp_routing_policy_flows(struct hmap *lflows, struct ovn_datapath *od,
struct hmap *ports,
const struct nbrec_logical_router_policy *rule,
uint16_t ecmp_group_id)
{
ovs_assert(rule->n_nexthops > 1);

bool nexthops_is_ipv4 = true;

/* Check that all the nexthops belong to the same addr family before
* adding logical flows. */
for (uint16_t i = 0; i < rule->n_nexthops; i++) {
bool is_ipv4 = strchr(rule->nexthops[i], '.') ? true : false;

if (i == 0) {
nexthops_is_ipv4 = is_ipv4;
}

if (is_ipv4 != nexthops_is_ipv4) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
VLOG_WARN_RL(&rl, "nexthop [%s] of the router policy with "
"the match [%s] do not belong to the same address "
"family as other next hops",
rule->nexthops[i], rule->match);
return;
}
}

struct ds match = DS_EMPTY_INITIALIZER;
struct ds actions = DS_EMPTY_INITIALIZER;

for (uint16_t i = 0; i < rule->n_nexthops; i++) {
struct ovn_port *out_port = get_outport_for_routing_policy_nexthop(
od, ports, rule->priority, rule->nexthops[i]);
if (!out_port) {
goto cleanup;
}

const char *lrp_addr_s =
find_lrp_member_ip(out_port, rule->nexthops[i]);
if (!lrp_addr_s) {
static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
VLOG_WARN_RL(&rl, "lrp_addr not found for routing policy "
" priority %"PRId64" nexthop %s",
rule->priority, rule->nexthops[i]);
goto cleanup;
}

ds_clear(&actions);
uint32_t pkt_mark = ovn_smap_get_uint(&rule->options, "pkt_mark", 0);
if (pkt_mark) {
ds_put_format(&actions, "pkt.mark = %u; ", pkt_mark);
}

bool is_ipv4 = strchr(rule->nexthops[i], '.') ? true : false;

ds_put_format(&actions, "%s = %s; "
"%s = %s; "
"eth.src = %s; "
"outport = %s; "
"flags.loopback = 1; "
"next;",
is_ipv4 ? REG_NEXT_HOP_IPV4 : REG_NEXT_HOP_IPV6,
rule->nexthops[i],
is_ipv4 ? REG_SRC_IPV4 : REG_SRC_IPV6,
lrp_addr_s,
out_port->lrp_networks.ea_s,
out_port->json_key);

ds_clear(&match);
ds_put_format(&match, REG_ECMP_GROUP_ID" == %"PRIu16" && "
REG_ECMP_MEMBER_ID" == %"PRIu16,
ecmp_group_id, i + 1);
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_POLICY_ECMP,
100, ds_cstr(&match),
ds_cstr(&actions), &rule->header_);
}

ds_clear(&actions);
ds_put_format(&actions, "%s = %"PRIu16
"; %s = select(", REG_ECMP_GROUP_ID, ecmp_group_id,
REG_ECMP_MEMBER_ID);

for (uint16_t i = 0; i < rule->n_nexthops; i++) {
if (i > 0) {
ds_put_cstr(&actions, ", ");
}

ds_put_format(&actions, "%"PRIu16, i + 1);
}
ds_put_cstr(&actions, ");");
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_POLICY,
rule->priority, rule->match,
ds_cstr(&actions), &rule->header_);

cleanup:
ds_destroy(&match);
ds_destroy(&actions);
}

struct parsed_route {
struct ovs_list list_node;
struct in6_addr prefix;
Expand Down Expand Up @@ -10300,13 +10408,27 @@ build_ingress_policy_flows_for_lrouter(
if (od->nbr) {
/* This is a catch-all rule. It has the lowest priority (0)
* does a match-all("1") and pass-through (next) */
ovn_lflow_add(lflows, od, S_ROUTER_IN_POLICY, 0, "1", "next;");
ovn_lflow_add(lflows, od, S_ROUTER_IN_POLICY, 0, "1",
REG_ECMP_GROUP_ID" = 0; next;");
ovn_lflow_add(lflows, od, S_ROUTER_IN_POLICY_ECMP, 150,
REG_ECMP_GROUP_ID" == 0", "next;");

/* Convert routing policies to flows. */
uint16_t ecmp_group_id = 1;
for (int i = 0; i < od->nbr->n_policies; i++) {
const struct nbrec_logical_router_policy *rule
= od->nbr->policies[i];
build_routing_policy_flow(lflows, od, ports, rule, &rule->header_);
bool is_ecmp_reroute =
(!strcmp(rule->action, "reroute") && rule->n_nexthops > 1);

if (is_ecmp_reroute) {
build_ecmp_routing_policy_flows(lflows, od, ports, rule,
ecmp_group_id);
ecmp_group_id++;
} else {
build_routing_policy_flow(lflows, od, ports, rule,
&rule->header_);
}
}
}
}
Expand Down
6 changes: 4 additions & 2 deletions ovn-nb.ovsschema
@@ -1,7 +1,7 @@
{
"name": "OVN_Northbound",
"version": "5.29.0",
"cksum": "328602112 27064",
"version": "5.30.0",
"cksum": "3273824429 27172",
"tables": {
"NB_Global": {
"columns": {
Expand Down Expand Up @@ -391,6 +391,8 @@
"key": {"type": "string",
"enum": ["set", ["allow", "drop", "reroute"]]}}},
"nexthop": {"type": {"key": "string", "min": 0, "max": 1}},
"nexthops": {"type": {
"key": "string", "min": 0, "max": "unlimited"}},
"options": {
"type": {"key": "string", "value": "string",
"min": 0, "max": "unlimited"}},
Expand Down

0 comments on commit 35b00c7

Please sign in to comment.