Skip to content

Commit

Permalink
ovn-controller: Fix chassisredirect port flapping when ovs-vswitchd c…
Browse files Browse the repository at this point in the history
…rashes

On a chassis when ovs-vswitchd crashes for some reason, the BFD status doesn't
get updated in the ovs db. ovn-controller will be reading the old BFD status
even though ovs-vswitchd is crashed. This results in the chassiredirect port
claim flapping between the master chassis and the chasiss with the next higher
priority if ovs-vswitchd crashes in the master chassis.

All the other chassis notices the BFD status down with the master chassis
and hence the next higher priority claims the port. But according to
the master chassis, the BFD status is fine and it again claims back the
chassisredirect port. And this results in flapping. The issue gets resolved
when ovs-vswitchd comes back but until then it leads to lot of SB DB
transactions and high CPU usage in ovn-controller's.

This patch fixes the issue by checking the OF connection status of the
ovn-controller with ovs-vswitchd and calculates the active bfd tunnels
only if it's connected.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
  • Loading branch information
numansiddique authored and ovsrobot committed Feb 4, 2019
1 parent 8411b6c commit 1299480
Show file tree
Hide file tree
Showing 4 changed files with 50 additions and 2 deletions.
6 changes: 6 additions & 0 deletions ovn/controller/ofctrl.c
Expand Up @@ -1265,3 +1265,9 @@ ofctrl_inject_pkt(const struct ovsrec_bridge *br_int, const char *flow_s,

return NULL;
}

bool
ofctrl_is_connected(void)
{
return rconn_is_connected(swconn);
}
3 changes: 3 additions & 0 deletions ovn/controller/ofctrl.h
Expand Up @@ -60,4 +60,7 @@ void ofctrl_check_and_add_flow(struct hmap *desired_flows, uint8_t table_id,
const struct match *,
const struct ofpbuf *ofpacts,
bool log_duplicate_flow);

bool ofctrl_is_connected(void);

#endif /* ovn/ofctrl.h */
13 changes: 12 additions & 1 deletion ovn/controller/ovn-controller.c
Expand Up @@ -689,7 +689,18 @@ main(int argc, char *argv[])
ovsrec_bridge_table_get(ovs_idl_loop.idl), br_int,
sbrec_chassis_table_get(ovnsb_idl_loop.idl), chassis_id,
sbrec_sb_global_first(ovnsb_idl_loop.idl));
bfd_calculate_active_tunnels(br_int, &active_tunnels);

if (ofctrl_is_connected()) {
/* Calculate the active tunnels only if have an an active
* OpenFlow connection to br-int.
* If we don't have a connection to br-int, it could mean
* ovs-vswitchd is down for some reason and the BFD status
* in the Interface rows could be stale. So its better to
* consider 'active_tunnels' set to be empty.
* */
bfd_calculate_active_tunnels(br_int, &active_tunnels);
}

binding_run(ovnsb_idl_txn, ovs_idl_txn, sbrec_chassis_by_name,
sbrec_datapath_binding_by_key,
sbrec_port_binding_by_datapath,
Expand Down
30 changes: 29 additions & 1 deletion tests/ovn.at
Expand Up @@ -8055,7 +8055,35 @@ ovn-nbctl --timeout=3 --wait=hv \

test_ip_packet gw2 gw1

OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
# Both gw1 and gw2 at this point should have claimed the cr-alice
# once each.
gw1_claim_ct=`grep "cr-alice: Claiming" gw1/ovn-controller.log | wc -l`
gw2_claim_ct=`grep "cr-alice: Claiming" gw2/ovn-controller.log | wc -l`

# Kill ovs-vswitchd in gw2. gw1 should claim the gateway port.
# So gw1 should claim twice and gw1 only once.

kill `cat gw2/ovs-vswitchd.pid`

# gw1 should claim the cr-alice
gw1_claim_ct=$((gw1_claim_ct+1))

OVS_WAIT_UNTIL([test $gw1_claim_ct = `cat gw1/ovn-controller.log \
| grep -c "cr-alice: Claiming"`])

AT_CHECK([test $gw2_claim_ct = `cat gw2/ovn-controller.log | \
grep -c "cr-alice: Claiming"`])

ovn-sbctl show
test_ip_packet gw1 gw2

as gw2
ovs-appctl -t ovn-controller exit
ovs-appctl -t ovs-vswitchd exit
ps -aef | grep ovn-c

OVN_CLEANUP([hv1],[gw1],[ext1])

AT_CLEANUP

AT_SETUP([ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router gateway port])
Expand Down

0 comments on commit 1299480

Please sign in to comment.