ovsdb raft: Avoid unnecessary reconnecting during leader election.

If a server claims itself as "disconnected", all clients connected to that server will try to reconnect to a new server in the cluster. However, currently a server would claim itself as disconnected even when itself is the candidate and try to become the new leader (most likely it will be), and all its clients will reconnect to another node. During a leader fail-over (e.g. due to a leader failure), it is expected that all clients of the old leader will have to reconnect to other nodes in the cluster, but it is unnecessary for all the clients of a healthy node to reconnect, which could cause more disturbance in a large scale environment. This patch fixes the problem by slightly change the condition that a server regards itself as disconnected: if its role is candidate, it is regarded as disconnected only if the election didn't succeed at the first attempt. Related failure test cases are also unskipped and all passed with this patch. Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Ben Pfaff <blp@ovn.org>
openvswitch · Apr 22, 2019 · 4d9b28c · 4d9b28c
1 parent 9f39622
commit 4d9b28c
Show file tree

Hide file tree

Showing 2 changed files with 11 additions and 8 deletions.
diff --git a/ovsdb/raft.c b/ovsdb/raft.c
@@ -286,6 +286,8 @@ struct raft {
 
     /* Candidates only.  Reinitialized at start of election. */
     int n_votes;                /* Number of votes for me. */
+    bool candidate_retrying;    /* The first round of election timed-out and it
+                                   is now retrying. */
 };
 
 /* All Raft structures. */
@@ -985,11 +987,17 @@ raft_get_sid(const struct raft *raft)
 /* Returns true if 'raft' has completed joining its cluster, has not left or
  * initiated leaving the cluster, does not have failed disk storage, and is
  * apparently connected to the leader in a healthy way (or is itself the
- * leader).*/
+ * leader).
+ *
+ * If 'raft' is candidate:
+ * a) if it is the first round of election, consider it as connected, hoping
+ *    it will successfully elect a new leader soon.
+ * b) if it is already retrying, consider it as disconnected (so that clients
+ *    may decide to reconnect to other members). */
 bool
 raft_is_connected(const struct raft *raft)
 {
-    return (raft->role != RAFT_CANDIDATE
+    return (!(raft->role == RAFT_CANDIDATE && raft->candidate_retrying)
             && !raft->joining
             && !raft->leaving
             && !raft->left
@@ -1608,6 +1616,7 @@ raft_start_election(struct raft *raft, bool leadership_transfer)
     }
 
     ovs_assert(raft->role != RAFT_LEADER);
+    raft->candidate_retrying = (raft->role == RAFT_CANDIDATE);
     raft->role = RAFT_CANDIDATE;
 
     raft->n_votes = 0;

diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
@@ -148,8 +148,6 @@ AT_CLEANUP
 
 AT_SETUP([OVSDB cluster - txn on follower-2, leader crash before sending execRep, follower-2 becomes leader])
 AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
-# XXX: fix bug before enabling this test
-AT_CHECK([exit 77])
 ovsdb_cluster_failure_test 2 3 1 crash-before-sending-execute-command-reply 2
 AT_CLEANUP
 
@@ -160,8 +158,6 @@ AT_CLEANUP
 
 AT_SETUP([OVSDB cluster - txn on follower-2, leader crash after sending execRep, follower-2 becomes leader])
 AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
-# XXX: fix bug before enabling this test
-AT_CHECK([exit 77])
 ovsdb_cluster_failure_test 2 3 1 crash-after-sending-execute-command-reply 2
 AT_CLEANUP
 
@@ -172,8 +168,6 @@ AT_CLEANUP
 
 AT_SETUP([OVSDB cluster - txn on leader, leader crash before sending appendReq, follower-2 becomes leader])
 AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
-# XXX: fix bug before enabling this test
-AT_CHECK([exit 77])
 ovsdb_cluster_failure_test 1 2 1 crash-before-sending-append-request 2
 AT_CLEANUP