Leader reelections triggered by broken follower in healthy cluster #9563

wojtek-t · 2018-04-11T14:25:44Z

Etcd version: 3.1.11 or 3.1.13 (both behave the same)

Scenario 1

I've done the following experiment:

setup 3-node etcd cluster (peer port: 2380, client port: 2379)
choose one of the followers
block traffic to and from peer port with the following ip-tables rules:
-A INPUT -i eth0 -p tcp -m tcp --dport 2380 -j DROP
-A OUTPUT -o eth0 -p tcp -m tcp --dport 2380 -j DROP
wait a couple seconds
unblock traffic

This triggered leader-reelection (even though the cluster was healthy all the time - the leader and the second follower).
Is that expected behavior?

Scenario 2

The second experiment I've done was similar:

setup 3-node etcd cluster (peer port: 2380, client port: 2379)
choose one of the followers
block traffic to and from peer port with the following iptables rules:
-A INPUT -i eth0 -p tcp -m tcp --dport 2380 -j DROP
-A OUTPUT -o eth0 -p tcp -m tcp --dport 2380 -j DROP
wait a couple seconds
unblock only 50% of outgoing traffic (keep incoming blocked) with following iptables rules:
-A INPUT -i eth0 -p tcp -m tcp --dport 2380 -j DROP
-A OUTPUT -o eth0 -p tcp -m tcp --dport 2380 -m statistic --mode random --probability 0.5 -j DROP

This one triggers a stream of leader-reelection - reelection is happening every few seconds.
[This may be a bit artificial experiment, but that sounds like a bug to me.]

Question

Can you please clarify what (from your perspective) is a "by-design" outcome in those situations.
[In the idea world, I would expect that in case in healthy cluster, no leader-reelections should happen, but maybe that's not the case in raft].

@gyuho @xiang90 @jpbetz @mborsz

gyuho · 2018-04-11T15:23:01Z

In the idea world, I would expect that in case in healthy cluster, no leader-reelections should happen

The isolated follower may election timeout and becomes candidate to trigger election. But the message vote from isolated follower would never reach other peers, so the active leader won't step down (e.g. message receive from a node with higher term) while packet dropping is activated.

For Scenario 1, do you observe leader election after 5. unblock traffic and do you have logs around that event?

For Scenario 2, do you observe leader election on step 5 (also server logs between current leader and new leader would be helpful to confirm)?

Note: Similar scenarios are already being tested in our functional testing, but above same patterns should be added to make sure.

wojtek-t · 2018-04-11T15:33:24Z

For Scenario 1, do you observe leader election after 5. unblock traffic and do you have logs around that event?

Yes - there is single election after 5.

For Scenario 2, do you observe leader election on step 5 (also server logs between current leader and new leader would be helpful to confirm)?

After unblocking I start observing reelections.

I have 100% of reproduction those - I can't do it now, but I can send you some logs tomorrow morning.

wojtek-t · 2018-04-12T11:12:58Z

So to provide some logs - these are the logs from the scenario 1 - as I wrote, i have 100% of reproducing it with the instructions I wrote above.

etcd1.log: follower on which I blocked traffic and unblocked outgoing
etcd2.log: initial leader
etcd3.log: initial follower
[I started the experiment around 10:59]
etcd1.log
etcd2.log
etcd3.log

Let me know if that's useful and you need any other information.

wojtek-t · 2018-04-18T08:50:39Z

@gyuho - did you have chance to look into that?

gyuho · 2018-04-18T15:53:45Z

@wojtek-t Sorry for delay. Will look into it this week.

gyuho · 2018-04-19T18:28:09Z

@wojtek-t I looked at your server logs and was also able to reproduce:

unblock traffic
This triggered leader-reelection (even though the cluster was healthy all the time - the leader and the second follower).

Here's what happens in etcd server and Raft

leader is A
follower C gets isolated
no write, no progress has been made in rest of the cluster (during isolation)
follower C election times out, starting an election (now C has higher term)
isolation is removed
now follower C responds to current leader's heartbeat with MsgAppResp of higher term
leader A receives MsgAppResp from follower C of higher term, and then leader steps down to follower

Can you please clarify what (from your perspective) is a "by-design" outcome in those situations.

Step 6 and 7 are necessary to prevent an isolated node from being stuck, thus this disruptive election is inevitable (even in 3.3 and master branch, even with pre-vote enabled).

gyuho · 2018-04-19T18:35:54Z

Just to clarify

no write, no progress has been made in rest of the cluster (during isolation)

Whether there were writes or not, election would still happen as soon as isolated follower regains its connectivity.

wojtek-t · 2018-04-23T09:06:04Z

OK - so what you're saying is that this is "by-design" and there are no plans to change that behavior, right?

gyuho · 2018-04-23T14:13:06Z

@wojtek-t Yes, it is working as designed. The original change can be found here #5468, if you are interested.

I will make sure we test this scenario through our functional testing.

gyuho added the area/raft label Apr 11, 2018

gyuho mentioned this issue Apr 14, 2018

pkg/proxy: make/simplify interface more extensible #9571

Merged

gyuho mentioned this issue Apr 23, 2018

*: support "continue" to label within for-loop etcd-io/gofail#12

Closed

gyuho closed this as completed Apr 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leader reelections triggered by broken follower in healthy cluster #9563

Leader reelections triggered by broken follower in healthy cluster #9563

wojtek-t commented Apr 11, 2018 •

edited

gyuho commented Apr 11, 2018 •

edited

wojtek-t commented Apr 11, 2018

wojtek-t commented Apr 12, 2018

wojtek-t commented Apr 18, 2018

gyuho commented Apr 18, 2018

gyuho commented Apr 19, 2018 •

edited

gyuho commented Apr 19, 2018

wojtek-t commented Apr 23, 2018

gyuho commented Apr 23, 2018

Leader reelections triggered by broken follower in healthy cluster #9563

Leader reelections triggered by broken follower in healthy cluster #9563

Comments

wojtek-t commented Apr 11, 2018 • edited

Scenario 1

Scenario 2

Question

gyuho commented Apr 11, 2018 • edited

wojtek-t commented Apr 11, 2018

wojtek-t commented Apr 12, 2018

wojtek-t commented Apr 18, 2018

gyuho commented Apr 18, 2018

gyuho commented Apr 19, 2018 • edited

gyuho commented Apr 19, 2018

wojtek-t commented Apr 23, 2018

gyuho commented Apr 23, 2018

wojtek-t commented Apr 11, 2018 •

edited

gyuho commented Apr 11, 2018 •

edited

gyuho commented Apr 19, 2018 •

edited