Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leader reelections triggered by broken follower in healthy cluster #9563

Closed
wojtek-t opened this issue Apr 11, 2018 · 9 comments
Closed

Leader reelections triggered by broken follower in healthy cluster #9563

wojtek-t opened this issue Apr 11, 2018 · 9 comments

Comments

@wojtek-t
Copy link
Contributor

wojtek-t commented Apr 11, 2018

Etcd version: 3.1.11 or 3.1.13 (both behave the same)

Scenario 1

I've done the following experiment:

  1. setup 3-node etcd cluster (peer port: 2380, client port: 2379)
  2. choose one of the followers
  3. block traffic to and from peer port with the following ip-tables rules:
    -A INPUT -i eth0 -p tcp -m tcp --dport 2380 -j DROP
    -A OUTPUT -o eth0 -p tcp -m tcp --dport 2380 -j DROP
  4. wait a couple seconds
  5. unblock traffic

This triggered leader-reelection (even though the cluster was healthy all the time - the leader and the second follower).
Is that expected behavior?

Scenario 2

The second experiment I've done was similar:

  1. setup 3-node etcd cluster (peer port: 2380, client port: 2379)
  2. choose one of the followers
  3. block traffic to and from peer port with the following iptables rules:
    -A INPUT -i eth0 -p tcp -m tcp --dport 2380 -j DROP
    -A OUTPUT -o eth0 -p tcp -m tcp --dport 2380 -j DROP
  4. wait a couple seconds
  5. unblock only 50% of outgoing traffic (keep incoming blocked) with following iptables rules:
    -A INPUT -i eth0 -p tcp -m tcp --dport 2380 -j DROP
    -A OUTPUT -o eth0 -p tcp -m tcp --dport 2380 -m statistic --mode random --probability 0.5 -j DROP

This one triggers a stream of leader-reelection - reelection is happening every few seconds.
[This may be a bit artificial experiment, but that sounds like a bug to me.]

Question

Can you please clarify what (from your perspective) is a "by-design" outcome in those situations.
[In the idea world, I would expect that in case in healthy cluster, no leader-reelections should happen, but maybe that's not the case in raft].

@gyuho @xiang90 @jpbetz @mborsz

@gyuho
Copy link
Contributor

gyuho commented Apr 11, 2018

In the idea world, I would expect that in case in healthy cluster, no leader-reelections should happen

The isolated follower may election timeout and becomes candidate to trigger election. But the message vote from isolated follower would never reach other peers, so the active leader won't step down (e.g. message receive from a node with higher term) while packet dropping is activated.

For Scenario 1, do you observe leader election after 5. unblock traffic and do you have logs around that event?

For Scenario 2, do you observe leader election on step 5 (also server logs between current leader and new leader would be helpful to confirm)?

Note: Similar scenarios are already being tested in our functional testing, but above same patterns should be added to make sure.

@wojtek-t
Copy link
Contributor Author

For Scenario 1, do you observe leader election after 5. unblock traffic and do you have logs around that event?

Yes - there is single election after 5.

For Scenario 2, do you observe leader election on step 5 (also server logs between current leader and new leader would be helpful to confirm)?

After unblocking I start observing reelections.

I have 100% of reproduction those - I can't do it now, but I can send you some logs tomorrow morning.

@wojtek-t
Copy link
Contributor Author

So to provide some logs - these are the logs from the scenario 1 - as I wrote, i have 100% of reproducing it with the instructions I wrote above.

etcd1.log: follower on which I blocked traffic and unblocked outgoing
etcd2.log: initial leader
etcd3.log: initial follower
[I started the experiment around 10:59]
etcd1.log
etcd2.log
etcd3.log

Let me know if that's useful and you need any other information.

@wojtek-t
Copy link
Contributor Author

@gyuho - did you have chance to look into that?

@gyuho
Copy link
Contributor

gyuho commented Apr 18, 2018

@wojtek-t Sorry for delay. Will look into it this week.

@gyuho
Copy link
Contributor

gyuho commented Apr 19, 2018

@wojtek-t I looked at your server logs and was also able to reproduce:

unblock traffic
This triggered leader-reelection (even though the cluster was healthy all the time - the leader and the second follower).

Here's what happens in etcd server and Raft

  1. leader is A
  2. follower C gets isolated
  3. no write, no progress has been made in rest of the cluster (during isolation)
  4. follower C election times out, starting an election (now C has higher term)
  5. isolation is removed
  6. now follower C responds to current leader's heartbeat with MsgAppResp of higher term
  7. leader A receives MsgAppResp from follower C of higher term, and then leader steps down to follower

Can you please clarify what (from your perspective) is a "by-design" outcome in those situations.

Step 6 and 7 are necessary to prevent an isolated node from being stuck, thus this disruptive election is inevitable (even in 3.3 and master branch, even with pre-vote enabled).

@gyuho
Copy link
Contributor

gyuho commented Apr 19, 2018

Just to clarify

  1. no write, no progress has been made in rest of the cluster (during isolation)

Whether there were writes or not, election would still happen as soon as isolated follower regains its connectivity.

@wojtek-t
Copy link
Contributor Author

OK - so what you're saying is that this is "by-design" and there are no plans to change that behavior, right?

@gyuho
Copy link
Contributor

gyuho commented Apr 23, 2018

@wojtek-t Yes, it is working as designed. The original change can be found here #5468, if you are interested.

I will make sure we test this scenario through our functional testing.

@gyuho gyuho closed this as completed Apr 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants