-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non voting servers get elected as leader in a mixed OSS/Enterprise version cluster #6979
Comments
There are two places in raft where a node enters
Does that seem likely to you? |
hashicorp/raft#398 makes
I checked the first issue again:
And I am not sure anymore if this is the problem. There is a check before that makes contains |
I saw the same issue today: a non voter went visible as Leader for a few seconds, then another took its role instead (Consul 1.6.5) |
We had the issue several times recently: Eg: 5 voter servers + 6 non-voters
So, I suspect that other consul servers see machine22[nodeID=22] dead without tag non_voter=1 and decide that consul22[nodeID=22] alive with tag non_voter=true are the same machines and mismatches the tags So, until |
Example from real world (from misconfiguration at start):
Of course, we can see that 48-df-37-42-78-40.central.criteo.prod has not the non_voter flag, and I suspect that's the root cause of it => Until I restart all consul voters (in order to remove consul28-par and 48-df-37-42-78-40.central.criteo.prod), I have those messages:
|
Hi all! I brought this issue up the other day with Consul Engineering and we determine that this is basically working as intended, because OSS servers aren't aware of Ent-specific functionality, so they shouldn't ever be mixed. Given that, I'll close this one out. But do reach out if you have any other questions! |
Overview of the Issue
In a cluster running open source version of consul, I've added an enterprise version server with the option
non_voting_server
. After few election, this server happened to be leader of the cluster which was not, obviously, expected.It happend in production while we were introducing progressively the enterprise version server.
The non voting option was set to avoid any impact at the begining of the migration.
Note: bellow logs and info comes from a test platform on which we reproduced the issue
Reproduction Steps
Steps to reproduce this issue, eg:
Consul info for both OSS and ENT Server
OSS Server info
Enterprise Server info
Operating system and Environment details
In this exemple 3 VM running ubuntu with 2 consul 1.5.1 OSS and 1 consul 1.5.1 ENT
OSS servers are:
consul-server000-dc0 build=1.5.1criteo8 role=consul
consul-server001-dc0 build=1.5.1criteo8 role=consul
ENT server is:
consul-server002-dc0 build=1.5.1+entcriteo9 nonvoter=1 role=consul
consul members + info fragement:
consul members detailed for these servers
Non voting server is leader!
Log Fragments
Election log
Non voting server gets elected
The text was updated successfully, but these errors were encountered: