New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Servers can't agree on cluster leader after restart when gossiping on WAN #454

Closed
jippi opened this Issue Nov 5, 2014 · 42 comments

Comments

Projects
None yet
@jippi
Copy link
Contributor

jippi commented Nov 5, 2014

Hi,

I'm running consul in a all "WAN" environment, one DC. All my boxes is in the same rack, but do not have a private lan to gossip over.

The first time they join each other, with an empty /opt/consul directory they manage to join and agree on a leader.

If I restart the cluster, they still connect and find each other - but they never seem to agree on a leader

Node            Address              Status  Type    Build  Protocol
consul01        195.1xx.35.xx1:8301  alive   server  0.4.1  2
consul02        195.1xx.35.xx2:8301  alive   server  0.4.1  2
consul03        195.1xx.35.xx3:8301  alive   server  0.4.1  2

they just keep repeating 2014/11/05 13:09:41 [ERR] agent: failed to sync remote state: No cluster leader in the consul monitor output

All nodes are started with /usr/local/bin/consul agent -config-dir /etc/consul

server 1

{
  "advertise_addr": "195.1xx.35.xx1",
  "bind_addr": "195.1xx.35.xx1",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "data_dir": "/opt/consul",
  "datacenter": "online",
  "domain": "consul",
  "log_level": "INFO",
  "ports": {
    "dns": 53
  },
  "recursor": "8.8.8.8",
  "rejoin_after_leave": true,
  "retry_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "server": true,
  "start_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "ui_dir": "/opt/consul/ui"
}

server 2

{
  "advertise_addr": "195.1xx.35.xx2",
  "bind_addr": "195.1xx.35.xx2",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "data_dir": "/opt/consul",
  "datacenter": "online",
  "domain": "consul",
  "log_level": "INFO",
  "ports": {
    "dns": 53
  },
  "recursor": "8.8.8.8",
  "rejoin_after_leave": true,
  "retry_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "server": true,
  "start_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "ui_dir": "/opt/consul/ui"
}

server 3

{
  "advertise_addr": "195.1xx.35.xx3",
  "bind_addr": "195.1xx.35.xx3",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "data_dir": "/opt/consul",
  "datacenter": "online",
  "domain": "consul",
  "log_level": "INFO",
  "ports": {
    "dns": 53
  },
  "recursor": "8.8.8.8",
  "rejoin_after_leave": true,
  "retry_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "server": true,
  "start_join": [
    "195.1xx.35.xx1",
    "195.1xx.35.xx2",
    "195.1xx.35.xx3"
  ],
  "ui_dir": "/opt/consul/ui"
}

@jippi jippi changed the title Servers can't agree on cluster leader after restart Servers can't agree on cluster leader after restart when gossiping on WAN Nov 5, 2014

@armon

This comment has been minimized.

Copy link
Member

armon commented Nov 5, 2014

Can you provide more log output from the servers after a restart? Specifically, those prefixed with "raft:" are of interest. Also you don't usually need retry_join with start_join since they serve an overlapping purpose with different semantics. retry_join will continuously retry until it succeeds, while start_join exits if the join fails.

@adrienbrault

This comment has been minimized.

Copy link

adrienbrault commented Nov 21, 2014

I am having the same issue.

Note that this only happens with bootstrap_expect. If I have a single bootstrap: true node, the cluster is able to elect a new leader after stopping all the nodes and then starting them all up.

Here's my steps and the logs: https://gist.github.com/adrienbrault/ad8d13802913b095415a

@armon

This comment has been minimized.

Copy link
Member

armon commented Nov 21, 2014

It looks like both of you are forcing the cluster into an outage state. The bootstrap_expect is only used for an initial bootstrap process. Once the cluster has established quorum, it is not expected to ever loose it again. A loss of quorum requires manual intervention (See: https://www.consul.io/docs/guides/outage.html).

This is what is happening:

  • Initial cluster start, with bootstrap, leader elected (3 raft peers, quorum size 2)
  • Server 1 leaves, (2 raft peers, quorum size 2)
  • Server 2 leaves (1 raft peer, quorum size 1) <= Outage! Without -bootstrap a single node cluster is not allowed for safety reasons (avoid a split-brain in the cluster)
  • Server 3 leaves (No servers)

When you are starting all the servers again you now have 3 servers again. But this time there is no leader, and no quorum. At this point, it is unsafe for any of the servers to gain leadership (split-brain risks), they will sit there until an operator intervenes.

The best way is to avoid causing an outage. If you cause quorum to be lost, manual intervention is required. Any other approach on the part of Consul would introduce safety issues.

@i0rek

This comment has been minimized.

Copy link
Member

i0rek commented Nov 21, 2014

Very interesting @armon! Thanks for the explanation.

@hesamg

This comment has been minimized.

Copy link

hesamg commented Nov 21, 2014

So, there is no automatic outage recovery even within the same LAN?

@adrienbrault

This comment has been minimized.

Copy link

adrienbrault commented Nov 23, 2014

@armon What about being able to specify the expected quorum size ? It is up to the user to use a correct value, like it is for bootstrap_expect

@armon

This comment has been minimized.

Copy link
Member

armon commented Nov 24, 2014

@adrienbrault Not currently. There are issues around changing the quorum size then once it is specified. The current approach I think makes the very reasonable trade off of being a zero-touch bootstrap, and zero-touch scale up and down as long as quorum isn't lost. With a sensible amount of redundancy, it should be incredibly unlikely that an operator needs to intervene.

@chrismiller

This comment has been minimized.

Copy link

chrismiller commented Dec 3, 2014

I can understand why the cluster can't currently recover from losing its quorum, but I too would like to see a way to allow automatic recovery by compromising elsewhere, eg having a fixed quorum size (or a fixed maximum number of Consul servers?) and losing the ability to zero-touch scale up and down. Here's our use case:

We have a bunch of servers running in AWS. I'd like to have Consul servers running on three of them, and Consul clients on the rest. So far so good. The catch is that we shut down all but one of the servers each night (to save money while they are idle). It seems that even if I put a Consul server on the lone surviving machine, I'd have to jump through some extra hoops to have the cluster recover each morning? I don't see us wanting to dynamically add additional Consul servers anytime soon but automatic recovery even after a complete shutdown (intentional or otherwise) would be very beneficial to us. Thoughts?

@francois

This comment has been minimized.

Copy link

francois commented Dec 6, 2014

Reported the exact same issue when I interrupted the servers in #476. I expected bootstrap_expect to handle this for me.

@armon

This comment has been minimized.

Copy link
Member

armon commented Dec 8, 2014

Basically we can safely provide one of two things:

  • Scale up/down without operator intervention (change quorum size)
  • Fixed quorum size, bootstrap without operator intervention

I don't see a way for us to safely provide both. Currently bootstrap_expect is giving a bit of both in the very special case of an "empty" cluster (no previous leader) because there is no safety concerns there.

So we can have a flag like -expect=3 and we can automatically bootstrap / recover when 3 servers are available. In that world, scaling down to 1 down will cause an outage.

@nathanhruby

This comment has been minimized.

Copy link
Member

nathanhruby commented May 11, 2015

Hi,

I'm doing some testing of consul and just ran into this myself, since restarting the cluster at once is trivial with config management tools. I would expect this type of user error to be frequent, especially with folks used to tools that regain quorum when possible (mongo, heartbeat, etc..). I think "never loose quorum" is not tenable in the long run.

It seems like the 80% use case is the "Nothing bad happened, I just restarted my cluster and would like it to return to operation." In that case it appears that consul already has most of the information required to do so without having to sacrifice the zero-touch scale or bootstrap? It knows the previous server list from raft/peers.json and could remember who the last leader was as a replicated fact. Quorum re-convergence then could simply be "all the previous peer members are gossiping again, and I was the previous leader, let's try an election" ?

-n

@mohitarora

This comment has been minimized.

Copy link

mohitarora commented May 12, 2015

Can we at least have steps documented somewhere for restoring the cluster in case we lose the quorum?

@ryanbreen

This comment has been minimized.

Copy link
Contributor

ryanbreen commented May 12, 2015

@mohitarora

This comment has been minimized.

Copy link

mohitarora commented May 12, 2015

@ryanbreen That didn't help. Here is what i did

  • Started first node in bootstrap mode ( this node will self-elect as leader, creating a basis for forming the cluster.)
  • Started second and third node in non bootstrap mode which i call a normal server
  • I wanted each server on equal footing so I did shutdown the bootstrapped consul instance and then re-enter the cluster as a normal server.

Everything looks good at this point.

I forced the cluster to lose quorum by restarting 2 of the 3 nodes at same time. Nodes came back online but leader was not selected on its own.

I want to know what should be my next step here.

Should i again start node 1 in bootstrap mode and re-execute the steps mentioned above?

@ryanbreen

This comment has been minimized.

Copy link
Contributor

ryanbreen commented May 12, 2015

I would suggest bootstrapping with --bootstrap-expect=3 instead of a -bootstrap, per this guide: http://www.consul.io/docs/guides/bootstrapping.html

@nathanhruby

This comment has been minimized.

Copy link
Member

nathanhruby commented May 12, 2015

bootstrap-expect doesn't seem work after a node has entered and left a cluster and has vestigial raft data left in the datadir.

@mohitarora

This comment has been minimized.

Copy link

mohitarora commented May 12, 2015

Thanks @ryanbreen .

-bootstrap-expect is better than -bootstrap, I will start using that but both these are used when cluster is initialized.

I still need the steps for recovery once quorum is lost. In my case no leader was selected once all nodes came back to life after quorum loss.

@saulshanabrook

This comment has been minimized.

Copy link

saulshanabrook commented May 12, 2015

I also had a similar question to @mohitarora. If we do lose quorum, and there is no leader, what do we do?

@highlyunavailable

This comment has been minimized.

Copy link
Contributor

highlyunavailable commented May 12, 2015

@saulshanabrook Then you're in an outage scenario (since you've lost quorum) and need to decide which server is authoritative, then follow the outage recovery guide.

One thing I've also found: If your leaders actually leave the cluster when shutdown cleanly (rather than kill -9 style killed), which is the default behavior, then they can't rejoin after a restart due to having notified the cluster they are leaving. @mohitarora try setting leave-on-terminate to false in your server config and then repeating your test of starting 3 servers then stopping 2 and bringing them back up.

@saulshanabrook

This comment has been minimized.

Copy link

saulshanabrook commented May 13, 2015

@highlyunavailable What if all three go down at once? How do I restart then?

@highlyunavailable

This comment has been minimized.

Copy link
Contributor

highlyunavailable commented May 13, 2015

If they went down hard or if you had leave-on-terminate set to false you should be able to just start them back up in any order assuming their IPs didn't change. If they did you need to do outage recovery.

@stuart-warren

This comment has been minimized.

Copy link

stuart-warren commented May 26, 2015

I'm getting the same problem with bootstrap_expect: 3

I'm not following why it doesn't use this value to know when it's safe to vote a leader, whether there was ever a leader before or not...

Here are some logs as requested from the OP after a restart of a node
6 nodes, 3 are servers

$ consul monitor
2015/05/26 13:16:08 [INFO] raft: Restored from snapshot 62-41051-1432252097671
2015/05/26 13:16:08 [INFO] raft: Node at 10.97.13.132:8300 [Follower] entering Follower state
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithftz01.comp.com 10.97.13.132
2015/05/26 13:16:08 [INFO] consul: adding server ithftz01.comp.com (Addr: 10.97.13.132:8300) (DC: multitest)
2015/05/26 13:16:08 [INFO] serf: Attempting re-join to previously known node: ithftz03.comp.com: 10.97.13.134:8301
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithftz01.comp.com.multitest 10.97.13.132
2015/05/26 13:16:08 [WARN] serf: Failed to re-join any previously known node
2015/05/26 13:16:08 [INFO] consul: adding server ithftz01.comp.com.multitest (Addr: 10.97.13.132:8300) (DC: multitest)
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithftz02.comp.com 10.97.13.133
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithftz03.comp.com 10.97.13.134
2015/05/26 13:16:08 [INFO] consul: adding server ithftz02.comp.com (Addr: 10.97.13.133:8300) (DC: multitest)
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithfpz01.comp.com 10.97.13.137
2015/05/26 13:16:08 [WARN] memberlist: Refuting an alive message
2015/05/26 13:16:08 [INFO] consul: adding server ithftz03.comp.com (Addr: 10.97.13.134:8300) (DC: multitest)
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithfpz03.comp.com 10.97.13.135
2015/05/26 13:16:08 [INFO] serf: EventMemberJoin: ithfpz02.comp.com 10.97.13.136
2015/05/26 13:16:08 [INFO] serf: Re-joined to previously known node: ithftz03.comp.com: 10.97.13.134:8301
2015/05/26 13:16:08 [INFO] agent: Joining cluster...
2015/05/26 13:16:08 [INFO] agent: (LAN) joining: [10.97.13.134 10.97.13.133 10.97.13.132]
2015/05/26 13:16:08 [ERR] agent: failed to sync remote state: No cluster leader
2015/05/26 13:16:08 [INFO] agent: (LAN) joined: 3 Err: <nil>
2015/05/26 13:16:08 [INFO] agent: Join completed. Synced with 3 initial agents
2015/05/26 13:16:09 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
2015/05/26 13:16:13 [INFO] agent.rpc: Accepted client: 127.0.0.1:58283
2015/05/26 13:16:37 [ERR] agent: failed to sync remote state: No cluster leader
2015/05/26 13:16:52 [ERR] agent: failed to sync remote state: No cluster leader
2015/05/26 13:17:20 [ERR] agent: failed to sync remote state: No cluster leader
@jwestboston

This comment has been minimized.

Copy link

jwestboston commented Jul 28, 2015

This is becoming a problem for me in production and I may have to move away from Consul. If I stop all 3 consul nodes (in my 3 node cluster), I cannot start the Consul cluster back up without a major headache.

Are there any thoughts as to how to properly handle this?

@armon

This comment has been minimized.

Copy link
Member

armon commented Jul 28, 2015

Most of these problems are caused by our default behavior of attempting a graceful leave. Our mental model is that servers are long lived and don't shutdown for any reason other than unexpected power loss, or a graceful maintenance in which case you need to leave the cluster. In retrospect that was a bad default. Almost all of this can be avoided by just kill -9 the Consul server, in affect simulating power loss.

There is clearly a UX issue here with Consul that we need to address, but this behavior is not a bug. It is a manifestation of bad UX leading to operator error that is causing a quorum loss in a way that is predictable and expected. You can either tune the settings to be non-default behavior and force a non-graceful exit, or just "pull the plug" with kill.

This is a classic "damned if you do, damned if you don't", since if we change the defaults to the inverse, we will have a new corresponding ticket where anybody who was expected the leave to be graceful has now caused quorum loss by operator error in the reverse sense. I'm not sure what the best answer is here.

@jwestboston

This comment has been minimized.

Copy link

jwestboston commented Jul 28, 2015

Hey @armon

I think there may actually be a bug here though, no? In order to fix the issue I have to:

  1. Stop consul on all 3 server nodes.
  2. Rewrite /var/lib/consul/raft/peers.json with the correct contents.
  3. Start consul on all 3 server nodes.

The issue with peers.json is the actual contents seem to be written as "null". (Aka the string "null").

@armon

This comment has been minimized.

Copy link
Member

armon commented Jul 28, 2015

@jwestboston From the perspective of Consul, all three servers have left that cluster. They should not rejoin any gossip peers or take part in any future replication. If they had not left the cluster, they would still have those peers and would rejoin the cluster on start.

Because all the servers or a majority of them have done this, it is an outage that now requires manual intervention. Does this make sense?

@jwestboston

This comment has been minimized.

Copy link

jwestboston commented Jul 29, 2015

@armon Ahh .. yes .. that does make sense! :-) Thanks for the clarification. The peers.json file is a list of folks actually, expected to be in the cluster at the current time. Stopping consul == gracefully exiting the cluster == removal from that list across the cluster.

So really, indeed, things are operating as designed. And all we need (maybe? perhaps?) is an easier experience for cold restarting an existing Consul cluster.

@stuart-warren

This comment has been minimized.

Copy link

stuart-warren commented Jul 30, 2015

So we should potentially set skip_leave_on_interrupt to true?

@rmullinnix

This comment has been minimized.

Copy link

rmullinnix commented Jul 31, 2015

I have an ansible playbook that will bounce the cluster in this situation. It pushes out a valid peers.json based on the hosts file. (I had to do a kludge to get the double quotes right with sed). Then restarts the consul service (I run consul as an installed service on linux)

https://gist.github.com/rmullinnix/ebb5ef2bb877309aebcd

@tkyang99

This comment has been minimized.

Copy link

tkyang99 commented Feb 9, 2016

I'm running into similar problems by just Ctrl-C shutting down agents and then starting them up again. So has this issue ever been solved?

@slackpad

This comment has been minimized.

Copy link
Contributor

slackpad commented Feb 9, 2016

@tkyang99 do you have skip_leave_on_interrupt set to true? You'll want that for server configurations because you'll otherwise leave them from the cluster as you ctrl-c them, which will cause an outage.

@mwiora

This comment has been minimized.

Copy link

mwiora commented Feb 25, 2016

@armon
I'm having a cluster of 3 nodes on an environment, which is subject to be stopped and restarted in an automated manner.
Since I had trouble with getting the consul cluster up and running again I was looking for the reason why no leader could be elected.

Regarding your post this seems to be an expected behaviour. Thanks for your clarification at this point!

Is there any way to prepare the consul cluster for such restarts - like shutting down 2 of 3 instances before stopping the instances?
Beside of telling the nodes that they should not leave gracefully - since the master would have been selected manually in a complicated way (stopping the running service, forcing bootstrapping on this node, starting the service, stopping the service, removing enforced bootstrap, starting the service again)

Is there a feature planned to be implemented in future releases?

Cheers,
µatthias

@slackpad

This comment has been minimized.

Copy link
Contributor

slackpad commented Mar 10, 2016

Hi @mwiora it should be possible to kill -9 the servers, or stop them if the leave_on_terminate settings are set correctly and they will renegotiate leadership when they come back since none of the servers have left the quorum. We are working on some issues related to this, however - #1534.

@chrisrana

This comment has been minimized.

Copy link

chrisrana commented May 9, 2016

Hi I am getting following error
Still num_peers = 0
Now I am getting error

2016/05/09 09:56:50 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:56:50 [ERR] consul: 'cf-vaultdemo-vault_consul-2' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:57:50 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:57:50 [ERR] consul: 'cf-vaultdemo-vault_consul-2' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:58:50 [ERR] consul: 'cf-vaultdemo-vault_consul-2' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:58:50 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:59:53 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:59:53 [ERR] consul: 'cf-vaultdemo-vault_consul-2' and 'cf-vaultdemo-vault_consul-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.

mode, not adding Raft peer.
2016/05/09 09:57:55 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:58:55 [ERR] consul: 'cf-vaultdemo-vault_consul-1' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:58:55 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:59:55 [ERR] consul: 'cf-vaultdemo-vault_consul-1' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 09:59:55 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 10:00:55 [ERR] consul: 'cf-vaultdemo-vault_consul-1' and 'cf-vaultdemo-vault_consul-0' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.
2016/05/09 10:00:55 [ERR] consul: 'cf-vaultdemo-vault_consul-0' and 'cf-vaultdemo-vault_consul-2' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.

Node 1:
{
"data_dir": "/var/vcap/store/consul",
"ui_dir": "/var/vcap/packages/consul-ui",
"node_name": "cf-vaultdemo-vault_consul-0",
"bind_addr": "0.0.0.0",
"client_addr": "0.0.0.0",
"advertise_addr": "10.20.0.252",
"leave_on_terminate": false,
"log_level": "INFO",
"domain": "consul",
"server": true,
"rejoin_after_leave": true,
"ports": {
"dns": 53
},
"disable_update_check": true,
"recursor": "10.20.0.40",
"start_join": [

],
"retry_join": [

],
"bootstrap_expect": 3
}

Node 2
{
"data_dir": "/var/vcap/store/consul",
"ui_dir": "/var/vcap/packages/consul-ui",
"node_name": "cf-vaultdemo-vault_consul-1",
"bind_addr": "0.0.0.0",
"client_addr": "0.0.0.0",
"advertise_addr": "10.20.0.254",
"leave_on_terminate": false,
"log_level": "INFO",
"domain": "consul",
"server": true,
"rejoin_after_leave": true,
"ports": {
"dns": 53
},
"disable_update_check": true,
"recursor": "10.20.0.40",
"start_join": [

],
"retry_join": [

],
"bootstrap_expect": 3
}

Node 3

{
"data_dir": "/var/vcap/store/consul",
"ui_dir": "/var/vcap/packages/consul-ui",
"node_name": "cf-vaultdemo-vault_consul-2",
"bind_addr": "0.0.0.0",
"client_addr": "0.0.0.0",
"advertise_addr": "10.20.0.32",
"leave_on_terminate": false,
"log_level": "INFO",
"domain": "consul",
"server": true,
"rejoin_after_leave": true,
"ports": {
"dns": 53
},
"disable_update_check": true,
"recursor": "10.20.0.40",
"start_join": [

],
"retry_join": [

],
"bootstrap_expect": 3

@angelosanramon

This comment has been minimized.

Copy link

angelosanramon commented Sep 6, 2016

I am having the same issue with Consul 0.6.4. After playing with it for a couple of days I found that the easiest way to fix this is:

  1. login to one of the server node.
  2. edit consul.conf
  3. remove the bootstrap_expect key
  4. add bootstrap: true
  5. restart consul
  6. This should elect a leader in the cluster. once leader is elected, remove bootstrap key and add bootstrap_expect key again.

On my Ansible playbook, I have a shell task that deals with this problem:
result=$(curl http://localhost:8500/v1/status/leader?token={TOKEN})
if [ -z "$result" -o "$result" == '""' ]; then
jq 'del(.bootstrap_expect) | .bootstrap=true' /etc/consul.conf > /tmp/consul.conf
mv -f /etc/consul.conf /etc/consul.conf.bak
mv -f /tmp/consul.conf /etc/consul.conf
service consul restart
mv -f /etc/consul.conf.bak /etc/consul.conf
fi

Hope this helps.

@ljsommer

This comment has been minimized.

Copy link

ljsommer commented Sep 12, 2016

I was having an issue with a 3 node cluster so I figured I'd restart them to see if that addressed the issue.
"What could go wrong? They'll just re-elect a new leader."
I cannot get a new leader elected.

First I created a /data/raft/peers.json file populated as described in this guide:
https://www.consul.io/docs/guides/outage.html

I tried @angelosanramon suggestion which got me slightly further but not far enough.
`==> WARNING: Bootstrap mode enabled! Do not enable unless necessary
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
Node name: 'consul0.lx.pri'
Datacenter: 'dc1'
Server: true (bootstrap: true)
Client Addr: 0.0.0.0 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
Cluster Addr: 10.5.100.223 (LAN: 8301, WAN: 8302)
Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
Atlas:

==> Log data will now stream in as it occurs:

2016/09/12 21:20:46 [INFO] raft: Restored from snapshot 108-1158239-1473679882905
2016/09/12 21:20:46 [INFO] serf: EventMemberJoin: consul0.lx.pri 10.5.100.223
2016/09/12 21:20:46 [INFO] serf: EventMemberJoin: consul0.lx.pri.dc1 10.5.100.223
2016/09/12 21:20:46 [INFO] raft: Node at 10.5.100.223:8300 [Follower] entering Follower state
2016/09/12 21:20:46 [INFO] consul: adding LAN server consul0.lx.pri (Addr: 10.5.100.223:8300) (DC: dc1)
2016/09/12 21:20:46 [INFO] consul: adding WAN server consul0.lx.pri.dc1 (Addr: 10.5.100.223:8300) (DC: dc1)
2016/09/12 21:20:46 [ERR] agent: failed to sync remote state: No cluster leader
2016/09/12 21:20:48 [WARN] raft: Heartbeat timeout reached, starting election
2016/09/12 21:20:48 [INFO] raft: Node at 10.5.100.223:8300 [Candidate] entering Candidate state
2016/09/12 21:20:48 [INFO] raft: Election won. Tally: 1
2016/09/12 21:20:48 [INFO] raft: Node at 10.5.100.223:8300 [Leader] entering Leader state
2016/09/12 21:20:48 [INFO] consul: cluster leadership acquired
2016/09/12 21:20:48 [INFO] consul: New leader elected: consul0.lx.pri
2016/09/12 21:20:48 [INFO] raft: Disabling EnableSingleNode (bootstrap)
2016/09/12 21:20:49 [INFO] raft: Removed ourself, transitioning to follower
2016/09/12 21:20:49 [INFO] raft: Node at 10.5.100.223:8300 [Follower] entering Follower state
2016/09/12 21:20:49 [ERR] consul.catalog: Register failed: node is not the leader
2016/09/12 21:20:49 [ERR] agent: failed to sync changes: node is not the leader
2016/09/12 21:20:49 [INFO] consul: cluster leadership lost
2016/09/12 21:20:49 [ERR] consul: failed to wait for barrier: node is not the leader
2016/09/12 21:20:51 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.`

At this point I am actually going to tear down my entire Consul cluster and start from scratch. This is definitely an issue.

@slackpad

This comment has been minimized.

Copy link
Contributor

slackpad commented Sep 12, 2016

Hi @ljsommer sorry about that.

2016/09/12 21:20:49 [INFO] raft: Removed ourself, transitioning to follower

It looks like you have the leave entry in your Raft log, which is un-doing your peers.json change. This should be fixed in master as the peers.json contents are applied last, any chance you can try this with the 0.7.0-rc2 build?

@ljsommer

This comment has been minimized.

Copy link

ljsommer commented Sep 12, 2016

@slackpad

I have the good fortune to be able to actually rebuild from scratch without losing any critical data, and yes I'll definitely be using the latest version of Consul to do it. When I get fully rebuilt I will be simulating this same scenario and documenting the results. I'll make sure and update this thread when I do with a step by step guide.

@SukantGujar

This comment has been minimized.

Copy link

SukantGujar commented Jul 18, 2017

Any updates on your tryst @ljsommer? We are also facing this issue. We have used the Consul on Kubernetes recipe from https://github.com/kelseyhightower/consul-on-kubernetes, hosted on GKE. Its a typical three peer cluster. GKE crashed all the nodes when scaling up the cluster and since then the nodes stopped electing leader. Finally I had to remove and re-deploy them.

@jacohend

This comment has been minimized.

Copy link

jacohend commented Sep 28, 2017

Facing this issue as well with Consul on Kubernetes

@cmorent

This comment has been minimized.

Copy link

cmorent commented Oct 3, 2017

Same here !

@shantanugadgil

This comment has been minimized.

Copy link
Contributor

shantanugadgil commented Sep 12, 2018

I too am using Nomad + Consul in a multi region (three AWS regions so far) mode with Cloud AutoJoin settings.
Each region has only one server for (as this is an experimental setup)

The option bootstrap_expect = 1 causes servers on either side to start printing the error message:

[ERR] nomad: 'us-west-2-server-10-xx-xx-xx.us-west-2' and 'us-east-1-server-xx-xx-xx-xx.us-east-1' are both in bootstrap mode. Only one node should be in bootstrap mode, not adding Raft peer.

Not setting bootstrap_expect in the second region causes

[ERR] worker: failed to dequeue evaluation: No cluster leader
[ERR] http: Request /v1/agent/health?type=server, error: {"server":{"ok":false,"message":"No cluster leader"}}

At least setting up bootstrap_expect makes things work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment