New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Electing a 0.8.x leader during an upgrade can cause a panic in older servers #2889

Closed
slackpad opened this Issue Apr 11, 2017 · 3 comments

Comments

Projects
None yet
3 participants
@slackpad
Contributor

slackpad commented Apr 11, 2017

Haven't seen this in the wild, but this code could possibly cause older Consul servers to panic if a 0.8.x server gains leadership during an upgrade:

https://github.com/hashicorp/consul/blob/v0.8.0/consul/leader.go#L156-L162

The panic would occur when an older server gets a Raft log entry for the autopilot config, which it won't understand. To avoid this until fixed, make sure to upgrade the followers before updating the current leader.

This is probably pretty rare since most folks upgrade the leader last to avoid unnecessary elections, but the consequences are high enough to make it worth avoiding. We could have the autopilot loop skip out if not all servers are at least at the right version and have it create the config, so that way it's created quickly once all the servers are upgraded, even if there's not a leader transition.

@slackpad slackpad added this to the 0.8.1 milestone Apr 11, 2017

@fieldju

This comment has been minimized.

Show comment
Hide comment
@fieldju

fieldju Apr 13, 2017

@slackpad, I am not sure how rare it is for people in the AWS / CloudFormation world who use immutable AMIs and autoscaling group update policies. If I use CloudFormation to push out a new AMI that has the new version of Consul then the order in which the existing EC2 instances are replaced is non-deterministic.

fieldju commented Apr 13, 2017

@slackpad, I am not sure how rare it is for people in the AWS / CloudFormation world who use immutable AMIs and autoscaling group update policies. If I use CloudFormation to push out a new AMI that has the new version of Consul then the order in which the existing EC2 instances are replaced is non-deterministic.

@slackpad

This comment has been minimized.

Show comment
Hide comment
@slackpad

slackpad Apr 13, 2017

Contributor

@fieldju yeah we have a fix in work under #2897 and are planning on a quick release (ideally next Monday) to get this patched up.

Contributor

slackpad commented Apr 13, 2017

@fieldju yeah we have a fix in work under #2897 and are planning on a quick release (ideally next Monday) to get this patched up.

@fieldju

This comment has been minimized.

Show comment
Hide comment
@fieldju

fieldju Apr 13, 2017

I am glad, I saw this issue before upgrading. I was just thinking of ways to force CloudFormation to do what I wanted, but I can wait till #2897 is done.

fieldju commented Apr 13, 2017

I am glad, I saw this issue before upgrading. I was just thinking of ways to force CloudFormation to do what I wanted, but I can wait till #2897 is done.

@kyhavlov kyhavlov closed this Apr 13, 2017

gws added a commit to democracyworks/consul-coreos that referenced this issue Jun 14, 2017

Update to Consul 0.8.4
This is the latest stable version, and also works around a nasty bug in
0.8.0[1].

[1] hashicorp/consul#2889

gws added a commit to democracyworks/consul-coreos that referenced this issue Jun 14, 2017

Update to Consul 0.8.4
This is the latest stable version, and also fixes a nasty bug in
0.8.0[1].

[1] hashicorp/consul#2889
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment