state/leadership: leadership.Manager wakes/checks at least every config.MaxSleep #4212

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
Contributor

davecheney commented Jan 27, 2016

Fixes LP 1511659

Backport for 1.25

From @fwereade

Eliminate sleep-forever case in leadership manager worker; config struct now expects a positive MaxSleep duration, which is (in effect) the longest time the manager will go without explicitly resyncing its cache.

Prior to this change, as soon as the leadership.Manager started it would check to see if any of the leases it managed were about to expire and if so it would go through the expiration cycle. However if there were no leases currently under management, then it would return nil, thus removing itself from the selectable set -- the manager would sleep until either a claim or check operation happened, or the tomb was killed.

Electing a new leader requires that the old leadership lease is expired; not just out of date. It's the act of expiration which triggers a new election. Hence expiring the lease is required to trigger a re-election which then triggers the claim or check operation -- if there is no expiration then the manager sleeps because nobody will issues claims or checks.

To rectify this, the manager now wakes periodically and refreshes it's cache from the database, this causes the next pass of the loop to find leases to expire, which it then does, triggering re-election and unblocking the whole shebang.

(Review request: http://reviews.vapour.ws/r/3645/)

Contributor

davecheney commented Jan 27, 2016

$$merge$$

Contributor

jujubot commented Jan 27, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Jan 27, 2016

Build failed: Generating tarball failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/6104

Contributor

davecheney commented Jan 27, 2016

$$merge$$

Contributor

jujubot commented Jan 27, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Jan 27, 2016

Build failed: Generating tarball failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/6105

Contributor

davecheney commented Jan 27, 2016

$$merge$$

Contributor

jujubot commented Jan 27, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Jan 27, 2016

Build failed: Generating tarball failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/6106

state/leadership: leadership.Manager wakes/checks at least every conf…
…ig.MaxSleep

Fixes LP 1511659

Backport for 1.25

From @fwereade
> Eliminate sleep-forever case in leadership manager worker; config struct now expects a positive MaxSleep duration, which is (in effect) the longest time the manager will go without explicitly resyncing its cache.

Prior to this change, as soon as the leadership.Manager started it would check to see if any of the leases it managed were about to expire and if so it would go through the expiration cycle. However if there were no leases currently under management, then it would return nil, thus removing itself from the selectable set -- the manager would sleep until either a claim or check operation happened, or the tomb was killed.

Electing a new leader requires that the old leadership lease is expired; not just out of date. It's the act of expiration which triggers a new election. Hence expiring the lease is required to trigger a re-election which then triggers the claim or check operation -- if there is no expiration then the manager sleeps because nobody will issues claims or checks.

To rectify this, the manager now wakes periodically and refreshes it's cache from the database, this causes the next pass of the loop to find leases to expire, which it then does, triggering re-election and unblocking the whole shebang.
Contributor

davecheney commented Jan 27, 2016

build failed because of significant whitespace, well isn't that just dandy.

Contributor

davecheney commented Jan 27, 2016

$$merge$$

Contributor

jujubot commented Jan 27, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Jan 27, 2016

Build failed: Generating tarball failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/6107

@davecheney davecheney closed this Jan 28, 2016

davecheney added a commit to davecheney/juju that referenced this pull request Jan 28, 2016

leadership.Manager wakes/checks at least every config.MaxSleep
Fixes LP 1511659

Backport for 1.25

This is a replacment for the original backport PR, juju#4212, which had to be abandoned.

This replacement makes only minimal adjustements to the juju/juju/testing package to accomodate the test for the fix to LP 1511659.

---

Original PR description

From @fwereade

> Eliminate sleep-forever case in leadership manager worker; config struct now expects a positive MaxSleep duration, which is (in effect) the longest time the manager will go without explicitly resyncing its cache.

Prior to this change, as soon as the leadership.Manager started it would check to see if any of the leases it managed were about to expire and if so it would go through the expiration cycle. However if there were no leases currently under management, then it would return nil, thus removing itself from the selectable set -- the manager would sleep until either a claim or check operation happened, or the tomb was killed.

Electing a new leader requires that the old leadership lease is expired; not just out of date. It's the act of expiration which triggers a new election. Hence expiring the lease is required to trigger a re-election which then triggers the claim or check operation -- if there is no expiration then the manager sleeps because nobody will issues claims or checks.

To rectify this, the manager now wakes periodically and refreshes it's cache from the database, this causes the next pass of the loop to find leases to expire, which it then does, triggering re-election and unblocking the whole shebang.

jujubot added a commit that referenced this pull request Jan 28, 2016

Merge pull request #4219 from davecheney/backport/1511659
state/leadership: leadership.Manager wakes/checks at least every config.MaxSleep

Fixes LP 1511659

Backport for 1.25

This is a replacment for the original backport PR, #4212, which had to be abandoned.

This replacement makes only minimal adjustements to the juju/juju/testing package to accomodate the test for the fix to LP 1511659.

---

Original PR description

From @fwereade

> Eliminate sleep-forever case in leadership manager worker; config struct now expects a positive MaxSleep duration, which is (in effect) the longest time the manager will go without explicitly resyncing its cache.

Prior to this change, as soon as the leadership.Manager started it would check to see if any of the leases it managed were about to expire and if so it would go through the expiration cycle. However if there were no leases currently under management, then it would return nil, thus removing itself from the selectable set -- the manager would sleep until either a claim or check operation happened, or the tomb was killed.

Electing a new leader requires that the old leadership lease is expired; not just out of date. It's the act of expiration which triggers a new election. Hence expiring the lease is required to trigger a re-election which then triggers the claim or check operation -- if there is no expiration then the manager sleeps because nobody will issues claims or checks.

To rectify this, the manager now wakes periodically and refreshes it's cache from the database, this causes the next pass of the loop to find leases to expire, which it then does, triggering re-election and unblocking the whole shebang.

(Review request: http://reviews.vapour.ws/r/3657/)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment