state/leadership: leadership.Manager wakes/checks at least every config.MaxSleep #4219

Merged
merged 1 commit into from Jan 28, 2016

Conversation

Projects
None yet
2 participants
Contributor

davecheney commented Jan 28, 2016

Fixes LP 1511659

Backport for 1.25

This is a replacment for the original backport PR, #4212, which had to be abandoned.

This replacement makes only minimal adjustements to the juju/juju/testing package to accomodate the test for the fix to LP 1511659.


Original PR description

From @fwereade

Eliminate sleep-forever case in leadership manager worker; config struct now expects a positive MaxSleep duration, which is (in effect) the longest time the manager will go without explicitly resyncing its cache.

Prior to this change, as soon as the leadership.Manager started it would check to see if any of the leases it managed were about to expire and if so it would go through the expiration cycle. However if there were no leases currently under management, then it would return nil, thus removing itself from the selectable set -- the manager would sleep until either a claim or check operation happened, or the tomb was killed.

Electing a new leader requires that the old leadership lease is expired; not just out of date. It's the act of expiration which triggers a new election. Hence expiring the lease is required to trigger a re-election which then triggers the claim or check operation -- if there is no expiration then the manager sleeps because nobody will issues claims or checks.

To rectify this, the manager now wakes periodically and refreshes it's cache from the database, this causes the next pass of the loop to find leases to expire, which it then does, triggering re-election and unblocking the whole shebang.

(Review request: http://reviews.vapour.ws/r/3657/)

leadership.Manager wakes/checks at least every config.MaxSleep
Fixes LP 1511659

Backport for 1.25

This is a replacment for the original backport PR, #4212, which had to be abandoned.

This replacement makes only minimal adjustements to the juju/juju/testing package to accomodate the test for the fix to LP 1511659.

---

Original PR description

From @fwereade

> Eliminate sleep-forever case in leadership manager worker; config struct now expects a positive MaxSleep duration, which is (in effect) the longest time the manager will go without explicitly resyncing its cache.

Prior to this change, as soon as the leadership.Manager started it would check to see if any of the leases it managed were about to expire and if so it would go through the expiration cycle. However if there were no leases currently under management, then it would return nil, thus removing itself from the selectable set -- the manager would sleep until either a claim or check operation happened, or the tomb was killed.

Electing a new leader requires that the old leadership lease is expired; not just out of date. It's the act of expiration which triggers a new election. Hence expiring the lease is required to trigger a re-election which then triggers the claim or check operation -- if there is no expiration then the manager sleeps because nobody will issues claims or checks.

To rectify this, the manager now wakes periodically and refreshes it's cache from the database, this causes the next pass of the loop to find leases to expire, which it then does, triggering re-election and unblocking the whole shebang.
Contributor

davecheney commented Jan 28, 2016

$$merge$$

Contributor

jujubot commented Jan 28, 2016

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

jujubot added a commit that referenced this pull request Jan 28, 2016

Merge pull request #4219 from davecheney/backport/1511659
state/leadership: leadership.Manager wakes/checks at least every config.MaxSleep

Fixes LP 1511659

Backport for 1.25

This is a replacment for the original backport PR, #4212, which had to be abandoned.

This replacement makes only minimal adjustements to the juju/juju/testing package to accomodate the test for the fix to LP 1511659.

---

Original PR description

From @fwereade

> Eliminate sleep-forever case in leadership manager worker; config struct now expects a positive MaxSleep duration, which is (in effect) the longest time the manager will go without explicitly resyncing its cache.

Prior to this change, as soon as the leadership.Manager started it would check to see if any of the leases it managed were about to expire and if so it would go through the expiration cycle. However if there were no leases currently under management, then it would return nil, thus removing itself from the selectable set -- the manager would sleep until either a claim or check operation happened, or the tomb was killed.

Electing a new leader requires that the old leadership lease is expired; not just out of date. It's the act of expiration which triggers a new election. Hence expiring the lease is required to trigger a re-election which then triggers the claim or check operation -- if there is no expiration then the manager sleeps because nobody will issues claims or checks.

To rectify this, the manager now wakes periodically and refreshes it's cache from the database, this causes the next pass of the loop to find leases to expire, which it then does, triggering re-election and unblocking the whole shebang.

(Review request: http://reviews.vapour.ws/r/3657/)

@jujubot jujubot merged commit 4ecc898 into juju:1.25 Jan 28, 2016

@davecheney davecheney deleted the davecheney:backport/1511659 branch Jan 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment