Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Use lease management for singular controller workers #7984
Conversation
axw
changed the title from
Jujud singular controller
to
Use lease management for singular controller workers
Oct 29, 2017
| } | ||
| -// Claim attempts to claim responsibility for model administration for the | ||
| -// supplied duration. If the claim is denied, it will return | ||
| +// ClaimModel attempts to claim responsibility for administration of the entity |
wallyworld
approved these changes
Oct 30, 2017
Looks good, love the deleted mongo code.
Might pay to get a second opinion just in case.
| + modelTag names.ModelTag | ||
| +} | ||
| + | ||
| +func (b *stateBackend) ModelTag() names.ModelTag { |
wallyworld
Oct 30, 2017
Owner
// ModelTag tells the Facade what models it should consider requests for.
| - ModelTag string `json:"model-tag"` | ||
| - ControllerTag string `json:"controller-tag"` | ||
| - Duration time.Duration `json:"duration"` | ||
| + Tag string `json:"tag"` |
wallyworld
Oct 30, 2017
Owner
EntityTag perhaps?
Then we have qualified names for both EntityTag and ClaimantTag
| + | ||
| + // SingularFlagDuration defines for how long this agent will ask | ||
| + // for controller administration rights. | ||
| + SingularFlagDuration time.Duration |
wallyworld
Oct 30, 2017
Owner
The var name here is not very descriptive - when I saw it in the code above, I had to scroll to its definition to have any clue what it was for. Perhaps ControllerAdminAttemptDuration? Or something.
axw
Oct 30, 2017
Member
renamed to ControllerLeaseDuration, hopefully that's more meaningful? ControllerAdminAttemptDuration says nothing to me
| +// State, and returns a worker implementing engine.Flag, whose | ||
| +// Check method always returns true. This is used for flagging | ||
| +// that the machine is a controller/model manager. | ||
| +func stateFlagManifold() dependency.Manifold { |
| + } | ||
| + | ||
| + go func() { | ||
| + worker.Wait() |
axw
Oct 30, 2017
Member
We could but it would be inappropriate. The purpose of this goroutine is just to decref the State when the worker is done.
jameinel
approved these changes
Oct 30, 2017
some small issues, I'm not sure that many of them need changes, but some comments nonetheless.
| + entity names.Tag, | ||
| +) (*API, error) { | ||
| + if !names.IsValidMachine(claimant.Id()) { | ||
| + return nil, errors.NotValidf("claimant tag") |
jameinel
Oct 30, 2017
Owner
I'm a bit surprised we need this, given claimant is a names.MachineTag how do you create a MachineTag that isn't a valid machine?
| @@ -224,7 +224,7 @@ func AllFacades() *facade.Registry { | ||
| reg("Resumer", 2, resumer.NewResumerAPI) | ||
| reg("RetryStrategy", 1, retrystrategy.NewRetryStrategyAPI) | ||
| - reg("Singular", 1, singular.NewExternalFacade) | ||
| + reg("Singular", 2, singular.NewExternalFacade) |
jameinel
Oct 30, 2017
Owner
are we not registering version 1 somewhere else? Are we just not worried about older versions because this runs in-process?
axw
Oct 30, 2017
Member
Correct. I rationalised in a commit message, but forgot to carry over to the PR description:
The facade version has been bumped, but the old
version has been dropped entirely. Controllers
are always in sync with both their API client
and API server code, and this is a controller-only
API; there's no point in carrying the old version.
| - claimer lease.Claimer | ||
| + auth facade.Authorizer | ||
| + controller names.ControllerTag | ||
| + model names.ModelTag |
jameinel
Oct 30, 2017
Owner
would it be clearer if the member was 'modeltag' instead of 'model' which sounds like it is the actual Model object not a ModelTag object?
| } | ||
| + err = facade.claimer.WaitUntilExpired(leaseId) |
jameinel
Oct 30, 2017
Owner
This actually seems to have odd semantics if we ever actually passed in more than 1 entity. Namely, we don't wait for all of them in parallel, we wait for them sequentially, which actually means that we might expire the first request by the time we manage to grab the last one, given nothing in here seems to be refreshing any claimed leases during the wait time.
Not your code, so I'm happy enough with a caveat/comment/bug, but this does look like if you had explicitly staggered leases, lease-1 expires in 10s, lease-2 in 25s, and lease-3 actually refreshes before lease-2 is grabbed, so it actually isn't until 55s, then lease-1 has actually expired before we finish grabbing #3.
| @@ -1921,45 +1908,6 @@ func (a *MachineAgent) uninstallAgent() error { | ||
| return errors.Errorf("uninstall failed: %v", errs) | ||
| } | ||
| -type MongoSessioner interface { | ||
| - MongoSession() *mgo.Session |
| + Flags: []string{ | ||
| + isControllerFlagName, | ||
| + }, | ||
| +}.Decorate |
jameinel
Oct 30, 2017
Owner
While these seem cute and functional, and make for terse descriptions of the components to run, it feels a little like the contribute to the 'logging' spew when trying to figure out what is/should be/isn't running right now. Not running because "flag not set" is often not very understandable. Is it supposed to be set? is something currently running and going to be setting it soon?
axw
Oct 30, 2017
Member
Yes, they are a bit noisy, and not completely clear when debugging. I don't have immediate ideas for improvement, will have to ponder.
| + isResponsibleControllerFlagName = "is-responsible-controller-flag" | ||
| + isControllerFlagName = "is-controller-flag" | ||
| + logPrunerName = "log-pruner" | ||
| + txnPrunerName = "transaction-pruner" |
jameinel
Oct 30, 2017
Owner
this makes me feel like its reasonable to break this into sections so we don't have to churn the entire table when we add a flag that is slightly longer.
| @@ -53,6 +54,9 @@ func (*ManifoldsSuite) TestManifoldNames(c *gc.C) { | ||
| "fan-configurer", | ||
| "global-clock-updater", | ||
| "host-key-reporter", | ||
| + "is-controller-flag", | ||
| + "is-responsible-controller-flag", |
jameinel
Oct 30, 2017
Owner
'is-responsible-controller-flag' still feels really hard for me to get my head around. responsible for what?
is-primary-controller
is master controller
is singleton controller
I don't know that I have anything better.
axw
Oct 30, 2017
Member
Yeah, I just copied from the model worker, figured it was better to be consistent. I like "is-primary-controller-flag", will go with that.
| + if pruneTimer == nil { | ||
| + pruneTimer = w.config.Clock.NewTimer(w.config.PruneInterval) | ||
| + pruneCh = pruneTimer.Chan() | ||
| + defer pruneTimer.Stop() |
jameinel
Oct 30, 2017
Owner
is there a reason to defer creating the pruneTimer until now?
Could we just set up the pruneTimer outside of the loop?
Also, shouldn't we be calling pruneTimer.Reset(w.config.PruneInterval) when the config changes?
axw
Oct 30, 2017
Member
is there a reason to defer creating the pruneTimer until now?
Yes, the reason is that we don't have any configuration until we get the first watcher notification. I'll comment.
Also, shouldn't we be calling pruneTimer.Reset(w.config.PruneInterval) when the config changes?
Nope, pruning should be every , from when we get the first config notification. The interval is statically configured via the manifold config.
axw
added some commits
Oct 29, 2017
|
$$merge$$ |
|
Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju |
axw commentedOct 29, 2017
Description of change
The aim of this PR is to get rid of Mongo mastership as a means of running singular controller workers, and while doing so, move several more state workers into the machine manifold (txnpruner, dblogpruner).
The "singular" lease manager is updated to manage leases for either the controller or model. The machine manifolds contain a singular worker that runs for all controller agents, and attempts to claim a lease on the controller. The txnpruner, dblogpruner, and externalcontrollerupdater manifolds/workers depend on that to start up.
We introduce two new flags to the machine manifolds: is-controller-flag, and is-responsible-controller-flag. The former is used to indicate that the agent is a controller, and the latter indicates that the agent has claimed responsibility for the controller.
QA steps
(wait for all)
Documentation changes
None.
Bug reference
https://bugs.launchpad.net/juju/+bug/1726680