Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Introduce and use global clock in lease manager #7957
Conversation
axw
referenced this pull request
Oct 24, 2017
Closed
WIP/RFC: introduce global clock, use in lease manager #7954
|
Note to self: need to write an upgrade step that clears out existing lease documents, creating new ones in the new schema. |
jameinel
reviewed
Oct 24, 2017
Overall I really like switching to global clock rather than tracking Skews.
I'm not sure if dropping 'lease#' from the keys is a win.
I think most of my comments are superficial, though I do think 'what happens on upgrade' is a rather significant issue.
| +// worker. We should update worker/singular and API facade to | ||
| +// support claiming for the entire controller, rather rather | ||
| +// than a specific model, and use that to run controller-wide | ||
| +// singular workers. |
jameinel
Oct 24, 2017
Owner
Could they be 'model specific' workers, but just be the controller model? Did we ever split the UUIDs to tell the difference?
axw
Oct 24, 2017
Member
Yes, there is a unique controller and controller model UUID, so we can tell the difference.
Do you mean use the controller model UUID in the API? We could do that, but I'd rather not treat the controller model specially when we have another option. Updating the API to handle either a controller or model tag is easy enough. I mostly just didn't want to do it in this PR, as it's big enough already.
axw
Oct 25, 2017
Member
Ah but I've remembered now that leases is a model-specific collection. So we'd need a global leases collection too, or rework things to make the existing collection global. So maybe best just to use the controller model after all.
jameinel
Oct 25, 2017
Owner
Isn't it just a special key? If the controller UUID is unique , then the _id is just ControllerUUID:BLah while all other _id are ModelUUID:blah
axw
Oct 25, 2017
Member
Model-specific docs always have a model UUID prefix, but that'd be OK. We could use the singular lease manager for the controller model, but store independent leases for the controller UUID and model UUID. I'll take a look later on, see if we can get rid of the Mongo master runner once and for all.
| + ClockName: clockName, | ||
| + StateName: stateName, | ||
| + NewWorker: globalclockupdater.NewWorker, | ||
| + UpdateInterval: 1 * time.Second, |
jameinel
Oct 24, 2017
Owner
Should these be constants somewhere other than inline?
I don't think we need to make 1s configurable, have we poked at anything around BackoffDelay? Is not updating the clock by an agent for 30s ever going to cause a problem?
It feels like something shorter (10s) might be better, I'm not 100% sure what we're trying to prevent.
axw
Oct 24, 2017
Member
I haven't done heavy testing yet, but with my tests 30s backoff was fine. I can drop it to 10s, should be fine either way. Will make them constants in this file.
| + // to non-monotonic time, since there is no way of knowing in | ||
| + // general whether or not the database was updated. | ||
| + Advance(d time.Duration) error | ||
| +} |
jameinel
Oct 24, 2017
Owner
Will this need changes to the migration format? It seems that ideally leases would all just get refreshed on Migration.
axw
Oct 24, 2017
Member
That's what was happening before, and will continue to happen. No migration changes.
| +} | ||
| + | ||
| +// Config contains the common resources and information required to | ||
| +// create an Updater or Watcher. |
| + // | ||
| + // For the Updater, the session should not be copied, as we | ||
| + // rely on the session being closed when mastership changes, | ||
| + // to guarantee at most one Updater is running at any time. |
| + coll := u.config.Session.DB(u.config.Database).C(u.config.Collection) | ||
| + new := u.time.Add(d) | ||
| + if err := coll.Update(matchTimeDoc(u.time), setTimeDoc(new)); err != nil { | ||
| + if err == mgo.ErrNotFound { |
jameinel
Oct 24, 2017
Owner
given that matchTimeDoc doesn't take into account convergence, this means that of the 3 things that 2 are likely to be in delay mode, was that the specific desire? I guess if we aren't trying to round the time, so the likelihood of ever getting convergence is low.
Is there a reason to have sub millisecond precision? (or even sub-second).
It would need something like:
matchTimeDoc(u.time, new)
I suppose we know that we should have 1 updater of the database, with a 30s failover period, though that also then causes all existing leases to be extended by 30s.
Maybe that's ok.
axw
Oct 24, 2017
Member
The delay is purely there to avoid unnecessary database reads and writes. Maybe it's overkill. The worker/lease code would work just as well without it, and we'd have less of a delay in taking over leadership. Note though that it only takes effect when the leader fails, so a short delay shouldn't be unexpected.
We don't need sub-second precision, but it keeps the code simple and moves the global time along as far as we safely/correctly can. Is there a concern with this?
I don't understand what you're saying about convergence.
| + err := coll.Insert(newClockDoc(globalEpoch)) | ||
| + if err != nil { | ||
| + if mgo.IsDup(err) { | ||
| + // Document already exists, return the existing value. |
jameinel
Oct 24, 2017
Owner
Did you check if Query.Apply was something you might be able to use? I only know it exists, offhand I'm not sure it supports the "if it doesn't exist, set it, else use the existing one" which is different from how Upsert would work.)
It does feel a little odd that we start with the Insert rather than defaulting to assuming the object exists.
But it probably isn't a huge deal, as its only 1 extra query on startup.
axw
Oct 24, 2017
Member
Query.Apply looks appropriate, thanks. Will look to switch over to that. Will also consider whether we can drop the additional insert.
axw
Oct 25, 2017
Member
feels slightly awkward, but it works well enough: upsert with {$inc: {"time": 0}}
| + c.Assert(s.readTime(c), gc.Equals, globalEpoch.Add(time.Second)) | ||
| + | ||
| + err = u1.Advance(time.Second) | ||
| + c.Assert(err, jc.ErrorIsNil) |
jameinel
Oct 24, 2017
Owner
Do we have any tests that check what happens if you're advancing by something other than exactly 1 second?
| @@ -105,7 +103,7 @@ func (client *client) request(name string, request lease.Request, getOps opsFunc | ||
| // On the first attempt, assume cache is good. | ||
| if attempt > 0 { | ||
| - if err := client.Refresh(); err != nil { | ||
| + if err := client.refresh(false); err != nil { |
jameinel
Oct 24, 2017
Owner
if the op failed, why don't we want to reread the globalTime? can we assume that the lease will beheld long enough?
| @@ -257,6 +209,7 @@ func (client *client) readEntries(collection mongo.Collection) (map[string]entry | ||
| for iter.Next(&leaseDoc) { | ||
| name, entry, err := leaseDoc.entry() | ||
| if err != nil { | ||
| + iter.Close() |
| - // The "extended" lease will certainly expire before the | ||
| - // existing lease could. Done. | ||
| - return nil, lastEntry, errNoExtension | ||
| + globalTime, err := client.refreshGlobalTime() |
jameinel
Oct 24, 2017
Owner
I'm a little surprised that we would refresh the global time on every lease extension. I guess that is how we would ensure that any given lease always lasts at least long enough.
But ultimately, it feels like it would be better to have an in-memory object that represents the global time, which is updated regularly (every second), and then we issue and expire and extend leases based on that.
axw
Oct 25, 2017
Member
Originally that was my plan, but that would be unsound. If we were to do that, then claims could last less than the specified duration, because the reader might lag arbitrarily long behind the updater.
| - fieldLeaseExpiry: toInt64(expiry), | ||
| - fieldLeaseWriter: client.config.Id, | ||
| + fieldStart: toInt64(globalTime), | ||
| + fieldDuration: nextEntry.duration, |
jameinel
Oct 24, 2017
Owner
why is globalTime + duration better than just tracking the Expiry time?
It does tell us more information abut the request, but anyone trying to figure out how much time is left still just adds them together and compares it to the current globalTime, right?
My quick grep through the diff says that every read of '.duration' is always a ".start.add(.duration)"
The one way that I could see it being "better" is if the database op was "copy the global time during inserting the doc", which would mean that you have the most up-to-date value.
But since we don't use TXN for globalTime, I'm not sure it can be used to create the lease.
axw
Oct 25, 2017
Member
I did this for a potential optimisation.
If you see a lease starts at time T_m, and you've seen that the current global time is T_n where T_n >= T_m, you could potentially use the local clock to assert that the requested duration in real time has passed. If you've only got the expiry time, you can't do that safely, because global time can pass slower than real time (i.e. you can work it forwards from the start time, but not backwards from the expiry time).
| -func (client *client) clockDocId() string { | ||
| - return clockDocId(client.config.Namespace) | ||
| +func (client *client) refreshGlobalTime() (time.Time, error) { | ||
| + globalTime, err := client.config.GlobalClock.Now() |
jameinel
Oct 24, 2017
Owner
Can you add a logger.Trace("Refreshing lease globalTime:") possibly with a value logged?
I think it would be good to run a controller with a lot of applications and see how often we are triggering refreshing the globalTime.
If it is reasonable, np, but I can see it being many times a second.
I've seen "waking to refresh leases" or whatever that message is surprisingly often.
| - // by a second, taking 100ms to read the clock doc; sees same lease with | ||
| - // suitable uncertainty. | ||
| + // Remote client, whose local clock is offset significantly from the | ||
| + // local client's, but has the same global clock value. |
| @@ -112,6 +63,10 @@ Client usage considerations | ||
| if you fail, you'll get ErrInvalid and a fresh cache to inspect to find out | ||
| recent state. | ||
| + * The expiry time returned via lease.Info is relative to the local system |
jameinel
Oct 24, 2017
Owner
So if we are going to do this, would it make sense to thread it all the way through to the caller.
So the farthest away point does:
please give me a lease for 30s and right now is time XXXX
And we do all the work and compute global clock and add 30s, but the final return is:
the lease should be good until XXXX+30s ?
Otherwise you're only getting the APIserver's time, and not the agent local time.
axw
Oct 25, 2017
Member
worker/lease runs in the same process as the state/lease code, so it's OK
the machine and unit agents use worker/singular and worker/leadership respectively, and those deal with the API server only in time.Durations
| + Collection: or(params.Collection, "default-collection"), | ||
| + Mongo: mongo, | ||
| + LocalClock: localClock, | ||
| + GlobalClock: globalClock, |
jameinel
Oct 24, 2017
Owner
I've been staring at these long enough that Clock has stopped looking correct... :)
| - // without incurring extra DB hits. | ||
| - // Apart from checking validity on load, though, there's little reason | ||
| - // to use Id elsewhere; Namespace and Name are the sources of truth. | ||
| + // Id is always "<Namespace>#<Name>#", so that we can extract useful |
jameinel
Oct 24, 2017
Owner
so is the point of using "lease#" to have it be a globalKey() like we use elsewhere in State?
I realize it doesn't have to be, because it is confined to a table. But sometimes it is nice to have a string that is guaranteed to be unique and obvious.
axw
Oct 25, 2017
Member
we used to have lease# to distinguish lease and clock skew documents. we don't need or have the latter anymore, so lease# is redundant
| + // The global clock is not migrated; each controller has its own | ||
| + // independent global clock. | ||
| + globalClockC, | ||
| + // Leases are not migrated either. When an application is migrated, |
| +func (st *State) globalClockReader() (*globalclock.Reader, error) { | ||
| + return globalclock.NewReader(globalclock.ReaderConfig{ | ||
| + Config: globalclock.Config{ | ||
| + Session: st.MongoSession(), |
jameinel
Oct 24, 2017
Owner
should these all be on the shared session? I believe that requests are serialized for a single session.
| + return errors.Annotate(err, "updating global clock") | ||
| + } | ||
| + logger.Tracef("incremented global time by %s", interval) | ||
| + last = w.config.LocalClock.Now() |
jameinel
Oct 24, 2017
Owner
why reset last? You already have the 'now' that you used for the update.
This means that the clock always drifts by some amount per update. (Whatever the time is to update the database.)
Do we have an idea how much that will be? It would mean that every lease grows by a significant period of time. (50ms on 1s is a 5% drift).
I suppose it does ensure that 1s is always at least 1s. i'm just concerned if its too much.
Can we run the clock for a significant period of time under load and see what the drift is?
axw
Oct 25, 2017
Member
As you pointed out, this is to ensure that wall time 1s >= global time 1s.
While the local time will drift from the starting point, it should not matter. The global time will still be advanced at a roughly constant rate of 1s per 1s+db-update wall clock time. Anyway, is there a safe alternative?
jameinel
Oct 25, 2017
Owner
So if you're using "now := time.Now()" at the beginning and then "delta := time.Since(now)" don't you guarantee that over time you don't lose any values?
It does mean that any given clock => clock tick may not be exactly 1s, but if update 1 is slightly delayed so update 2 actually is slightly fast, then update 3 will be back on track, won't it?
axw
added some commits
Oct 24, 2017
|
LGTM |
|
!!bang!! |
|
$$merge$$ |
|
Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju |
axw commentedOct 24, 2017
•
Edited 1 time
-
axw
Oct 25, 2017
Description of change
This PR introduces a global clock document to the database. Each controller machine agent runs a worker that will periodically (1s) attempt to advance the global time by the same (period) amount. Concurrent updates will be prevented, and unsuccessful workers will back off for a short (30s) delay. We guarantee that the global time is monotonically increasing, and increases at a rate no faster than wall clock time.
We pass a global clock reader into the state/lease client, so that it can compute expiry times based on the global time. Lease documents store the start time as a global time, and the request duration; we then make the expiry time relative to the local time for use by clients. The local time is expected to contain a monotonic component (i.e. Go 1.9+), so it can be compared to time.Now().
We refresh the global clock time when a claim or extension is made, to guarantee that the lease is held for at least the specified duration. When a lease is to be expired, we first attempt to use the most recent global clock time, and then refresh if the lease expires after the cached time, only failing if the lease expires after the refreshed time.
QA steps
Identify the lease holders for "singular-controller" and "ubuntu", and stop those agents (controller machine agent and unit agent respectively). Another agent should eventually claim the leases.
Also:
(wait for postgresql units to become idle, take note of the leader)
(wait for additional controllers to be voting)
Verify that:
Documentation changes
None.
Bug reference
Fixes https://bugs.launchpad.net/juju/+bug/1706340