Introduce and use global clock in lease manager #7957

Merged
merged 5 commits into from Oct 26, 2017

Conversation

Projects
None yet
3 participants
Member

axw commented Oct 24, 2017

Description of change

This PR introduces a global clock document to the database. Each controller machine agent runs a worker that will periodically (1s) attempt to advance the global time by the same (period) amount. Concurrent updates will be prevented, and unsuccessful workers will back off for a short (30s) delay. We guarantee that the global time is monotonically increasing, and increases at a rate no faster than wall clock time.

We pass a global clock reader into the state/lease client, so that it can compute expiry times based on the global time. Lease documents store the start time as a global time, and the request duration; we then make the expiry time relative to the local time for use by clients. The local time is expected to contain a monotonic component (i.e. Go 1.9+), so it can be compared to time.Now().

We refresh the global clock time when a claim or extension is made, to guarantee that the lease is held for at least the specified duration. When a lease is to be expired, we first attempt to use the most recent global clock time, and then refresh if the lease expires after the cached time, only failing if the lease expires after the refreshed time.

QA steps

  1. juju bootstrap localhost
  2. juju enable-ha
  3. juju deploy ubuntu -n 3

Identify the lease holders for "singular-controller" and "ubuntu", and stop those agents (controller machine agent and unit agent respectively). Another agent should eventually claim the leases.

Also:

  1. juju bootstrap localhost (with juju 2.2.5)
  2. juju deploy postgresql -n 3
    (wait for postgresql units to become idle, take note of the leader)
  3. juju enable-ha
    (wait for additional controllers to be voting)
  4. juju upgrade-juju -m controller (with this branch)

Verify that:

  • there are no old docs in the leases collection
  • the leases are kept by the previous holders across upgrade

Documentation changes

None.

Bug reference

Fixes https://bugs.launchpad.net/juju/+bug/1706340

Member

axw commented Oct 24, 2017

Note to self: need to write an upgrade step that clears out existing lease documents, creating new ones in the new schema.

Overall I really like switching to global clock rather than tracking Skews.
I'm not sure if dropping 'lease#' from the keys is a win.

I think most of my comments are superficial, though I do think 'what happens on upgrade' is a rather significant issue.

+// worker. We should update worker/singular and API facade to
+// support claiming for the entire controller, rather rather
+// than a specific model, and use that to run controller-wide
+// singular workers.
@jameinel

jameinel Oct 24, 2017

Owner

Could they be 'model specific' workers, but just be the controller model? Did we ever split the UUIDs to tell the difference?

@axw

axw Oct 24, 2017

Member

Yes, there is a unique controller and controller model UUID, so we can tell the difference.

Do you mean use the controller model UUID in the API? We could do that, but I'd rather not treat the controller model specially when we have another option. Updating the API to handle either a controller or model tag is easy enough. I mostly just didn't want to do it in this PR, as it's big enough already.

@axw

axw Oct 25, 2017

Member

Ah but I've remembered now that leases is a model-specific collection. So we'd need a global leases collection too, or rework things to make the existing collection global. So maybe best just to use the controller model after all.

@jameinel

jameinel Oct 25, 2017

Owner

Isn't it just a special key? If the controller UUID is unique , then the _id is just ControllerUUID:BLah while all other _id are ModelUUID:blah

@axw

axw Oct 25, 2017

Member

Model-specific docs always have a model UUID prefix, but that'd be OK. We could use the singular lease manager for the controller model, but store independent leases for the controller UUID and model UUID. I'll take a look later on, see if we can get rid of the Mongo master runner once and for all.

cmd/jujud/agent/machine/manifolds.go
+ ClockName: clockName,
+ StateName: stateName,
+ NewWorker: globalclockupdater.NewWorker,
+ UpdateInterval: 1 * time.Second,
@jameinel

jameinel Oct 24, 2017

Owner

Should these be constants somewhere other than inline?
I don't think we need to make 1s configurable, have we poked at anything around BackoffDelay? Is not updating the clock by an agent for 30s ever going to cause a problem?
It feels like something shorter (10s) might be better, I'm not 100% sure what we're trying to prevent.

@axw

axw Oct 24, 2017

Member

I haven't done heavy testing yet, but with my tests 30s backoff was fine. I can drop it to 10s, should be fine either way. Will make them constants in this file.

+ // to non-monotonic time, since there is no way of knowing in
+ // general whether or not the database was updated.
+ Advance(d time.Duration) error
+}
@jameinel

jameinel Oct 24, 2017

Owner

Will this need changes to the migration format? It seems that ideally leases would all just get refreshed on Migration.

@axw

axw Oct 24, 2017

Member

That's what was happening before, and will continue to happen. No migration changes.

state/globalclock/config.go
+}
+
+// Config contains the common resources and information required to
+// create an Updater or Watcher.
@jameinel

jameinel Oct 24, 2017

Owner

Updater or Reader?

@axw

axw Oct 24, 2017

Member

thanks

state/globalclock/config.go
+ //
+ // For the Updater, the session should not be copied, as we
+ // rely on the session being closed when mastership changes,
+ // to guarantee at most one Updater is running at any time.
@jameinel

jameinel Oct 24, 2017

Owner

We no longer have the constraint about session not being copied, do we?

@axw

axw Oct 24, 2017

Member

correct, removed

state/globalclock/updater.go
+ coll := u.config.Session.DB(u.config.Database).C(u.config.Collection)
+ new := u.time.Add(d)
+ if err := coll.Update(matchTimeDoc(u.time), setTimeDoc(new)); err != nil {
+ if err == mgo.ErrNotFound {
@jameinel

jameinel Oct 24, 2017

Owner

given that matchTimeDoc doesn't take into account convergence, this means that of the 3 things that 2 are likely to be in delay mode, was that the specific desire? I guess if we aren't trying to round the time, so the likelihood of ever getting convergence is low.
Is there a reason to have sub millisecond precision? (or even sub-second).
It would need something like:
matchTimeDoc(u.time, new)

I suppose we know that we should have 1 updater of the database, with a 30s failover period, though that also then causes all existing leases to be extended by 30s.
Maybe that's ok.

@axw

axw Oct 24, 2017

Member

The delay is purely there to avoid unnecessary database reads and writes. Maybe it's overkill. The worker/lease code would work just as well without it, and we'd have less of a delay in taking over leadership. Note though that it only takes effect when the leader fails, so a short delay shouldn't be unexpected.

We don't need sub-second precision, but it keeps the code simple and moves the global time along as far as we safely/correctly can. Is there a concern with this?

I don't understand what you're saying about convergence.

state/globalclock/updater.go
+ err := coll.Insert(newClockDoc(globalEpoch))
+ if err != nil {
+ if mgo.IsDup(err) {
+ // Document already exists, return the existing value.
@jameinel

jameinel Oct 24, 2017

Owner

Did you check if Query.Apply was something you might be able to use? I only know it exists, offhand I'm not sure it supports the "if it doesn't exist, set it, else use the existing one" which is different from how Upsert would work.)
It does feel a little odd that we start with the Insert rather than defaulting to assuming the object exists.
But it probably isn't a huge deal, as its only 1 extra query on startup.

@axw

axw Oct 24, 2017

Member

Query.Apply looks appropriate, thanks. Will look to switch over to that. Will also consider whether we can drop the additional insert.

@axw

axw Oct 25, 2017

Member

feels slightly awkward, but it works well enough: upsert with {$inc: {"time": 0}}

+ c.Assert(s.readTime(c), gc.Equals, globalEpoch.Add(time.Second))
+
+ err = u1.Advance(time.Second)
+ c.Assert(err, jc.ErrorIsNil)
@jameinel

jameinel Oct 24, 2017

Owner

Do we have any tests that check what happens if you're advancing by something other than exactly 1 second?

@axw

axw Oct 25, 2017

Member

we do now

@@ -105,7 +103,7 @@ func (client *client) request(name string, request lease.Request, getOps opsFunc
// On the first attempt, assume cache is good.
if attempt > 0 {
- if err := client.Refresh(); err != nil {
+ if err := client.refresh(false); err != nil {
@jameinel

jameinel Oct 24, 2017

Owner

if the op failed, why don't we want to reread the globalTime? can we assume that the lease will beheld long enough?

@axw

axw Oct 25, 2017

Member

we do it in the claim/extend ops function

state/lease/client.go
@@ -257,6 +209,7 @@ func (client *client) readEntries(collection mongo.Collection) (map[string]entry
for iter.Next(&leaseDoc) {
name, entry, err := leaseDoc.entry()
if err != nil {
+ iter.Close()
@jameinel

jameinel Oct 24, 2017

Owner

iter.Close can return an error (I believe).
Can we get it logged?

- // The "extended" lease will certainly expire before the
- // existing lease could. Done.
- return nil, lastEntry, errNoExtension
+ globalTime, err := client.refreshGlobalTime()
@jameinel

jameinel Oct 24, 2017

Owner

I'm a little surprised that we would refresh the global time on every lease extension. I guess that is how we would ensure that any given lease always lasts at least long enough.
But ultimately, it feels like it would be better to have an in-memory object that represents the global time, which is updated regularly (every second), and then we issue and expire and extend leases based on that.

@axw

axw Oct 25, 2017

Member

Originally that was my plan, but that would be unsound. If we were to do that, then claims could last less than the specified duration, because the reader might lag arbitrarily long behind the updater.

- fieldLeaseExpiry: toInt64(expiry),
- fieldLeaseWriter: client.config.Id,
+ fieldStart: toInt64(globalTime),
+ fieldDuration: nextEntry.duration,
@jameinel

jameinel Oct 24, 2017

Owner

why is globalTime + duration better than just tracking the Expiry time?
It does tell us more information abut the request, but anyone trying to figure out how much time is left still just adds them together and compares it to the current globalTime, right?

My quick grep through the diff says that every read of '.duration' is always a ".start.add(.duration)"

The one way that I could see it being "better" is if the database op was "copy the global time during inserting the doc", which would mean that you have the most up-to-date value.
But since we don't use TXN for globalTime, I'm not sure it can be used to create the lease.

@axw

axw Oct 25, 2017

Member

I did this for a potential optimisation.

If you see a lease starts at time T_m, and you've seen that the current global time is T_n where T_n >= T_m, you could potentially use the local clock to assert that the requested duration in real time has passed. If you've only got the expiry time, you can't do that safely, because global time can pass slower than real time (i.e. you can work it forwards from the start time, but not backwards from the expiry time).

state/lease/client.go
-func (client *client) clockDocId() string {
- return clockDocId(client.config.Namespace)
+func (client *client) refreshGlobalTime() (time.Time, error) {
+ globalTime, err := client.config.GlobalClock.Now()
@jameinel

jameinel Oct 24, 2017

Owner

Can you add a logger.Trace("Refreshing lease globalTime:") possibly with a value logged?
I think it would be good to run a controller with a lot of applications and see how often we are triggering refreshing the globalTime.
If it is reasonable, np, but I can see it being many times a second.
I've seen "waking to refresh leases" or whatever that message is surprisingly often.

@axw

axw Oct 25, 2017

Member

done

state/lease/client_remote_test.go
- // by a second, taking 100ms to read the clock doc; sees same lease with
- // suitable uncertainty.
+ // Remote client, whose local clock is offset significantly from the
+ // local client's, but has the same global clock value.
@jameinel

jameinel Oct 24, 2017

Owner

'has a slightly delayed global clock value' ?

@axw

axw Oct 25, 2017

Member

heh indeed, thanks

@@ -112,6 +63,10 @@ Client usage considerations
if you fail, you'll get ErrInvalid and a fresh cache to inspect to find out
recent state.
+ * The expiry time returned via lease.Info is relative to the local system
@jameinel

jameinel Oct 24, 2017

Owner

So if we are going to do this, would it make sense to thread it all the way through to the caller.
So the farthest away point does:
please give me a lease for 30s and right now is time XXXX
And we do all the work and compute global clock and add 30s, but the final return is:
the lease should be good until XXXX+30s ?
Otherwise you're only getting the APIserver's time, and not the agent local time.

@axw

axw Oct 25, 2017

Member

worker/lease runs in the same process as the state/lease code, so it's OK
the machine and unit agents use worker/singular and worker/leadership respectively, and those deal with the API server only in time.Durations

+ Collection: or(params.Collection, "default-collection"),
+ Mongo: mongo,
+ LocalClock: localClock,
+ GlobalClock: globalClock,
@jameinel

jameinel Oct 24, 2017

Owner

I've been staring at these long enough that Clock has stopped looking correct... :)

- // without incurring extra DB hits.
- // Apart from checking validity on load, though, there's little reason
- // to use Id elsewhere; Namespace and Name are the sources of truth.
+ // Id is always "<Namespace>#<Name>#", so that we can extract useful
@jameinel

jameinel Oct 24, 2017

Owner

so is the point of using "lease#" to have it be a globalKey() like we use elsewhere in State?
I realize it doesn't have to be, because it is confined to a table. But sometimes it is nice to have a string that is guaranteed to be unique and obvious.

@axw

axw Oct 25, 2017

Member

we used to have lease# to distinguish lease and clock skew documents. we don't need or have the latter anymore, so lease# is redundant

state/migration_internal_test.go
+ // The global clock is not migrated; each controller has its own
+ // independent global clock.
+ globalClockC,
+ // Leases are not migrated either. When an application is migrated,
@jameinel

jameinel Oct 24, 2017

Owner

extra blank space?

@axw

axw Oct 25, 2017

Member

done

state/state.go
+func (st *State) globalClockReader() (*globalclock.Reader, error) {
+ return globalclock.NewReader(globalclock.ReaderConfig{
+ Config: globalclock.Config{
+ Session: st.MongoSession(),
@jameinel

jameinel Oct 24, 2017

Owner

should these all be on the shared session? I believe that requests are serialized for a single session.

@axw

axw Oct 25, 2017

Member

we should be able to copy it now

+ return errors.Annotate(err, "updating global clock")
+ }
+ logger.Tracef("incremented global time by %s", interval)
+ last = w.config.LocalClock.Now()
@jameinel

jameinel Oct 24, 2017

Owner

why reset last? You already have the 'now' that you used for the update.
This means that the clock always drifts by some amount per update. (Whatever the time is to update the database.)
Do we have an idea how much that will be? It would mean that every lease grows by a significant period of time. (50ms on 1s is a 5% drift).
I suppose it does ensure that 1s is always at least 1s. i'm just concerned if its too much.
Can we run the clock for a significant period of time under load and see what the drift is?

@axw

axw Oct 25, 2017

Member

As you pointed out, this is to ensure that wall time 1s >= global time 1s.

While the local time will drift from the starting point, it should not matter. The global time will still be advanced at a roughly constant rate of 1s per 1s+db-update wall clock time. Anyway, is there a safe alternative?

@jameinel

jameinel Oct 25, 2017

Owner

So if you're using "now := time.Now()" at the beginning and then "delta := time.Since(now)" don't you guarantee that over time you don't lose any values?
It does mean that any given clock => clock tick may not be exactly 1s, but if update 1 is slightly delayed so update 2 actually is slightly fast, then update 3 will be back on track, won't it?

axw added some commits Oct 24, 2017

{core,state}/globalclock: introduce global clock
We introduce a new juju.globalclock MongoDB collection,
which contains a coordinated global time that will serve
as the basis of our lease management. Global time is
monotonically increasing, and increases at a rate no
faster than wall-clock time.
worker/globalclockupdater: introduce worker
Introducing a new global clock updater worker,
which is responsible for updating the global
clock on a configured schedule. All controller
machine agents run this worker; the worker
will detect and back off on write conflicts.
cmd/jujud/agent/machine: run globalclockupdater
Run the global clock updater in the controller machine agents.
state/lease: update to use global clock
Update the state/lease client to use the global clock
for lease expiry. This makes leases impervious to
wall clock jumps. The existing clock documents that
were written to the leases collection are no longer
required.
upgrades: replace old lease documents
On upgrade to 2.3, remove old lease documents,
and replace with new ones. The new documents
will have a 60 second expiry time, relative
to the global time epoch.
Owner

jameinel commented on 54d403c Oct 25, 2017

LGTM

Member

axw commented Oct 25, 2017

!!bang!!

Member

axw commented Oct 26, 2017

$$merge$$

Contributor

jujubot commented Oct 26, 2017

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

@jujubot jujubot merged commit a5b1704 into juju:develop Oct 26, 2017

1 check passed

continuous-integration/jenkins/pr-merge This commit looks good
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment