Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Model config to control the update-status hook interval #7519
Conversation
wallyworld
added some commits
Jun 20, 2017
jameinel
reviewed
Jun 20, 2017
I have a comment about how we compute the window, and I'd like to understand how we find out about changes to the interval so that we can have users changing the value and seeing that we are actually doing something.
I'm happy with the 20% +/- as a range. Its a bit arbitrary, that seems like a good starting point.
| + return errors.Annotatef(err, "update status hook frequency %v cannot be less than 1m", f) | ||
| + } | ||
| + if f > 60*time.Minute { | ||
| + return errors.Annotatef(err, "update status hook frequency %v cannot be less than 60m", f) |
| +// update-status hook. | ||
| +func (c *Config) UpdateStatusHookInterval() time.Duration { | ||
| + // Value has already been validated. | ||
| + val, _ := time.ParseDuration(c.mustString(UpdateStatusHookInterval)) |
jameinel
Jun 20, 2017
Owner
Given previous issues, does this handle when the value doesn't exist ahead of time. I also wonder if someone puts just '30' would that be treated as a string, or will this fail and we should we just interpret it as seconds (minutes?)
I definitely want to handle 'if its "" or nil, treat it as default' rather than having failures.
I would be ok not supporting floats/ints in the field, but I'd want it to be a clear error of "value 30 does not have any units" or something like that, rather than "int is not a valid string".
wallyworld
Jun 20, 2017
Owner
There's an upgrade step to add the value to an existing model.
There will not be a case where the value is "" if the upgrade is done properly.
wallyworld
Jun 20, 2017
Owner
The value is a time duration - it's a string eg "30m". There's no ambiguity with units.
jameinel
Jun 20, 2017
Owner
My point is that when a user types a value they may just type 30 and the error message is going to be "not a string", which is a terrible user experience.
Given what we saw in the last round of "what happens when we introduce mandatory config", and the fact that IMO it is not ambiguous. An empty value can just be treated as "use the default", I'd really rather we layer defense in depth here so that we don't get bugs around it.
wallyworld
Jun 20, 2017
Owner
I can't repro a scenario where this is an issue, but none the less I've added a second lot of default handling in the getter method. We should remove it once 2.1.x becomes EOL. Previously it was controller config so that could explain any corner case there may have been.
| @@ -300,6 +299,12 @@ func (w *RemoteStateWatcher) loop(unitTag names.UnitTag) (err error) { | ||
| observedEvent(&seenLeadershipChange) | ||
| } | ||
| + // TODO(wallyworld) - listen for changes to this value |
jameinel
Jun 20, 2017
Owner
What is the plan around this? update-status interval is currently fixed and you can change the setting and restart Jujud?
wallyworld
Jun 20, 2017
Owner
No, we need to properly add apis for listening to well defined model config changes. All we have now is something to listen to any model config change. That would kill the system if lots of uniters had to be notified about any model change.
| + lower := 0.8 * float64(wait) | ||
| + window := 0.4 * float64(wait) | ||
| + offset := float64(r.Int63n(int64(window))) | ||
| + wait = time.Duration(lower + offset) |
jameinel
Jun 20, 2017
Owner
This seems to randomize the time that any given unit will wait, but not randomize the time that any given wait will trigger. Is that correct?
As in, with this in play, eg mysql/0 will wake up every 3m and trigger, while mysql/1 will wake up every 6m and trigger.
I'd rather see it where on every sleep cycle we recompute a variable wait.
So that mysql/0 will wait 3, then 6, then 4, then 5, etc. That means that all units still average 5min wait times (or whatever was nominally set), but it should still push out when the explicit ticks are happening.
Or maybe that's what you're doing and I'm just confused by the double-nested func returning a func.
It feels like we just want to move the recalculation of specific wait time into the second func rather than this first one.
wallyworld
Jun 20, 2017
Owner
No, the wait is calculated new each time. Each time the channel fires, the function is called again and a new random time is used for the next wait period.
wallyworld
Jun 20, 2017
Owner
I refactored a little to add a test for the generation of the random duration
| -func NewUpdateStatusTimer() func() <-chan time.Time { | ||
| - return updateStatusSignal | ||
| + return func() <-chan time.Time { | ||
| + // TODO(fwereade): 2016-03-17 lp:1558657 |
jameinel
Jun 20, 2017
Owner
Any chance that since you're touching this code now you could have it interact with Clocks instead of 'time.After' ?
wallyworld
Jun 20, 2017
Owner
Thought about it but a lot to try and get done for 2.2.1. I can see if I get time.
jameinel
Jun 20, 2017
Owner
This doesn't seem very high priority, to me. It seemed like it might have been low fruit, but if it isn't, don't worry about it. We have a way to inject something different for testing (give it a different func).
wallyworld
added some commits
Jun 20, 2017
| + wait := timer(nominal) | ||
| + c.Assert(min, jc.LessThan, wait) | ||
| + c.Assert(wait, jc.LessThan, max) | ||
| + } |
jameinel
Jun 20, 2017
Owner
Only problem with this test is that it passes if NewUpdateStatusTimer just returns nominal on every request. However, since we're using rand() we can't do a lot. I wanted to do something similar in my 'batch pings' code. This is what I came up with:
pingBatcher := NewPingBatcher(nil, interval)
var lastTime time.Duration
var measuredMinTime time.Duration
var measuredMaxTime time.Duration
for i := 0; i < 1000; i++ {
next := pingBatcher.nextSleep()
// We use Assert rather than Check because we don't want 100s of failures
c.Assert(next, jc.GreaterThan, minTime)
c.Assert(next, jc.LessThan, maxTime)
if lastTime == 0 {
measuredMinTime = next
measuredMaxTime = next
} else {
// We are using a range in 100s of milliseconds at a
// resolution of nanoseconds. The chance of getting the
// same random value 2x in a row is sufficiently low that
// we can just assert the value is changing.
// (Chance of collision is roughly 1 in 100 Million)
c.Assert(next, gc.Not(gc.Equals), lastTime)
if next < measuredMinTime {
measuredMinTime = next
}
if next > measuredMaxTime {
measuredMaxTime = next
}
}
lastTime = next
}
// Check that we're actually using the full range that was requested.
// Assert that after 1000 tries we've used a good portion of the range
// If we sampled perfectly, then we would have fully sampled the range,
// spread very 1/1000 of the range.
// If we set the required range to 1/100, then a given sample would fail 99%
// of the time, 1000 samples would fail 0.99^1000=4e-5 or ~1-in-20,000 times.
// (actual measurements showed 18 in 20,000, probably due to double ended range vs single ended)
// However, at 1/10 its 0.9^1000=1.7e-46, or 10^41 times less likely to fail.
// In 100,000 runs, a range of 1/10 never failed
expectedCloseness := (maxTime - minTime) / 10
c.Check(measuredMinTime, jc.LessThan, minTime + expectedCloseness)
c.Check(measuredMaxTime, jc.GreaterThan, maxTime - expectedCloseness)
Running the loop 1000 times still is 4ms, so it isn't particularly expensive (I
put an outer loop of 100,000 times to check that in never failed
intermittently).
It is still probablistic, but I've given some really good odds. My
probabilities may be off slightly, but I still expect it to be vanishingly
unlikely that the test will ever intermittently fail.
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
wallyworld commentedJun 20, 2017
•
Edited 1 time
-
wallyworld
Jun 20, 2017
Description of change
Provide a model config attribute "update-status-hook-interval" which controls how often to run the update-status hook. Whatever value is provided has a random +/- 20% offset applied to avoid all hooks for all units firing art once.
This implementation doesn't yet listen for changes in the model config value - whatever value is provided as bootstrap via --config is what's used.
QA steps
juju bootstrap --debug
juju deploy mysql
juju debug-log
watch for update-status hooks firing at staggered but regular intervals
Documentation changes
model config doc needs updating
Bug reference
https://bugs.launchpad.net/juju/+bug/1697841
https://bugs.launchpad.net/juju/+bug/1698139