many: allow core refresh.schedule setting #2833

Merged
merged 101 commits into from Apr 26, 2017

Conversation

Projects
None yet
4 participants
Collaborator

mvo5 commented Feb 10, 2017

This is a new iteration of the new core.refresh.schedule support - it is limited currently to a schedule within a single day.

This allows to setup a schedule for refreshes like:

$ snap set core refresh.schedule=9:00-11:00/22:00-23:40

The snapd system will only auto-refresh within these schedule windows unless the window got missed and in such cases it will refresh immediately.

Open questions/issues:

  • some of the syntactical sugar of the schedule is missing (there is a FIXME about this)
  • no easy way to validate a setting currently, i.e. ideally snap set core refresh.schedule would reject invalid settings right away but there is no easy for this right now - except for adding the validation in the configure hook which duplicates a bit of code as the hook is shell(or python) and our code is go.
  • some validation missing, i.e. we should probably not allow update intervals less than ~30min.

mvo5 added some commits Jan 31, 2017

I think we need one extra test and there's one trivial error at the end of the diff. The "time in interval" thing is somewhat confusing and I don't quite understand one of the functions. Perhaps some comment on that logic could help (at least me :-).

overlord/snapstate/snapmgr.go
+ logger.Noticef("cannot use refresh.schedule: %s", err)
+ refreshSchedule, err = timeutil.ParseSchedule(defaultRefreshSchedule)
+ if err != nil {
+ panic(fmt.Sprintf("defaultRefreshSchedule cannot be parsed: %s", err))
@zyga

zyga Feb 13, 2017

Contributor

We should have a test that ensures we can parse defaultRefreshSchedule

@mvo5

mvo5 Feb 13, 2017

Collaborator

Excellent idea.

timeutil/schedule.go
+ return false
+ }
+
+ if sched.Weekday != "" {
@zyga

zyga Feb 13, 2017

Contributor

I don't quite understand what is going on here.

@mvo5

mvo5 Feb 13, 2017

Collaborator

I added a comment, I hope this makes it somewhat clearer.

@zyga

zyga Feb 14, 2017

Contributor

The condition above (!sched.Matches(t1) || !sched.Matches(t2)) confused me as without it the Schedule object didn't matter at all.

timeutil/schedule.go
+ "fri": 5, "friday": 5,
+ "sat": 6, "saturday": 6,
+}
+
@zyga

zyga Feb 13, 2017

Contributor

Question, any reason those are not func (schedule *Schedule) parseWeekday(s string) err?

@mvo5

mvo5 Feb 13, 2017

Collaborator

No good reason. In fact the current mix is strange. After looking at this a bit I made them real functions that do not know yet about *Schedule.

@zyga

zyga Feb 14, 2017

Contributor

I just read them, nice! Thanks for doing that

timeutil/schedule.go
+
+ return rest, nil
+}
+
@zyga

zyga Feb 13, 2017

Contributor

Personal nitpick. It would be awesome if each of the parseXXX function had a one-line comment that shows roughly what they parse.

@mvo5

mvo5 Feb 13, 2017

Collaborator

Thanks, I did that now.

timeutil/schedule.go
+
+func parseSingleSchedule(s string) (*Schedule, error) {
+ var cur Schedule
+
@zyga

zyga Feb 13, 2017

Contributor

Nice design pattern with the car, cdr like parsing.

timeutil/schedule_test.go
+ // 2017-02-07 is a Tuesday
+ {"mon@9:00-11:00", "2017-02-07 9:00", false},
+ } {
+ fakeNow, err := time.Parse(shortForm, t.fakeNow)
@zyga

zyga Feb 13, 2017

Contributor

The err here is never checked.

@mvo5

mvo5 Feb 13, 2017

Collaborator

Thanks, added this now.

mvo5 added some commits Feb 13, 2017

zyga approved these changes Feb 14, 2017

LGTM

timeutil/schedule.go
+ return false
+ }
+
+ if sched.Weekday != "" {
@zyga

zyga Feb 13, 2017

Contributor

I don't quite understand what is going on here.

@mvo5

mvo5 Feb 13, 2017

Collaborator

I added a comment, I hope this makes it somewhat clearer.

@zyga

zyga Feb 14, 2017

Contributor

The condition above (!sched.Matches(t1) || !sched.Matches(t2)) confused me as without it the Schedule object didn't matter at all.

timeutil/schedule.go
+ "fri": 5, "friday": 5,
+ "sat": 6, "saturday": 6,
+}
+
@zyga

zyga Feb 13, 2017

Contributor

Question, any reason those are not func (schedule *Schedule) parseWeekday(s string) err?

@mvo5

mvo5 Feb 13, 2017

Collaborator

No good reason. In fact the current mix is strange. After looking at this a bit I made them real functions that do not know yet about *Schedule.

@zyga

zyga Feb 14, 2017

Contributor

I just read them, nice! Thanks for doing that

@mvo5 mvo5 added this to the 2.23 milestone Feb 14, 2017

mvo5 added some commits Feb 14, 2017

reomove refresh.disabled as discussed during the standup, the tests w…
…ill use refresh.schedule in 2 days to disable the auto-refresh during the tests

mvo5 added some commits Feb 15, 2017

@niemeyer niemeyer changed the title from many: allow core.refresh.schedule setting to many: allow core refresh.schedule setting Feb 15, 2017

mvo5 added some commits Feb 15, 2017

Looks good. A few questions/comments:

overlord/snapstate/snapmgr.go
- return nil
- }
+ // already have a refresh timer
+ if m.nextRefresh != nil {
@niemeyer

niemeyer Feb 16, 2017

Contributor

I couldn't find logic that reload the next refresh when the value changes, and stops the old timer while doing that to ensure it doesn't get called out of turn. Does that exist and I missed, or do we still need it?

@mvo5

mvo5 Feb 16, 2017

Collaborator

That is indeed missing!

@mvo5

mvo5 Feb 16, 2017

Collaborator

I added that now.

overlord/snapstate/snapmgr.go
@@ -484,6 +483,18 @@ func (m *SnapManager) ensureRefreshes() error {
return err
}
+ when := timeutil.Next(refreshSchedule, lastRefresh)
@niemeyer

niemeyer Feb 16, 2017

Contributor

s/when/delta/ (when feels like a timestamp)

overlord/snapstate/snapmgr.go
+ when := timeutil.Next(refreshSchedule, lastRefresh)
+
+ m.nextRefresh = time.AfterFunc(when, func() { m.doAutoRefresh(updated, tasksets) })
+ logger.Debugf("schedule next refresh in %s", when)
@niemeyer

niemeyer Feb 16, 2017

Contributor

Phrasing is a bit curious. Perhaps ("Next refresh scheduled for %s.", time.Now().Add(delta))

@mvo5

mvo5 Feb 16, 2017

Collaborator

Much nicer indeed!

overlord/snapstate/snapmgr.go
@@ -510,6 +521,8 @@ func (m *SnapManager) ensureRefreshes() error {
}
chg.Set("snap-names", updated)
chg.Set("api-data", map[string]interface{}{"snap-names": updated})
+
+ m.nextRefresh = nil
@niemeyer

niemeyer Feb 16, 2017

Contributor

Shouldn't we ensure this is stopped before we unset it? Otherwise the function is still called.

@mvo5

mvo5 Feb 16, 2017

Collaborator

AIUI a go timer is only called once (c.f. https://play.golang.org/p/OuAh1Ua6db). A ticker would be callled more often. So I think we don't need to cancel.

@niemeyer

niemeyer Feb 16, 2017

Contributor

Sorry, I assumed this was also called on global refreshes ("snap refresh" alone), but looking at the terminology above, it makes sense that it's only called on "auto".

overlord/snapstate/snapmgr.go
- tr := config.NewTransaction(st)
- tr.Set("core", "refresh.last", time.Now())
- tr.Commit()
+ st.Set("last-refresh", time.Now())
@niemeyer

niemeyer Feb 16, 2017

Contributor

Thanks!

tests/main/auto-refresh/task.yaml
@@ -10,29 +10,32 @@ execute: |
echo "Install a snap from stable"
snap install test-snapd-tools
snap list | MATCH 'test-snapd-tools +[0-9]+\.[0-9]+'
+ snap set core refresh.schedule="0:00-$(date +%H:%M:%S --date=30seconds)"
@niemeyer

niemeyer Feb 16, 2017

Contributor

We don't take seconds anymore, I think? Also not clear what the goal is here. This probably shouldn't be working as is, so perhaps unnecessary?

@mvo5

mvo5 Feb 16, 2017

Collaborator

We still take seconds currently, they are used in the unit tests still. But I fixed the test, now that we have the 14day max interval the test does not need to set this schedule, it can use a different oen like 00:00-23:59.

@mvo5 mvo5 modified the milestones: 2.24, 2.23 Feb 17, 2017

@niemeyer niemeyer removed this from the 2.24 milestone Feb 23, 2017

@niemeyer niemeyer added the Blocked label Feb 23, 2017

Blocked pending high-level agreements on exact UX desired.

Collaborator

mvo5 commented Feb 23, 2017

Closing for now until the high level discussion has happened.

@mvo5 mvo5 closed this Feb 23, 2017

@mvo5 mvo5 removed the Blocked label Mar 30, 2017

@mvo5 mvo5 reopened this Mar 30, 2017

@mvo5 mvo5 added this to the 2.24 milestone Apr 4, 2017

overlord/snapstate/snapmgr.go
+ m.lastRefreshAttempt = time.Now()
+ m.state.Lock()
+ updated, tasksets, err := AutoRefresh(m.state)
+ m.state.Unlock()
@pedronis

pedronis Apr 4, 2017

Contributor

the unlocking looks wrong here, we need to make the Change based on tasksets while we still hold the same lock

overlord/snapstate/snapmgr.go
+ return nil
+}
+
+func (m *SnapManager) doAutoRefresh(updated []string, tasksets []*state.TaskSet) error {
@pedronis

pedronis Apr 4, 2017

Contributor

this is a strange name given that it's not a task handler, also a bit unclear why the code is split this way between the the function passed to AfterFunc and this one

@@ -466,6 +488,8 @@ func (m *SnapManager) ensureRefreshes() error {
}
chg.Set("snap-names", updated)
chg.Set("api-data", map[string]interface{}{"snap-names": updated})
+
@pedronis

pedronis Apr 4, 2017

Contributor

we need a EnsureBefore(0) around here ?

Contributor

zyga commented Apr 4, 2017

Unit tests failed on:

FAIL: snapstate_test.go:4264: snapmgrTestSuite.TestEnsureRefreshesAlreadyRanInThisInterval
snapstate_test.go:4289:
    c.Check(refreshLast.Equal(fakeLastRefresh), Equals, true)
... obtained bool = false
... expected bool = true

A few comments below.

For the validation question, I suggest just adding the parsing in the snapd set config pipeline, quick and dirty.

In the near future we'll have configuration schemas for everybody, and can leverage that for core itself too.

For the rest, let's just make sure we have a LGTM from @pedronis.

overlord/snapstate/snapmgr.go
- // random interval on top of the minmum time between refreshes
- defaultRefreshRandomness = 4 * time.Hour
+var (
+ defaultRefreshSchedule = "00:00-04:59/5:00-10:59/11:00-16:59/17:00-23:59"
@niemeyer

niemeyer Apr 5, 2017

Contributor

This can be a single line.

overlord/snapstate/snapmgr.go
if err != nil && !config.IsNoOption(err) {
return err
}
+ refreshSchedule, err := timeutil.ParseSchedule(refreshScheduleStr)
@niemeyer

niemeyer Apr 5, 2017

Contributor

Don't we have to validate/discard the weekday somewhere around here?

overlord/snapstate/snapmgr.go
@@ -434,11 +441,26 @@ func (m *SnapManager) ensureRefreshes() error {
}
// store attempts in memory so that we can backoff a
@niemeyer

niemeyer Apr 5, 2017

Contributor

Comment is (was) cut off at the end.

overlord/snapstate/snapstate_test.go
// Ensure() also runs ensureRefreshes()
s.state.Unlock()
s.snapmgr.Ensure()
+ // give the timer a bit of time
+ time.Sleep(100 * time.Millisecond)
@niemeyer

niemeyer Apr 5, 2017

Contributor

This will likely create issues in slow systems. If last-refresh is supposed to get updated, can we test this inside a time-bounded loop with shorter sleeps to decide when we're done waiting?

overlord/snapstate/snapstate_test.go
s.state.Unlock()
s.snapmgr.Ensure()
+ // give the timer a bit of time
+ time.Sleep(300 * time.Millisecond)
@niemeyer

niemeyer Apr 5, 2017

Contributor

Here it looks like it already created problems even on a high end system?

Perhaps same approach here and elsewhere if we have the same pattern.

@mvo5 mvo5 modified the milestones: 2.25, 2.24 Apr 6, 2017

mvo5 added some commits Apr 7, 2017

Merge branch 'feature/internal-refresh-schedule' of github.com:mvo5/s…
…nappy into feature/internal-refresh-schedule
overlord/snapstate/snapmgr.go
-var (
- defaultRefreshSchedule = "00:00-04:59/5:00-10:59/11:00-16:59/17:00-23:59"
-)
+var defaultRefreshSchedule = "00:00-04:59/5:00-10:59/11:00-16:59/17:00-23:59"
@zyga

zyga Apr 7, 2017

Contributor

naive question, can this be a const?

Collaborator

mvo5 commented Apr 10, 2017

Fails with Apr 07 09:52:22 ubuntu /usr/lib/snapd/snapd[11304]: task.go:303: DEBUG: 2017-04-07T09:52:22Z ERROR sha3-384 mismatch after patching "core": got 3ca758bb05ff3a422fffa2c02018aaf63909417180b3e902eaddcf0cbc4d4c317bfe26ac5c39567bc69abb94ece0d1e8 but expected cbdff81206101ef7063f2f91a4c3f36e1ad946fe8a809eca4c21fa5335968dfc0c696e5f364cbc9a4c83590aceb3fc15

Collaborator

mvo5 commented Apr 10, 2017

Fails with Apr 07 16:21:21 ubuntu /usr/lib/snapd/snapd[7801]: task.go:303: DEBUG: 2017-04-07T16:21:21Z ERROR sha3-384 mismatch after patching "core": got 3ca758bb05ff3a422fffa2c02018aaf63909417180b3e902eaddcf0cbc4d4c317bfe26ac5c39567bc69abb94ece0d1e8 but expected cbdff81206101ef7063f2f91a4c3f36e1ad946fe8a809eca4c21fa5335968dfc0c696e5f364cbc9a4c83590aceb3fc15

mvo5 added some commits Apr 11, 2017

a couple more comments

overlord/snapstate/snapmgr.go
+ logger.Debugf("Option refresh-schedule changed, reloading.")
+ m.nextRefresh.Stop()
+ m.nextRefresh = nil
+ m.lastRefreshAttempt = time.Time{}
@pedronis

pedronis Apr 24, 2017

Contributor

why this resetting?

@pedronis

pedronis Apr 24, 2017

Contributor

also not sure:

// Check that we have reasonable delays between unsuccessful attempts.
// If the store is under stress we need to make sure we do not
// hammer it too often
if !m.lastRefreshAttempt.IsZero() && m.lastRefreshAttempt.Add(10*time.Minute).After(time.Now()) {
	return nil
}

makes sense where it is now, time.Now() is not when we will run the refresh attempt, it's just the time we are trying to schedule it, should it be moved inside the AfterFunc callback? or the logic tweaked some other way?

@pedronis

pedronis Apr 24, 2017

Contributor

I'm also not sure 100% why we need the timer? wouldn't checking if the nextRefresh time is in the past be enough and try to do the change creation job here directly?

mvo5 added some commits Apr 25, 2017

Nitpicks only. LGTM!

overlord/snapstate/snapmgr.go
- return nil
+ refreshSchedule, err := timeutil.ParseSchedule(refreshScheduleStr)
+ if err == nil {
+ err = refreshScheduleUsesWeekdays(refreshSchedule)
@niemeyer

niemeyer Apr 25, 2017

Contributor

s/Uses/No/ would make this line more readable I think, as right now the error is happening when the function gets what it asks for (with weekdays), which is a bit misleading.

@mvo5

mvo5 Apr 26, 2017

Collaborator

Thanks, fixed.

overlord/snapstate/snapmgr.go
- // change already in motion
- return nil
+ if err != nil {
+ logger.Noticef("cannot use refresh.schedule: %s", err)
@niemeyer

niemeyer Apr 25, 2017

Contributor

s/:/configuration:/ maybe, to give a hint that this is a configuration option?

@mvo5

mvo5 Apr 26, 2017

Collaborator

👍

overlord/snapstate/snapmgr.go
+ if !m.nextRefresh.IsZero() {
+ if m.currentRefreshSchedule != refreshScheduleStr {
+ // the refresh schedule has changed
+ logger.Debugf("Option refresh-schedule changed, reloading.")
@niemeyer

niemeyer Apr 25, 2017

Contributor

s/-/./, and it's not clear what this is reloading? Perhaps end at "changed."?

@mvo5

mvo5 Apr 26, 2017

Collaborator

Yeah, good point. There will be a debug output in the next line in this case anyway that prints the next scheduled refresh time so ending at "changed." is a good idea.

overlord/snapstate/snapmgr.go
m.lastRefreshAttempt = time.Now()
updated, tasksets, err := AutoRefresh(m.state)
if err != nil {
+ logger.Noticef("AutoRefresh failed with: %s", err)
@niemeyer

niemeyer Apr 25, 2017

Contributor

This should probably be a bit more oriented towards the user, since AutoRefresh is an internal method which the user will have no idea about. Perhaps something along the lines of

``"Cannot prepare auto-refresh change: %v"`

@mvo5

mvo5 Apr 26, 2017

Collaborator

\o/ Updated. Thanks !

overlord/snapstate/snapmgr.go
m.lastRefreshAttempt = time.Now()
updated, tasksets, err := AutoRefresh(m.state)
if err != nil {
+ logger.Noticef("AutoRefresh failed with: %s", err)
return err
}
// Do setLastRefresh() only if the store (in AutoRefresh) gave
@niemeyer

niemeyer Apr 25, 2017

Contributor

Comment is now out-of-date relative to the code. Perhaps s/Do setLastRefresh()/Set last refresh time/

lgtm, but not sure I understand the time.Sleep after Ensure left in the tests given that we don't have a timer anymore

overlord/snapstate/snapmgr.go
- // store attempts in memory so that we can backoff a
+func (m *SnapManager) doAutoRefresh() error {
@pedronis

pedronis Apr 25, 2017

Contributor

I remarked this before, but it seems an odd name for something that is not a handler.

I would call it something like: just autoRefresh or initiateAutoRefresh

@mvo5

mvo5 Apr 26, 2017

Collaborator

Good point, sorry that I failed to address this earlier. How do you feel about runAutoRefresh() ?

overlord/snapstate/snapmgr.go
+ // do refresh attempt (if needed)
+ if m.nextRefresh.Before(time.Now()) {
+ err = m.doAutoRefresh()
+ m.nextRefresh = time.Time{}
@pedronis

pedronis Apr 25, 2017

Contributor

should we do this only if err == nil?

@mvo5

mvo5 Apr 26, 2017

Collaborator

Yes, good point. Added and also added a comment.

overlord/snapstate/snapstate_test.go
+ if canAutoRefreshCalled {
+ break
+ }
+ time.Sleep(10 * time.Millisecond)
@pedronis

pedronis Apr 25, 2017

Contributor

do we need these sleeps, now that we don't have the timer?

@mvo5

mvo5 Apr 26, 2017

Collaborator

Good catch, they can all be killed. This is done now. Thank you!

overlord/snapstate/snapstate_test.go
@@ -4470,6 +4538,8 @@ func (s *snapmgrTestSuite) TestEnsureRefreshesWithUpdateError(c *C) {
// update for the "some-snap" in our fake store
s.state.Unlock()
s.snapmgr.Ensure()
+ // give the timer a bit of time
+ time.Sleep(100 * time.Millisecond)
@pedronis

pedronis Apr 25, 2017

Contributor

we don't have the timer anymore, do we need this?

@mvo5

mvo5 Apr 26, 2017

Collaborator

No, all removed. Thank you again.

mvo5 added some commits Apr 26, 2017

lgtm, couple last small comments

overlord/snapstate/snapmgr.go
- // store attempts in memory so that we can backoff a
+func (m *SnapManager) runAutoRefresh() error {
@pedronis

pedronis Apr 26, 2017

Contributor

we don't have a lot of other examples like this, usually the other managers create the change in ensure* directly,

I can live with run, the other options a bit shorter that my previous proposal would be "launchAutoRefresh"

overlord/snapstate/snapmgr.go
@@ -457,9 +435,86 @@ func (m *SnapManager) ensureRefreshes() error {
}
chg.Set("snap-names", updated)
chg.Set("api-data", map[string]interface{}{"snap-names": updated})
+
+ m.state.EnsureBefore(0)
@pedronis

pedronis Apr 26, 2017

Contributor

now that this is called from Ensure directly without the timer indirection I think we don't need this anymore

mvo5 added some commits Apr 26, 2017

mvo5 added some commits Apr 26, 2017

timeutil/schedule.go
@@ -68,7 +68,7 @@ type Schedule struct {
func (sched *Schedule) String() string {
if sched.Weekday == "" {
- return fmt.Sprintf("%s:%s", sched.Start, sched.End)
+ return fmt.Sprintf("%s-%s", sched.Start, sched.End)
@niemeyer

niemeyer Apr 26, 2017

Contributor

Isn't there a test missing?

@mvo5

mvo5 Apr 26, 2017

Collaborator

Yes indeed, I added one now.

@niemeyer niemeyer merged commit 7039d78 into snapcore:master Apr 26, 2017

6 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
xenial-amd64 autopkgtest finished (success)
Details
xenial-i386 autopkgtest finished (success)
Details
xenial-ppc64el autopkgtest finished (success)
Details
yakkety-amd64 autopkgtest finished (success)
Details
zesty-amd64 autopkgtest finished (success)
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment