Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon, overlord/state: warnings pipeline #5514

Merged
merged 12 commits into from Sep 10, 2018

Conversation

chipaca
Copy link
Contributor

@chipaca chipaca commented Jul 16, 2018

This is the backend work for the warnings pipeline. It has some
(possibly ephemeral) debug verbs added as well, to let you play with
it via http if that's your thing.

Missing is:

the frontend work: namely 'snap warnings', 'snap okay', and the
messages after any command.

deleting warnings when they expire.


This is actually the third iteration, but the overlord/state code
remains unchanged: the change has been to simplify the way the
warnings summary is returned, thanks to pedronis's insightful insight.

This is the backend work for the warnings pipeline. It has some
(possibly ephemeral) debug verbs added as well, to let you play with
it via http if that's your thing.

Missing is:

the frontend work: namely 'snap warnings', 'snap okay', and the
messages after any command.

deleting warnings when they expire.

---

This is actually the third iteration, but the overlord/state code
remains unchanged: the change has been to simplify the way the
warnings summary is returned, thanks to pedronis's insightful insight.
@codecov-io
Copy link

codecov-io commented Jul 16, 2018

Codecov Report

Merging #5514 into master will increase coverage by 0.05%.
The diff coverage is 90.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5514      +/-   ##
==========================================
+ Coverage   78.94%   78.99%   +0.05%     
==========================================
  Files         518      519       +1     
  Lines       39441    39619     +178     
==========================================
+ Hits        31137    31298     +161     
- Misses       5773     5786      +13     
- Partials     2531     2535       +4
Impacted Files Coverage Δ
overlord/state/state.go 94.49% <100%> (+0.07%) ⬆️
daemon/daemon.go 69.4% <100%> (+0.46%) ⬆️
daemon/api.go 77.04% <68.18%> (-0.2%) ⬇️
daemon/response.go 89.24% <80%> (-0.39%) ⬇️
overlord/state/warning.go 98.3% <98.3%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60e22b1...9700719. Read the comment docs.

for k, w := range s.warnings {
if w.IsDeletable(now) {
delete(s.warnings, k)
s.modified = true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do this in other places, Prune for example calls writing in a similar situation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will change

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed the state bits, bunch of comments/questions

// the warning text itself. Only one of these in the system at a time.
message string
// the first time one of these messages was created
firstSeen time.Time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seen here is a bit ambiguous with Shown, Emitted instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function is AddWarning, so maybe s/seen/added/ ?

lastSeen time.Time
// the last time one of these was shown to the user
lastShown time.Time
// how much time since one of these was last seen should we drop the message
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again seen is unclear here, also we probably want it in the name deleteAfterLastEmitted ?

Copy link
Collaborator

@pedronis pedronis Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we tend to use deadlines and not duration, not sure how if that's relevant here though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these durations are relative to the deadlines in the same struct so I think they are ok

// how much time since one of these was last seen should we drop the message
deleteAfter time.Duration
// how much time since one of these was last shown should we repeat it
repeatAfter time.Duration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeatAfterLastShown ?

return nil
}

func (w *Warning) IsDeletable(now time.Time) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should /delete/expire/ and call this Expired() ?

type byLastSeen []*Warning

func (a byLastSeen) Len() int { return len(a) }
func (a byLastSeen) Swap(i, j int) { a[i], a[j] = a[j], a[i] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coverage says this is not tested (or not always)

n := 0
for _, w := range s.warnings {
if w.IsShowable(t) {
w.lastShown = t
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this always moves lastShown forward because of the def of IsShowable right? not sure it's obvious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always? no... a warning with firstAdded > t, or with lastShown + repeatAfter > t, would not have been returned by WarningsToShow(t) and will not have its lastShown bumped by OkayWarnings(t)

func (s *State) UnshowAllWarnings() {
s.writing()
for _, w := range s.warnings {
w.lastShown = time.Time{}
Copy link
Collaborator

@pedronis pedronis Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem to be tested directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it is

}

// WarningsSummary returns the number of warnings that are ready to be
// shown to the user, and the current timestamp (useful for ACKing the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it say Okaying ?


n := 0
for _, w := range s.warnings {
if w.IsShowable(t) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsShowable uses/checks firstSeen, but here at Okaying should we use lastSeen perhaps? should we okay or not warnings that were emitting again since t?

Copy link
Contributor Author

@chipaca chipaca Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translating to the new names (just so I don't get confused),

IsShowable uses/checks firstAdded, but here at Okaying should we use lastAdded perhaps? should we okay or not warnings that were emitting again since t?

A warning is showable at time t iff

  • it was known at t (that is, firstAdded is not after t), and
  • it had not been shown at t (that is, lastShown is zero), or it had been shown long before t (repeatAfter before).

if a warning that would be showable at t is added again at tʹ > t, then IsShowable(t) will be true, and its lastShown will be updated, but it will now have lastAdded == tʹ so it will be again showable after repeatAfter has elapsed.

This is, as I understand it, part of the design: if the system is throwing up a warning periodically, snap okay should shut the warning up for a while (until repeatAfter).

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the implementation, thank you!

A few easy comments:

daemon/api.go Outdated
st.AddWarning(a.Message)
return SyncResponse(true, nil)
case "unshow-warnings":
st.UnshowAllWarnings()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per comments below, we can probably make the terminology a bit more consistent across frontend and backend around the "ack" and "pending" terminology. This would become "ack-warnings", and perhaps the method could be the same AckWarnings, if it can accept a 0 timestamp for "all" maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note this is debug, and it's not "ack"ing the warnings; it's unsetting the last shown timestamp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.. unshow is awkward, but sounds right for this.

daemon/api.go Outdated
var (
stateOkayWarnings = (*state.State).OkayWarnings
stateAllWarnings = (*state.State).AllWarnings
stateWarningsToShow = (*state.State).WarningsToShow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ack verb also used here feels a bit more clear than "Okay":

How about:

  • AckWarnings
  • AllWarnings
  • PendingWarnings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the user command would still be okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after talking on IRC, we're keeping "okay" everywhere -- having it be Ack internally but snap okay would cause confusion, even more so because snap ack exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unresolving because of the ToShow -> Pending rename

daemon/api.go Show resolved Hide resolved
daemon/api.go Outdated
switch sel {
case "all":
all = true
case "to-show", "":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/to-show/pending/ in an attempt to have a common term there as well.

daemon/daemon.go Outdated
}
if rsp.Type != ResponseTypeError {
st.Lock()
rsp.addWarningsToMeta(st.WarningsSummary)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious.. I'd have expected PendingWarnings (WarningsToShow) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, the discussion on that as reflected on the whiteboard was a bit confusing. Initially we thought of printing the actual warnings after each command (“Warnings show up at the bottom of other commands, but, If more than 3, say WARNING: see snap warnings”), but then as we thought of it more it got cut back to “On commands, if last warning is more recent than last seen show WARNING: There are new messages. See snap warnings”. This sending the warning summary supports the latter directly, and the client can implement the former by hitting the warnings endpoint if the summary indicates it needs to.

}

func (w *Warning) IsShowable(t time.Time) bool {
return (w.lastShown.IsZero() || w.lastShown.Add(w.repeatAfter).Before(t)) && !w.firstAdded.After(t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could it have been last shown before it was first added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when lastShown.IsZero(), then firstAdded can be after t.
So I could rewrite this as

if w.lastShown.IsZero() {
    return !w.firstAdded.After(t)
}
return w.lastShown.Add(w.repeatAfter).Before(t)

is that clearer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was never shown, shouldn't we just say "yes, show it"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsShowable(t) tells you whether it is or was showable at the given time. You can then decide things based on that. If a warning was not known at time t, it would not have been showable at that time.

the t argument is always in the past; very recent past, for when getting a list of everything we want to show to the user, or not so recent for when the user asks us to okay everything that was shown to them before.

From that point of view perhaps your suggestion of ShowAfter is even more confusing than IsShowable (perhaps WasShowable would be more accurate? dunno)

}

// flattenWarning loops over the warnings map, and returns all
// warnings therein as a flat list, for serialising.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of a map for the messages feels a bit unusual. The slice feels slightly more natural for the in memory representation, and while it would be more expensive to find strings, it would avoid the cost of flattening it.

With that said, this is not a blocker as it's just the in-memory representation and easy to go one way or the other later as we learn more. So let's go with your choice.

// message it'll be added (with its firstAdded and lastAdded set to the
// current time), otherwise the existing one will have its lastAdded
// updated.
func (s *State) AddWarning(message string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest "Warning" as the function name, and something a bit more comfortable:

func (s *State) Warning(format string, args ...interface{}) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want warnings to have arguments like this. That is, we don't want "You only have 200MB of disk space left"; we want "Your disk space is running low". Otherwise the "only hold one warning of a given type" idea goes down the drain.

If, as an iteration, we move to use the format as the key, and have extra data in warnings where we store the args, then this would work. But I'd like to leave it as an iteration of the api (and call it Warningf? I'm bad with names)

Warning on its own seems strange to me; as it's a noun and not a verb I'd expect it to return a warning, not add it (like state.Task(id) returns a task). But we've established that I'm bad with names 🤷‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this should really be Warn instead of Warning (or Warnf if you want). Can we go with that? There's not much reason to wait.. it's just a Sprintf inside the function.

I agree with your point about being careful with arguments so we're not bothering people and losing the benefits of the system, but we do want arguments anyway so we can talk to people about specific things instead of just general messages. For example, colloquially:

  • Your snap "foo" is broken
  • Partition /foo is running low on space
  • Snapshot file at /.../foo is corrupted

etc.

w.firstAdded = t
if err := w.validate(); err != nil {
// programming error!
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please just log this? I'm concerned about us finding warnings in the field that we don't have tests for, and breaking down a system permanently as every time it starts it warns again and breaks down again.

}

// DeleteExpired deletes warnings that have expired.
func (s *State) DeleteExpired() int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name lacks context. I also suggest making this a private function and baking it into Prune instead. It's goal is already to garbage collect old stuff, and it's already called opportunistically.

Thanks to @niemeyer and @pedronis for their feedback.

This changes AddWarning to Warnf (and has it take args like Sprintf),
changes WarningsToShow to PendingWarnings (including as an action on
the daemon), renames Expired to ExpiredBefore and IsShowable to
ShowAfter, and lastly drops DeleteExpired and adds expired warning
pruning to flatten/unflatten and State.Prune itself.
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, bunch of small comments/wonderings/nitpicks

}

func (w *Warning) ShowAfter(t time.Time) bool {
return (w.lastShown.IsZero() || w.lastShown.Add(w.repeatAfter).Before(t)) && !w.firstAdded.After(t)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an hard time spotting the parenthesis initially, maybe swap the two side of the &&

}, time.Now().UTC())
}

func (s *State) addWarningFull(w Warning, t time.Time) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this still have the Full in the name given we don't have AddWarning itself anymore ?

}

for _, t := range []T1{
// snaity check
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

for _, t := range []T1{
// snaity check
{`{"message": "x", "first-added": "2006-01-02T15:04:05Z", "expire-after": "1h", "repeat-after": "1h"}`, nil},
// remove one at a time:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

DefaultRepeatAfter = time.Hour * 24
DefaultExpireAfter = time.Hour * 24 * 28

errNoMessage = errors.New("warning has no message")
Copy link
Collaborator

@pedronis pedronis Sep 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit strange that the error variables' names don't mention warning

flat := make([]*Warning, 0, len(s.warnings))
for _, w := range s.warnings {
if w.ExpiredBefore(now) {
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this bit tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep :-) one of the AllWarnings check happens after a flatten pass

s.warnings = make(map[string]*Warning, len(flat))
for _, w := range flat {
if w.ExpiredBefore(now) {
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this bit tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no; I'd have to play with times a little too much (it'd have to not have expired during serialise, and then expire before unserialise; or I'd have to do it by hand? i guess i could do that).

@@ -72,14 +73,31 @@ func (r *resp) transmitMaintenance(kind errorKind, message string) {
}
}

func (r *resp) addWarningsToMeta(ws func() (int, time.Time)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand why does it take a function and not count, stamp ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to do this. Whatever the reason was it seems silly.

After confusing multiple reviewers I refactored ShowAfter in an
attempt to make it easier to follow.
added comments; renamed addWarningFull to just addWarning; added
Warning to the error names.
@chipaca chipaca merged commit b814491 into snapcore:master Sep 10, 2018
@chipaca chipaca deleted the warnings-take-two branch September 10, 2018 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants