New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] overlord: show what manager throws a state ensure error #8687
Conversation
When we install UC20 we get a lot of state.ErrNoState errors. In general the state engine error reporting can be unhelpful if the errors generic. To improve the situation a bit this commit adds information about what manager is throwing the error. Eventually we should ensure we don't return "naked" ErrNoState or similar generic errors all the way up without decorating them with more information.
@@ -147,7 +147,7 @@ func (se *StateEngine) Ensure() error { | |||
for _, m := range se.managers { | |||
err := m.Ensure() | |||
if err != nil { | |||
logger.Noticef("state ensure error: %v", err) | |||
logger.Noticef("state ensure error from %T: %v", m, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
devicemgr has it's own ensureError which adds devicemgr: ..
prefix to the error string. Maybe we could do something like this:
diff --git a/overlord/stateengine.go b/overlord/stateengine.go
index 80b2755c4d..f4da008bdc 100644
--- a/overlord/stateengine.go
+++ b/overlord/stateengine.go
@@ -121,6 +121,14 @@ func (se *StateEngine) StartUp() error {
type ensureError struct {
errs []error
}
+type managerEnsureError struct {
+ manager string
+ err error
+}
+
+func (m *managerEnsureError) Error() string {
+ return fmt.Sprintf("%s: %v", m.manager, m.err)
+}
func (e *ensureError) Error() string {
return fmt.Sprintf("state ensure errors: %v", e.errs)
@@ -147,8 +155,12 @@ func (se *StateEngine) Ensure() error {
for _, m := range se.managers {
err := m.Ensure()
if err != nil {
- logger.Noticef("state ensure error: %v", err)
- errs = append(errs, err)
+ merr := &managerEnsureError{
+ manager: fmt.Sprintf("%T", m),
+ err: err,
+ }
+ logger.Noticef("state ensure error: %v", merr)
+ errs = append(errs, merr)
}
}
if len(errs) != 0 {
Note that the error strings are not too nice:
state ensure errors: [*overlord_test.fakeManager: boom1 *overlord_test.fakeManager: boom2]
the ongoing assumption was that ensure errors would rare or not very interesting that's why we never spent a lot of time on their logging, it seems the assumption is wrong. |
I think they are kind of rare, it's not totally wrong. After PR #8690 got merged it even rarer. But for the cases when it does happen every bit of context helps |
@mvo5 spread failures look related to the changes here. |
Closing for now, it's not important enough if it requires this level of discussion. |
When we install UC20 we get a lot of state.ErrNoState errors. In
general the state engine error reporting can be unhelpful if the
errors generic. To improve the situation a bit this commit adds
information about what manager is throwing the error. Eventually
we should ensure we don't return "naked" ErrNoState or similar
generic errors all the way up without decorating them with more
information.
I tested this on a UC20 install and the output is (slightly) more useful
this way. More to come.