New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define error aware interface #1275
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1275 +/- ##
==========================================
+ Coverage 56.25% 56.29% +0.04%
==========================================
Files 500 504 +4
Lines 31162 31267 +105
==========================================
+ Hits 17531 17603 +72
- Misses 11262 11287 +25
- Partials 2369 2377 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a stab in the right direction, because it constrains the implementer to have a place where to send irrecoverable errors (the error channel).
Here's a list of things that seem important to me:
- make it clear this is for irrecoverable errors from the PoV of the ErrorAware module, i.e. that this isn't meant to encompass all error management therein,
- force the implementer to have a way to react to those irrecoverable errors, i.e. require a
func OnError(err error)
, - and then possibly link the two with default processing, perhaps by doing the following (?) :
- having an ErrorBase that listens to the error channel and passes the errors read there to
OnError
- and then having implementing components embed the
ErrorBase
- having an ErrorBase that listens to the error channel and passes the errors read there to
@huitseeker Actually, the way I think about this is slightly different. The so the way this works is, once the owner of component A starts is, it then spawns a new routine which polls from the error channel of component A. If it receives an error, then it can shutdown component A, or restart it, or propagate the error on to it's own parent, etc. |
I like the idea of having the caller / parent (not the component itself) handle the child component's error. We could have a mechanism to keep a running log of the errors that could be further analyzed for things like flakiness (is the same component throwing an error at frequent intervals?) How would we test this? i.e. under what conditions would components throw an error? What would be the expected error handling behavior and can we standardize this? (i.e. restart component, log the error somewhere permanent for later analysis, notify someone) |
@smnzhu I think we're on the same page, I wasn't clear about two interfaces:
|
@gomisha Components will throw an error anytime they encounter a fatal / irrecoverable error. The expected error handling behavior will be different for each component, and may also depend on the specific error that was thrown. The implementation of the error handling behavior should be left up to the parent component, and I don't think there is a standard behavior that would apply to all cases. @huitseeker I see what you're saying. For now, I don't see a need to force the parent to have a way to react to errors or implement default processing, because if we implement a single In the future if components are standardized with a type ErrorHandler func(context.Context, <-chan error, func())
type ComponentFactory func() module.RunnableReadyDoneErrorAware
StartComponent(context.Context, ComponentFactory, ErrorHandler) which could be used as follows: ctx, cancel := context.WithCancel(context.Background())
componentFactory := func() module.RunnableReadyDoneErrorAware {
return NewComponentX(arg1, arg2)
}
StartComponent(ctx, componentFactory, func(errCtx context.Context, errs <-chan error, restart func()) {
select {
case err := <-errs:
// handle the error here. In this example, we restart the component
restart()
case <-errCtx.Done():
return
}
}) and implemented (roughly) as follows: func StartComponent(ctx context.Context, componentFactory ComponentFactory, errorHandler ErrorHandler) {
restartChan := make(chan struct{})
start := func() (context.CancelFunc, <-chan struct{}) {
// context used to restart the component
runCtx, cancel := context.WithCancel(ctx)
component := componentFactory()
component.Run(runCtx)
select {
case <-ctx.Done():
runtime.Goexit()
case <-component.Ready():
}
go errorHandler(runCtx, component.Errors(), func() {
restartChan <- struct{}{}
runtime.Goexit()
})
return cancel, component.Done()
}
go func() {
for{
cancel, done := start()
select {
case <-ctx.Done():
return
case <-restartChan:
cancel()
}
select {
case <-ctx.Done():
return
case <-done:
}
}
}()
} However, I think it's too early for stuff like this rn. For now, I added a new helper struct
One potential issue I currently see with this approach is that it may not behave in a desirable way if a fatal error is caused as a result of some external party calling a public method on the component (e.g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, thanks a lot for getting this started, there's a ton of great ideas in there, such as:
- the separation between in and out-of component,
- the use of a closure for start,
- the IrrecoverableError signaling through a channel
I've left a few alternate suggestions in #1308, here are the main points:
- the word
Error
is overly broad for situations encountered inside the component: I've tried to suggestIrrecoverable
for errors that relinquish the control flow, - the signaling of the irrecoverable error does not have a lot of variability, it pretty much needs to be what's in your
ThrowError
, hence I'm not sure this should be aninterface
. Maybe I missed something: do you have suggestion for vastly different behaviors for the emitter-side of an irrecoverable error? - the receiving side of the irrecoverable error seems to me that it has a lot of variability. In particular, there may be a significant task of getting the system in a "known good state" before restart,
- I would like
RunComponent
to be hierarchical. For instance, if theStart
of aStartable
component returns an error, there's no use doing a restart. Hence this is an irrecoverable error from the PoV of theRunComponent
, hence I'd love to be able to send that error to an irrecoverable handler in an enclosing context. - To represent this hierarchy, I've tried to use the component we use to pass things around hierarchically: the
Context
- the most complex function in both our approaches is
RunComponent
, and it seems valuable to me to make it generic (so nobody else has to rewrite one), and, hopefully, small
Possibly superfluous: - type assertions to be able to use a function on Context instances that are actually IrrecoverableSignalerContext, Possibly missing: - do we need to demo more in the component. E.g. RunComponent may demo cleanup, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the following failure, has this merged master?
GO111MODULE=on go test -coverprofile=coverage.txt -covermode=atomic --tags relic ./...
? github.com/onflow/flow-go/access [no test files]
? github.com/onflow/flow-go/access/legacy [no test files]
? github.com/onflow/flow-go/access/legacy/convert [no test files]
<nil> INF admin server starting up admin=command_runner
<nil> INF process loop shutting down admin=command_runner
<nil> INF admin server shutting down admin=command_runner
panic: send on closed channel
goroutine 11 [running]:
github.com/onflow/flow-go/admin.(*adminServer).RunCommand(0xc00040ce58, {0x183af20, 0xc0000c85a0}, 0xc0001aeec0)
/Users/huitseeker/tmp/flow-go/admin/server.go:32 +0x1d2
github.com/onflow/flow-go/admin/admin._Admin_RunCommand_Handler({0x169efa0, 0xc00040ce58}, {0x183af20, 0xc0000c85a0}, 0xc00049a8a0, 0x0)
/Users/huitseeker/tmp/flow-go/admin/admin/admin_grpc.pb.go:77 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000436700, {0x1846e48, 0xc00017e180}, 0xc0007ac480, 0xc00049ee40, 0x1cd0bd0, 0x0)
/Users/huitseeker/golang/pkg/mod/google.golang.org/grpc@v1.40.0/server.go:1297 +0xccf
google.golang.org/grpc.(*Server).handleStream(0xc000436700, {0x1846e48, 0xc00017e180}, 0xc0007ac480, 0x0)
/Users/huitseeker/golang/pkg/mod/google.golang.org/grpc@v1.40.0/server.go:1626 +0xa2a
google.golang.org/grpc.(*Server).serveStreams.func1.2()
/Users/huitseeker/golang/pkg/mod/google.golang.org/grpc@v1.40.0/server.go:941 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
/Users/huitseeker/golang/pkg/mod/google.golang.org/grpc@v1.40.0/server.go:939 +0x294
FAIL github.com/onflow/flow-go/admin 0.557s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks for the tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
bors merge |
Defines a new Error Aware interface which an owner of a component can use to be notified when a fatal error occurs in the component. The owner can then decide what to do which this error (restart the component, log the error, propagate the error up to its own parent component, etc).
Eventually, every component could implement this interface. After a component is started, the owner of the component would also starts up a thread that reads from the error channel. When an error occurs, that thread can parse the error and decide how to respond to it.
makes progress on https://github.com/dapperlabs/flow-go/issues/5829
TODO
start()
insideRunComponent
to block forever, and error thrown during shutdown will be ignored.Ready
)