Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

Protocol async errors refactor #2062

Merged
merged 70 commits into from
Jan 11, 2020
Merged

Conversation

pradovic
Copy link
Contributor

@pradovic pradovic commented Dec 23, 2019

Refactor all async handlers that use p2p.Protocol.Run() to not log errors in handlers, and not to use peer.Drop directly.

todo:

  • Add special error to allow handlers to return error to be logged but not break the execution. Identified places are commented with todo: handle nil error
  • There are a few more "double async" parts in a few already async handlers that still use peer.Drop(). Investigate and remove if possible (will be done in separate PR)

I have link this PR to #2058 at the moment, but I will merge them separately on master later. I did this as they are chronologically related.

@pradovic
Copy link
Contributor Author

pradovic commented Dec 26, 2019

@janos , @zelig , I have updated PR with the idea how to handle different kind of errors. I have removed the old errors in p2p.Protocol as I did not really see them used anywhere logically, just for error messages.
Basically the idea is that special errors inside the handlers and p2p.Protocol define how they can be handled logically (should they break the loop, etc...), while handling them and logging additional context in the specific protocols present the information to the user. This whey we can add more specific errors, and log differently several use-cases, for ex. graceful shutdown, fatal error, blacklisting, etc...
Let me know do you think that approach makes sense, and if you like, we can brainstorm a bit about the log messages format on a different levels of protocol.

@pradovic pradovic requested a review from janos December 26, 2019 13:53
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach with BreakError (even thou I have comments on its implementation).

My comments are mostly on two topics. One is the logging and returning errors at the same time, which should not be done, as the error may be logged twice. The second is related to error messages. They can be simplified or shortened, buy removing words error, erroring, err:, reason: as they are errors implicitly. Also errors should not start with a capital letter, or have a form of full sentences. I have left comments on that, but not on all occurrences, as they would be mostly repetitive.

bzzeth/bzzeth.go Outdated
@@ -135,8 +143,7 @@ func (b *BzzEth) handleNewBlockHeaders(ctx context.Context, p *Peer, msg *NewBlo
}
yes, err := b.netStore.Store.HasMulti(ctx, addresses...)
if err != nil {
log.Error("Error checking hashesh in store", "Reason", err)
return nil
return fmt.Errorf("error checking hashesh in store. Reason: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that error and Reason is needed in message, Maybe something like check hashes: %w?

bzzeth/bzzeth.go Outdated
@@ -79,6 +79,7 @@ func (b *BzzEth) Run(p *p2p.Peer, rw p2p.MsgReadWriter) error {
// perform handshake and register if peer serves headers
handshake, err := bp.Handshake(context.TODO(), Handshake{ServeHeaders: true}, nil)
if err != nil {
log.Error("Shutting down bzzeth protocol.", "peer", bp, "reason", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error should not be logged and passed, just one of those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the next comment) were the things I wanted to discuss. I see 3 scenarios:

  1. Log some context and message here and return nil
  2. Log some context here and return a new specific error that will be seen in events in tests
  3. Add more context too the error here and return an error. I am not sure if this error will be logged anywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scenario 3. is the one that I think is the correct one. We now have BreakError which is to make the peer drop, and any other error will be at least logged. If the error is logged and returned, it is easy to make the mistake to log the error twice. If the error is just passed, it is up to the caller to decide what to do, and the function that returns the error does not have to worry about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, from the architecture point of view. I am just worried about the BreakErrors, which won't be logged in this case, right? This is probably the place that has enough information about what will happen to the specific protocol (ex. bzzeth, stream, etc...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a BreakError, then it should be logged as error before the peer is dropped.

Copy link
Contributor Author

@pradovic pradovic Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peer is actually dropped when the parent run function returns. This is why I was logging it here, as the drop is actually happening when it's propagated with end of the run function, in this case when BzzEth.Run() returns. This is why I was logging it here. The error returned by BzzEth.Run() is only used for events and testDisconnect. So, I see these options if we don't won't to log and return err message here:

  1. Use peer.Drop() on error in p2p.Protocol and log BreakErrors there before drop. No need to return specific errors in p2p.Protocol.Run() then at all maybe?
  2. Log the specific info here, and return different error to be shown in events and drop

I think the main question is, weather we need to return specific errors with p2p.Protocol.Run() and weather we want to rely on finishing Run() functions to drop a peer or do we want to drop a peer with peer.Drop().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, as we discussed, option 1. would be a better compromise.

Copy link
Contributor Author

@pradovic pradovic Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have switched back to option 1. I don't like the fact that we lose a bit of control over the async errors. It's hard to check what caused protocol to stop, except that there is a log message. Of course, we could add the functionality in p2p.protocol to remember the drop error, and expose the API to check it. But maybe we can think about all this more in the future, and leave this as is for now, as we tried both approaches and it should not be to hard to switch back and forth between them.

bzzeth/bzzeth.go Outdated
return peer.Run(b.handleMsg(bp))
err = peer.Run(b.handleMsg(bp))
if err != nil {
log.Error("Shutting down bzzeth protocol.", "peer", bp, "reason", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error should not be logged and passed, just one of those.

bzzeth/bzzeth.go Outdated
@@ -90,10 +91,17 @@ func (b *BzzEth) Run(p *p2p.Peer, rw p2p.MsgReadWriter) error {
// This protocol is all about interaction between an Eth node and a Swarm Node.
// If another swarm node tries to connect then the protocol goes into idle
if isSwarmNodeFunc(bp) {
return peer.Run(b.handleMsgFromSwarmNode(bp))
err = peer.Run(b.handleMsgFromSwarmNode(bp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is shadowed later.

bzzeth/bzzeth.go Outdated
@@ -118,7 +126,7 @@ func (b *BzzEth) handleMsg(p *Peer) func(context.Context, interface{}) error {
func (b *BzzEth) handleMsgFromSwarmNode(p *Peer) func(context.Context, interface{}) error {
return func(ctx context.Context, msg interface{}) error {
p.logger.Warn("bzzeth.handleMsgFromSwarmNode")
return errRcvdMsgFromSwarmNode
return protocols.BreakError(errRcvdMsgFromSwarmNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning should be removed, even more as it does not add any context to the error.

@@ -494,7 +437,7 @@ func (p *Peer) Handshake(ctx context.Context, hs interface{}, verify func(interf
err = ctx.Err()
}
if err != nil {
return nil, errorf(ErrHandshake, err.Error())
return nil, fmt.Errorf("%w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to wrap error if there is no additional context.

rw.eof = true
done := make(chan error)
go func() {
done <- eg.Wait()
}()
select {
case err := <-done:
if err != io.EOF {
t.Fatal("run returned error")
fmt.Println("in done")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove fmt.Println.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug line 🙈

pss/pss.go Outdated
return pp.Run(handle)

if err := pp.Run(handle); err != nil {
log.Error("Shutting down pss protocol.", "peer", pp, "reason", err)
Copy link
Member

@janos janos Dec 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error logged and returned.

pss/pss.go Outdated
return fmt.Errorf("invalid message type: %s", msg)
}
if err := p.handlePssMsg(ctx, pssmsg); err != nil {
return fmt.Errorf("handler error: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for word error in error message.

return sp.Peer.Run(r.HandleMsg(sp))
// todo: log peer info appropriately
if err := sp.Peer.Run(r.HandleMsg(sp)); err != nil {
log.Error("Shutting down stream protocol.", "peer:", sp, "reason", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an error, both log.Error and log.Info bellow will be printed.

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful work on this PR @pradovic! 👏 🍾
I have just a few minor comments

@@ -418,11 +417,11 @@ DELIVERY:
if err == nil && deliveredCnt < total {
err := p.Send(ctx, &BlockHeaders{Rid: uint32(msg.Rid)})
if err != nil {
p.logger.Error("could not send empty BlockHeader", "err", err)
return fmt.Errorf("could not send empty BlockHeader: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am assuming that here peer is the ethereum node that expects the block headers. the send fails here, don't we want to break too?

@@ -326,6 +252,10 @@ func (p *Peer) Drop(reason string) {
// Returns nil if the active jobs are finished within the timeout duration, or error otherwise.
func (p *Peer) Stop(timeout time.Duration) error {
p.mtx.Lock()
if p.running != true {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if !p.running

@@ -359,7 +286,7 @@ func (p *Peer) Send(ctx context.Context, msg interface{}) error {

code, found := p.spec.GetCode(msg)
if !found {
return errorf(ErrInvalidMsgType, "%v", code)
return fmt.Errorf("invalid message type %v ", code)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to break here. if the message code is not found - it means the nodes are incompatible and we should disconnect from that node

Copy link
Contributor Author

@pradovic pradovic Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sync call, as it is not using an event loop, so here we don't need to use a special error. This goes for Handshake() calls as well. Only run needs a special BreakError in order to break. Does this makes sense?

@@ -459,7 +385,7 @@ func (p *Peer) handleMsg(msg p2p.Msg, handle func(ctx context.Context, msg inter
// returns the remote handshake and an error
func (p *Peer) Handshake(ctx context.Context, hs interface{}, verify func(interface{}) error) (interface{}, error) {
if _, ok := p.spec.GetCode(hs); !ok {
return nil, errorf(ErrHandshake, "unknown handshake message type: %T", hs)
return nil, fmt.Errorf("unknown handshake message type: %T", hs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we break as a result?

return nil
pssmsg, ok := msg.(*message.Message)
if !ok {
return fmt.Errorf("invalid message type %s", msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we should break or not. @nolash ?

@pradovic pradovic changed the base branch from protocol-msg-pauser to master January 9, 2020 11:31
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

@pradovic pradovic merged commit 07d6bd5 into master Jan 11, 2020
@pradovic pradovic deleted the protocol-async-errors-refactor branch January 11, 2020 13:14
@acud acud added this to the 0.5.5 milestone Jan 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants