p2p: fix discovery shutdown (#8725)#8735
Merged
AskAlexSharov merged 1 commit intodevelfrom Nov 17, 2023
Merged
Conversation
Problem: Some goroutines are blocked on shutdown: 1. table close <-tab.closed // because table loop pending 1. table loop <-refreshDone // because lookup shutdown blocks doRefresh 1. lookup shutdown <-it.replyCh // because it.queryfunc (findnode - ensureBond) is blocked, and not returning errClosed (if it returns and pushes to it.replyCh, then shutdown() will unblock) 1. findnode - ensureBond <-rm.errc // because the related replyMatcher was added after loop() exited, so there's nothing to push errClosed and unlock it If addReplyMatcher channel is buffered, it is possible that UDPv4.pending() adds a new reply matcher after closeCtx.Done(). Such reply matcher's errc result channel will never be updated, because the UDPv4.loop() has exited at this point. Subsequent discovery operations will deadlock. Solution: Revert to an unbuffered channel.
mh0lt
reviewed
Nov 16, 2023
Contributor
There was a problem hiding this comment.
This channel needs a buffer, or certainly it did because some call sequences deadlock otherwise. Is it causing shutdown to hang ?
I think that we need to add an additional check rather than just not buffering the channel. I'll take a look at the logging it seems to me that we should add a status check.
Looking at the code think this fix is what is needed:
func (t *UDPv4) pending(id enode.ID, ip net.IP, port int, ptype byte, callback replyMatchFunc) *replyMatcher {
ch := make(chan error, 1)
p := &replyMatcher{from: id, ip: ip, port: port, ptype: ptype, callback: callback, errc: ch}
if t.closeCtx.Err() != nil { <-- we need to check if the contexts is previously closed before starting to wait
ch <- errClosed
} else {
select {
case t.addReplyMatcher <- p:
// loop will handle it
case <-t.closeCtx.Done():
ch <- errClosed
}
}
return p
}
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem:
Some goroutines are blocked on shutdown:
If addReplyMatcher channel is buffered, it is possible that UDPv4.pending() adds a new reply matcher after closeCtx.Done().
Such reply matcher's errc result channel will never be updated, because the UDPv4.loop() has exited at this point. Subsequent discovery operations will deadlock.
Solution:
Revert to an unbuffered channel.