peer: launch persistent peer pruning in background goroutine #8041
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is a follow up, to a follow
up of an initial concurrency issue fixed in the peer goroutine.
In #7938, we noticed that the introduction of
p.startReady
can causeDisconnect
to block. This happens asDisconnect
cannot be called untilp.startReady
has been closed.Disconnect
is also called fromInboundPeerConnected
(the case of concurrent peers, so we need to remove one of the connections) while the main server mutex is held. Ifp.Start
blocks for any reason, then this leads to the deadlock as: we can't disconnect until we've finished starting, and we can't finish starting as we need the disconnect caller to exit as it has the mutex.In this commit, we now make the call to
prunePersistentPeerConnection
async. The call toprunePersistentPeerConnection
eventually wants to grab the server mutex, which triggers the circular waiting scenario above.The main learning here is that no calls to the main server mutex path can block from
p.Start
. This is more or less a stop gap to resolve the issue initially introduced in v0.16.4. Assuming we want to move forward with this fix, we should reexaminep.startReady
all together, and also revisit attempt to refactor this section of the code to eliminate the mega mutex in the server in favor of a dedicated event loop.Fixes #8039