-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consensus: prevent synchronization stalls #1116
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fyrchik
approved these changes
Jun 26, 2020
It breaks tests at the moment, so I'll fix that first. |
When CN is not up to date with the network is synchonizes blocks first and only then starts consensus process. But while synchronizing it receives consensus payloads and tries to process them even though messages reader routine is not started yet. This leads to lots of goroutines waiting to send their messages: Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 1639919 [chan send, 4 minutes]: Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc005bd7680) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687 Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc00507d170, 0xc005bdd560, 0x0, 0x0) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58 Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc00507d170) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294 Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 1639181 [chan send, 10 minutes]: Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc013bb6600) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687 Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc01361ee10, 0xc01342c780, 0x0, 0x0) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58 Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc01361ee10) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294 Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 39454 [chan send, 32 minutes]: Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc014fea680) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687 Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc0140b2ea0, 0xc014fe0ed0, 0x0, 0x0) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58 Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc0140b2ea0) Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294 Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad Luckily it doesn't break synchronization completely as eventually connection timers fire, the node breaks all connections, create new ones and these new ones request blocks successfully until another consensus payload stalls them too. In the end the node reaches synchronization, message processing loop starts and releases all of these waiting goroutines, but it's better for us to avoid this happening at all. This also makes double-starting a no-op which is a nice property.
roman-khimov
force-pushed
the
fix-consensus-synchronization-issue
branch
from
June 26, 2020 09:32
e239467
to
cefad68
Compare
random.Bytes() might be not random enough.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When CN is not up to date with the network is synchonizes blocks first and
only then starts consensus process. But while synchronizing it receives
consensus payloads and tries to process them even though messages reader
routine is not started yet. This leads to lots of goroutines waiting to send
their messages:
Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 1639919 [chan send, 4 minutes]:
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc005bd7680)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc00507d170, 0xc005bdd560, 0x0, 0x0)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc00507d170)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294
Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad
Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 1639181 [chan send, 10 minutes]:
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc013bb6600)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc01361ee10, 0xc01342c780, 0x0, 0x0)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc01361ee10)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294
Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad
Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 39454 [chan send, 32 minutes]:
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc014fea680)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc0140b2ea0, 0xc014fe0ed0, 0x0, 0x0)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc0140b2ea0)
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294
Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial
Jun 25 23:55:53 nodoka neo-go[32733]: #11/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad
Luckily it doesn't break synchronization completely as eventually connection
timers fire, the node breaks all connections, create new ones and these new
ones request blocks successfully until another consensus payload stalls them
too. In the end the node reaches synchronization, message processing loop
starts and releases all of these waiting goroutines, but it's better for us to
avoid this happening at all.
This also makes double-starting a no-op which is a nice property.