-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 Routing: Refresh distance vector nonces #9651
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
wacban
approved these changes
Oct 9, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
How damaging was the issue? How did you find it out?
The only impact I am aware of is unnecessary termination of connections (not too damaging). It was noticeable in localnet, which tends to have a very stable topology. |
nikurt
pushed a commit
that referenced
this pull request
Oct 10, 2023
DistanceVector messages contain timestamped edges, which by default expire after 30 minutes. However, transmission of DistanceVectors is triggered only by changes to the network topology. As a result, nodes in a stable network will needlessly expire their DistanceVectors after 30 minutes. When a DistanceVector expires, the connection with the associated peer is terminated, resulting in unnecessary churn of network connections. RoutingTableUpdate is an existing message type used by the V1 protocol to periodically flood refreshed edge nonces to the whole network. This PR passes the refreshed nonces from RoutingTableUpdate messages into the local storage for the V2 protocol so that the timestamps for stable DistanceVectors can be kept up to date.
nikurt
pushed a commit
that referenced
this pull request
Oct 11, 2023
DistanceVector messages contain timestamped edges, which by default expire after 30 minutes. However, transmission of DistanceVectors is triggered only by changes to the network topology. As a result, nodes in a stable network will needlessly expire their DistanceVectors after 30 minutes. When a DistanceVector expires, the connection with the associated peer is terminated, resulting in unnecessary churn of network connections. RoutingTableUpdate is an existing message type used by the V1 protocol to periodically flood refreshed edge nonces to the whole network. This PR passes the refreshed nonces from RoutingTableUpdate messages into the local storage for the V2 protocol so that the timestamps for stable DistanceVectors can be kept up to date.
marcelo-gonzalez
pushed a commit
to marcelo-gonzalez/nearcore
that referenced
this pull request
Oct 18, 2023
DistanceVector messages contain timestamped edges, which by default expire after 30 minutes. However, transmission of DistanceVectors is triggered only by changes to the network topology. As a result, nodes in a stable network will needlessly expire their DistanceVectors after 30 minutes. When a DistanceVector expires, the connection with the associated peer is terminated, resulting in unnecessary churn of network connections. RoutingTableUpdate is an existing message type used by the V1 protocol to periodically flood refreshed edge nonces to the whole network. This PR passes the refreshed nonces from RoutingTableUpdate messages into the local storage for the V2 protocol so that the timestamps for stable DistanceVectors can be kept up to date.
marcelo-gonzalez
added a commit
to marcelo-gonzalez/nearcore
that referenced
this pull request
Apr 8, 2024
the test pytest/tools/mirror/offline_test.py often fails with many fewer transactions observed than wanted, and the logs reveal that many transactions are invalid because the access keys used do not exist in the target chain. This happens because some early transactions that should have added those keys never make it on chain. These transactions are sent successfully from the perspective of the ClientActor, but the logs show that they're dropped by the peer manager: ``` DEBUG handle{handler="PeerManagerMessageRequest" actor="PeerManagerActor" msg_type="NetworkRequests"}: network: Failed sending message: peer not connected to=ed25519:Fz7d1xkkt3XsvTPiwk4JRhMuPru4Ss7cLS8fdhshDRj3 num_connected_peers=1 msg=Routed(RoutedMessageV2 { msg: RoutedMessage { ... body: tx GFW8HgTndXVKdcLHdsCXxURjHxDnnEqHadrbxvsLKVQb ... ``` So, the peer manager is dropping the transaction instead of routing it, and the test fails because many subsequent transactions depended on that one. A git bisect shows that this behavior starts after near#9651. It seems that this failure to route messages happens for a bit longer after startup after that PR. The proper way to handle this might be to implement a mechanism whereby these messages won't just silently be dropped, and the ClientActor can receive a notification that it wasn't successful so that we can retry it later. But for now a workaround is to just wait a little bit before sending transactions. So we'll set a 15 second timer for the first batch of transactions, and then proceed normally with the others
github-merge-queue bot
pushed a commit
that referenced
this pull request
Apr 10, 2024
The test pytest/tools/mirror/offline_test.py often fails with many fewer transactions observed than wanted, and the logs reveal that many transactions are invalid because the access keys used do not exist in the target chain. This happens because some early transactions that should have added those keys never make it on chain. These transactions are sent successfully from the perspective of the ClientActor, but the logs show that they're dropped by the peer manager: ``` DEBUG handle{handler="PeerManagerMessageRequest" actor="PeerManagerActor" msg_type="NetworkRequests"}: network: Failed sending message: peer not connected to=ed25519:Fz7d1xkkt3XsvTPiwk4JRhMuPru4Ss7cLS8fdhshDRj3 num_connected_peers=1 msg=Routed(RoutedMessageV2 { msg: RoutedMessage { ... body: tx GFW8HgTndXVKdcLHdsCXxURjHxDnnEqHadrbxvsLKVQb ... ``` So, the peer manager is dropping the transaction instead of routing it, and the test fails because many subsequent transactions depended on that one. A git bisect shows that this behavior starts after #9651. It seems that this failure to route messages happens for a bit longer after startup after that PR. The proper way to handle this might be to implement a mechanism whereby these messages won't just silently be dropped, and the ClientActor can receive a notification that it wasn't successful so that we can retry it later. But for now a workaround is to just wait a little bit before sending transactions. So we'll set a 15 second timer for the first batch of transactions, and then proceed normally with the others
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
DistanceVector messages contain timestamped edges, which by default expire after 30 minutes. However, transmission of DistanceVectors is triggered only by changes to the network topology. As a result, nodes in a stable network will needlessly expire their DistanceVectors after 30 minutes. When a DistanceVector expires, the connection with the associated peer is terminated, resulting in unnecessary churn of network connections.
RoutingTableUpdate is an existing message type used by the V1 protocol to periodically flood refreshed edge nonces to the whole network. This PR passes the refreshed nonces from RoutingTableUpdate messages into the local storage for the V2 protocol so that the timestamps for stable DistanceVectors can be kept up to date.