-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug why gossip push messages are not propagated effectively enough #28642
Comments
Each node maintains an On the receiving end, we are maintaining One issue arises when the In the simplest case:
So |
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: #28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
As described here: solana-labs#28642 (comment) current gossip pruning code fails to maintain spanning trees across cluster. This commit instead implements a pruning code based on timeliness of delivered messages. If a messages is delivered timely enough (in terms of number of duplicates already observed for that value), it counts towards the respective node's score. Once there are enough many CRDS upserts from a specific origin, redundant nodes are pruned based on the tracked score. Since the pruning leaves some configurable redundancy and the scores are reset frequently, it should better tolerate active-set rotations.
Problem
Metrics indicate that we still rely more than desired on pull request to propagate CRDS values. Pull requests are slow and have significant overhead on bandwidth use.
Proposed Solution
Investigate why messages are not propagated effectively through push.
also #11698
The text was updated successfully, but these errors were encountered: