-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implements an unbiased weighted shuffle using binary indexed tree #18343
implements an unbiased weighted shuffle using binary indexed tree #18343
Conversation
f866ecd
to
856bfca
Compare
Codecov Report
@@ Coverage Diff @@
## master #18343 +/- ##
========================================
Coverage 82.5% 82.6%
========================================
Files 436 436
Lines 121801 121903 +102
========================================
+ Hits 100602 100693 +91
- Misses 21199 21210 +11 |
Current implementation of weighted_shuffle: https://github.com/solana-labs/solana/blob/b08f8bd1b/gossip/src/weighted_shuffle.rs#L11-L37 uses a heuristic which results in biased samples. For example, if the weights are [1, 10, 100], then the 3rd index should come first 100 times more often than the 1st index. However, weighted_shuffle is picking the 3rd index 200+ times more often than the 1st index, showing a disproportional bias in favor of higher weights. This commit implements weighted shuffle using binary indexed tree to maintain cumulative sum of weights while sampling. The resulting samples are demonstrably unbiased and precisely proportional to the weights. Additionally the iterator interface allows to skip computations when not all indices are processed. Of the use cases of weighted_shuffle, changing turbine code requires feature-gating to keep the cluster in sync. That is not updated in this commit, but can be done together with future updates to turbine.
856bfca
to
d912b34
Compare
let since = (now.saturating_sub(req_time).min(3600 * 1000) / 1024) as u32; | ||
let stake = get_stake(&item.id, stakes); | ||
let weight = get_weight(max_weight, since, stake); | ||
(weight, item) | ||
// Weights are bounded by max_weight defined above. | ||
// So this type-cast should be safe. | ||
((weight * 100.0) as u64, item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a function for this calculation to use here and in crds_gossip_push.rs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think might be good to refactor push_options
and pull_options
to share the common parts. But I would rather not to mix that refactoring with this pull-request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
…8343) Current implementation of weighted_shuffle: https://github.com/solana-labs/solana/blob/b08f8bd1b/gossip/src/weighted_shuffle.rs#L11-L37 uses a heuristic which results in biased samples. For example, if the weights are [1, 10, 100], then the 3rd index should come first 100 times more often than the 1st index. However, weighted_shuffle is picking the 3rd index 200+ times more often than the 1st index, showing a disproportional bias in favor of higher weights. This commit implements weighted shuffle using binary indexed tree to maintain cumulative sum of weights while sampling. The resulting samples are demonstrably unbiased and precisely proportional to the weights. Additionally the iterator interface allows to skip computations when not all indices are processed. Of the use cases of weighted_shuffle, changing turbine code requires feature-gating to keep the cluster in sync. That is not updated in this commit, but can be done together with future updates to turbine. (cherry picked from commit dba42c5)
…8343) (#18485) Current implementation of weighted_shuffle: https://github.com/solana-labs/solana/blob/b08f8bd1b/gossip/src/weighted_shuffle.rs#L11-L37 uses a heuristic which results in biased samples. For example, if the weights are [1, 10, 100], then the 3rd index should come first 100 times more often than the 1st index. However, weighted_shuffle is picking the 3rd index 200+ times more often than the 1st index, showing a disproportional bias in favor of higher weights. This commit implements weighted shuffle using binary indexed tree to maintain cumulative sum of weights while sampling. The resulting samples are demonstrably unbiased and precisely proportional to the weights. Additionally the iterator interface allows to skip computations when not all indices are processed. Of the use cases of weighted_shuffle, changing turbine code requires feature-gating to keep the cluster in sync. That is not updated in this commit, but can be done together with future updates to turbine. (cherry picked from commit dba42c5) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Older weighted_shuffle is based on a heuristic which results in biased samples as shown in: solana-labs#18343 and can be replaced with WeightedShuffle. Also, as described in: solana-labs#13919 weighted_best can be replaced with rand::distributions::WeightedIndex, or WeightdShuffle::first.
Older weighted_shuffle is based on a heuristic which results in biased samples as shown in: solana-labs#18343 and can be replaced with WeightedShuffle. Also, as described in: solana-labs#13919 weighted_best can be replaced with rand::distributions::WeightedIndex, or WeightdShuffle::first.
Older weighted_shuffle is based on a heuristic which results in biased samples as shown in: #18343 and can be replaced with WeightedShuffle. Also, as described in: #13919 weighted_best can be replaced with rand::distributions::WeightedIndex, or WeightdShuffle::first. (cherry picked from commit db23295)
…24139) Older weighted_shuffle is based on a heuristic which results in biased samples as shown in: #18343 and can be replaced with WeightedShuffle. Also, as described in: #13919 weighted_best can be replaced with rand::distributions::WeightedIndex, or WeightdShuffle::first. (cherry picked from commit db23295) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Problem
Current implementation of weighted_shuffle:
https://github.com/solana-labs/solana/blob/b08f8bd1b/gossip/src/weighted_shuffle.rs#L11-L37
uses a heuristic which results in biased samples.
For example, if the weights are [1, 10, 100], then the 3rd index should
come first 100 times more often than the 1st index. However,
weighted_shuffle is picking the 3rd index 200+ times more often than the
1st index, showing a disproportional bias in favor of higher weights.
Summary of Changes
This commit implements weighted shuffle using binary indexed tree to
maintain cumulative sum of weights while sampling. The resulting samples
are demonstrably unbiased and precisely proportional to the weights.
Additionally the iterator interface allows to skip computations when
not all indices are processed.
Of the use cases of weighted_shuffle, changing turbine code requires
feature-gating to keep the cluster in sync. That is not updated in
this commit, but can be done together with future updates to turbine.