-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bumps up min number of bloom items in gossip pull requests #17236
bumps up min number of bloom items in gossip pull requests #17236
Conversation
f236230
to
c358753
Compare
Codecov Report
@@ Coverage Diff @@
## master #17236 +/- ##
=========================================
- Coverage 82.7% 82.7% -0.1%
=========================================
Files 423 423
Lines 118112 118114 +2
=========================================
+ Hits 97687 97688 +1
- Misses 20425 20426 +1 |
c358753
to
8ffc1eb
Compare
2f47b91
to
da90fc5
Compare
Hmmm, so if there are no entries in the
My question is, when the entrypoint gets this message, do they not respond with everything in their table because this mask with |
So, the The bloom filter target 10% false rate given the number of items. So, if the The validator always have a couple of items in its crds table (its own contact-info, version, node-instance). So the bloom filter always have some bits set causing the false collisions. I realize the description of this pull-request is not good; I will update the description as well. |
When a validator starts, it has an (almost) empty crds table and it only sends one pull-request to the entrypoint. The bloom filter in the pull-request targets 10% false rate given the number of items. So, if the `num_items` is very wrong, it makes a very small bloom filter with a very high false rate: https://github.com/solana-labs/solana/blob/2ae57c172/runtime/src/bloom.rs#L70-L80 https://github.com/solana-labs/solana/blob/2ae57c172/core/src/crds_gossip_pull.rs#L48 As a result, it is very unlikely that the validator obtains entrypoint's contact-info in response. This exacerbates how long the validator will loop on: > Waiting to adopt entrypoint shred version https://github.com/solana-labs/solana/blob/ed51cde37/validator/src/main.rs#L390-L412 This commit increases the min number of bloom items when making gossip pull requests. Effectively this will break the entrypoint crds table into 64 shards, one pull-request for each, a larger bloom filter for each shard, and increases the chances that the response will include entrypoint's contact-info, which is needed for adopting shred version and validator start.
da90fc5
to
2345372
Compare
@behzadnouri Ah I see, the bloom is built for a 10% false rate given very little expected default capacity, so here we are bumping the expected default capacity, thanks for the explanation! |
crds.len() + self.purged_values.len() + self.failed_inserts.len(), | ||
); | ||
let filters = CrdsFilterSet::new(num, bloom_size); | ||
const MIN_NUM_BLOOM_ITEMS: usize = 65_536; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be a moving target as the number of validators on the network increases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will see, but probably with #17261 we should be fine
let bloom = AtomicBloom::from(Bloom::random( | ||
num_bloom_items, | ||
BLOOM_FALSE_RATE, | ||
BLOOM_MAX_BITS, | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for cleaning this up!
When a validator starts, it has an (almost) empty crds table and it only sends one pull-request to the entrypoint. The bloom filter in the pull-request targets 10% false rate given the number of items. So, if the `num_items` is very wrong, it makes a very small bloom filter with a very high false rate: https://github.com/solana-labs/solana/blob/2ae57c172/runtime/src/bloom.rs#L70-L80 https://github.com/solana-labs/solana/blob/2ae57c172/core/src/crds_gossip_pull.rs#L48 As a result, it is very unlikely that the validator obtains entrypoint's contact-info in response. This exacerbates how long the validator will loop on: > Waiting to adopt entrypoint shred version https://github.com/solana-labs/solana/blob/ed51cde37/validator/src/main.rs#L390-L412 This commit increases the min number of bloom items when making gossip pull requests. Effectively this will break the entrypoint crds table into 64 shards, one pull-request for each, a larger bloom filter for each shard, and increases the chances that the response will include entrypoint's contact-info, which is needed for adopting shred version and validator start. (cherry picked from commit e8b35a4)
…17383) When a validator starts, it has an (almost) empty crds table and it only sends one pull-request to the entrypoint. The bloom filter in the pull-request targets 10% false rate given the number of items. So, if the `num_items` is very wrong, it makes a very small bloom filter with a very high false rate: https://github.com/solana-labs/solana/blob/2ae57c172/runtime/src/bloom.rs#L70-L80 https://github.com/solana-labs/solana/blob/2ae57c172/core/src/crds_gossip_pull.rs#L48 As a result, it is very unlikely that the validator obtains entrypoint's contact-info in response. This exacerbates how long the validator will loop on: > Waiting to adopt entrypoint shred version https://github.com/solana-labs/solana/blob/ed51cde37/validator/src/main.rs#L390-L412 This commit increases the min number of bloom items when making gossip pull requests. Effectively this will break the entrypoint crds table into 64 shards, one pull-request for each, a larger bloom filter for each shard, and increases the chances that the response will include entrypoint's contact-info, which is needed for adopting shred version and validator start. (cherry picked from commit e8b35a4) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
coverage ci builds are failing presumably because of the overhead added in: solana-labs#17236 for very small test cluster. This commit uses smaller min number of bloom items condition on that if debug assertions are enabled or not.
coverage ci builds are failing presumably because of the overhead added in: solana-labs#17236 for very small test cluster. Previous attempt at fixing the flakiness: solana-labs#17408 This commit uses smaller min number of bloom items condition on that if debug assertions are enabled or not.
coverage ci builds are have become flaky presumably because of the overhead added in solana-labs#17236 for very small test clusters. This commit uses a smaller min number of bloom items condition on that if debug assertions are enabled or not. Previous attempt at fixing the flakiness: solana-labs#17408
coverage ci builds are have become flaky presumably because of the overhead added in #17236 for very small test clusters. This commit uses a smaller min number of bloom items condition on that if debug assertions are enabled or not. Previous attempt at fixing the flakiness: #17408 (cherry picked from commit 5567305)
coverage ci builds are have become flaky presumably because of the overhead added in #17236 for very small test clusters. This commit uses a smaller min number of bloom items condition on that if debug assertions are enabled or not. Previous attempt at fixing the flakiness: #17408 (cherry picked from commit 5567305) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Problem
When a validator starts, it has an (almost) empty crds table and it only sends one pull-request to the entrypoint. The bloom filter in the pull-request targets 10% false rate given the number of items. So, if the
num_items
is very wrong, it makes a very small bloom filter with a very high false rate:https://github.com/solana-labs/solana/blob/2ae57c172/runtime/src/bloom.rs#L70-L80
https://github.com/solana-labs/solana/blob/2ae57c172/core/src/crds_gossip_pull.rs#L48
As a result, it is very unlikely that the validator obtains entrypoint's contact-info in response. This exacerbates how long the validator will loop on:
https://github.com/solana-labs/solana/blob/ed51cde37/validator/src/main.rs#L390-L412
Summary of Changes
This commit increases the min number of bloom items when making gossip pull requests. Effectively this will break the entrypoint crds table into 64 shards, one pull-request for each, a larger bloom filter for each shard, and increases the chances that the response will include entrypoint's contact-info, which is needed for adopting shred version and validator start.