-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builds crds filters in parallel #12360
builds crds filters in parallel #12360
Conversation
bb7cd54
to
5981f6c
Compare
Codecov Report
@@ Coverage Diff @@
## master #12360 +/- ##
=======================================
Coverage 82.0% 82.0%
=======================================
Files 356 356
Lines 83106 83164 +58
=======================================
+ Hits 68181 68247 +66
+ Misses 14925 14917 -8 |
5981f6c
to
6ba6d00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good to me, just couple of nits. I could think of some ideas to optimize around the lock critical section and maybe coalescing work going into the lock, but seems quite a bit better than what we have.
6ba6d00
to
23222a6
Compare
Pull request has been modified.
crds.table | ||
.par_values() | ||
.with_min_len(PAR_MIN_LENGTH) | ||
.for_each(|v| filters.add(v.value_hash)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does aggregating the values for each filter outside of the lock before calling add
to bulk insert those values improve the bench at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I tried something similar to what I think you are suggesting, and it was slightly slower. I have some ideas about a lock-free implementation of the bloom filter construction, that I am going to test as well and compare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I was thinking of doing like a batch calculate a set of positions to set outside the lock.. then go into the lock and bulk set the positions. It might be more beneficial on a larger machine with more cores which would show more lock contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sent out #12422 which adds an atomic variant of the bloom filter, and will use it here once that is merged. so wouldn't need locking and mutex here anymore.
filters.0 | ||
let filters = CrdsFilterSet::new(num, bloom_size); | ||
rayon::join( | ||
|| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the gossip-specific threadpool here might be a good idea. The thread limit will keep it from using the whole machine's CPU and we've found that not sharing work between other instances of rayon generally performs better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but I tried adding a ThreadPool to struct CrdsGossipPull
, and it was not compiling with the #[derive(Clone)]
thing. struct CrdsGossip
also has #[derive(Clone)]
. Would it make sense to add the ThreadPool
to struct ClusterInfo
and pass it down the call-stack? or, do you have a different suggestion?
I am not sure those #[derive(Clone)]
are used anywhere, so we may be able to drop them. Or, alternatively implement Clone
manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea. I was thinking passing it down the call stack. removing #[derive(Clone)]
or doing Clone
manually doesn't seem like it would be too bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a ThreadPool to struct ClusterInfo
. please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After adding a dedicated ThreadPool to struct ClusterInfo
, a number of tests started seg-faulting and crashing, apparently because of going out of memory. so I had to:
- add
#[serial]
to two of the tests in core/tests/crds_gossip.rs. - limit number of threads in
clone_with_id
when cloning cluster-info.
Should we also limit the number of threads for the threadpool in cluster-info as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be to actually have two separate thread-pools, one for ClusterInfo::listen
thread, and the other for ClusterInfo::gossip
thread.
It has the advantage that listen
and gossip
may not block each other, at the cost of higher memory requirements at max load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea. I hesitate to add another thread pool because we already have too many. The change for the pool getting created in gossip service doesn't look too bad. The work is not really blocking like an IO-heavy workload might be, so I think it might be fine to share the thread pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. I sent out #12402
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think the method in this PR is fine where thread_pool is a member of ClusterInfo
struct. Just move the thread pool creation of gossip-work-{i} to ClusterInfo::new
and then use self.thread_pool
in ClusterInfo::listen
. For test configuration just create a 1 or 2 thread pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated #12402 accordingly
5e559f5
to
0e72a12
Compare
9a25b4f
to
5ba16fa
Compare
cd4cc82
to
2c46dfb
Compare
solana-labs#12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing solana-labs#12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts solana-labs#12402 and instead add separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67
solana-labs#12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing solana-labs#12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts solana-labs#12402 and instead adds separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67
#12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing #12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts #12402 and instead adds separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67
#12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing #12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts #12402 and instead adds separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 (cherry picked from commit 0d5258b)
#12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing #12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts #12402 and instead adds separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 (cherry picked from commit 0d5258b)
2c46dfb
to
6489cf3
Compare
) #12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing #12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts #12402 and instead adds separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 (cherry picked from commit 0d5258b) Co-authored-by: behzad nouri <behzadnouri@gmail.com>
6489cf3
to
2939a63
Compare
Based on run-time profiles, the majority time of new_pull_requests is spent building bloom filters, in hashing and bit-vec ops. This commit builds crds filters in parallel using rayon constructs. The added benchmark shows ~5x speedup (4-core machine, 8 threads).
2939a63
to
df719df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Based on run-time profiles, the majority time of new_pull_requests is spent building bloom filters, in hashing and bit-vec ops. This commit builds crds filters in parallel using rayon constructs. The added benchmark shows ~5x speedup (4-core machine, 8 threads). (cherry picked from commit 537bbde) # Conflicts: # core/Cargo.toml
* builds crds filters in parallel (#12360) Based on run-time profiles, the majority time of new_pull_requests is spent building bloom filters, in hashing and bit-vec ops. This commit builds crds filters in parallel using rayon constructs. The added benchmark shows ~5x speedup (4-core machine, 8 threads). (cherry picked from commit 537bbde) # Conflicts: # core/Cargo.toml * resolves mergify merge conflict Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Problem
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.
Summary of Changes
This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).