builds crds filters in parallel #12360

behzadnouri · 2020-09-20T20:40:48Z

Problem

Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

Summary of Changes

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).

codecov · 2020-09-21T01:35:52Z

Codecov Report

Merging #12360 into master will increase coverage by 0.0%.
The diff coverage is 97.3%.

@@           Coverage Diff           @@
##           master   #12360   +/-   ##
=======================================
  Coverage    82.0%    82.0%           
=======================================
  Files         356      356           
  Lines       83106    83164   +58     
=======================================
+ Hits        68181    68247   +66     
+ Misses      14925    14917    -8

core/src/crds_gossip_pull.rs

sakridge

Looks pretty good to me, just couple of nits. I could think of some ideas to optimize around the lock critical section and maybe coalescing work going into the lock, but seems quite a bit better than what we have.

Pull request has been modified.

carllin · 2020-09-21T18:47:09Z

core/src/crds_gossip_pull.rs

+                crds.table
+                    .par_values()
+                    .with_min_len(PAR_MIN_LENGTH)
+                    .for_each(|v| filters.add(v.value_hash))


Does aggregating the values for each filter outside of the lock before calling add to bulk insert those values improve the bench at all?

Initially I tried something similar to what I think you are suggesting, and it was slightly slower. I have some ideas about a lock-free implementation of the bloom filter construction, that I am going to test as well and compare.

Yea, I was thinking of doing like a batch calculate a set of positions to set outside the lock.. then go into the lock and bulk set the positions. It might be more beneficial on a larger machine with more cores which would show more lock contention.

sent out #12422 which adds an atomic variant of the bloom filter, and will use it here once that is merged. so wouldn't need locking and mutex here anymore.

sakridge · 2020-09-21T18:52:01Z

core/src/crds_gossip_pull.rs

-        filters.0
+        let filters = CrdsFilterSet::new(num, bloom_size);
+        rayon::join(
+            || {


I think using the gossip-specific threadpool here might be a good idea. The thread limit will keep it from using the whole machine's CPU and we've found that not sharing work between other instances of rayon generally performs better.

Agreed, but I tried adding a ThreadPool to struct CrdsGossipPull, and it was not compiling with the #[derive(Clone)] thing. struct CrdsGossip also has #[derive(Clone)]. Would it make sense to add the ThreadPool to struct ClusterInfo and pass it down the call-stack? or, do you have a different suggestion?

I am not sure those #[derive(Clone)] are used anywhere, so we may be able to drop them. Or, alternatively implement Clone manually.

yea. I was thinking passing it down the call stack. removing #[derive(Clone)] or doing Clone manually doesn't seem like it would be too bad.

added a ThreadPool to struct ClusterInfo. please take a look.

After adding a dedicated ThreadPool to struct ClusterInfo, a number of tests started seg-faulting and crashing, apparently because of going out of memory. so I had to:

add #[serial] to two of the tests in core/tests/crds_gossip.rs.

limit number of threads in clone_with_id when cloning cluster-info.

Should we also limit the number of threads for the threadpool in cluster-info as well?

An alternative would be to actually have two separate thread-pools, one for ClusterInfo::listen thread, and the other for ClusterInfo::gossip thread.

It has the advantage that listen and gossip may not block each other, at the cost of higher memory requirements at max load.

Yea. I hesitate to add another thread pool because we already have too many. The change for the pool getting created in gossip service doesn't look too bad. The work is not really blocking like an IO-heavy workload might be, so I think it might be fine to share the thread pool.

sounds good. I sent out #12402

Actually, I think the method in this PR is fine where thread_pool is a member of ClusterInfo struct. Just move the thread pool creation of gossip-work-{i} to ClusterInfo::new and then use self.thread_pool in ClusterInfo::listen. For test configuration just create a 1 or 2 thread pool.

updated #12402 accordingly

solana-labs#12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing solana-labs#12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts solana-labs#12402 and instead add separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

solana-labs#12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing solana-labs#12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts solana-labs#12402 and instead adds separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

#12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing #12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts #12402 and instead adds separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

#12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing #12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts #12402 and instead adds separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 (cherry picked from commit 0d5258b)

) #12402 moved gossip-work threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334 to ClusterInfo::new as a new field in the ClusterInfo struct: https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249 So that they can be shared between listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 However, in testing #12360 it turned out this will cause breakage: https://buildkite.com/solana-labs/solana/builds/31646 https://buildkite.com/solana-labs/solana/builds/31651 https://buildkite.com/solana-labs/solana/builds/31655 Whereas with separate thread pools all is good. It might be the case that one thread is slowing down the other by exhausting the thread-pool whereas with separate thread-pools we get fair scheduling guarantees from the os. This commit reverts #12402 and instead adds separate thread-pools for listen and gossip threads: https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67 (cherry picked from commit 0d5258b) Co-authored-by: behzad nouri <behzadnouri@gmail.com>

Based on run-time profiles, the majority time of new_pull_requests is spent building bloom filters, in hashing and bit-vec ops. This commit builds crds filters in parallel using rayon constructs. The added benchmark shows ~5x speedup (4-core machine, 8 threads).

sakridge

lgtm

Based on run-time profiles, the majority time of new_pull_requests is spent building bloom filters, in hashing and bit-vec ops. This commit builds crds filters in parallel using rayon constructs. The added benchmark shows ~5x speedup (4-core machine, 8 threads). (cherry picked from commit 537bbde) # Conflicts: # core/Cargo.toml

* builds crds filters in parallel (#12360) Based on run-time profiles, the majority time of new_pull_requests is spent building bloom filters, in hashing and bit-vec ops. This commit builds crds filters in parallel using rayon constructs. The added benchmark shows ~5x speedup (4-core machine, 8 threads). (cherry picked from commit 537bbde) # Conflicts: # core/Cargo.toml * resolves mergify merge conflict Co-authored-by: behzad nouri <behzadnouri@gmail.com>

behzadnouri force-pushed the par-build-crds-filter-mtx branch 3 times, most recently from bb7cd54 to 5981f6c Compare September 21, 2020 00:29

behzadnouri force-pushed the par-build-crds-filter-mtx branch from 5981f6c to 6ba6d00 Compare September 21, 2020 01:40

behzadnouri marked this pull request as ready for review September 21, 2020 01:40

behzadnouri requested review from carllin and sakridge September 21, 2020 01:40

sakridge reviewed Sep 21, 2020

View reviewed changes

core/src/crds_gossip_pull.rs Outdated Show resolved Hide resolved

sakridge reviewed Sep 21, 2020

View reviewed changes

core/src/crds_gossip_pull.rs Outdated Show resolved Hide resolved

sakridge previously approved these changes Sep 21, 2020

View reviewed changes

behzadnouri force-pushed the par-build-crds-filter-mtx branch from 6ba6d00 to 23222a6 Compare September 21, 2020 18:17

carllin reviewed Sep 21, 2020

View reviewed changes

sakridge reviewed Sep 21, 2020

View reviewed changes

behzadnouri force-pushed the par-build-crds-filter-mtx branch 4 times, most recently from 5e559f5 to 0e72a12 Compare September 22, 2020 15:23

behzadnouri requested review from carllin and sakridge September 22, 2020 17:16

behzadnouri mentioned this pull request Sep 24, 2020

adds an atomic variant of the bloom filter #12422

Merged

behzadnouri force-pushed the par-build-crds-filter-mtx branch 8 times, most recently from 9a25b4f to 5ba16fa Compare September 26, 2020 22:17

behzadnouri force-pushed the par-build-crds-filter-mtx branch from cd4cc82 to 2c46dfb Compare September 27, 2020 14:30

behzadnouri mentioned this pull request Sep 28, 2020

separates out ClusterInfo::{gossip,listen} thread-pools #12535

Merged

behzadnouri force-pushed the par-build-crds-filter-mtx branch from 2c46dfb to 6489cf3 Compare September 29, 2020 10:46

behzadnouri force-pushed the par-build-crds-filter-mtx branch from 6489cf3 to 2939a63 Compare September 29, 2020 15:16

behzadnouri force-pushed the par-build-crds-filter-mtx branch from 2939a63 to df719df Compare September 29, 2020 17:56

sakridge approved these changes Sep 29, 2020

View reviewed changes

behzadnouri merged commit 537bbde into solana-labs:master Sep 29, 2020

behzadnouri added the v1.3 label Sep 29, 2020

mergify bot mentioned this pull request Sep 29, 2020

builds crds filters in parallel (bp #12360) #12571

Merged

behzadnouri deleted the par-build-crds-filter-mtx branch October 6, 2020 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

builds crds filters in parallel #12360

builds crds filters in parallel #12360

behzadnouri commented Sep 20, 2020 •

edited

Loading

codecov bot commented Sep 21, 2020 •

edited

Loading

sakridge left a comment

carllin Sep 21, 2020

behzadnouri Sep 21, 2020

sakridge Sep 22, 2020

behzadnouri Sep 23, 2020

sakridge Sep 21, 2020

behzadnouri Sep 21, 2020

sakridge Sep 21, 2020

behzadnouri Sep 22, 2020

behzadnouri Sep 22, 2020

behzadnouri Sep 22, 2020

sakridge Sep 22, 2020 •

edited

Loading

behzadnouri Sep 23, 2020

sakridge Sep 23, 2020

behzadnouri Sep 23, 2020 •

edited

Loading

sakridge left a comment

builds crds filters in parallel #12360

builds crds filters in parallel #12360

Conversation

behzadnouri commented Sep 20, 2020 • edited Loading

Problem

Summary of Changes

codecov bot commented Sep 21, 2020 • edited Loading

Codecov Report

sakridge left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sakridge Sep 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

behzadnouri Sep 23, 2020 • edited Loading

Choose a reason for hiding this comment

sakridge left a comment

Choose a reason for hiding this comment

behzadnouri commented Sep 20, 2020 •

edited

Loading

codecov bot commented Sep 21, 2020 •

edited

Loading

sakridge Sep 22, 2020 •

edited

Loading

behzadnouri Sep 23, 2020 •

edited

Loading