Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builds crds filters in parallel #12360

Merged

Conversation

behzadnouri
Copy link
Contributor

@behzadnouri behzadnouri commented Sep 20, 2020

Problem

Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

Summary of Changes

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).

@behzadnouri behzadnouri force-pushed the par-build-crds-filter-mtx branch 3 times, most recently from bb7cd54 to 5981f6c Compare September 21, 2020 00:29
@codecov
Copy link

codecov bot commented Sep 21, 2020

Codecov Report

Merging #12360 into master will increase coverage by 0.0%.
The diff coverage is 97.3%.

@@           Coverage Diff           @@
##           master   #12360   +/-   ##
=======================================
  Coverage    82.0%    82.0%           
=======================================
  Files         356      356           
  Lines       83106    83164   +58     
=======================================
+ Hits        68181    68247   +66     
+ Misses      14925    14917    -8     

@behzadnouri behzadnouri marked this pull request as ready for review September 21, 2020 01:40
sakridge
sakridge previously approved these changes Sep 21, 2020
Copy link
Member

@sakridge sakridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, just couple of nits. I could think of some ideas to optimize around the lock critical section and maybe coalescing work going into the lock, but seems quite a bit better than what we have.

@mergify mergify bot dismissed sakridge’s stale review September 21, 2020 18:18

Pull request has been modified.

crds.table
.par_values()
.with_min_len(PAR_MIN_LENGTH)
.for_each(|v| filters.add(v.value_hash))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does aggregating the values for each filter outside of the lock before calling add to bulk insert those values improve the bench at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I tried something similar to what I think you are suggesting, and it was slightly slower. I have some ideas about a lock-free implementation of the bloom filter construction, that I am going to test as well and compare.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I was thinking of doing like a batch calculate a set of positions to set outside the lock.. then go into the lock and bulk set the positions. It might be more beneficial on a larger machine with more cores which would show more lock contention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sent out #12422 which adds an atomic variant of the bloom filter, and will use it here once that is merged. so wouldn't need locking and mutex here anymore.

filters.0
let filters = CrdsFilterSet::new(num, bloom_size);
rayon::join(
|| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the gossip-specific threadpool here might be a good idea. The thread limit will keep it from using the whole machine's CPU and we've found that not sharing work between other instances of rayon generally performs better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I tried adding a ThreadPool to struct CrdsGossipPull, and it was not compiling with the #[derive(Clone)] thing. struct CrdsGossip also has #[derive(Clone)]. Would it make sense to add the ThreadPool to struct ClusterInfo and pass it down the call-stack? or, do you have a different suggestion?

I am not sure those #[derive(Clone)] are used anywhere, so we may be able to drop them. Or, alternatively implement Clone manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea. I was thinking passing it down the call stack. removing #[derive(Clone)] or doing Clone manually doesn't seem like it would be too bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a ThreadPool to struct ClusterInfo. please take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After adding a dedicated ThreadPool to struct ClusterInfo, a number of tests started seg-faulting and crashing, apparently because of going out of memory. so I had to:

Should we also limit the number of threads for the threadpool in cluster-info as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative would be to actually have two separate thread-pools, one for ClusterInfo::listen thread, and the other for ClusterInfo::gossip thread.

It has the advantage that listen and gossip may not block each other, at the cost of higher memory requirements at max load.

Copy link
Member

@sakridge sakridge Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. I hesitate to add another thread pool because we already have too many. The change for the pool getting created in gossip service doesn't look too bad. The work is not really blocking like an IO-heavy workload might be, so I think it might be fine to share the thread pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. I sent out #12402

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the method in this PR is fine where thread_pool is a member of ClusterInfo struct. Just move the thread pool creation of gossip-work-{i} to ClusterInfo::new and then use self.thread_pool in ClusterInfo::listen. For test configuration just create a 1 or 2 thread pool.

Copy link
Contributor Author

@behzadnouri behzadnouri Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated #12402 accordingly

@behzadnouri behzadnouri force-pushed the par-build-crds-filter-mtx branch 4 times, most recently from 5e559f5 to 0e72a12 Compare September 22, 2020 15:23
@behzadnouri behzadnouri force-pushed the par-build-crds-filter-mtx branch 8 times, most recently from 9a25b4f to 5ba16fa Compare September 26, 2020 22:17
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Sep 29, 2020
solana-labs#12402
moved gossip-work threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334
to ClusterInfo::new as a new field in the ClusterInfo struct:
https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249
So that they can be shared between listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

However, in testing solana-labs#12360
it turned out this will cause breakage:
https://buildkite.com/solana-labs/solana/builds/31646
https://buildkite.com/solana-labs/solana/builds/31651
https://buildkite.com/solana-labs/solana/builds/31655
Whereas with separate thread pools all is good. It might be the case
that one thread is slowing down the other by exhausting the thread-pool
whereas with separate thread-pools we get fair scheduling guarantees
from the os.

This commit reverts solana-labs#12402
and instead add separate thread-pools for listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67
behzadnouri added a commit to behzadnouri/solana that referenced this pull request Sep 29, 2020
solana-labs#12402
moved gossip-work threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334
to ClusterInfo::new as a new field in the ClusterInfo struct:
https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249
So that they can be shared between listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

However, in testing solana-labs#12360
it turned out this will cause breakage:
https://buildkite.com/solana-labs/solana/builds/31646
https://buildkite.com/solana-labs/solana/builds/31651
https://buildkite.com/solana-labs/solana/builds/31655
Whereas with separate thread pools all is good. It might be the case
that one thread is slowing down the other by exhausting the thread-pool
whereas with separate thread-pools we get fair scheduling guarantees
from the os.

This commit reverts solana-labs#12402
and instead adds separate thread-pools for listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67
behzadnouri added a commit that referenced this pull request Sep 29, 2020
#12402
moved gossip-work threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334
to ClusterInfo::new as a new field in the ClusterInfo struct:
https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249
So that they can be shared between listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

However, in testing #12360
it turned out this will cause breakage:
https://buildkite.com/solana-labs/solana/builds/31646
https://buildkite.com/solana-labs/solana/builds/31651
https://buildkite.com/solana-labs/solana/builds/31655
Whereas with separate thread pools all is good. It might be the case
that one thread is slowing down the other by exhausting the thread-pool
whereas with separate thread-pools we get fair scheduling guarantees
from the os.

This commit reverts #12402
and instead adds separate thread-pools for listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67
mergify bot pushed a commit that referenced this pull request Sep 29, 2020
#12402
moved gossip-work threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334
to ClusterInfo::new as a new field in the ClusterInfo struct:
https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249
So that they can be shared between listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

However, in testing #12360
it turned out this will cause breakage:
https://buildkite.com/solana-labs/solana/builds/31646
https://buildkite.com/solana-labs/solana/builds/31651
https://buildkite.com/solana-labs/solana/builds/31655
Whereas with separate thread pools all is good. It might be the case
that one thread is slowing down the other by exhausting the thread-pool
whereas with separate thread-pools we get fair scheduling guarantees
from the os.

This commit reverts #12402
and instead adds separate thread-pools for listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

(cherry picked from commit 0d5258b)
jackcmay pushed a commit that referenced this pull request Sep 29, 2020
#12402
moved gossip-work threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334
to ClusterInfo::new as a new field in the ClusterInfo struct:
https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249
So that they can be shared between listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

However, in testing #12360
it turned out this will cause breakage:
https://buildkite.com/solana-labs/solana/builds/31646
https://buildkite.com/solana-labs/solana/builds/31651
https://buildkite.com/solana-labs/solana/builds/31655
Whereas with separate thread pools all is good. It might be the case
that one thread is slowing down the other by exhausting the thread-pool
whereas with separate thread-pools we get fair scheduling guarantees
from the os.

This commit reverts #12402
and instead adds separate thread-pools for listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

(cherry picked from commit 0d5258b)
mergify bot added a commit that referenced this pull request Sep 29, 2020
)

#12402
moved gossip-work threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334
to ClusterInfo::new as a new field in the ClusterInfo struct:
https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249
So that they can be shared between listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

However, in testing #12360
it turned out this will cause breakage:
https://buildkite.com/solana-labs/solana/builds/31646
https://buildkite.com/solana-labs/solana/builds/31651
https://buildkite.com/solana-labs/solana/builds/31655
Whereas with separate thread pools all is good. It might be the case
that one thread is slowing down the other by exhausting the thread-pool
whereas with separate thread-pools we get fair scheduling guarantees
from the os.

This commit reverts #12402
and instead adds separate thread-pools for listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

(cherry picked from commit 0d5258b)

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
Copy link
Member

@sakridge sakridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@behzadnouri behzadnouri merged commit 537bbde into solana-labs:master Sep 29, 2020
mergify bot pushed a commit that referenced this pull request Sep 29, 2020
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).

(cherry picked from commit 537bbde)

# Conflicts:
#	core/Cargo.toml
mergify bot added a commit that referenced this pull request Sep 30, 2020
* builds crds filters in parallel (#12360)

Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).

(cherry picked from commit 537bbde)

# Conflicts:
#	core/Cargo.toml

* resolves mergify merge conflict

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
@behzadnouri behzadnouri deleted the par-build-crds-filter-mtx branch October 6, 2020 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants