New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache the results of isAQuorum
#2771
Cache the results of isAQuorum
#2771
Conversation
@@ -461,10 +462,31 @@ QuorumIntersectionCheckerImpl::containsQuorumSliceForNode(BitSet const& bs, | |||
return containsQuorumSlice(bs, mGraph.at(node)); | |||
} | |||
|
|||
size_t | |||
QuorumIntersectionCheckerImpl::BitSetHashFunction:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you should add the hash function to the class found in util/BitSet.h
- this implementation is bad, and performs a lot of copies (as per your heap track results). As you move the hash function into the
BitSet
class, you can write a much more efficient hash function by hashing directly the buffer contained inmPtr
(I think it's just a continuous buffer of bits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MonsieurNicolas
One thing that I can't seem to figure out is how to hash the buffer. mPtr
points to a struct bitset_s
which contains uint64_t* array
and size_t arraysize
. The two ways that I can think of after reading various posts on Stack Overflow are:
- Put them in a vector and hash it using its hash function, which would put me back to square 1,
- Implement a custom hash function such based on a hash function that I can find online like https://www.boost.org/doc/libs/1_35_0/doc/html/boost/hash_combine_id241013.html. But I feel that I've been told never to write any hashing or encryption functions by some people, so I'm not sure.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good instinct :)
In other places, we use this : stellar::shortHash::computeHash(stellar::ByteSlice(x.data(), x.size()))
- this uses sip hash which may be a bit slow for this application, but you can try it.
You can still roll your own hash function, for Bitset
, it doesn't need to be that good.
That said, maybe using a map
instead of unordered_map
would just be faster? I think you mentioned it before, but I am not sure what you tried: all you need is to define an implementation of std::less
for BitSet
, and you can do something similar to what we're doing for the hash function, and simply use std::memcmp
internally (you can short circuit using size to avoid some comparisons and define any buffer smaller than another to be "less").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I tried
shortHash
, but it was slower than the Boost implementation, so I decided to go with the custom hash function. This reduced the memory usage by a lot! - I'm not sure, but I'm guessing
std::less
and<
are the same thing? TheBitSet
already has<=
defined, but they are defined as the subset relation. When I tried it,map
didn't work appropriately with this. We already have at least one place where we use<=
to mean the subset relation such as https://github.com/stellar/stellar-core/blob/master/src/herder/QuorumIntersectionCheckerImpl.cpp#L275, so it might be a bit confusing if we have both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For std::map
to work you need a comparison function and <=
does not meet the requirements for <
.
I didn't realize that BitSet
was already using those operator overloads so yeah we should not use std::less
or operator<
in this case. When instantiating std::map
you can provide a custom comparator there, so you can just do that (and give it a good name like BitSet::StorageComparator
as I think it needs to be a comparison class).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about this more, we should also remove the operator<=
overload for BitSet
: as not all sets are comparable, this makes it error prone/confusing. It's just better to replace it with an explicit call to isSubsetEq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that <=
is confusing, and I'll replace it with an explicit call to isSubsetEq
.
One thing I just realized is, do you recommend that I use std::map
over RandomEvictionCache
? RandomEvictionCache
is implemented in terms of unordered_map
, so I believe that I'll need to use a hashing function instead of a comparator. I'm wondering if, in cases with a lot of nodes (>= 30 nodes?), the cache might get a bit too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some more thinking, I'm wondering if it is better to leave the <=
operator as is for now.
RandomEvictionCache
seems more suitable for our case thanstd::map
so we can avoid the state explosion as the number of subsets can get huge really quickly.- Since
RandomEvictionCache
is implemented in terms ofunordered_map
, it may be a bit confusing if this PR modified the<=
operator of theBitSet
class as it's not directly related. - From what I see in the tests for
BitSet
, it was pretty intentional to implement<=
as a subset operator.
I opened an issue for this #2800. I figured that we can keep track of that there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool yeah that's fine
932fe3a
to
adc16ca
Compare
adc16ca
to
eaaf72e
Compare
src/util/BitSet.h
Outdated
@@ -351,6 +351,25 @@ class BitSet | |||
{ | |||
streamWith(out, [](std::ostream& out, size_t i) { out << i; }); | |||
} | |||
class HashFunction | |||
{ | |||
std::hash<uint64_t> hasher; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming convention is mHasher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
@@ -461,10 +462,31 @@ QuorumIntersectionCheckerImpl::containsQuorumSliceForNode(BitSet const& bs, | |||
return containsQuorumSlice(bs, mGraph.at(node)); | |||
} | |||
|
|||
size_t | |||
QuorumIntersectionCheckerImpl::BitSetHashFunction:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool yeah that's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit, and we can merge!
eaaf72e
to
fbb4f2d
Compare
r+ fbb4f2d |
Description
Improve the performance of the quorum intersection checker by caching the results of
isAQuorum
.After caching results of different functions in the
QuorumIntersectionCheckerImpl.cpp
(#2140),isAQuorum
seems to be the best function to cache the results for because:contractToMaximalQuorum
and is bigger than caching results of other functions.contractToMaximalQuorum
consumes more memory because both the input and output areBitSet
. On the other hand,isAQuorum
's return value is boolean, so it consumes less memory.On top of that, I have also included a small change to
contractToMaximalQuorum
which also improves the execution time.These two changes make the code pass
quorum intersection scaling test
roughly 10-30% faster than compared to the master branch.heaptrack src/stellar-core test -a 'quorum intersection scaling test'
)Master
This change
perf
shows that the functioncontainsQuorumSlice
takes 70% of the execution time in the master branch, but it will only take 54% with this change, which, I believe, comes from caching the results ofisAQuorum
.Checklist
clang-format
v5.0.0 (viamake format
or the Visual Studio extension)