Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the results of isAQuorum #2771

Merged
merged 3 commits into from Nov 16, 2020

Conversation

hidenori-shinohara
Copy link
Contributor

@hidenori-shinohara hidenori-shinohara commented Oct 26, 2020

Description

Improve the performance of the quorum intersection checker by caching the results of isAQuorum.

After caching results of different functions in the QuorumIntersectionCheckerImpl.cpp (#2140), isAQuorum seems to be the best function to cache the results for because:

  • The improvement is comparable to contractToMaximalQuorum and is bigger than caching results of other functions.
  • Caching the results of contractToMaximalQuorum consumes more memory because both the input and output are BitSet. On the other hand, isAQuorum's return value is boolean, so it consumes less memory.

On top of that, I have also included a small change to contractToMaximalQuorum which also improves the execution time.
These two changes make the code pass quorum intersection scaling test roughly 10-30% faster than compared to the master branch.

  • Memory consumption (heaptrack src/stellar-core test -a 'quorum intersection scaling test')

Master

bytes allocated in total (ignoring deallocations): 3.83MB (1.18MB/s)
calls to allocation functions: 30384 (9334/s)
temporary memory allocations: 1736 (533/s)
peak heap memory consumption: 1.87MB
peak RSS (including heaptrack overhead): 71.31MB
total memory leaked: 1.28MB

This change

bytes allocated in total (ignoring deallocations): 37.92MB (8.78MB/s)
calls to allocation functions: 1663627 (385009/s)
temporary memory allocations: 1385472 (320636/s)
peak heap memory consumption: 8.70MB
peak RSS (including heaptrack overhead): 404.98MB
total memory leaked: 1.25MB
  • perf shows that the function containsQuorumSlice takes 70% of the execution time in the master branch, but it will only take 54% with this change, which, I believe, comes from caching the results of isAQuorum.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v5.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@@ -461,10 +462,31 @@ QuorumIntersectionCheckerImpl::containsQuorumSliceForNode(BitSet const& bs,
return containsQuorumSlice(bs, mGraph.at(node));
}

size_t
QuorumIntersectionCheckerImpl::BitSetHashFunction::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. you should add the hash function to the class found in util/BitSet.h
  2. this implementation is bad, and performs a lot of copies (as per your heap track results). As you move the hash function into the BitSet class, you can write a much more efficient hash function by hashing directly the buffer contained in mPtr (I think it's just a continuous buffer of bits)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MonsieurNicolas
One thing that I can't seem to figure out is how to hash the buffer. mPtr points to a struct bitset_s which contains uint64_t* array and size_t arraysize. The two ways that I can think of after reading various posts on Stack Overflow are:

  1. Put them in a vector and hash it using its hash function, which would put me back to square 1,
  2. Implement a custom hash function such based on a hash function that I can find online like https://www.boost.org/doc/libs/1_35_0/doc/html/boost/hash_combine_id241013.html. But I feel that I've been told never to write any hashing or encryption functions by some people, so I'm not sure.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good instinct :)

In other places, we use this : stellar::shortHash::computeHash(stellar::ByteSlice(x.data(), x.size())) - this uses sip hash which may be a bit slow for this application, but you can try it.

You can still roll your own hash function, for Bitset, it doesn't need to be that good.

That said, maybe using a map instead of unordered_map would just be faster? I think you mentioned it before, but I am not sure what you tried: all you need is to define an implementation of std::less for BitSet, and you can do something similar to what we're doing for the hash function, and simply use std::memcmp internally (you can short circuit using size to avoid some comparisons and define any buffer smaller than another to be "less").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I tried shortHash, but it was slower than the Boost implementation, so I decided to go with the custom hash function. This reduced the memory usage by a lot!
  • I'm not sure, but I'm guessing std::less and < are the same thing? The BitSet already has <= defined, but they are defined as the subset relation. When I tried it, map didn't work appropriately with this. We already have at least one place where we use <= to mean the subset relation such as https://github.com/stellar/stellar-core/blob/master/src/herder/QuorumIntersectionCheckerImpl.cpp#L275, so it might be a bit confusing if we have both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For std::map to work you need a comparison function and <= does not meet the requirements for <.
I didn't realize that BitSet was already using those operator overloads so yeah we should not use std::less or operator< in this case. When instantiating std::map you can provide a custom comparator there, so you can just do that (and give it a good name like BitSet::StorageComparator as I think it needs to be a comparison class).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this more, we should also remove the operator<= overload for BitSet : as not all sets are comparable, this makes it error prone/confusing. It's just better to replace it with an explicit call to isSubsetEq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that <= is confusing, and I'll replace it with an explicit call to isSubsetEq.

One thing I just realized is, do you recommend that I use std::map over RandomEvictionCache? RandomEvictionCache is implemented in terms of unordered_map, so I believe that I'll need to use a hashing function instead of a comparator. I'm wondering if, in cases with a lot of nodes (>= 30 nodes?), the cache might get a bit too big.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more thinking, I'm wondering if it is better to leave the <= operator as is for now.

  • RandomEvictionCache seems more suitable for our case than std::map so we can avoid the state explosion as the number of subsets can get huge really quickly.
  • Since RandomEvictionCache is implemented in terms of unordered_map, it may be a bit confusing if this PR modified the <= operator of the BitSet class as it's not directly related.
  • From what I see in the tests for BitSet, it was pretty intentional to implement <= as a subset operator.

I opened an issue for this #2800. I figured that we can keep track of that there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool yeah that's fine

src/herder/QuorumIntersectionCheckerImpl.h Outdated Show resolved Hide resolved
src/herder/QuorumIntersectionCheckerImpl.cpp Outdated Show resolved Hide resolved
src/herder/QuorumIntersectionCheckerImpl.cpp Outdated Show resolved Hide resolved
@@ -351,6 +351,25 @@ class BitSet
{
streamWith(out, [](std::ostream& out, size_t i) { out << i; });
}
class HashFunction
{
std::hash<uint64_t> hasher;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming convention is mHasher

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

@@ -461,10 +462,31 @@ QuorumIntersectionCheckerImpl::containsQuorumSliceForNode(BitSet const& bs,
return containsQuorumSlice(bs, mGraph.at(node));
}

size_t
QuorumIntersectionCheckerImpl::BitSetHashFunction::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool yeah that's fine

Copy link
Contributor

@MonsieurNicolas MonsieurNicolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, and we can merge!

@MonsieurNicolas MonsieurNicolas added this to In progress in v15.1.0 via automation Nov 13, 2020
@MonsieurNicolas
Copy link
Contributor

r+ fbb4f2d

@latobarita latobarita merged commit 1d9c8bc into stellar:master Nov 16, 2020
v15.1.0 automation moved this from In progress to Done Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants