Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve requesting frontier performance #1503

Merged
merged 2 commits into from Dec 28, 2018

Conversation

Projects
6 participants
@wezrule
Copy link
Collaborator

commented Dec 27, 2018

On an MSVC Release build I profiled bootstrapping a fresh node for 30 seconds (chosen arbitrarily) to see if there was anything obvious which easily be improved. I found that 7% of the time was spent inside rai::uint256_union::number() when requesting frontiers.

There is a boost function boost::multiprecision::import_bits() which takes a sequence of bytes and constructs the appropriate multiprecision number (and conversely export_bits() to do the opposite). I tried using this instead and there was a small performance improvement. Next I looked at the callers and the majority were from the less than operator rai::uint256_union::operator<. Inside it, it calls number() on each operand and does the comparison with the result. However I think this step is unnecessary, and the bytes can be compared directly with std::memcmp. Results shown below (time spent inside operator<):

Original: 7%
Using boost::import_bits() inside number (): 3.18%
Replacing number () with memcmp: 0.3%

The numbers are even greater on a debug build (sometimes as high as 20% in the current build). While the import_bits() function change no longer affects this specific result, I have left it in as it is still beneficial. I also modified the *_union constructors to call export_bits to modify the bytes array with the passed in number. Some tests have been added to cover the changes to operator<.

@clemahieu

This comment has been minimized.

Copy link
Collaborator

commented Dec 27, 2018

Excellent find.

@SergiySW
Copy link
Collaborator

left a comment

According to my tests with GCC SSE4 build it can save up to 75-80% of time required to calculate number () or sort 100k uint256_union vector (< operator)

@SergiySW SergiySW requested a review from cryptocode Dec 27, 2018

@wezrule

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 27, 2018

Thanks guys. I haven't tried this on a big-endian system though, I wonder if there could be some problems?

@cryptocode

This comment has been minimized.

Copy link
Collaborator

commented Dec 27, 2018

The protocol implementation (at least partially) assumes little endian hosts as-is, so that should be fine. Doing LE-to-native those places would be a nice improvement.

@wezrule

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 27, 2018

@cryptocode ok, good to know, thanks!

@clemahieu

This comment has been minimized.

Copy link
Collaborator

commented Dec 27, 2018

The byte format of uint256_union is big-endian and this is tested in uint256_union.big_endian_union_constructor. The memcmp comparison here will sort them in the same order so this change is good.

@cryptocode

This comment has been minimized.

Copy link
Collaborator

commented Dec 27, 2018

Ah yeah, that's the real reason why this is fine

@zhyatt zhyatt added this to the V18.0 milestone Dec 28, 2018

@zhyatt zhyatt added this to Unscheduled in V18 Dec 28, 2018

@rkeene rkeene merged commit 3383607 into nanocurrency:master Dec 28, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@SergiySW SergiySW moved this from Unscheduled to CP 0 in V18 Dec 28, 2018

@wezrule wezrule deleted the wezrule:improve_bootstrapping_performance branch Jan 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.