Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-thread the signature checker #1651

Merged
merged 13 commits into from Jan 29, 2019

Conversation

@wezrule
Copy link
Collaborator

commented Jan 27, 2019

This solves #1606 and provides a finished implementation of #1614 which which was created from the work of @termoose @guilhermelawless so thanks a lot!

Currently the vote and block processors add checks to the signature checker which runs on a separate thread and sets the value of a promise which the calling threads are blocked on until it is set. Having it in a separate thread doesn't really do much because the calling threads are blocked straight away so the caller might as well do the work themselves. I have removed that thread and added a thread pool which can be user assigned on the config.json file and is called if the batch size of the signatures to check are > 512. This value was heuristically chosen as the largest size seemed to be 2048 (hardcoded) and 256 size chunks were found to be optimal in terms of memory size and speed benchmarked by Colin in 57d819b which is what each thread on a thread pool works on.

A new test was added which runs over multiple threads and the promises are checked after all signature checks are added to the thread pool rather than waiting for each one to finish (which checks overloading the boost thread pool). Using this test and changing the cores I got the following results:
1 core - 5.9s
2 cores - 5.4s
4 cores - 3.4s
8 cores - 2.5s

This is by no means a benchmark test as it needs to run quickly on any number of cores. A more biased test which has more messages of a larger size greatly favours the higher amount of cores used.

I ran TSAN for a few hours on the live network and it showed no new errors.

@wezrule wezrule self-assigned this Jan 27, 2019

@wezrule wezrule added this to the V18.0 milestone Jan 27, 2019

@wezrule wezrule added the enhancement label Jan 27, 2019

Show resolved Hide resolved nano/node/node.cpp Outdated
Show resolved Hide resolved nano/node/node.cpp Outdated
@wezrule

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 28, 2019

I've changed it so that the calling thread does work verifying signatures instead of just blocking and made the new test (and an existing one) check for an incorrect signature as well.

Show resolved Hide resolved nano/node/node.cpp Outdated
@SergiySW

This comment has been minimized.

Copy link
Collaborator

commented Jan 28, 2019

I suppose we can increase hardcoded values as
verify_state_blocks (transaction, lock_a, 2048 * signature_checker_threads);
or something like that for more efficency

@wezrule

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 28, 2019

@SergiySW this was noted in the other PR but I wasn't sure if there was a specific reason 2048 was chosen so I was hesitant to change it. It does make sense though as this change would help greatly for large message sizes. I have made the change and done it for forced state blocks as well. I tested it on mac and worked ok (can process 8192 messages on 4 core cpu fine :)). It may be a problem if they choose a thread size which is a lot greater than their actual number of CPU cores (as it will try and process a lot of messages! with no gain as the threads will interleave), but then they deserve what they get tbh :).

@SergiySW

This comment has been minimized.

Copy link
Collaborator

commented Jan 28, 2019

@wezrule are there downsides for 1 core with 4 default checker threads?

@wezrule

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 28, 2019

@SergiySW yes it will be less efficient as the threads will just interleave on same core when working on large sizes. At least in theory it should! I don't have a 1 core machine to benchmark, but the ed25519_sign_open_batch function operates on a max of 64 signatures and doesn't do any blocking (that I can see), so I see no reason why doing it with multiple threads is faster than 1 on a 1 core machine. I did originally just copy what the other thread pools did but I don't think that's correct as they work a bit differently. I will change it, thanks

@wezrule

This comment has been minimized.

Copy link
Collaborator Author

commented Jan 28, 2019

@SergiySW good spot! Changed to account for this.

@zhyatt zhyatt added this to CP 3 (2018-01-27) in V18 Jan 28, 2019

@zhyatt zhyatt removed the request for review from argakiig Jan 28, 2019

@wezrule wezrule force-pushed the wezrule:multithread_signature_checker branch from 5dee9e1 to 413bc9f Jan 29, 2019

Forgot that config.signature_checker_threads is 0 based so add 1 to n…
…um threads passed into verify_state_blocks
Show resolved Hide resolved nano/node/node.cpp
@rkeene

rkeene approved these changes Jan 29, 2019

@wezrule wezrule merged commit 28a1082 into nanocurrency:master Jan 29, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@wezrule wezrule deleted the wezrule:multithread_signature_checker branch Jan 29, 2019

@zhyatt zhyatt removed the request for review from clemahieu Jan 30, 2019

@renesq

This comment has been minimized.

Copy link

commented Mar 21, 2019

The config.signature_checker_threads variable being set to 0 by default made me a bit anxious, until I found this code line
verify_state_blocks (transaction, lock_a, 2048 * (node.config.signature_checker_threads + 1));
Which means that the config documentation in the wiki apparently is not very helpful. It sais "signature_checker_threads": "1", // Number of threads to use for verifying signatures. It should probably say "additional threads"? But I didn't quite grasp the numeral concept that kicks in here. Does the setting even matter?

Edit: nodeconfig.cpp sais
signature_checker_threads ((boost::thread::hardware_concurrency () != 0) ? boost::thread::hardware_concurrency () - 1 : 0), /* The calling thread does checks as well so remove it from the number of threads used */

@clemahieu

This comment has been minimized.

Copy link
Collaborator

commented Mar 21, 2019

The intention for the thread classes we use are where: N = hardware_concurrency:
N signature verification threads at normal priority for signature verification
N work-generation threads at idle priority
N IO threads at normal priority

IO threads are supposed to be very short-lived operations and thus, we don’t need more than the number of cores or there’s an issue where long operations are happening in IO threads. This is currently the case and we want to remove all synchronous disk operations from IO threads since that’s a bottleneck.

Work generation threads are at idle priority since if the node or other processes are using priority, we don’t need to fight for it, generating work for new transactions isn’t essential to node operation.

Signature processing threads are at normal priority and one per CPU to make sure we’re using as much of the computational resources as are needed to process vote and block traffic.

@guilhermelawless

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2019

@renesq If it says "signature_checker_threads": "1" in the config wiki and that's the default, it's because it was generated from a dual core machine. When creating the config it defaults to number of cores minus one.

When verifying signatures, N-1 threads are created to do batches of [by default] 256 signatures, and the remainder goes to the calling thread (thus in total, N threads doing work).

If config.signature_checker_threads = 0 that's the default for a single core machine.

@wezrule

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 22, 2019

The vote/block processor threads currently just wait for the result from the signature checker thread pool, so instead of waiting they do work as well. In the case of a small number of verifications (less than 512) all the work is done on the calling thread, so no scheduling is necessary. So there is an "implicit" signature checker thread if you will, but I can see the confusion from a users point of view though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.