Thread pool for checking verifications #1614

termoose · 2019-01-22T11:16:07Z

Added a thread_pool class for submitting work into one of the hardware_concurrency number of threads (default value). Class is somewhat general and can be used for other threading use-cases.
Added a test case for doing 100k verifications multi-threaded.
Batching 256 verifications in one thread, hard cut-off of 1000 for running the single threaded version.

cryptocode · 2019-01-22T13:24:28Z

Could boost::asio::thread_pool work here? Since we're already using boost, it's good to use that one if possible.

termoose · 2019-01-22T13:38:58Z

I actually didn't know there was a thread pool in boost, always heard that they wouldn't add one. Guess they did! Although it looks pretty basic, there's no support for futures so I'd have to roll my own way of getting results from the async operations in the thread pool (the true/false from all those async std::all_of).

I'll see if I can get away with a smaller code footprint by using it.

wezrule · 2019-01-25T10:49:47Z

I will take a look at it today @termoose as it is something of high priority and is needed asap, thanks!

termoose · 2019-01-25T11:09:07Z

Thanks, seems like it works as well as the custom thread pool. I changed the cut-off to 10k based on the feedback from @guilhermelawless. I was considering adding this to node_config as a setting, but decided not to since it was a bit awkward getting a hold of this object from nano::signature_checker.

nano/node/node.cpp

guilhermelawless · 2019-01-25T12:06:43Z

Getting the same results with the new thread pool as well. Also a good thing is that results don't change much with higher batch sizes, so that's one less thing to parametrize.

Not sure I agree with the cut-off change. Setting to 512 (2x batch_size) already gets me a 1.7x speedup, so why not enjoy that?

wezrule · 2019-01-25T13:20:56Z

Have you checked on live whether it goes into your verify_threaded function? For me the max signature size I got in 1 chunk was 2048. 256 was common, but so are very low numbers (many are 1). I think a different approach may be needed to get good results on the live network. I think our benchmark tests should use the nano::signature_checker::add () function many times with different sizes to reach a total of something like 100k and then compare how long that takes to complete vs the master branch. I'm going to start working on this, feel free to do the same.

guilhermelawless · 2019-01-25T19:49:07Z

@wezrule @termoose
Did not catch this in boost documentation (didn't go through the .cpp, only headers), but this sanity check is failing:

TEST (sanity, thread_pool)
{
	boost::asio::thread_pool thread_pool (4);
	std::vector<bool> v(4, false);
	for (int i=0; i<=1; ++i)
		boost::asio::post (thread_pool, [i, &v] { v[i] = true; });

	thread_pool.join ();

	for (int i=2; i<=3; ++i)
		boost::asio::post (thread_pool, [i, &v] { v[i] = true; });

	thread_pool.join ();

	ASSERT_EQ (4, v.size ());
	ASSERT_EQ (true, v[0]);
	ASSERT_EQ (true, v[1]);
	ASSERT_EQ (true, v[2]);
	ASSERT_EQ (true, v[3]);
}

This test fails:

Value of: v[2]
  Actual: 0
Expected: true

It seems to me that once a thread_pool has been joined, it cannot be reused. But to solve we only need to instance it inside verify_threaded, not class wide.

guilhermelawless · 2019-01-25T20:14:34Z

See guilhermelawless#1 for a more realistic case with the following adds (repeated until 100k): 1, 256, 257, 1, 1, 512, 1, 2048, 1024, 1, 64, 1, 1, 1, 1, 1, 1, 1, 1. The relevant commit is guilhermelawless@9676296

It gets me a 2x speedup in total. Also fixed the thread_pool re-use, and the number of batches sometimes being 1 extra due to overflow = 0. Cut-off lowered to 512.

@termoose feel free to take code from there and modify this PR, i won't be PRing myself.

termoose · 2019-01-25T23:27:25Z

Good catch on not being able to reuse the thread pool! I assumed it would be reused as long as stop() was not called prior to join() based on this part of the documentation, but I guess I misread it. It's not very clear without looking at the code.

If stop() is not called prior to join(), the join() call will wait
until the pool has no more outstanding work.

Won't there be an overhead of initialising the thread pool every time? I could try to implement something with a std::promise being set_value'd inside of each work function, and then get()ing all the corresponding std::futures later. I experimented a bit with this but couldn't make it work correctly, and it might not be needed if the overhead of initialising the thread pool is not really there. I cherry-picked your changes though, thanks!

@wezrule I agree it would be cool to accumulate verifications in add() but testing it and implementing that sounds like a new issue as it's bit out of scope from this one. If this threading change is also needed asap you could get this out and we can work on improving that add() later. I started something that I might be able to extend a week back termoose#1 where each verify() is able to run in it's own signature_checker_thread and doesn't have to run sequentially.

nano/core_test/signing.cpp

termoose · 2019-01-26T06:41:27Z

Made it so that the thread pool does not have to be instantiated inside verify_threaded every time, rather we're get()ing the futures as explained above. Not sure why that syntax checker is failing though, it looks correct to me on that line.

guilhermelawless · 2019-01-26T07:02:24Z

About 3% improvement over instantiating every time 👍 Ran test 50 times and no failures.

wezrule · 2019-01-26T21:45:40Z

Hi @termoose @guilhermelawless thanks for your contributions so far! Now I have a few comments:

There was mention of using std::vector for concurrent modification in different threads on different elements. Normally this would be ok, however std::vector is an anomaly, this is because std::vector has a special specialization in which some implementations squeeze bool values into bits.

Notwithstanding (17.6.5.9), implementations are required to avoid data races when the contents of the contained object in different elements in the same container, excepting vector, are modified concurrently.

validate_message_batch () had a bug where it doesn't check the correct range after the first batch size.
I wasn't entirely sure why we had another thread for signature_checking because the calling thread just blocks until the future was finished, so I've got rid of it now.
If threads is set to 1 I just use the calling thread to do the work.
I have added the config option so that the user can specify the number of cores (we do similar things with work, io and network threads)
The newly added test is now run over multiple threads and promises are checked after all signature checks are added to the thread pool rather than waiting for each one to finish. Using this test and changing the cores I got the following results:
1 core - 5.9s
2 cores - 5.4s
4 cores - 3.4s
8 cores - 2.5s

When more messages of a greater size are used it greatly favours the higher cores, however we want tests to run fairly quickly irrespective of the number of cores so this test is not meant to be representative of a benchmark, it just checks functionality is working correctly for a wide range of sizes.

There were some other things, but I'm far too sleepy to remember :)

Can you guys have a look here and give any comments: https://github.com/wezrule/raiblocks/tree/multithread_signature_checker @termoose feel free to modify your PR with any of these changes, otherwise I will mine as a PR some point tomorrow.

Indentation and syntax fix

then we're wait for the corresponding futures to fold the result

termoose · 2019-01-27T01:44:40Z

Aware of this specialisation, we don't use std::vector<bool> anywhere but instead have futures and promises in our vectors.
Thanks for finding that off by 1 on batch sizes!
Modified my PR to include the ability to set number of threads from the config

There are a number of changes here though that I find confusing but I don't have time to figure out all the details until you merge it in tomorrow.

You use a shared_ptr to the vector as well as on your Task which causes a heap allocation every time verify_threaded is called. Is this an overhead that's needed?
You have the tasks_remaining and task->pending counters which looks like spin-your-own std::futures for checking when the threads are finished. Myself would prefer a set of std::promise<bool>s that communicate these results as return values from future.get().
Casting to and from char, is this to keep the std::vector thread safe? This overhead would also not be needed with our original std::vector<std::future<bool>>. It also makes things look strange, for example when initialising a vector of chars to false.
If pending_mutex is only used for the counter then it could be made std::atomic_int and you could rather do if(--task->pending == 0) where it decrements and loads atomically. But again I don't see these counters are needed if threading primitives are being used.

Waiting for each signature check to finish before posting new ones should not incur any great overhead as the cores are all fully exhausted anyway. Would be interesting to see some benchmarks on the different implementations if someone has time to run them.

wezrule · 2019-01-27T10:24:52Z

std::vector was mentioned in an earlier version, std::vector<bool> results (batches + 1, false); so I was just informing you guys in case you were unaware.

I've changed how it works quite a bit which is why you may be finding it confusing. I got rid of the queue (there's no add function anymore), the tasks are posted directly to the thread pool. I don't want to block the calling thread so the task is shared so that it remains in scope for the thread pool after the function finishes. Although the calling thread currently just waits on a future, it could be proactive and respond after each thread in the thread pool is finished, but I am leaving this for a later task.

The std::vector was for thread safety yes, there is no overhead in converting char -> bool. I don't know about benchmarks between this and using a vector of promises, but I no longer use this anymore now, and just assert the result after each thread.

Good point about the if (--task->pending == 0) thanks! I am now using std::atomic instead of a mutex.

I am going to run TSAN on these changes on the live network for a while to see if it picks up anything.

Ok we appreciate all the time and effort you have put into this and look forward to any future contributions. I will create a PR after some more testing. Thanks again.

guilhermelawless · 2019-01-27T10:48:33Z

It looks good, add was confusing me as well since there weren't any background tasks happening, every verify was blocking.

All I would add is a release_assert (pending == 0); on a Task destructor.

wezrule · 2019-01-27T14:47:47Z

@guilhermelawless done. I will close this off and we can use: #1651

termoose mentioned this pull request Jan 22, 2019

Multithread signature_checker #1606

Closed

wezrule added the functionality quality improvements This item indicates the need for or supplies a better way to implement existing functionality label Jan 22, 2019

termoose force-pushed the pooled-verify branch from b4ac0e3 to 59bdcab Compare January 22, 2019 13:23

guilhermelawless reviewed Jan 25, 2019

View reviewed changes

nano/node/node.cpp Outdated Show resolved Hide resolved

guilhermelawless reviewed Jan 26, 2019

View reviewed changes

nano/core_test/signing.cpp Outdated Show resolved Hide resolved

termoose and others added 11 commits January 27, 2019 11:10

Submit verification work to our thread pool

2c503a7

Indentation and syntax fix

Removed custom thread_pool, replaced with boost::asio::thread_pool

604a880

Fixed syntax, lowered cut-off and removed the need for mutex

3939eb4

Fix thread pool re-use, number of batches, and lower cut-off

e5cfb16

Indentation

a804229

Removed failing test-case, trying to make travis CI happy

b985d75

Only create thread_pool once by letting it's worker set promises and

929653c

then we're wait for the corresponding futures to fold the result

Fixed syntax

b469c65

Added a constructor for specifying the nr of threads

9d0f9fe

Fixed off by 1 error on batch size

930c8e7

Can now get signature_checker_threads from config file

1838bfa

termoose force-pushed the pooled-verify branch from 770760a to 1838bfa Compare January 27, 2019 00:23

Syntax fixed

6be3e3a

wezrule mentioned this pull request Jan 27, 2019

Multi-thread the signature checker #1651

Merged

wezrule closed this Jan 27, 2019

wezrule assigned termoose Jan 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread pool for checking verifications #1614

Thread pool for checking verifications #1614

termoose commented Jan 22, 2019

cryptocode commented Jan 22, 2019

termoose commented Jan 22, 2019

wezrule commented Jan 25, 2019 •

edited

Loading

termoose commented Jan 25, 2019

guilhermelawless commented Jan 25, 2019

wezrule commented Jan 25, 2019 •

edited

Loading

guilhermelawless commented Jan 25, 2019

guilhermelawless commented Jan 25, 2019 •

edited

Loading

termoose commented Jan 25, 2019

termoose commented Jan 26, 2019

guilhermelawless commented Jan 26, 2019 •

edited

Loading

wezrule commented Jan 26, 2019

termoose commented Jan 27, 2019

wezrule commented Jan 27, 2019

guilhermelawless commented Jan 27, 2019 •

edited

Loading

wezrule commented Jan 27, 2019 •

edited

Loading

Thread pool for checking verifications #1614

Thread pool for checking verifications #1614

Conversation

termoose commented Jan 22, 2019

cryptocode commented Jan 22, 2019

termoose commented Jan 22, 2019

wezrule commented Jan 25, 2019 • edited Loading

termoose commented Jan 25, 2019

guilhermelawless commented Jan 25, 2019

wezrule commented Jan 25, 2019 • edited Loading

guilhermelawless commented Jan 25, 2019

guilhermelawless commented Jan 25, 2019 • edited Loading

termoose commented Jan 25, 2019

termoose commented Jan 26, 2019

guilhermelawless commented Jan 26, 2019 • edited Loading

wezrule commented Jan 26, 2019

termoose commented Jan 27, 2019

wezrule commented Jan 27, 2019

guilhermelawless commented Jan 27, 2019 • edited Loading

wezrule commented Jan 27, 2019 • edited Loading

wezrule commented Jan 25, 2019 •

edited

Loading

wezrule commented Jan 25, 2019 •

edited

Loading

guilhermelawless commented Jan 25, 2019 •

edited

Loading

guilhermelawless commented Jan 26, 2019 •

edited

Loading

guilhermelawless commented Jan 27, 2019 •

edited

Loading

wezrule commented Jan 27, 2019 •

edited

Loading