Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding thread_group for managing async tasks #1291

Merged
merged 1 commit into from
Nov 8, 2016

Conversation

vtnerd
Copy link
Contributor

@vtnerd vtnerd commented Nov 2, 2016

Looking for general feedback (tests need to be written, etc., before merging)...

This was from a discussion in a prior pull request. This does not use asio::io_service because it is a bit clunky to wait for all threads to complete io::service::run() (to guarantee that all handlers have completed), then restart the io::service::run() to re-use the threads - busy spin loops or condition variables would be needed which makes the implementation look similar to this. I added the thread re-use "feature" because one of the functions in rctSigs.cpp created two different thread_group objects, mainly so that the dispatched functions were guaranteed to complete before moving to the next section. Arguably, std::future should be used instead - retrieving values would be sequenced specifically with the associated tasks, which would require some additional allocations for the shared_state inside of std::future (and thus be some amount slower). Or maybe the thread_group should be a "one shot" like it is currently (i.e. cannot re-use threads after synchronization). So the obvious choices are: (1) no change, (2) this, (3) this style with asio::io_service and thread joining after each group of tasks, or (4) this style with asio::io_service and std::future for blocking after each group of tasks.

This can be used by blockchain.cpp and wallet2.cpp which requires similar changes. Also:

  • No valgrind errors or leaks
  • The class automatically ensures that threads join on destruction
  • Its easy to block until all functions complete without introducing a bug
  • The current thread can block until all work has completed without joining the thread, allowing for thread-reuse (see rctSigs.cpp)
  • The execution times of the tests are really close before and after the patch on my VM.

Disadvantages:

  • Using asio::io_service would allow for running I/O events on the same threads (I didn't see any obvious place where a refactor would allow this).
  • asio::io_service would have less hand-rolled stuff

@ghost
Copy link

ghost commented Nov 3, 2016

Hi @vtnerd, do you think this would have positive changes on the speed of shutdown as well for nodes with low CPU resources?

Please see #1176

@moneromooo-monero
Copy link
Collaborator

I'm afraid I can't review that, I don't know enough about C++.

@fluffypony
Copy link
Contributor

lol @moneromooo-monero:) This looks like a job for... @hyc !

@hyc
Copy link
Collaborator

hyc commented Nov 3, 2016

I've done 1 quick pass thru this commit and the previous PR. My general impression is this looks like a good way to go. In general I would prefer that we have 1 persistent thread pool instead of spawning and killing threads everywhere we use them. Will look in more detail later.

@vtnerd
Copy link
Contributor Author

vtnerd commented Nov 3, 2016

@NanoAkron It sounds like this would alleviate that problem, but I would have to investigate it some more. A stop function could be added which would abort processing on each thread after it completed the current dispatched function, but I think a throw thread_pool::stop_error() should be added in the sync function. Otherwise the call could return without all dispatched functions completing.

@hyc I agree completely. Allocating threads inside of a function is not ideal. Ideally there would be one group of threads handling background tasks and I/O. I've come up with a slightly different design that uses asio::io_service internally, while maintaining the ability to block on a group of tasks (thread_pool::task_group). I'm thinking initially its just a one-to-one replacement inside of the functions, and then another PR to see how much work needs to be done to create the thread_pool once in the core somewhere. Ideally the p2p code used the same thread_pool/io_service too, because I don't see an advantage to having more io_service::run calls than available CPU cores. Updating the usage of anything in the epee namespace is going to take some work though.

Edit: Some botched grammar in first portion.

@moneromooo-monero
Copy link
Collaborator

It seems to make sense.
It could also be used in wallet2 and blockchain later, which is where I got this code :)

@moneromooo-monero
Copy link
Collaborator

NanoAkron: the delay in stopping is when syncing, and isn't due to that, but to the block loop not breaking when a shutdown is requested (it doesn't have access to the class which holds that flag, though a pointer to the flag (or some callback checking it) could be passed in ctor IIRC).

Copy link
Contributor

@fluffypony fluffypony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

@fluffypony fluffypony merged commit 64094e5 into monero-project:master Nov 8, 2016
fluffypony added a commit that referenced this pull request Nov 8, 2016
64094e5 adding thread_group for managing async tasks (Lee Clagett)
@moneromooo-monero
Copy link
Collaborator

[ 19%] Building CXX object src/common/CMakeFiles/obj_common.dir/thread_group.cpp.o
/home/user/src/bitmonero/src/common/thread_group.cpp: In member function ‘void tools::thread_group::data::dispatch(std::function<void()>)’:
/home/user/src/bitmonero/src/common/thread_group.cpp:154:13: error: ‘struct tools::thread_group::data::node’ has no member named ‘next’
assert(last->next == nullptr);
^
src/common/CMakeFiles/obj_common.dir/build.make:206: recipe for target 'src/common/CMakeFiles/obj_common.dir/thread_group.cpp.o' failed
make[3]: *** [src/common/CMakeFiles/obj_common.dir/thread_group.cpp.o] Error 1

@moneromooo-monero
Copy link
Collaborator

moneromooo-monero commented Nov 8, 2016

Possibly meant this, but I wouldn't bet my life on it :)

diff --git a/src/common/thread_group.cpp b/src/common/thread_group.cpp
index ece268a..aa1b64f 100644
--- a/src/common/thread_group.cpp
+++ b/src/common/thread_group.cpp
@@ -151,7 +151,7 @@ void thread_group::data::dispatch(std::function<void()> f) {
{
const std::unique_lockstd::mutex lock(mutex);
assert(last != nullptr);
- assert(last->next == nullptr);
+ assert(last->ptr == nullptr);
if (pending == std::numeric_limitsstd::size_t::max()) {
throw std::overflow_error("thread_group exceeded max queue depth");
}

@vtnerd
Copy link
Contributor Author

vtnerd commented Nov 9, 2016

Correct, that is supposed to be assert(last->ptr ==nullptr). And I apologize for the slowness, but I was going to show a variation where boost::asio::io_service was used instead. Perhaps this is for the best right now, its lots of refactoring work before I/O and background tasks in the same event loop is even possible.

moneromooo-monero added a commit to moneromooo-monero/bitmonero that referenced this pull request Nov 9, 2016
@moneromooo-monero
Copy link
Collaborator

Thanks for the sanity check.
The patch already a lot of boilerplate as it is, so that's a good win already :)

@hyc
Copy link
Collaborator

hyc commented Nov 17, 2016

Sorry guys, I should have caught this sooner - this patch uses std::mutex and std::unique_lock, which are unusable on Windows. We have to use boost:: for everything.

@hyc
Copy link
Collaborator

hyc commented Nov 17, 2016

The attached lets it compile under windows. Works OK on Win32.

dif.txt

@vtnerd
Copy link
Contributor Author

vtnerd commented Nov 17, 2016

And mingw continues to frustrate me ... I suppose compiling with a closed source program (MSVC) is undesirable?

I have some tests sitting around that I should've pushed a while ago, looks like I will add this change to that request. What happens to the builds that fired for this? I thought they passed, but obviously they couldn't have.

@hyc
Copy link
Collaborator

hyc commented Nov 17, 2016

Fyi, this is the beginning of the compile errors I hit - there was quite a long stream of errors after these:

[ 92%] Building CXX object src/common/CMakeFiles/obj_common.dir/thread_group.cpp.obj
In file included from C:/msys64/home/hyc/bitmonero/src/common/thread_group.cpp:28:0:
C:/msys64/home/hyc/bitmonero/src/common/thread_group.h:119:17: error: 'thread' is not a member of 'std'
     std::vector<std::thread> threads;
                 ^
C:/msys64/home/hyc/bitmonero/src/common/thread_group.h:119:17: error: 'thread' is not a member of 'std'
C:/msys64/home/hyc/bitmonero/src/common/thread_group.h:119:28: error: template argument 1 is invalid
     std::vector<std::thread> threads;
                            ^
C:/msys64/home/hyc/bitmonero/src/common/thread_group.h:119:28: error: template argument 2 is invalid

@vtnerd
Copy link
Contributor Author

vtnerd commented Nov 17, 2016

I just replaced std::thread, std::mutex and std::condition_variable which are all unsupported by mingw.

@anonimal
Copy link
Contributor

??? kovri uses std::thread, std::mutex, std::unique_lock, std::condition_variable nearly everywhere and builds/runs fine with mingw (kovri aims for c++11/14 compliance though).

Did I miss something (within the conversation or on IRC)?

@vtnerd
Copy link
Contributor Author

vtnerd commented Nov 17, 2016

This topic is definitely confusing.

There is MinGW, and MinGW-w64. Yes a fork with a confusing name! The latter can provide std::thread, etc., while the former has not had an official release in 3 years. And according to a page on the Qt-wiki the std::thread mode of MinGW-w64 uses a posix wrapper which is slower (but no details are provided). A wrapper is needed because the C++ implementation which ships with MinGW (libstdc++) is written specifically for pthreads (POSIX) / gthreads (GNU). Or so I think, its difficult to get information about this.

I looked through the source for boost::thread, and I think it uses native windows threads instead of a posix wrapper. Guidance here? There is a 64-bit Windows build which means MinGW-w64 is already being used for official binaries, but boost::thread is easier (and unconfirmed faster) because it works without hassle on both mingw variants.

@hyc
Copy link
Collaborator

hyc commented Nov 18, 2016

Mingw provides support for these things, but at least in gcc 4.x it's buggy on win32. We deliberately do not use these in the Monero code base because of that.

@hyc
Copy link
Collaborator

hyc commented Nov 18, 2016

@anonimal
Copy link
Contributor

@vtnerd JFTR, as you can tell by my carelessness for grammar and pronouns, and in an effort to save time typing (of which I've now doubled-down over by responding), I say mingw as a catch-all for mingw-w64 (I know the differences 😄). As for the wrapper, that's good to know.

@hyc how applicable are the 5.3.0 optimization bugs referenced in that link (8 months later)?

@hyc
Copy link
Collaborator

hyc commented Nov 21, 2016

I really don't know. Have not moved any of my builds off gcc 4.9.

@vtnerd
Copy link
Contributor Author

vtnerd commented Nov 21, 2016

This pull request switches everything to boost::thread, while simultaneously adding a fork-join implementation for groups of tasks.

@vtnerd vtnerd deleted the improvement/thread_group branch November 26, 2016 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants