-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock between std::thread, TBB TaskGroup, and OpenMP #353
Comments
Thank you for the callstack. It seems we are deadlocked under loader lock. Do you use OpenMP from MSVC? If possible, can you load symbols for |
Ahh got it! Sorry, I should have figured out how to get that before. Updated above. I don't think I have source from OpenMP to look at exactly what it's doing inside PartialBarrier1::Block() though. (As far as I know this is from MSVC.) |
The OpenMP sources are not required. I just wanted to be sure that it is a barrier. My initial supposition on SO about oversubscription is incorrect and I think it is a loader lock issue.
Therefore, thread ID is not reused, it is really the same thread. From TBB perspective, it is not a good practice to run the tasks and do not wait them before the thread completion:
In oneTBB, we resolved the first issue but we still have the second issue. So, you can either try oneTBB or use enqueue tasks (it is not really convenient with task_group because of possible race) , e.g., wrap
We need |
I see! I was unaware that the task is "tied" to the current thread, and the authors of this code probably were too. So that's why the Sleep() on the IO thread fixes the deadlock - the task is always completed in that case before the IO thread exits and cleans up while holding the loader lock. I don't entirely understand your second bullet ("In rear cases") but are you proposing I should fix this by using the arena, or that I should fix this by having the IO thread wait for the task to complete? I think the latter. I just tried it by adding a taskGroup.wait() on the IO thread before it exits, and that removes the deadlock. Hooray! Thank you for the help. |
You are welcome!
I did not think that it was possible (e.g. it is a driver thread for callbacks) but it is also a valid solutioin. |
I've added an edit above showing where I needed to put the taskGroup.wait() to get rid of the deadlock, for future reference. |
UPDATE: Adding a code-comment with the solution (where to do taskGroup.wait()) for future reference. See comments below for explanation.
Moving from https://stackoverflow.com/questions/66346225/ to here.
I'm working in some preexisting code that uses a number of multi-threading techniques; std::thread, plus Intel TBB TaskGroup, plus OpenMP. 🙄 It looks like I've hit a race condition involving thread reuse that causes both thread::join and OpenMP to fail to return.
The scenario is that the main thread kicks off a bunch of I/O worker std::threads, which themselves initiate some tasks, and the tasks have some segments of code that use OpenMP for parallelism. The main thread does std::thread::join() to wait for the std::threads, then tbb::TaskGroup::wait() to wait for the tasks to complete.
And the CPU-Intensive work includes usage of OpenMP.
This code hangs; the main thread waits on thread->join() forever. Even on a test case that has only a single IO job / CPU-intensive task. I added the printf's you see above, and the result showed that the IO job finished fast, that thread exited, and then the CPU-intensive task spun up with the same thread ID before the main thread even got into the join() call.
The problem reproduces whenever an IO thread ID is reused for a TBB task after the IO thread finishes. In my code here, that's every time; in my actual more complicated project, it doesn't happen 100% of the time. The thread->join() call is always still sitting there waiting, even though the IO work is all done and it should be able to move on to wait for the tasks. When I looked in the debugger for the case listed above, thread->join() was indeed still waiting on thread ID 24452, and a thread with that ID did exist, but that thread was completely done executing. Most of the time the main thread ends up waiting on the same thread that gets reused, but apparently that isn't a requirement to reproduce the problem. There just has to be reuse of any thread.
This seems like a bug: thread::join() should return even if TBB reuses a thread, because the work the std::thread was created with has completed. Somehow TBB is breaking the semantics of thread::join
Second funny thing, the task on the reused thread never completes. When I look at the thread that's executing the task in the debugger, it's sitting at the end of the OpenMP parallel execution. I don't see any other threads executing parallel work. There are a number of threads from vcomp140[d].dll sitting around in ntdll.dll code, for which I don't have symbols - I presume these are just waiting for new work, not doing my task. The CPU is idle. I'm pretty confident nobody is looping. So, the TBB task is hung somewhere in the OpenMP multi-threading implementation.
This seems like another bug: OpenMP breaks when used on a TBB task, although it only happens if the thread happens to be a reused std::thread.
Here's the callstack of the hung task:
So, somewhere between std::thread and TBB tasks and OpenMP parallelism there's a race condition triggered by thread reuse.
I have found two workarounds that make the hang go away:
I'm reporting this more as a possible bug report than as a cry for help. I've worked around it for now by removing OpenMP. The above is my full code, though you'll need to link with TBB to build.
This is a copy of TBB obtained within the last month via VCPKG, built for Windows via the VCPKG commands.
The text was updated successfully, but these errors were encountered: