-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immediately scheduling task on a different thread #1354
Comments
Hi @yonik, I'm not sure I 100% understood your example but as far I can see you want to run a subset of tasks from the thread that is not going to participate in this work. |
Yeah, I think it really comes down to the time for default work-stealing to occur. The fact that I launched multiple tasks in my example code isn't actually very relevant. I'm looking for a way to tell TBB "I just created a task, but this thread won't be executing it, so please move/steal this task ASAP". It's almost like work-requesting rather than work-stealing.
I already tried a bunch of variants I didn't include here. flow graphs, task groups, and arenas all showed about the same amount of latency for work to be stolen and started on a different thread. |
I believe the latency in this scenario will be directly depended on the fact: is internal arena already populated with threads or not. |
Yes, that looks to be the case.
Results of task_group.run (I also started recording what thread did the processing):
Results using arena.enqueue: (it does look faster than task_group now)
From an implementation POV, what makes arena.enqueue() that much faster than task_group.run()? |
|
Is there anything else I can help you with or we can close this issue? |
I've been experimenting a little more and noticed some interesting things about task_groups. Once tasks in a task_group exist, they are very quick to start running (presumably because of the local queue you mentioned). But some combination of the task_group creation and the first task submit is slower (not horrible, but around 10us). So the question is: assuming one has a root task launched via task_arena.enqueue() and that task runs multiple tasks via a task_group, is there a best practice or way to lower the task_group creation time? Assuming the same arena will always be used, would caching task_group objects speed things up? I'll attach some test code once I clean it up. |
Test code:
Timings for using arena.enqueue for sub-tasks:
Timing for using task_group.run() for sub-tasks:
Nested task groups:
|
OK, things are now looking great! I changed the warm up code to use a barrier to ensure that all of the threads are running, and lowered the concurrency to the number of physical cores. Then I noticed that the longer task_group time seems to only be for an arena thread that hasn't had a task_group used on it before. Once a thread was re-used, it gets fast. It seems to take less than 200ns to create a task_grpoup, add 4 sub-tasks, and have one of the sub-tasks run (including book-keeping overhead.) Here is some sample output. Note that the latency of the root task is high, but that's fine since we launched a bunch and only have so many physical cores. The latency of the sub-tasks, once the root task is started, settles in to fast numbers. These seem like great results now, so closing this issue. Thank you for all the help!
|
I need to launch tasks from a thread that won't be participating in executing them (it's participating in a different event loop and will be blocking). It seems to take on the order of ~50usec for TBB to run the task on a different thread. Is there a way to get this work scheduled faster?
Below is some test code and results.
Results from calling task_group.run() from a non-arena thread:
Results from calling task_group.run() from within a task:
The test code:
The text was updated successfully, but these errors were encountered: