-
Notifications
You must be signed in to change notification settings - Fork 807
[DRAFT][NATIVECPU] using tbb::parallel_for when oneTBB is enabled #20064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
[DRAFT][NATIVECPU] using tbb::parallel_for when oneTBB is enabled #20064
Conversation
…queue_async_ops_eventswait_onetbb_merge
…queue_async_ops_eventswait
…_ops_eventswait' into uwe/fasternativecpuenqueue_async_ops_eventswait_onetbb_merge
…queue_async_ops_eventswait
…_ops_eventswait' into uwe/fasternativecpuenqueue_async_ops_eventswait_onetbb_merge
…queue_async_ops_eventswait_onetbb_merge
…queue_async_ops_eventswait_onetbb_merge_parallelfor_exp
auto thread_id = getTBBThreadID(); | ||
task(thread_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto thread_id = getTBBThreadID(); | |
task(thread_id); | |
task(getTBBThreadID()); |
The variable doesn't seem like it's still needed. Likewise in the other file.
using tbb_nd_executor = nativecpu_tbb_executor; | ||
|
||
template <template <class> class RangeTpl, class... T> | ||
static inline void invoke_tbb_parallel_for(const tbb_nd_executor &tbb_ex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer to remove this function and call tbb::parallel_for
directly in the other invoke_tbb_parallel_for
overload.
IndexT groupsPerThread; | ||
size_t dim = 0; | ||
for (size_t t = 0; t < 3; t++) | ||
groupsPerThread[t] = numWG[t] / numParallelThreads; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusing, groupsPerThread
is an array but after initialization, only a single element of that array is used.
More to the point, in #19550, for non-oneTBB, I simplified the splitting across threads to be done over the linear range, rather than over any specific dimension, and that would probably be better with oneTBB as well.
Using the
tbb::parallel_for
api to enqueue NativeCPU kernel invocations when oneTBB is enabled