-
Notifications
You must be signed in to change notification settings - Fork 807
[NativeCPU] Simplify enqueue. #19550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We were creating excessive numbers of threads. When we know we want a given amount of threads, just divide the number of workgroups by the number of threads and have each thread process that many workgroups.
Testing on Pointnet showed that the nesting we originally had is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this PR also removes the unsafe enqueuing optimization for nd_range kernels. If so it's probably worth also mentioning this in the description.
Done, thanks. |
@intel/llvm-gatekeepers This can be merged, thanks. |
We were creating excessive numbers of threads. When we know we want a given amount of threads, just divide the number of workgroups by the number of threads and have each thread process that many workgroups.
This implementation also means we no longer need to resize workgroups, which was not generally safe.