-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement subgroup reduction for SYCL RangePolicy parallel_reduce #3940
Conversation
d7d2603
to
3aa7e39
Compare
7b26e90
to
4f726b3
Compare
4f7a84e
to
154eff5
Compare
With
on a V100 so we are faster than the native implementation (at least for this simple test case). |
1ea2154
to
87284d4
Compare
I am happy with the current status and am looking for reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the failing test, would it be practical to lift the reduction code for a group and subgroup out of here into its own file and call it from these places. It looks like its the exact same code for the three usecases (minus the copy to global for the team internal reduction). Otherwise this looks pretty clean, and I didn't see a mistake which would explain the failing tests.
e7a2026
to
53b2884
Compare
1b9c030
to
149361d
Compare
Retest this please. |
0f64a3c
to
56fc59d
Compare
@crtrott I fixed the failing tests and factored common code out. |
I'm going to assume that we will merge this more or less as is so I can base other improvements on top of this. |
Using subgroups for the reductions should improve the performance for
parallel_reduce
usingSYCL
.