Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement subgroup reduction for SYCL RangePolicy parallel_reduce #3940

Merged
merged 12 commits into from
Jun 10, 2021

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Apr 14, 2021

Using subgroups for the reductions should improve the performance for parallel_reduce using SYCL.

@masterleinad
Copy link
Contributor Author

With 154eff5 (#3940) I'm seeing

n = 128
AXPBY KK: 4.422993e-02 s 6.468522e-02 GB/s
AXPBY SYCL: 2.562794e-02 s 1.116369e-01 GB/s
DOT KK: 6.557975e-02 s 2.908442e-02 GB/s
DOT SYCL: 8.905854e-02 s 2.141680e-02 GB/s
n = 1024
AXPBY KK: 4.248038e-02 s 5.387943e-01 GB/s
AXPBY SYCL: 2.707055e-02 s 8.455012e-01 GB/s
DOT KK: 9.947761e-02 s 1.533892e-01 GB/s
DOT SYCL: 1.804862e-01 s 8.454271e-02 GB/s
n = 8192
AXPBY KK: 5.261190e-02 s 3.480305e+00 GB/s
AXPBY SYCL: 2.776235e-02 s 6.595460e+00 GB/s
DOT KK: 1.428609e-01 s 8.544697e-01 GB/s
DOT SYCL: 2.057009e-01 s 5.934358e-01 GB/s
n = 65536
AXPBY KK: 1.451227e-01 s 1.009383e+01 GB/s
AXPBY SYCL: 2.890844e-02 s 5.067184e+01 GB/s
DOT KK: 2.128002e-01 s 4.589105e+00 GB/s
DOT SYCL: 2.522376e-01 s 3.871598e+00 GB/s
n = 524288
AXPBY KK: 1.962914e-01 s 5.970078e+01 GB/s
AXPBY SYCL: 5.099575e-02 s 2.297986e+02 GB/s
DOT KK: 2.496850e-01 s 3.128943e+01 GB/s
DOT SYCL: 4.468903e-01 s 1.748192e+01 GB/s
n = 4194304
AXPBY KK: 2.972564e-01 s 3.153843e+02 GB/s
AXPBY SYCL: 1.551411e-01 s 6.042888e+02 GB/s
DOT KK: 3.195238e-01 s 1.956036e+02 GB/s
DOT SYCL: 5.425608e-01 s 1.151945e+02 GB/s
n = 33554432
AXPBY KK: 1.244910e+00 s 6.024532e+02 GB/s
AXPBY SYCL: 1.025914e+00 s 7.310552e+02 GB/s
DOT KK: 9.931421e-01 s 5.034526e+02 GB/s
DOT SYCL: 1.252469e+00 s 3.992115e+02 GB/s
n = 268435456
AXPBY KK: 8.666444e+00 s 6.923255e+02 GB/s
AXPBY SYCL: 8.056914e+00 s 7.447020e+02 GB/s
DOT KK: 5.673008e+00 s 7.050933e+02 GB/s
DOT SYCL: 6.063481e+00 s 6.596871e+02 GB/s

on a V100 so we are faster than the native implementation (at least for this simple test case).

@masterleinad masterleinad removed the [WIP] label Jun 3, 2021
@masterleinad masterleinad changed the title [WIP] Implement subgroup reduction for SYCL RangePolicy parallel_reduce Implement subgroup reduction for SYCL RangePolicy parallel_reduce Jun 3, 2021
@masterleinad masterleinad requested a review from nliber June 3, 2021 22:03
@masterleinad
Copy link
Contributor Author

I am happy with the current status and am looking for reviews.

crtrott
crtrott previously requested changes Jun 4, 2021
Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the failing test, would it be practical to lift the reduction code for a group and subgroup out of here into its own file and call it from these places. It looks like its the exact same code for the three usecases (minus the copy to global for the team internal reduction). Otherwise this looks pretty clean, and I didn't see a mistake which would explain the failing tests.

@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad
Copy link
Contributor Author

@crtrott I fixed the failing tests and factored common code out.

@masterleinad masterleinad removed the [WIP] label Jun 8, 2021
@masterleinad
Copy link
Contributor Author

I'm going to assume that we will merge this more or less as is so I can base other improvements on top of this.

@dalg24 dalg24 dismissed crtrott’s stale review June 10, 2021 20:49

Change rested has been addressed

@dalg24 dalg24 merged commit 1041f12 into kokkos:develop Jun 10, 2021
@masterleinad masterleinad deleted the sycl_sungroup_reduction branch December 2, 2021 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants