Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SYCL reduction performance: RangePolicy #6264

Merged

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Jul 6, 2023

Part of #6035.
This limits the number of workgroups, effectively processing multiple work items per thread, and increases the maximum workgroup size for Intel GPUs. Technically, sycl::info::kernel_device_specific::work_group_size should be the maximum usable value for the workgroup size but it turns out I could still choose sycl::info::device::max_work_group_size and got better performance.

Also, fix matching use_shuffle_based_algorithm(is the reference not a pointer) with ReducerType::static_value_size() (0 for array reductions).

@masterleinad masterleinad marked this pull request as ready for review July 6, 2023 18:35
@masterleinad
Copy link
Contributor Author

masterleinad commented Jul 6, 2023

Only HIP-ROCm-5.2-C++20 is timing out. Everything else is passing.

@masterleinad masterleinad force-pushed the improve_reduction_performance_sycl_1 branch from 9ac49e9 to c9573a6 Compare July 13, 2023 19:31
core/src/SYCL/Kokkos_SYCL_Parallel_Reduce.hpp Outdated Show resolved Hide resolved
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
@masterleinad
Copy link
Contributor Author

SYCL CI is passing.

@dalg24
Copy link
Member

dalg24 commented Jul 17, 2023

Unrelated failure to launch one CUDA build

@dalg24 dalg24 merged commit 933d23b into kokkos:develop Jul 17, 2023
27 of 28 checks passed
@crtrott crtrott mentioned this pull request Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants