SYCL: Use host-pinned memory to copy reduction/scan result #6500

masterleinad · 2023-10-11T15:10:50Z

For dot-type kernels, I see an improvement between 10% and 50% in runtime when using host-pinned memory with memcpy vs. device memory with sycl::queue::memcpy on Intel GPUs. This is analogous to what we are doing for Cuda and HIP.

Noticed while looking at #5334.

masterleinad · 2023-10-17T14:06:18Z

CUDA-12.2-NVHPC and OPENACC-NVHPC-CUDA-12.2 were timing out, OPENMPTARGET-Clang failed openmptarget.partitioning_by_vector which we see regularly. Everything else, In particular the SYCL build, passes.

core/src/SYCL/Kokkos_SYCL_ParallelReduce_MDRange.hpp

masterleinad · 2023-10-25T13:31:36Z

Retest this please.

masterleinad · 2023-10-26T12:22:20Z

All CI is passing.

masterleinad added 2 commits October 11, 2023 11:05

SYCL: Use host-pinned memory to copy reduction/scan result

60cfc94

Remove unused variable

a99013e

masterleinad force-pushed the sycl_reduce_host_memory_copy branch from fb18a1c to a99013e Compare October 17, 2023 02:26

masterleinad marked this pull request as ready for review October 17, 2023 14:06

Rombur approved these changes Oct 18, 2023

View reviewed changes

dalg24 reviewed Oct 23, 2023

View reviewed changes

core/src/SYCL/Kokkos_SYCL_ParallelReduce_MDRange.hpp Show resolved Hide resolved

core/src/SYCL/Kokkos_SYCL_ParallelReduce_MDRange.hpp Show resolved Hide resolved

m_shared_memory_lock -> m_host_scratch_lock; improve comments

2ea1f0e

dalg24 reviewed Oct 24, 2023

View reviewed changes

core/src/SYCL/Kokkos_SYCL_ParallelReduce_MDRange.hpp Outdated Show resolved Hide resolved

masterleinad added 2 commits October 24, 2023 14:42

Add comment for choosing memcpy over fence+deep_copy

e66f21a

m_[host_]scratch_lock->m_scratch_buffers_lock

b914bcf

dalg24 approved these changes Oct 24, 2023

View reviewed changes

dalg24 merged commit 0975671 into kokkos:develop Oct 26, 2023
28 checks passed

masterleinad deleted the sycl_reduce_host_memory_copy branch October 26, 2023 15:26

masterleinad mentioned this pull request Feb 8, 2024

CHANGELOG: 4.3.0 #6519

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SYCL: Use host-pinned memory to copy reduction/scan result #6500

SYCL: Use host-pinned memory to copy reduction/scan result #6500

masterleinad commented Oct 11, 2023 •

edited

masterleinad commented Oct 17, 2023

masterleinad commented Oct 25, 2023

masterleinad commented Oct 26, 2023

SYCL: Use host-pinned memory to copy reduction/scan result #6500

SYCL: Use host-pinned memory to copy reduction/scan result #6500

Conversation

masterleinad commented Oct 11, 2023 • edited

masterleinad commented Oct 17, 2023

masterleinad commented Oct 25, 2023

masterleinad commented Oct 26, 2023

masterleinad commented Oct 11, 2023 •

edited