Add checks for shmem usage in parallel_reduce #4548

ldh4 · 2021-11-22T15:57:31Z

This is to resolve #4461

For CUDA build, the problem comes from ParallelReduce where it determines the necessary scratch space size and the block size for the reduced view. Starting from the size of 181 doubles for the reduced view, the calculated block size drops from 32 to 16, which seems to cause cuda illegal memory access. Interestingly, for ParallelReduce that takes in a teampolicy, there already is a similar check that uses the teamsize instead of the block size to verify the similar condition. So, to keep the conditions of the throws as consistently as possible across the policies, this commit puts in a simple function that checks if the calculated block size would be set below 32 because of the internal max shared memory size per block.

For HIP build, starting from 125 doubles, the calculated block size drops to 0. And there already is a check in HIP::ParallelReduce that throws if the calculated block size becomes 0, which is what was observed in the original issue post.

crtrott

Can we make the test a general test for all backends? I.e. does this work with HIP/SYCL (do they have implemented array reductions actually?)

masterleinad · 2021-11-30T17:56:13Z

Can we make the test a general test for all backends? I.e. does this work with HIP/SYCL (do they have implemented array reductions actually?)

Array reductions are implemented for HIP and SYCL.

ldh4 · 2021-11-30T18:21:31Z

Can we make the test a general test for all backends? I.e. does this work with HIP/SYCL (do they have implemented array reductions actually?)

I only looked into the existing array reduction size limits in CUDA and HIP, but if there's a similar shared memory limits for array reduction in SYCL as well, the test can be converted to a general test. Each backend will just need to be checking against different shared memory limit in that case.
I'll also add in a bit more expressive way of calculating the shared memory limit in the test so that the general test won't just be checking against some magical number hard-coded per backend.

crtrott · 2021-12-02T00:27:44Z

I am not sure that there is a completely generic way of figuring out max shmem size. We would probably need to do a function you hand the execution space instance, and then overloads for that in the test where you in fact use semi magic numbers, or low level backend specific functions to figure it out (i.e. call raw CUDA/HIP/SYCL functionality)

ajpowelsnl · 2022-01-24T20:21:07Z

Please fix:

/var/jenkins/workspace/Kokkos/core/unit_test/cuda/TestCuda_ReducerViewSizeLimit.cpp:59:3: error: use 'using' instead of 'typedef' [modernize-use-using,-warnings-as-errors]

  typedef ValueType value_type[];

  ^

/var/jenkins/workspace/Kokkos/core/unit_test/cuda/TestCuda_ReducerViewSizeLimit.cpp:100:3: error: use 'using' instead of 'typedef' [modernize-use-using,-warnings-as-errors]

  typedef ValueType value_type[];

  ^

The check will throw if the expected size of the reduced view exceeds the internal shmem limit

ldh4 · 2022-01-27T01:30:10Z

Can we make the test a general test for all backends? I.e. does this work with HIP/SYCL (do they have implemented array reductions actually?)

@crtrott, as we discussed before, I will open another issue to convert this Cuda unit-test to a general test for other backends once this is merged.

ldh4 force-pushed the issue4461 branch from ada593b to 72fa17a Compare November 22, 2021 16:44

ldh4 linked an issue Nov 22, 2021 that may be closed by this pull request

Array reducer cudaErrorIllegalAddress at value_count >= 181 #4461

Closed

crtrott requested changes Nov 30, 2021

View reviewed changes

crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Jan 19, 2022

ldh4 added this to In progress in Kokkos Release 3.6 Jan 19, 2022

ldh4 force-pushed the issue4461 branch 3 times, most recently from 6e54d63 to bcc14ad Compare January 24, 2022 09:30

ldh4 force-pushed the issue4461 branch 4 times, most recently from 100cf44 to d1d6ecd Compare January 26, 2022 22:15

Added in checks for shmem usage in parallel_reduce

c421d1f

The check will throw if the expected size of the reduced view exceeds the internal shmem limit

ldh4 force-pushed the issue4461 branch from d1d6ecd to c421d1f Compare January 27, 2022 00:20

crtrott approved these changes Jan 27, 2022

View reviewed changes

crtrott merged commit d495b50 into kokkos:develop Jan 28, 2022

Kokkos Release 3.6 automation moved this from In progress to Done Jan 28, 2022

ldh4 mentioned this pull request Feb 1, 2022

Add shmem usage checks in array reductions #4743

Open

PhilMiller mentioned this pull request Dec 21, 2022

Reductions with an array of results gives memory error #3727

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add checks for shmem usage in parallel_reduce #4548

Add checks for shmem usage in parallel_reduce #4548

ldh4 commented Nov 22, 2021

crtrott left a comment

masterleinad commented Nov 30, 2021

ldh4 commented Nov 30, 2021

crtrott commented Dec 2, 2021

ajpowelsnl commented Jan 24, 2022

ldh4 commented Jan 27, 2022 •

edited

Add checks for shmem usage in parallel_reduce #4548

Add checks for shmem usage in parallel_reduce #4548

Conversation

ldh4 commented Nov 22, 2021

crtrott left a comment

Choose a reason for hiding this comment

masterleinad commented Nov 30, 2021

ldh4 commented Nov 30, 2021

crtrott commented Dec 2, 2021

ajpowelsnl commented Jan 24, 2022

ldh4 commented Jan 27, 2022 • edited

ldh4 commented Jan 27, 2022 •

edited