Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks for shmem usage in parallel_reduce #4548

Merged
merged 1 commit into from
Jan 28, 2022
Merged

Conversation

ldh4
Copy link
Contributor

@ldh4 ldh4 commented Nov 22, 2021

This is to resolve #4461

For CUDA build, the problem comes from ParallelReduce where it determines the necessary scratch space size and the block size for the reduced view. Starting from the size of 181 doubles for the reduced view, the calculated block size drops from 32 to 16, which seems to cause cuda illegal memory access. Interestingly, for ParallelReduce that takes in a teampolicy, there already is a similar check that uses the teamsize instead of the block size to verify the similar condition. So, to keep the conditions of the throws as consistently as possible across the policies, this commit puts in a simple function that checks if the calculated block size would be set below 32 because of the internal max shared memory size per block.

For HIP build, starting from 125 doubles, the calculated block size drops to 0. And there already is a check in HIP::ParallelReduce that throws if the calculated block size becomes 0, which is what was observed in the original issue post.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the test a general test for all backends? I.e. does this work with HIP/SYCL (do they have implemented array reductions actually?)

@masterleinad
Copy link
Contributor

Can we make the test a general test for all backends? I.e. does this work with HIP/SYCL (do they have implemented array reductions actually?)

Array reductions are implemented for HIP and SYCL.

@ldh4
Copy link
Contributor Author

ldh4 commented Nov 30, 2021

Can we make the test a general test for all backends? I.e. does this work with HIP/SYCL (do they have implemented array reductions actually?)

I only looked into the existing array reduction size limits in CUDA and HIP, but if there's a similar shared memory limits for array reduction in SYCL as well, the test can be converted to a general test. Each backend will just need to be checking against different shared memory limit in that case.
I'll also add in a bit more expressive way of calculating the shared memory limit in the test so that the general test won't just be checking against some magical number hard-coded per backend.

@crtrott
Copy link
Member

crtrott commented Dec 2, 2021

I am not sure that there is a completely generic way of figuring out max shmem size. We would probably need to do a function you hand the execution space instance, and then overloads for that in the test where you in fact use semi magic numbers, or low level backend specific functions to figure it out (i.e. call raw CUDA/HIP/SYCL functionality)

@crtrott crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Jan 19, 2022
@ldh4 ldh4 added this to In progress in Kokkos Release 3.6 Jan 19, 2022
@ldh4 ldh4 force-pushed the issue4461 branch 3 times, most recently from 6e54d63 to bcc14ad Compare January 24, 2022 09:30
@ajpowelsnl
Copy link
Contributor

Please fix:

/var/jenkins/workspace/Kokkos/core/unit_test/cuda/TestCuda_ReducerViewSizeLimit.cpp:59:3: error: use 'using' instead of 'typedef' [modernize-use-using,-warnings-as-errors]

  typedef ValueType value_type[];

  ^

/var/jenkins/workspace/Kokkos/core/unit_test/cuda/TestCuda_ReducerViewSizeLimit.cpp:100:3: error: use 'using' instead of 'typedef' [modernize-use-using,-warnings-as-errors]

  typedef ValueType value_type[];

  ^

@ldh4 ldh4 force-pushed the issue4461 branch 4 times, most recently from 100cf44 to d1d6ecd Compare January 26, 2022 22:15
The check will throw if the expected size of the reduced view exceeds the internal shmem limit
@ldh4
Copy link
Contributor Author

ldh4 commented Jan 27, 2022

Can we make the test a general test for all backends? I.e. does this work with HIP/SYCL (do they have implemented array reductions actually?)

@crtrott, as we discussed before, I will open another issue to convert this Cuda unit-test to a general test for other backends once this is merged.

@crtrott crtrott merged commit d495b50 into kokkos:develop Jan 28, 2022
Kokkos Release 3.6 automation moved this from In progress to Done Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
Development

Successfully merging this pull request may close these issues.

Array reducer cudaErrorIllegalAddress at value_count >= 181
4 participants