New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Fix max scratch size calculation for level 0 of CUDA and HIP #5718

Merged

dalg24 merged 4 commits into kokkos:develop from crtrott:fix-cuda-max-scratch-size-calc

Dec 22, 2022

Member

crtrott commented Dec 21, 2022

This takes a bit better into account the internal shared memory teams are using by default. It does not take into account static shared memory which teams may use for certain types of team level reductions. The amount of shared memory there depends on the value_type and thus can't be taken into account without handing over the functor.

It does otherwise fix #1811

PhilMiller reviewed

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Outdated Show resolved Hide resolved

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Outdated Show resolved Hide resolved

Contributor

PhilMiller commented Dec 21, 2022

This maybe leaves the question of whether we should provide a version that takes the functor for the case where it matters, and make sure the documentation points users to that.


          Fix CUDA max_team_scratch level 0 calculation

c3e38dd

This will still not give the actual available scratch if the functor
were to do certain types of reductions, in which case some static
shared memory is used which depends on the value_type size, and thus
can't be accounted for without knowing the functor.

crtrott force-pushed the fix-cuda-max-scratch-size-calc branch from a4d28d4 to 6d88ba8 Compare

December 21, 2022 23:04

PhilMiller reviewed

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Outdated Show resolved Hide resolved

PhilMiller reviewed

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Outdated Show resolved Hide resolved

PhilMiller reviewed

View reviewed changes

core/src/HIP/Kokkos_HIP_Parallel_Team.hpp Outdated Show resolved Hide resolved


          P max_team_scratch level 0 calculation

752c6db

This will still not give the actual available scratch if the functor
were to do certain types of reductions, in which case some static
shared memory is used which depends on the value_type size, and thus
can't be accounted for without knowing the functor.

crtrott force-pushed the fix-cuda-max-scratch-size-calc branch from 6d88ba8 to 752c6db Compare

December 21, 2022 23:10

Contributor

PhilMiller commented Dec 21, 2022

(Your HIP commit message got truncated)

dalg24 requested changes

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Outdated Show resolved Hide resolved

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Show resolved Hide resolved

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Outdated Show resolved Hide resolved

PhilMiller reviewed

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Outdated Show resolved Hide resolved


          Scratch Size calculation moving variable to smaller scope

6455fc4

PhilMiller reviewed

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Show resolved Hide resolved

Contributor

PhilMiller commented Dec 22, 2022

This may also address #3498


          More finetuning based on review feedback

87752bc

PhilMiller approved these changes

View reviewed changes

dalg24 reviewed

View reviewed changes

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Show resolved Hide resolved

dalg24 requested a review from Rombur

December 22, 2022 03:02

dalg24 reviewed

View reviewed changes

Member

dalg24 left a comment

I am basically fine with that PR but I would like to get Bruno's feedback

Rombur approved these changes

View reviewed changes

dalg24 approved these changes

View reviewed changes

dalg24 merged commit 75a344e into kokkos:develop

PhilMiller mentioned this pull request

Figure out the maximal number of elements in a shared memory view #1811

Closed

PhilMiller linked an issue

that may be closed by this pull request

Figure out the maximal number of elements in a shared memory view #1811

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment