Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix max scratch size calculation for level 0 of CUDA and HIP #5718

Merged
merged 4 commits into from
Dec 22, 2022

Conversation

crtrott
Copy link
Member

@crtrott crtrott commented Dec 21, 2022

This takes a bit better into account the internal shared memory teams are using by default. It does not take into account static shared memory which teams may use for certain types of team level reductions. The amount of shared memory there depends on the value_type and thus can't be taken into account without handing over the functor.

It does otherwise fix #1811

@PhilMiller
Copy link
Contributor

This maybe leaves the question of whether we should provide a version that takes the functor for the case where it matters, and make sure the documentation points users to that.

This will still not give the actual available scratch if the functor
were to do certain types of reductions, in which case some static
shared memory is used which depends on the value_type size, and thus
can't be accounted for without knowing the functor.
@crtrott crtrott force-pushed the fix-cuda-max-scratch-size-calc branch from a4d28d4 to 6d88ba8 Compare December 21, 2022 23:04
This will still not give the actual available scratch if the functor
were to do certain types of reductions, in which case some static
shared memory is used which depends on the value_type size, and thus
can't be accounted for without knowing the functor.
@crtrott crtrott force-pushed the fix-cuda-max-scratch-size-calc branch from 6d88ba8 to 752c6db Compare December 21, 2022 23:10
@PhilMiller
Copy link
Contributor

(Your HIP commit message got truncated)

core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Outdated Show resolved Hide resolved
core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Show resolved Hide resolved
core/src/Cuda/Kokkos_Cuda_Parallel_Team.hpp Outdated Show resolved Hide resolved
@PhilMiller
Copy link
Contributor

This may also address #3498

@dalg24 dalg24 requested a review from Rombur December 22, 2022 03:02
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am basically fine with that PR but I would like to get Bruno's feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Figure out the maximal number of elements in a shared memory view
4 participants