-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix max scratch size calculation for level 0 of CUDA and HIP #5718
Fix max scratch size calculation for level 0 of CUDA and HIP #5718
Conversation
This maybe leaves the question of whether we should provide a version that takes the functor for the case where it matters, and make sure the documentation points users to that. |
This will still not give the actual available scratch if the functor were to do certain types of reductions, in which case some static shared memory is used which depends on the value_type size, and thus can't be accounted for without knowing the functor.
a4d28d4
to
6d88ba8
Compare
This will still not give the actual available scratch if the functor were to do certain types of reductions, in which case some static shared memory is used which depends on the value_type size, and thus can't be accounted for without knowing the functor.
6d88ba8
to
752c6db
Compare
(Your HIP commit message got truncated) |
This may also address #3498 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am basically fine with that PR but I would like to get Bruno's feedback
This takes a bit better into account the internal shared memory teams are using by default. It does not take into account static shared memory which teams may use for certain types of team level reductions. The amount of shared memory there depends on the
value_type
and thus can't be taken into account without handing over the functor.It does otherwise fix #1811