-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#5635: HIP: Add Overloads for parallel_scan with return value for ThreadVectorRange #6242
#5635: HIP: Add Overloads for parallel_scan with return value for ThreadVectorRange #6242
Conversation
c0e6aa2
to
6ad0843
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid copying the implementation from the overloads not returning the total.
6ad0843
to
264acc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this pull request also touch Cuda
?
I created the PRs as a chain which should be merged in particular order. This one is the last one currently created so it contains the changes from other ones. There will be at least two more PRs created. Edit: I added numeration in PR names to make that more clear. |
264acc6
to
e933957
Compare
e933957
to
cdb0e1f
Compare
Marking as draft since dependencies have not been merged. |
b4feed9
to
f8ecd2f
Compare
f8ecd2f
to
6c6a26a
Compare
if (i - 1 < loop_boundaries.end && threadIdx.x > 0) | ||
closure(i - 1, val, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the discussion #6235 (comment) for a similar fix in CUDA backend.
gcc timed out , rest passes |
@@ -1016,7 +1016,7 @@ struct checkScan { | |||
}; | |||
} // namespace VectorScanReducer | |||
|
|||
#if !(defined(KOKKOS_IMPL_CUDA_CLANG_WORKAROUND) || defined(KOKKOS_ENABLE_HIP)) | |||
#if !defined(KOKKOS_IMPL_CUDA_CLANG_WORKAROUND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it works for HIP and CUDA nvcc, there is a high probability that it also works for CUDA clang. Not for this PR but I think it's worth to check if the still need this condition.
Related to #5635 #6453
Depends on #6235(merged)