Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#5635: HIP: Add Overloads for parallel_scan with return value for ThreadVectorRange #6242

Merged

Conversation

thearusable
Copy link
Contributor

@thearusable thearusable commented Jun 27, 2023

Related to #5635 #6453

Depends on #6235 (merged)

@thearusable thearusable self-assigned this Jun 27, 2023
@thearusable thearusable force-pushed the 5635-hip-parallel-scan-with-value branch from c0e6aa2 to 6ad0843 Compare July 7, 2023 10:29
@thearusable thearusable changed the title [WIP] #5635: HIP: Add Overloads for parallel_scan with return value #5635: HIP: Add Overloads for parallel_scan with return value Jul 7, 2023
@thearusable thearusable marked this pull request as ready for review July 7, 2023 10:31
Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid copying the implementation from the overloads not returning the total.

@thearusable thearusable force-pushed the 5635-hip-parallel-scan-with-value branch from 6ad0843 to 264acc6 Compare July 10, 2023 13:30
@thearusable thearusable changed the title #5635: HIP: Add Overloads for parallel_scan with return value #5635: [3/X] HIP: Add Overloads for parallel_scan with return value Jul 10, 2023
Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this pull request also touch Cuda?

@thearusable
Copy link
Contributor Author

thearusable commented Jul 12, 2023

Why does this pull request also touch Cuda?

I created the PRs as a chain which should be merged in particular order. This one is the last one currently created so it contains the changes from other ones. There will be at least two more PRs created.

Edit: I added numeration in PR names to make that more clear.

@thearusable thearusable force-pushed the 5635-hip-parallel-scan-with-value branch from 264acc6 to e933957 Compare July 13, 2023 12:20
@thearusable thearusable force-pushed the 5635-hip-parallel-scan-with-value branch from e933957 to cdb0e1f Compare July 24, 2023 14:30
@thearusable thearusable changed the title #5635: [3/X] HIP: Add Overloads for parallel_scan with return value #5635: HIP: Add Overloads for parallel_scan with return value for ThreadVectorRange Jul 24, 2023
@masterleinad
Copy link
Contributor

Marking as draft since dependencies have not been merged.

@masterleinad masterleinad marked this pull request as draft August 9, 2023 21:19
@masterleinad masterleinad mentioned this pull request Sep 14, 2023
@masterleinad masterleinad force-pushed the 5635-hip-parallel-scan-with-value branch 2 times, most recently from b4feed9 to f8ecd2f Compare September 20, 2023 18:38
@cz4rs cz4rs force-pushed the 5635-hip-parallel-scan-with-value branch from f8ecd2f to 6c6a26a Compare September 27, 2023 09:58
@cz4rs cz4rs marked this pull request as ready for review September 27, 2023 10:06
Comment on lines +800 to +801
if (i - 1 < loop_boundaries.end && threadIdx.x > 0)
closure(i - 1, val, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the discussion #6235 (comment) for a similar fix in CUDA backend.

@fnrizzi
Copy link
Contributor

fnrizzi commented Sep 28, 2023

gcc timed out , rest passes

@@ -1016,7 +1016,7 @@ struct checkScan {
};
} // namespace VectorScanReducer

#if !(defined(KOKKOS_IMPL_CUDA_CLANG_WORKAROUND) || defined(KOKKOS_ENABLE_HIP))
#if !defined(KOKKOS_IMPL_CUDA_CLANG_WORKAROUND)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works for HIP and CUDA nvcc, there is a high probability that it also works for CUDA clang. Not for this PR but I think it's worth to check if the still need this condition.

@crtrott crtrott merged commit 567524c into kokkos:develop Oct 4, 2023
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants