Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.7.01] fix incorrect offset in cuda parallel scan for < 4 byte types #5607

Conversation

dalg24
Copy link
Member

@dalg24 dalg24 commented Oct 31, 2022

Cherry-picking #5555 into RC 3.7.01

WARNING I had to do non-trivial conflict resolution in Kokkos_Cuda_Parallel_Range.hpp, Kokkos_HIP_Parallel_Range.hpp, and Kokkos_HIP_ReduceScan.hpp because of #5146 and #5383

Please review carefully.

…#5555)

* kokkos#5545: tests: add reproducer test for kokkos#5545

* kokkos#5545: cuda: use word_size_type for ParallelScan

* kokkos#5545: tests: add reproducer for offset issue with ParallelScanWithTotal

* kokkos#5545: cuda: add word_size_type for ParallelScanWithTotal

* fix formatting

* cuda: fix comments for word_size_type

* tests: fix mismatched types in Scan test assert

* Fix HIP parallel_scan when using types < 4 bytes

* Fix indentation

* Replace class with typename

* tests: re-use old scan test for small types; add extra parameter to avoid overflow

* tests: remove fence and change ImbalanceSz to value_type

* Fix format

* Re-introduce size_type member type in ParallelScan[WithTotal] CUDA specializations

Applied review suggestion to reduce the diff

* Nitpick in HIP too prefere size_type alias to HIP::size_type in definition of word_size_type

* Apply Dongs suggestion and revert one more change I missed

Co-authored-by: Bruno Turcksin <bruno.turcksin@gmail.com>
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
Co-authored-by: Damien L-G <dalg24@gmail.com>
@dalg24 dalg24 requested review from Rombur and nmm0 October 31, 2022 21:19
This was referenced Oct 31, 2022
Copy link
Contributor

@nmm0 nmm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line Kokkos_Cuda_ParallelRange.hpp:1060 (I can't comment directly on the diff since the line wasn't changed) I think we need:

m_scratch_space + (grid_x - 1) * size / sizeof(word_size_type),
              size);

size there is the value size so without this we may run into the same problem (offset of 0).

nmm0
nmm0 previously requested changes Nov 1, 2022
core/src/Cuda/Kokkos_Cuda_Parallel_Range.hpp Show resolved Hide resolved
@nmm0 nmm0 dismissed their stale review November 1, 2022 22:04

Pushed commits fixing issue

Copy link
Contributor

@nmm0 nmm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to go

@dalg24 dalg24 merged commit 3af6d0b into kokkos:release-candidate-3.7.01 Nov 2, 2022
@dalg24 dalg24 deleted the rc371-cherrypick5555-fixincorrectoffsetparallelscansmalltypescudaandhip branch May 17, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants