Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy #6433

Conversation

masterleinad
Copy link
Contributor

Fixes #6398. Since we are missing the initialization we are running into undefined behavior and it's difficult to reliably reproduce the issue in the test suite (although it always failed in a stand-alone reproducer). Disabling other tests around it or executing them in a different order made the test pass or fail for me (without the fix).

@masterleinad masterleinad marked this pull request as ready for review September 12, 2023 20:46
@dalg24 dalg24 added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Backend - CUDA labels Sep 13, 2023
// Requesting per team scratch memory for a largish number of teams, resulted
// in problems computing the correct scratch pointer due to missed
// initialization of the maximum number of scratch pad indices in the Cuda
// baackend.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// baackend.
// backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix if there are any other changes required/requested.

@dalg24 dalg24 merged commit 7e35f10 into kokkos:develop Sep 13, 2023
28 checks passed
@dalg24
Copy link
Member

dalg24 commented Sep 13, 2023

@masterleinad did you identify when this bug was introduced?

@masterleinad
Copy link
Contributor Author

@masterleinad did you identify when this bug was introduced?

Looks like 5d93865 (#5814) where we introduced m_num_scratch_locks but only initialized it for ParallelReduce. That pull request first appeared in 4.1.00.

@dalg24
Copy link
Member

dalg24 commented Sep 13, 2023

Please open an issue listing fixes that would be going into an 4.0.01 if we do a patch release and add this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend - CUDA Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants