Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy #6433

masterleinad · 2023-09-12T16:23:16Z

Fixes #6398. Since we are missing the initialization we are running into undefined behavior and it's difficult to reliably reproduce the issue in the test suite (although it always failed in a stand-alone reproducer). Disabling other tests around it or executing them in a different order made the test pass or fail for me (without the fix).

dalg24 · 2023-09-13T03:45:21Z

core/unit_test/TestTeamScratch.hpp

+  // Requesting per team scratch memory for a largish number of teams, resulted
+  // in problems computing the correct scratch pointer due to missed
+  // initialization of the maximum number of scratch pad indices in the Cuda
+  // baackend.


Suggested change

// baackend.

// backend.

I'll fix if there are any other changes required/requested.

dalg24 · 2023-09-13T13:52:06Z

@masterleinad did you identify when this bug was introduced?

masterleinad · 2023-09-13T14:04:58Z

@masterleinad did you identify when this bug was introduced?

Looks like 5d93865 (#5814) where we introduced m_num_scratch_locks but only initialized it for ParallelReduce. That pull request first appeared in 4.1.00.

dalg24 · 2023-09-13T14:07:21Z

Please open an issue listing fixes that would be going into an 4.0.01 if we do a patch release and add this one.

Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy

6979f67

masterleinad marked this pull request as ready for review September 12, 2023 20:46

dalg24 added Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) Backend - CUDA labels Sep 13, 2023

dalg24 approved these changes Sep 13, 2023

View reviewed changes

fnrizzi approved these changes Sep 13, 2023

View reviewed changes

dalg24 merged commit 7e35f10 into kokkos:develop Sep 13, 2023
28 checks passed

This was referenced Sep 13, 2023

Cherry-pick #6433 to 4.1.01 #6438

Open

CHANGELOG: 4.2.0 #6197

Closed

masterleinad mentioned this pull request Nov 27, 2023

Invalid memory access when using parallel_for and level 1 scratch memory. #6398

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy #6433

Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy #6433

masterleinad commented Sep 12, 2023

dalg24 Sep 13, 2023

masterleinad Sep 13, 2023

dalg24 commented Sep 13, 2023

masterleinad commented Sep 13, 2023

dalg24 commented Sep 13, 2023

Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy #6433

Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy #6433

Conversation

masterleinad commented Sep 12, 2023

dalg24 Sep 13, 2023

Choose a reason for hiding this comment

masterleinad Sep 13, 2023

Choose a reason for hiding this comment

dalg24 commented Sep 13, 2023

masterleinad commented Sep 13, 2023

dalg24 commented Sep 13, 2023