OpenMPTarget: scratch implementation for parallel_reduce #3776

rgayatri23 · 2021-02-03T20:05:42Z

Updates in this PR :

scratch implementation for parallel_reduce.
Added scratch unit tests to the list of unit tests.
Edited unit tests to work with the OpenMPTarget backend restrictions.

… scratch unit tests.

…s avoided in the OpenMPTarget backend.

dalg24

Essentially copy/paste from code in ParallelFor specialization. Code duplication is unfortunate but I don't necessary have a better idea.

I am unsure how the lock_array would play with streams but I guess we have the same issue with parallel_for

core/unit_test/TestTeamScratch.hpp

dalg24

Add FIXMEs in the unit tests
Make sure you add a comment when it is not obvious

rgayatri23 · 2021-02-05T18:46:31Z

Essentially copy/paste from code in ParallelFor specialization. Code duplication is unfortunate but I don't necessary have a better idea.
I dont have an idea to avoid the code duplication for this part, but I have plans to reduce it in ParallelFor and ParallelReduce to avoid code duplication for TagType similar to what we did in the MDRange case.

I am unsure how the lock_array would play with streams but I guess we have the same issue with parallel_for
Yes, when streams are implemented by compilers, we will have to modify this implementation.

crtrott · 2021-02-05T21:34:34Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Parallel.hpp

+        while (lock_team != 1) {
+          // Avoid tripping over other team's block index.
+          // Find a free block index.
+          iter = (omp_get_team_num() % max_active_teams);


shouldn't this happen outside the while loop? Is that also wrong in the ParallelFor?

also why do we need two while loops. Isn't the only way to escape the inner while loop for lock_team==1 in which case shmem_block_indx != -1?

Yes the additional loop is not needed. Deleted it.

crtrott · 2021-02-05T21:37:48Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Parallel.hpp

+#pragma omp target teams distribute parallel for is_device_ptr(lock_array)
+    for (int i = 0; i < max_active_teams; ++i) {
+      lock_array[i] = 0;
+    }


why do this again and again (i probably have overlooked that in the ParallelFor too) I mean we only need to do it when reallocating or (so the get_lock_array could do that). Since otherwise this thing should be back to zero anyway (a kernel will never leave it not being zero unless it crashes).

Yes, I was initializing it in case a previous kernel crashes. But I guess we don't need since if the kernel crashes, then the application would quit anyway.

crtrott · 2021-02-05T21:38:03Z

core/src/OpenMPTarget/Kokkos_OpenMPTarget_Parallel.hpp

+      // Loop as long as a shmem_block_index is not found.
+      while (shmem_block_index == -1) {
+        // Loop until a lock can be acquired on a team
+        while (lock_team != 1) {


same as above

…llelReduce` when calculating the shared block index.

rgayatri added 2 commits February 2, 2021 17:06

OpenMPTarget: scratch implementation for parallel_reduce. Uncommented…

e029e37

… scratch unit tests.

OpenMPTarget: Merge with upstream/develop.

790a708

rgayatri23 requested review from crtrott and dalg24 February 3, 2021 20:06

OpenMPTarget: Delete the scratch unit test from the list of unit test…

614f787

…s avoided in the OpenMPTarget backend.

dalg24 approved these changes Feb 5, 2021

View reviewed changes

dalg24 reviewed Feb 5, 2021

View reviewed changes

core/unit_test/TestTeamScratch.hpp Show resolved Hide resolved

dalg24 requested changes Feb 5, 2021

View reviewed changes

Added a FIXME_OPENMPTARGET comment in scratch unit test.

fcaef95

crtrott requested changes Feb 5, 2021

View reviewed changes

rgayatri added 3 commits February 5, 2021 14:30

OpenMPTarget: Deleted the extra while-loop in ParallelFor and `Para…

81274cd

…llelReduce` when calculating the shared block index.

OpenMPTarget: fix for clang-format.

23dc099

OpenMPTarget: update to initialize lock_array only once during creation.

0265e4c

crtrott approved these changes Feb 10, 2021

View reviewed changes

dalg24 approved these changes Feb 10, 2021

View reviewed changes

crtrott merged commit 397dfb6 into kokkos:develop Feb 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenMPTarget: scratch implementation for parallel_reduce #3776

OpenMPTarget: scratch implementation for parallel_reduce #3776

rgayatri23 commented Feb 3, 2021

dalg24 left a comment

dalg24 left a comment

rgayatri23 commented Feb 5, 2021

crtrott Feb 5, 2021

crtrott Feb 5, 2021

rgayatri23 Feb 5, 2021

crtrott Feb 5, 2021

rgayatri23 Feb 5, 2021 •

edited

crtrott Feb 5, 2021

OpenMPTarget: scratch implementation for parallel_reduce #3776

OpenMPTarget: scratch implementation for parallel_reduce #3776

Conversation

rgayatri23 commented Feb 3, 2021

dalg24 left a comment

Choose a reason for hiding this comment

dalg24 left a comment

Choose a reason for hiding this comment

rgayatri23 commented Feb 5, 2021

crtrott Feb 5, 2021

Choose a reason for hiding this comment

crtrott Feb 5, 2021

Choose a reason for hiding this comment

rgayatri23 Feb 5, 2021

Choose a reason for hiding this comment

crtrott Feb 5, 2021

Choose a reason for hiding this comment

rgayatri23 Feb 5, 2021 • edited

Choose a reason for hiding this comment

crtrott Feb 5, 2021

Choose a reason for hiding this comment

rgayatri23 Feb 5, 2021 •

edited