[SYCL][CUDA] Implement root group barrier #14828

keyradical · 2024-07-29T15:19:12Z

This PR adds an algorithm for doing a GPU wide barrier in CUDA backend.

Rough outline of the algorithm:

Every 0th thread from each workgroup performs atomic.add(1)
The same thread checks the atomic result with ld.acquire in a loop until it's equal to total amount of workgroups.
All threads call group-wide barrier.sync

One caveat to this is that there is no initialization of the atomic start value. So if we call this barrier several times in a kernel, on the second iteration, the start value will already contain the result from previous barrier. That's why we actually spin the while loop while current value % totalWgroups != 0.

maarquitos14 · 2024-07-30T10:20:02Z

sycl/test-e2e/GroupAlgorithm/root_group.cpp

      sycl::group_barrier(root);
-
+      if (it.get_group(0) % 2 == 0) {
+        X += sycl::sin(X);


I don't see that we ever check neither X nor Y. Should we? Otherwise, why we need this? How do we make sure it ran?

I wanted to explicitly delay some of the workgroups by adding them more work to do, because I've seen this test passing if insufficient barrier was used. For instance on CUDA backend, doing work-group wide barrier would be enough for it to pass and that is not correct. I think this test should perform some work-group divergence to actually check that we actually perform gpu-wide barrier.

How do we make sure it ran?

The X and Y are declared as volatile and my understanding was that this would prevent compiler from removing them with some optimization.

Right, can we have a comment explaining it? Otherwise we risk that this code will just be removed in the future thinking it's not required.

maarquitos14

LGTM.

MartinWehking

LGTM

keyradical · 2024-07-31T19:32:47Z

@intel/llvm-gatekeepers can you merge this, please?

Konrad Kusiak added 2 commits July 29, 2024 16:07

Implement root group wide barrier for CUDA backend

62a159a

Merge branch 'sycl' into implementRootGroupBarrier

b21e35f

keyradical requested a review from a team as a code owner July 29, 2024 15:19

keyradical requested a review from MartinWehking July 29, 2024 15:19

keyradical temporarily deployed to WindowsCILock July 29, 2024 15:26 — with GitHub Actions Inactive

keyradical temporarily deployed to WindowsCILock July 29, 2024 17:42 — with GitHub Actions Inactive

Added divergence to a root group test

79aa19e

keyradical requested a review from a team as a code owner July 30, 2024 08:27

keyradical requested a review from maarquitos14 July 30, 2024 08:27

keyradical had a problem deploying to WindowsCILock July 30, 2024 08:27 — with GitHub Actions Error

Increased number of workgroups in the test

35b6648

keyradical temporarily deployed to WindowsCILock July 30, 2024 08:38 — with GitHub Actions Inactive

keyradical had a problem deploying to WindowsCILock July 30, 2024 09:02 — with GitHub Actions Failure

maarquitos14 reviewed Jul 30, 2024

View reviewed changes

Added missing header to the test

caa7df4

keyradical temporarily deployed to WindowsCILock July 30, 2024 10:57 — with GitHub Actions Inactive

keyradical temporarily deployed to WindowsCILock July 30, 2024 11:21 — with GitHub Actions Inactive

Added comment to the test

2f7ad30

keyradical had a problem deploying to WindowsCILock July 30, 2024 13:52 — with GitHub Actions Failure

keyradical had a problem deploying to WindowsCILock July 30, 2024 14:21 — with GitHub Actions Failure

Merge branch 'sycl' into implementRootGroupBarrier

ee34d50

keyradical temporarily deployed to WindowsCILock July 31, 2024 13:37 — with GitHub Actions Inactive

keyradical temporarily deployed to WindowsCILock July 31, 2024 14:15 — with GitHub Actions Inactive

maarquitos14 approved these changes Jul 31, 2024

View reviewed changes

MartinWehking approved these changes Jul 31, 2024

View reviewed changes

sommerlukas merged commit 132f763 into intel:sycl Aug 1, 2024

keyradical mentioned this pull request Sep 3, 2024

Failing root_group test when more than 1 work-group is launched. #14462

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][CUDA] Implement root group barrier #14828

[SYCL][CUDA] Implement root group barrier #14828

Uh oh!

keyradical commented Jul 29, 2024

Uh oh!

maarquitos14 Jul 30, 2024

Uh oh!

keyradical Jul 30, 2024

Uh oh!

maarquitos14 Jul 30, 2024

Uh oh!

maarquitos14 left a comment

Uh oh!

MartinWehking left a comment

Uh oh!

keyradical commented Jul 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SYCL][CUDA] Implement root group barrier #14828

[SYCL][CUDA] Implement root group barrier #14828

Uh oh!

Conversation

keyradical commented Jul 29, 2024

Uh oh!

maarquitos14 Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

keyradical Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

maarquitos14 Jul 30, 2024

Choose a reason for hiding this comment

Uh oh!

maarquitos14 left a comment

Choose a reason for hiding this comment

Uh oh!

MartinWehking left a comment

Choose a reason for hiding this comment

Uh oh!

keyradical commented Jul 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants