[SYCL][E2E] Add more tests for virtual functions #15067

AlexeySachkov · 2024-08-14T11:38:14Z

This commit still doesn't bring an exhaustive coverage for the feature, but still improves the situation by checking the following scenarios:

using math built-ins from virtual functions
using group barriers from virtual functions
using virtual functions in nd-range kernels where every work-item calls a different virtual function
using virtual functions when the code is scattered across several translation units

Some tests are disabled, because we do not support those scenarios yet and more changes are required to make them work.

This commit still doesn't bring an exhaustive coverage for the feature, but still improves the situation by checking the following scenarios: - using math built-ins from virtual functions - using group barriers from virtual functions - using virtual functions in nd-range kernels where every work-item calls a different virtual function - using virtual functions when the code is scattered across several translation units Some tests are disabled, because we do not support those scenarios yet and more changes are required to make them work.

aelovikov-intel · 2024-08-14T18:54:05Z

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

+  SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(oneapi::indirectly_callable)
+  virtual int apply(int *LocalData, sycl::nd_item<1> It) {
+    LocalData[It.get_local_id()] += It.get_local_id();
+    sycl::group_barrier(It.get_group());


I'm afraid that people might copy-paste this example thoughtlessly in divergent control flow resulting in UB. I'm not sure if adding a comment here would be enough or if "convergent" functions should be prohibited under indirectly_callable by default and require explicit buy-in from the programmer (e.g. indirectly_callable_in_uniform_control_flow attribute).

I don't exactly share the concern. I.e. apply could have been a regular function which can also be blindly copy-pasted and called from a non-convergent/non-uniform context resulting in the very same UB.

Maybe using apply(int *, sycl::group) would be a better pattern? group in arguments is what the spec uses for such interfaces.

Thinking about it more. I can't just pass group only, because I need local IDs which aren't available in group. And nd_item already includes group, so passing them both together would be a bit weird>

I suppose that we should assume that if nd_item is passed, then some group operations can be performed.

There is group::get_local_id in core SYCL.

There is group::get_local_id in core SYCL.

Didn't know that! Switched to use group instead of nd_item in 0ea83a8

Host reference calculation has also been fixed by that commit: I've verified it on CPU (test passes there with internal newer version of OCL CPU RT).

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

aelovikov-intel · 2024-08-14T19:02:24Z

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

+    // We can't call group_barrier on host and therefore here we have a
+    // reference function instead of calling the same methods on host.
+    for (size_t GID = 0; GID < G.size() / L.size(); ++GID) {
+      for (size_t LID = 0; LID < L.size(); ++LID)
+        HostData[GID * L.size() + LID] += LID;
+
+      int Res = (TestCase == 0) ? 0 : 1;
+      for (size_t LID = 0; LID < L.size(); ++LID) {
+        if (TestCase == 0)
+          Res += HostData[GID * L.size() + LID];
+        else
+          Res *= HostData[GID * L.size() + LID];
+      }
+
+      for (size_t LID = 0; LID < L.size(); ++LID)
+        HostData[GID * L.size() + LID] = Res;
+    }
+
+    sycl::host_accessor HostAcc(DataStorage);
+    for (size_t I = 0; I < HostData.size(); ++I)
+      assert(HostAcc[I] == HostData[I]);


To be honest, this requires some focus to understand... Can we use #ifdef __SYCL_DEVICE_ONLY__ to unify the paths instead?

I don't think that we can use #ifdefs here, because nd_item is not user-constructible, i.e. the diff between host and device version of the function would be too huge.

But I will try to add some comments here which should help map this function to apply functions that we have above

The code was re-written (11db515) to be closer to apply function that we have

aelovikov-intel · 2024-08-14T19:03:12Z

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

+  }
+};
+
+int main() try {


Wow, C++ never stops to surprise me with something I didn't know before...

aelovikov-intel · 2024-08-14T19:16:38Z

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

+    q.submit([&](sycl::handler &CGH) {
+       CGH.single_task([=]() {
+         DeviceStorage->construct</* ret type = */ BaseOp>(TestCase);
+       });
+     }).wait_and_throw();


Why can't we just create a derived subclass normally and then pass it into the next kernel through its baseclass pointer? That would eliminate the dependency on "helpers.hpp" in this "uniform" tests.

The main reason the obj_storage_t helper was introduced is to make sure that the storage we allocated is large enough and has correct alignment.

As noted in #14209 (comment) attempting to construct an object in a misaligned memory is a UB.

Here we have two different classes instances of which we may construct: SumOp and MultipleOp. Even though they are the same in their layout, I would still prefer not to hardcode their size and alignment, but instead use this generic helper which allows to change them as we wish without worrying about alignment and allocation size.

Is it easier to write, or easier to debug when it will fail when a mistake is made in some future PR? I won't insist on the change here, but IMO, over-complicating simple tests usually leads to manually simplifying them in future whenever they catch regressions.

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

…-vf-tests

wenju-he · 2024-09-19T03:22:48Z

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

+} catch (sycl::exception &e) {
+  std::cout << "Unexpected exception was thrown: " << e.what() << std::endl;
+  return 1;
+}


This test generates two device images. One contains definitions of indirectly-callable functions and the other one contains kernel functions. In AOT mode they are not linked together before calling opencl-aot. When will this be fixed?

Yeah, AOT support for virtual functions is incomplete yet, that will be addressed in separate PRs. For now that's a second priority, because there is plenty enough bugs even on JIT path

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

…-vf-tests

…h the actual kernel

aelovikov-intel · 2024-10-04T21:27:19Z

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

+public:
+  SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(oneapi::indirectly_callable)
+  virtual int apply(int *LocalData, sycl::group<1> WG) {
+    LocalData[WG.get_local_id()] += WG.get_local_id();


This is read/write, but I'm not sure "read" part is really important for this test. Can we change it to write-only (e.g. g.get_group_linear() + g.get_local_id()). Then we'd be able to create /* virtual ? */ int calc_ref_value(auto global_size, auto local_size) { return /* formula */ }.

That would simplify lines 109-157 a lot, and would also move the reference value compute close to the device code so that they'd fit in a single screen.

Good idea, thanks. I've applied that approach in f92ac85

aelovikov-intel · 2024-10-04T21:32:18Z

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp

+      }
+    }
+
+    return sycl::group_broadcast(WG, Res);


This likely contains another group_barrier inside. Would it make sense to change the code to store the leader's value in line 50, then have a barrier and then read leader's value in each of the WIs before returning?

Right, group_broadcast implies group_barrier. Considering that the test is named group-barrier, I've replaced group_broadcast with "manual broadcast" in b948b36

aelovikov-intel

I think benefits of these in-tree outweigh perfecting the tests in the review, so formally LGTM.

aelovikov-intel · 2024-10-04T21:35:22Z

sycl/test-e2e/VirtualFunctions/misc/range-non-uniform-vf.cpp

+    q.submit([&](sycl::handler &CGH) {
+      sycl::accessor DataAcc(DataStorage, CGH, sycl::read_write);
+      CGH.parallel_for(R, props, [=](auto it) {
+        // Select VF that corresponds to this work-item


Suggested change

// Select VF that corresponds to this work-item

// Select virtual function that corresponds to this work-item

although I'm biased here as VF usually means "vector factor" to me.

We actually select an object and not a virtual function here, fixed in 21adc25

sycl/test-e2e/VirtualFunctions/multiple-translation-units/separate-call.cpp

…-vf-tests

AlexeySachkov · 2024-10-11T08:52:29Z

Considering that there is a formal approval and all most recent comments were applied, I've merged the PR. If there is any other feedback, I will apply it as a follow-up PR

AlexeySachkov temporarily deployed to WindowsCILock August 14, 2024 11:39 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to WindowsCILock August 14, 2024 12:11 — with GitHub Actions Inactive

Apply clang-format, add more descriptive comments and issue links

b2256e6

AlexeySachkov marked this pull request as ready for review August 14, 2024 17:45

AlexeySachkov requested a review from a team as a code owner August 14, 2024 17:45

AlexeySachkov requested a review from aelovikov-intel August 14, 2024 17:45

AlexeySachkov temporarily deployed to WindowsCILock August 14, 2024 17:46 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to WindowsCILock August 14, 2024 18:42 — with GitHub Actions Inactive

aelovikov-intel reviewed Aug 14, 2024

View reviewed changes

wenju-he reviewed Sep 10, 2024

View reviewed changes

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp Show resolved Hide resolved

AlexeySachkov added 2 commits September 16, 2024 09:02

Merge remote-tracking branch 'origin/sycl' into private/asachkov/more…

bf7793d

…-vf-tests

Add missing broadcast

d4d0829

wenju-he reviewed Sep 19, 2024

View reviewed changes

sycl/test-e2e/VirtualFunctions/misc/group-barrier.cpp Outdated Show resolved Hide resolved

AlexeySachkov added 5 commits September 26, 2024 02:31

Merge remote-tracking branch 'origin/sycl' into private/asachkov/more…

1d2937e

…-vf-tests

Use better inputs so that result is not alwys zero

499552c

Rewrite group-barrier verification function so it is more aligned wit…

11db515

…h the actual kernel

Outline common tests requirements into lit.local.cfg

2a73588

Drop custom async exception handler

90c9175

AlexeySachkov temporarily deployed to WindowsCILock September 26, 2024 15:53 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to WindowsCILock September 26, 2024 16:06 — with GitHub Actions Inactive

Switch to use group; Fix group-barrier host code

0ea83a8

AlexeySachkov had a problem deploying to WindowsCILock October 1, 2024 16:51 — with GitHub Actions Error

Fix incorrect comparison; XFAIL -> UNSUPPORTED due to hang on GPU

b2c691d

AlexeySachkov temporarily deployed to WindowsCILock October 1, 2024 17:05 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to WindowsCILock October 1, 2024 17:17 — with GitHub Actions Inactive

aelovikov-intel reviewed Oct 4, 2024

View reviewed changes

aelovikov-intel approved these changes Oct 4, 2024

View reviewed changes

AlexeySachkov added 4 commits October 10, 2024 02:31

Merge remote-tracking branch 'origin/sycl' into private/asachkov/more…

d0c67e5

…-vf-tests

Drop group_broadcast to be focued on group_barrier only

b948b36

Apply review comments to simplify the test

f92ac85

Apply code review comments

21adc25

AlexeySachkov temporarily deployed to WindowsCILock October 10, 2024 11:16 — with GitHub Actions Inactive

AlexeySachkov temporarily deployed to WindowsCILock October 10, 2024 17:15 — with GitHub Actions Inactive

AlexeySachkov merged commit 6ba05b7 into intel:sycl Oct 11, 2024
12 checks passed

AlexeySachkov deleted the private/asachkov/more-vf-tests branch October 11, 2024 08:52

	// Select VF that corresponds to this work-item
	// Select virtual function that corresponds to this work-item

[SYCL][E2E] Add more tests for virtual functions #15067

[SYCL][E2E] Add more tests for virtual functions #15067

Uh oh!

Conversation

AlexeySachkov commented Aug 14, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aelovikov-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

AlexeySachkov commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants