-
Notifications
You must be signed in to change notification settings - Fork 795
[SYCL][Doc] Fix work_group_scratch_memory example #20506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
The example in the specification was using an old API that we removed in intel#14785. Update the example and the spec wording to use `launch_config` from sycl_ext_oneapi_enqueue_functions to pass the property with the size of the scratch memory.
| local memory capacity as reported by `info::device::local_mem_size` | ||
| then the implementation must throw a synchronous exception with the | ||
| `errc::memory_allocation` error code from the kernel invocation command | ||
| (e.g. `parallel_for`). This check must take all APIs that allocation device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we have a check to confirm whether the SLM size is set for certain kernel?
Say, if a kernel calls the syclex::get_work_group_scratch_memory() in kernel body, but we do not set the syclex::work_group_scratch_size in nd_launch/parallel_for, will an exception be thrown?
I'm asking this because looks like there will be no any warning or error in current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, and it seems to me that it would be easy to implement. I think we just need a check here to see if WorkGroupMemorySize has been set by the user.
https://github.com/intel/llvm/blob/sycl/sycl/source/detail/scheduler/commands.cpp#L2474-L2481
We should be careful, though, not to throw an exception just because the user set WorkGroupMemorySize to 0. Setting the size to 0 should be allowed.
@AlexeySachkov and @steffenlarsen do you see any problem with this? I assume this code is executed synchronously by the same application thread that launches the kernel, right? Or, is this code executed asynchronously by another thread? If that is the case, the error would be an async error.
I think we could throw errc::memory_allocation in this scenario too, and we can distinguish this case from the out-of-memory case via the message string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head, it should be possible to make this a synchronous exception as the information about the existence of the implicit local argument is present both in the compiler-generated specializations of DeviceKernelInfo and the kernel properties. For SYCLBIN or RTC compiled kernels, we should have the kernel object ahead of enqueuing the kernel launch, which should be enough to find its properties. How much overhead the lookup for the latter would be is TBD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction: The information is only in the properties, so the implementation will have to look up the kernel's information. However, it should just be a single map lookup so it should be relatively cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code I point to in the link above already seems to have all the necessary information. Essentially, the code looks like:
if (/*kernel has the implicit local arg*/) {
/* Push "WorkGroupMemorySize" as an argument */
}
Therefore, the code already knows whether the kernel has the implicit argument, and it knows the value of the work_group_scratch_size property that was passed at launch time. Therefore, it seems like we just need to rewrite the code here like:
if (/*kernel has the implicit local arg*/) {
if (WorkGroupMemorySize == -1) {
/* report synchronous exception */
}
/* Push "WorkGroupMemorySize" as an argument */
}
And we need to trace back the calls to this code and arrange to pass -1 for WorkGroupMemorySize in the case when the property is not set. (Or, choose some other sentinel value other than -1.)
If we all agree, I think this should be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we could do it there, but it would have to be an async exception. If we are fine with that then it would definitely be the simplest solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to drive this to some sort of decision. I think there are three options:
-
We can add the error check I propose above. When I wrote that, I didn't realize that this would result in an asynchronous error, though. That means you won't get the error unless you call
queue::wait_and_throw, which makes the error a lot less useful. -
We could add an error check to the "submit" code path, so that you get a synchronous exception. (This means you will get the error even without calling either
queue::waitorqueue::wait_and_throw.) This is a much better error diagnostic. However, we are really trying to optimize the submit code path. Therefore, I don't think we should do this if it adds any significant code to the hot path. -
If the error check is expensive, maybe we can do the error check only if the user compiles with some special
-Dflag.
I think option (1) would not be a great solution. @steffenlarsen what are your thoughts on options (2) and (3)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go for (2) and see how expensive it will be. I don't suspect the implementation will be incredibly difficult and we could implement it selectively at first while we evaluate the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'd like to create a separate spec PR with the wording about the error condition. Let's keep this PR focused on the example.
If everything else is good, can I get an approval on this one? Note that I just made a small clarification to the example in 1cf4ff9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the PR that updates the wording on error conditions: #20651
These `constexpr` declarations aren't necessary, and they make the example more confusing. The purpose of this extension is to allocate a runtime-defined amount of local memory, so the memory size shouldn't be declared `constexpr`.
The example in the specification was using an old API that we removed in #14785. Update the example and the spec wording to use
launch_configfrom sycl_ext_oneapi_enqueue_functions to pass the property with the size of the scratch memory.