-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[SYCL] Documentation for Clang driver and core language support. #170602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Clang's emerging support for the SYCL 2020 specification includes new driver command line options, predefined macros, and several C++11-style attributes that enable core language features intended to support implementation of SYCL run-time libraries. Included is documentation intended to provide an architectural overview with a target audience of SYCL implementers. This update discards previous documentation from an earlier SYCL implementation effort that is unlikely to be correct or relevant for the new SYCL implementation effort.
Fznamznon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should remove address space info from the doc. It still correctly describes the model used by SYCL compiler now.
| clang++ -c -fsycl source-file.cpp | ||
| The choice of which target devices will be supported is made at compile time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of this sentence. With single -fsycl flag the target is SPIR-V, an intermediate device-agnostic representation. The device for execution is chosen at runtime depending on the device selector used to create a queue or it is chosen automatically depending on the availability and SYCL runtime library heuristics. I.e. the front-end doesn't make any choices about which devices are supported/targeted, with SPIR-V anything supporting SPIR-V translation can be targeted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unsure whether it was worthwhile to distinguish between concrete and virtual device architectures and intermediate representations other than via some vague hand waving about AOT and JIT. SPIR-V targets aren't completely device agnostic since the LLVM IR is still constrained by ABI decisions (e.g., 32-bit vs 64-bit).
What I was trying to convey is that the set of target devices that are eligible for use at run-time is constrained by choices made at compile-time. I'll try to clarify this. I'll look at adding some commentary about concrete vs virtual architectures in the "Supported Targets" section too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was trying to convey is that the set of target devices that are eligible for use at run-time is constrained by choices made at compile-time.
I still don't quite get this.
| By default, SYCL source files will be compiled with support for a host target | ||
| dependent set of target devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you trying to say that the device targets depend on the host?
| target support will be enabled for ``spirv64-unknown-unknown`` devices. | ||
| The set of supported target devices can be specified via a comma separated list | ||
| of target triples with the `--offload-targets= <opt-offload-targets_>`_ option. | ||
| The following Clang invocation enables support for AMD, NVIDIA, and Intel GPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that will be supported any time soon. I'm not sure it makes sense to highlight it here.
| With that point established, the tasks required to actually execute a SYCL | ||
| kernel are delegated according to the following division of responsibilities. | ||
|
|
||
| Clang is responsible for: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang is also responsible for splitting "device code" and "host code" according to https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_compilation_of_functions and validating that "device code" conforms to https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:language.restrictions.kernels .
| selected device. | ||
|
|
||
| The SYCL run-time library tasks are expected to be performed in conjunction | ||
| with an offload backend such as liboffload, OpenCL, CUDA, Hip, or Level Zero; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| with an offload backend such as liboffload, OpenCL, CUDA, Hip, or Level Zero; | |
| with an offload backend such as liboffload, OpenCL, CUDA, HIP, or Level Zero; |
|
|
||
| The SYCL run-time library tasks are expected to be performed in conjunction | ||
| with an offload backend such as liboffload, OpenCL, CUDA, Hip, or Level Zero; | ||
| their details are out of scope for this document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably makes sense to mention explicitly that SYCL is simply an abstraction layer over some lower level offloading models. I.e. SYCL itself doesn't provide a compiler that emits GPU assembly. SYCL also doesn't provide a library that actually works with the target device directly. There is always some additional proxy layer aka offload backend.
| kernel arguments. | ||
| The call to ``kernel_entry_point<KernelName>(kernelFunc)`` in the implementation | ||
| of ``single_task()`` results in an implicit call to ``sycl_kernel_launch`` that | ||
| looks similar to the following (the access to the captured copy of ``sout`` via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is sout?
| Depending on the compiler mode, ``multi_ptr`` will either decorate its internal | ||
| data with the address space attribute or not. | ||
| Continuing with the earlier example, assume that the ``sycl::stream`` type holds | ||
| a reference to a buffer and that the SYCL run-time implementation uses an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK SYCL buffer is a host only class, how about switching to accessor here to avoid possible confusion
| a reference to a buffer and that the SYCL run-time implementation uses an | |
| a reference to an accessor and that the SYCL run-time implementation uses an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent was to refer to a buffer in a generic sense, not to the sycl::buffer class specifically. I can see how that is confusing though. I'm reluctant to pull in other SYCL types for fear that might increase confusion (e.g., does this situation only arise in conjunction with those other SYCL types?). I'll play with the wording to either add some distance from sycl::buffer or to change the example in some way.
| For example, when compiling for a ``x86_64-unknown-linux-gnu`` host target, | ||
| target support will be enabled for ``spirv64-unknown-unknown`` devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads like target support for spirv64-unknown-unknown will only be enabled when x86_64-unknown-linux-gnu target is used. Is that the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that wasn't the intention. I'll try to reword.
| ================= | ||
| Support for SYCL is still in the implementation phase, but all targets | ||
| supported by the `--offload-targets= <opt-offload-targets_>`_ option | ||
| are intended to eventually be supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to try and wordsmith this to not say 'supported' so many times, but I failed spectacularly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 1: Support for SYCL is still in the implementation phase, but all targets available through the --offload-targets= <opt-offload-targets_>_ option are intended to eventually work with SYCL.
Option 2: Support for SYCL is still in the implementation phase, but all targets accepted by the --offload-targets= <opt-offload-targets_>_ option are intended to eventually be available for SYCL.
|
|
||
| sycl_kernel_launch<KernelName>("kernel-entry-point", kernelFunc)(kernelFunc.sout) | ||
|
|
||
| The SYCL kernel name type, ``KernelName``, is passed as an explicit template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we might need to pass any code location information as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is other compile-time information that would be useful to make available to the sycl_kernel_launch function, now is definitely the time to discuss that. I have been a little concerned about the lack of extension points for passing such information.
As for code location, I presume you would be interested in the location from which the call to the SYCL kernel invocation function (e.g., single_task()) occurred? If so, there are limitations on what we can do there because some of the SYCL kernel invocation functions have a function parameter list that already ends with a parameter pack and can't accommodate passing std::source_location (or similar) as a default argument.
| .. code-block:: C++ | ||
|
|
||
| namespace sycl { | ||
| class handler { | ||
| template <typename KernelName, typename... KernelArgs> | ||
| auto sycl_kernel_launch(const char *entryPointName, KernelArgs &... args) { | ||
| return [&] (auto &... subobjects) { | ||
| // Process the kernel arguments and special subobjects, schedule | ||
| // execution of 'entryPointName' on the selected device, and | ||
| // return a type-list object sepcifying additional parameters to | ||
| // add to the offload kernel entry point function (see below). | ||
| return detail::type_list<...>{}; | ||
| }; | ||
| } | ||
| ... | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I'd restructure to include this code snippet early and then describe what it does. I was able to follow the description above only because I was already familiar with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that is a good idea. I was feeling like the gaps between the code fragments was getting large. I'll try to make changes to introduce the code earlier.
Do you think it might be better to combine all the code fragments into one code block rather than interspersing them with explanatory prose? That would make it easier to review the code, but I'm worried the prose might become too disconnected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start with the full snippet and, maybe, repeat parts of it when talking about something specific.
| // Process the kernel arguments and special subobjects, schedule | ||
| // execution of 'entryPointName' on the selected device, and | ||
| // return a type-list object sepcifying additional parameters to | ||
| // add to the offload kernel entry point function (see below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Process the kernel arguments and special subobjects, schedule | |
| // execution of 'entryPointName' on the selected device, and | |
| // return a type-list object sepcifying additional parameters to | |
| // add to the offload kernel entry point function (see below). | |
| // Process the kernel arguments and special subobjects, schedule | |
| // execution of 'entryPointName' on the selected device | |
| process(...); | |
| // return a type-list object sepcifying additional parameters to | |
| // add to the offload kernel entry point function (see below). |
Subjective, but I think that would slightly increase readability.
|
|
||
| When translating calls to functions declared with the | ||
| `sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attribute, Clang | ||
| inspects each kernel argument type to look for data members, catures, and base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| inspects each kernel argument type to look for data members, catures, and base | |
| inspects each kernel argument type to look for data members, captures, and base |
| For example, kernel arguments that contain a subobject of | ||
| ``sycl::local_accessor`` type usually cannot be bit-copied to device memory for | ||
| use by a SYCL kernel; additional support from the offload backend is generally | ||
| required to manage their associated memory. | ||
| A SYCL run-time library implementation might therefore declare this type | ||
| similar to the following. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be an error to have a special type inside a special type? And if not, would clang communicate both or the outer only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question and one I haven't given much consideration to. My immediate reaction is to either make it an error or to communicate both. I lean slightly towards communicating both.
I think communicating only the outer one is workable, but it would create a burden for the SYCL RT to handle and, since failure to handle it would be silent, seems somewhat bug prone. It would be especially difficult for the SYCL RT to handle if the inner type was somehow template dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see how reporting both results in a more extensible design, but I'm unsure why we'd ever want that and error might be a better choice. Also, since the outer is special, we have full control over it and know why inner specials are there without FE's help.
| In some cases, it is useful to restrict compilation to just the host target or | ||
| just the device targets; the `-fsycl-host-only <opt-fsycl-host-only_>`_ and | ||
| `-fsycl-device-only <opt-fsycl-device-only_>`_ options are available for these | ||
| purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ability to pass some clang options (or at least clang -cc1 ones) to just host/device compiler would be very useful too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that. These options already exist as -Xarch_<arch>, -Xarch_device, and -Xarch_host. I'll check if they work. If they do and @mdtoguchi has no concerns, I'll add mention of them.
A related question is whether the documentation should mention --offload-device-only and --offload-host-only instead of -fsycl-device-only and -fsycl-host-only. My understanding is that they behave the same but that the direction is to move towards unified options for all the offloading languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no issue with usages of -Xarch_*, -Xarch_device and -Xarch_host. These options should work as intended.
| This is accomplished by, in the definition of a SYCL kernel invocation function, | ||
| including a call to a function declared with the attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| This is accomplished by, in the definition of a SYCL kernel invocation function, | |
| including a call to a function declared with the attribute. | |
| This is accomplished by including, in the definition of a SYCL kernel invocation | |
| function, a call to a function declared with the attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is pretty confusing. I like Yuri's suggestion.
| The remaining arguments, for which there is just one in this case, | ||
| ``kernelFunc``, are the kernel arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
| The remaining arguments, for which there is just one in this case, | |
| ``kernelFunc``, are the kernel arguments. | |
| The remaining arguments are the kernel arguments; in this case, | |
| there is just one: ``kernelFunc``. |
| return [&] (auto &... subobjects) { | ||
| // Process the kernel arguments and special subobjects, schedule | ||
| // execution of 'entryPointName' on the selected device, and | ||
| // return a type-list object sepcifying additional parameters to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // return a type-list object sepcifying additional parameters to | |
| // return a type-list object specifying additional parameters to |
| clang++ -c -fsycl source-file.cpp | ||
| The choice of which target devices will be supported is made at compile time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was trying to convey is that the set of target devices that are eligible for use at run-time is constrained by choices made at compile-time.
I still don't quite get this.
| By default, SYCL source files will be compiled with support for a host target | ||
| dependent set of target devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you trying to say that the device targets depend on the host?
|
|
||
| The `SMCP`_ and `SSCP`_ compilation models require that code generation be | ||
| performed for each SYCL kernel for each target device. | ||
| In order for Clang to perform that code generation, it needs to be informed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this line.
| This is accomplished by, in the definition of a SYCL kernel invocation function, | ||
| including a call to a function declared with the attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is pretty confusing. I like Yuri's suggestion.
Clang's emerging support for the SYCL 2020 specification includes new driver command line options, predefined macros, and several C++11-style attributes that enable core language features intended to support implementation of SYCL run-time libraries. Included is documentation intended to provide an architectural overview with a target audience of SYCL implementers.
This update discards previous documentation from an earlier SYCL implementation effort that is unlikely to be correct or relevant for the new SYCL implementation effort.