Skip to content

Conversation

@tahonermann
Copy link
Contributor

Clang's emerging support for the SYCL 2020 specification includes new driver command line options, predefined macros, and several C++11-style attributes that enable core language features intended to support implementation of SYCL run-time libraries. Included is documentation intended to provide an architectural overview with a target audience of SYCL implementers.

This update discards previous documentation from an earlier SYCL implementation effort that is unlikely to be correct or relevant for the new SYCL implementation effort.

Clang's emerging support for the SYCL 2020 specification includes
new driver command line options, predefined macros, and several
C++11-style attributes that enable core language features intended
to support implementation of SYCL run-time libraries. Included is
documentation intended to provide an architectural overview with
a target audience of SYCL implementers.

This update discards previous documentation from an earlier SYCL
implementation effort that is unlikely to be correct or relevant
for the new SYCL implementation effort.
@tahonermann tahonermann self-assigned this Dec 4, 2025
@tahonermann tahonermann added the SYCL https://registry.khronos.org/SYCL label Dec 4, 2025
Copy link
Contributor

@Fznamznon Fznamznon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should remove address space info from the doc. It still correctly describes the model used by SYCL compiler now.

clang++ -c -fsycl source-file.cpp
The choice of which target devices will be supported is made at compile time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this sentence. With single -fsycl flag the target is SPIR-V, an intermediate device-agnostic representation. The device for execution is chosen at runtime depending on the device selector used to create a queue or it is chosen automatically depending on the availability and SYCL runtime library heuristics. I.e. the front-end doesn't make any choices about which devices are supported/targeted, with SPIR-V anything supporting SPIR-V translation can be targeted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure whether it was worthwhile to distinguish between concrete and virtual device architectures and intermediate representations other than via some vague hand waving about AOT and JIT. SPIR-V targets aren't completely device agnostic since the LLVM IR is still constrained by ABI decisions (e.g., 32-bit vs 64-bit).

What I was trying to convey is that the set of target devices that are eligible for use at run-time is constrained by choices made at compile-time. I'll try to clarify this. I'll look at adding some commentary about concrete vs virtual architectures in the "Supported Targets" section too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to convey is that the set of target devices that are eligible for use at run-time is constrained by choices made at compile-time.

I still don't quite get this.

Comment on lines +44 to +45
By default, SYCL source files will be compiled with support for a host target
dependent set of target devices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying to say that the device targets depend on the host?

target support will be enabled for ``spirv64-unknown-unknown`` devices.
The set of supported target devices can be specified via a comma separated list
of target triples with the `--offload-targets= <opt-offload-targets_>`_ option.
The following Clang invocation enables support for AMD, NVIDIA, and Intel GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that will be supported any time soon. I'm not sure it makes sense to highlight it here.

With that point established, the tasks required to actually execute a SYCL
kernel are delegated according to the following division of responsibilities.

Clang is responsible for:
Copy link
Contributor

@Fznamznon Fznamznon Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang is also responsible for splitting "device code" and "host code" according to https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_compilation_of_functions and validating that "device code" conforms to https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:language.restrictions.kernels .

selected device.

The SYCL run-time library tasks are expected to be performed in conjunction
with an offload backend such as liboffload, OpenCL, CUDA, Hip, or Level Zero;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with an offload backend such as liboffload, OpenCL, CUDA, Hip, or Level Zero;
with an offload backend such as liboffload, OpenCL, CUDA, HIP, or Level Zero;


The SYCL run-time library tasks are expected to be performed in conjunction
with an offload backend such as liboffload, OpenCL, CUDA, Hip, or Level Zero;
their details are out of scope for this document.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to mention explicitly that SYCL is simply an abstraction layer over some lower level offloading models. I.e. SYCL itself doesn't provide a compiler that emits GPU assembly. SYCL also doesn't provide a library that actually works with the target device directly. There is always some additional proxy layer aka offload backend.

kernel arguments.
The call to ``kernel_entry_point<KernelName>(kernelFunc)`` in the implementation
of ``single_task()`` results in an implicit call to ``sycl_kernel_launch`` that
looks similar to the following (the access to the captured copy of ``sout`` via
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is sout?

Depending on the compiler mode, ``multi_ptr`` will either decorate its internal
data with the address space attribute or not.
Continuing with the earlier example, assume that the ``sycl::stream`` type holds
a reference to a buffer and that the SYCL run-time implementation uses an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK SYCL buffer is a host only class, how about switching to accessor here to avoid possible confusion

Suggested change
a reference to a buffer and that the SYCL run-time implementation uses an
a reference to an accessor and that the SYCL run-time implementation uses an

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to refer to a buffer in a generic sense, not to the sycl::buffer class specifically. I can see how that is confusing though. I'm reluctant to pull in other SYCL types for fear that might increase confusion (e.g., does this situation only arise in conjunction with those other SYCL types?). I'll play with the wording to either add some distance from sycl::buffer or to change the example in some way.

Comment on lines +46 to +47
For example, when compiling for a ``x86_64-unknown-linux-gnu`` host target,
target support will be enabled for ``spirv64-unknown-unknown`` devices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads like target support for spirv64-unknown-unknown will only be enabled when x86_64-unknown-linux-gnu target is used. Is that the intention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that wasn't the intention. I'll try to reword.

=================
Support for SYCL is still in the implementation phase, but all targets
supported by the `--offload-targets= <opt-offload-targets_>`_ option
are intended to eventually be supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to try and wordsmith this to not say 'supported' so many times, but I failed spectacularly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 1: Support for SYCL is still in the implementation phase, but all targets available through the --offload-targets= <opt-offload-targets_>_ option are intended to eventually work with SYCL.

Option 2: Support for SYCL is still in the implementation phase, but all targets accepted by the --offload-targets= <opt-offload-targets_>_ option are intended to eventually be available for SYCL.


sycl_kernel_launch<KernelName>("kernel-entry-point", kernelFunc)(kernelFunc.sout)

The SYCL kernel name type, ``KernelName``, is passed as an explicit template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we might need to pass any code location information as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is other compile-time information that would be useful to make available to the sycl_kernel_launch function, now is definitely the time to discuss that. I have been a little concerned about the lack of extension points for passing such information.

As for code location, I presume you would be interested in the location from which the call to the SYCL kernel invocation function (e.g., single_task()) occurred? If so, there are limitations on what we can do there because some of the SYCL kernel invocation functions have a function parameter list that already ends with a parameter pack and can't accommodate passing std::source_location (or similar) as a default argument.

Comment on lines +342 to +358
.. code-block:: C++

namespace sycl {
class handler {
template <typename KernelName, typename... KernelArgs>
auto sycl_kernel_launch(const char *entryPointName, KernelArgs &... args) {
return [&] (auto &... subobjects) {
// Process the kernel arguments and special subobjects, schedule
// execution of 'entryPointName' on the selected device, and
// return a type-list object sepcifying additional parameters to
// add to the offload kernel entry point function (see below).
return detail::type_list<...>{};
};
}
...
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I'd restructure to include this code snippet early and then describe what it does. I was able to follow the description above only because I was already familiar with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that is a good idea. I was feeling like the gaps between the code fragments was getting large. I'll try to make changes to introduce the code earlier.

Do you think it might be better to combine all the code fragments into one code block rather than interspersing them with explanatory prose? That would make it easier to review the code, but I'm worried the prose might become too disconnected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start with the full snippet and, maybe, repeat parts of it when talking about something specific.

Comment on lines +349 to +352
// Process the kernel arguments and special subobjects, schedule
// execution of 'entryPointName' on the selected device, and
// return a type-list object sepcifying additional parameters to
// add to the offload kernel entry point function (see below).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Process the kernel arguments and special subobjects, schedule
// execution of 'entryPointName' on the selected device, and
// return a type-list object sepcifying additional parameters to
// add to the offload kernel entry point function (see below).
// Process the kernel arguments and special subobjects, schedule
// execution of 'entryPointName' on the selected device
process(...);
// return a type-list object sepcifying additional parameters to
// add to the offload kernel entry point function (see below).

Subjective, but I think that would slightly increase readability.


When translating calls to functions declared with the
`sycl_kernel_entry_point <attr-sycl_kernel_entry_point_>`_ attribute, Clang
inspects each kernel argument type to look for data members, catures, and base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inspects each kernel argument type to look for data members, catures, and base
inspects each kernel argument type to look for data members, captures, and base

Comment on lines +501 to +506
For example, kernel arguments that contain a subobject of
``sycl::local_accessor`` type usually cannot be bit-copied to device memory for
use by a SYCL kernel; additional support from the offload backend is generally
required to manage their associated memory.
A SYCL run-time library implementation might therefore declare this type
similar to the following.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be an error to have a special type inside a special type? And if not, would clang communicate both or the outer only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question and one I haven't given much consideration to. My immediate reaction is to either make it an error or to communicate both. I lean slightly towards communicating both.

I think communicating only the outer one is workable, but it would create a burden for the SYCL RT to handle and, since failure to handle it would be silent, seems somewhat bug prone. It would be especially difficult for the SYCL RT to handle if the inner type was somehow template dependent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how reporting both results in a more extensible design, but I'm unsure why we'd ever want that and error might be a better choice. Also, since the outer is special, we have full control over it and know why inner specials are there without FE's help.

Comment on lines +93 to +96
In some cases, it is useful to restrict compilation to just the host target or
just the device targets; the `-fsycl-host-only <opt-fsycl-host-only_>`_ and
`-fsycl-device-only <opt-fsycl-device-only_>`_ options are available for these
purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ability to pass some clang options (or at least clang -cc1 ones) to just host/device compiler would be very useful too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that. These options already exist as -Xarch_<arch>, -Xarch_device, and -Xarch_host. I'll check if they work. If they do and @mdtoguchi has no concerns, I'll add mention of them.

A related question is whether the documentation should mention --offload-device-only and --offload-host-only instead of -fsycl-device-only and -fsycl-host-only. My understanding is that they behave the same but that the direction is to move towards unified options for all the offloading languages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no issue with usages of -Xarch_*, -Xarch_device and -Xarch_host. These options should work as intended.

Comment on lines +199 to +200
This is accomplished by, in the definition of a SYCL kernel invocation function,
including a call to a function declared with the attribute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
This is accomplished by, in the definition of a SYCL kernel invocation function,
including a call to a function declared with the attribute.
This is accomplished by including, in the definition of a SYCL kernel invocation
function, a call to a function declared with the attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is pretty confusing. I like Yuri's suggestion.

Comment on lines +325 to +326
The remaining arguments, for which there is just one in this case,
``kernelFunc``, are the kernel arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
The remaining arguments, for which there is just one in this case,
``kernelFunc``, are the kernel arguments.
The remaining arguments are the kernel arguments; in this case,
there is just one: ``kernelFunc``.

return [&] (auto &... subobjects) {
// Process the kernel arguments and special subobjects, schedule
// execution of 'entryPointName' on the selected device, and
// return a type-list object sepcifying additional parameters to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// return a type-list object sepcifying additional parameters to
// return a type-list object specifying additional parameters to

clang++ -c -fsycl source-file.cpp
The choice of which target devices will be supported is made at compile time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to convey is that the set of target devices that are eligible for use at run-time is constrained by choices made at compile-time.

I still don't quite get this.

Comment on lines +44 to +45
By default, SYCL source files will be compiled with support for a host target
dependent set of target devices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying to say that the device targets depend on the host?


The `SMCP`_ and `SSCP`_ compilation models require that code generation be
performed for each SYCL kernel for each target device.
In order for Clang to perform that code generation, it needs to be informed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this line.

Comment on lines +199 to +200
This is accomplished by, in the definition of a SYCL kernel invocation function,
including a call to a function declared with the attribute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is pretty confusing. I like Yuri's suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

SYCL https://registry.khronos.org/SYCL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants