Copyright © 2019 Intel Corporation. All rights reserved.
Khronos® is a registered trademark and SYCL™ and SPIR™ are trademarks of The Khronos Group Inc. OpenCL™ is a trademark of Apple Inc. used by permission by Khronos.
To report problems with this extension, please open a new issue at:
This extension is written against the SYCL 2020 revision 4 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision.
This extension also depends on the following other SYCL extensions:
This is a proposed extension specification, intended to gather community feedback. Interfaces defined in this specification may not be implemented yet or may be in a preliminary state. The specification itself may also change in incompatible ways before it is finalized. Shipping software products should not rely on APIs defined in this specification.
SYCL provides a mechanism to set a required sub-group size for a kernel via an attribute, and the sycl_ext_oneapi_kernel_properties extension provides an equivalent property.
Either mechanism is sufficient when tuning individual kernels for specific devices, but their usage quickly becomes complicated in real-life scenarios because:
-
An integral sub-group size must be provided at host compile-time.
-
The sub-group sizes supported by a device are not known until run-time.
-
It is common for the same sub-group size to be used for all kernels (e.g. because the sub-group size is reflected in data structures).
Applications wishing to write portable sub-group code that can target multiple architectures must therefore multi-version their C++ code (e.g. via templates), dispatch to the correct kernel(s) based on the result of a run-time query, and repeat this process for every kernel individually.
This extension aims to simplify the process of using sub-groups by introducing the notion of named sub-group sizes, allowing developers to request a sub-group size that meets certain requirements at host compile-time and deferring the selection of a specific sub-group size until the kernel is compiled for a specific device.
This extension also defines the default behavior of sub-groups in SYCL code to improve the out-of-the-box experience for new developers, without preventing experts and existing developers from requesting the existing compiler behavior.
This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro SYCL_EXT_ONEAPI_NAMED_SUB_GROUP_SIZES
to one of the values defined in the
table below. Applications can test for the existence of this macro to
determine if the implementation supports this feature, or applications can test
the macro’s value to determine which of the extension’s features the
implementation supports.
Value | Description |
---|---|
1 |
The APIs of this experimental extension are not versioned, so the feature-test macro always has this value. |
Much of the behavior related to sub-groups in SYCL 2020 is implementation-defined. Different kernels may use different sub-group sizes, and even the same kernel may use different kernels on some devices (e.g. for different ND-range launch configurations).
The extension introduces simpler behavior for sub-groups:
-
If no sub-group size property appears on a kernel or
SYCL_EXTERNAL
function, the default behavior of an implementation must be to compile and execute the kernel or function using a device’s primary sub-group size. The primary sub-group size must be compatible with all core language features. -
If a developer does not require a stable sub-group size across all kernels and kernel launches, they can explicitly request an automatic sub-group size chosen by the implementation.
-
Implementations are free to provide mechanisms which override the default sub-group behavior (e.g. via compiler flags), but developers must use this mechanism explicitly in order to opt-in to any change in behavior.
A new info::device::primary_sub_group_size
device query is introduced to
query a device’s primary sub-group size.
Device Descriptor | Return Type | Description |
---|---|---|
|
|
Return a sub-group size supported by this device that is guaranteed to support all core language features for the device. |
namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {
struct named_sub_group_size {
static constexpr uint32_t primary = /* unspecified */,
static constexpr uint32_t automatic = /* unspecified */,
};
inline constexpr sub_group_size_key::value_t<named_sub_group_size::primary> sub_group_size_primary;
inline constexpr sub_group_size_key::value_t<named_sub_group_size::automatic> sub_group_size_automatic;
} // namespace experimental
} // namespace oneapi
} // namespace ext
} // namespace sycl
Note
|
The named sub-group size properties are deliberately designed to reuse as
much of the existing sub_group_size property infrastructure as possible.
Implementations are free to choose the integral value associated with each
named sub-group type, but it is expected that many implementations will use
values like 0 (which is otherwise not a meaningful sub-group size) or -1
(which would otherwise correspond to a sub-group size so large it is unlikely
any device would support it).
|
Property | Description |
---|---|
|
The |
|
The |
At most one of the sub_group_size
, sub_group_size_primary
and
sub_group_size_automatic
properties may be associated with a kernel or
device function.
Note
|
No special handling is required to detect this case, since
sub_group_size_primary and sub_group_size_automatic are simply named
shorthands for properties associated with sub_group_size_key .
|
The sub_group_size
, sub_group_size_primary
and sub_group_size_automatic
properties can be associated with a kernel launch using one of the overloaded
kernel invocation commands or associated with a kernel definition using the
get(properties_tag)
mechanism. The properties can be associated with a device
function using the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
macro.
There are special requirements whenever a device function defined in one
translation unit makes a call to a device function that is defined in a second
translation unit. In such a case, the second device function is always declared
using SYCL_EXTERNAL
. If the kernel calling these device functions is defined
using a sub-group size property, the functions declared using SYCL_EXTERNAL
must be similarly decorated to ensure that the same sub-group size is used.
This decoration must exist in both the translation unit making the call and
also in the translation unit that defines the function. If the sub-group size
property is missing in the translation unit that makes the call, or if the
sub-group size of the called function does not match the sub-group size of the
calling function, the program is ill-formed and the compiler must raise a
diagnostic.
Note that a compiler may choose a different sub-group size for each kernel and
SYCL_EXTERNAL
function using an automatic sub-group size. If kernels with an
automatic sub-group size call SYCL_EXTERNAL
functions using an automatic
sub-group size, the program may be ill-formed. The behavior when
SYCL_EXTERNAL
is used in conjunction with an automatic sub-group size is
implementation-defined, and code relying on specific behavior should not be
expected to be portable across implementations. If a kernel calls a
SYCL_EXTERNAL
function with an incompatible sub-group size, the compiler must
raise a diagnostic -- it is expected that this diagnostic will be raised during
link-time, since this is the first time the compiler will see both translation
units together.
This non-normative section describes command line flags that the DPC++ compiler supports. Other compilers are free to provide their own command line flags (if any).
The -fsycl-default-sub-group-size
flag controls the default sub-group size
used within a translation unit, which applies to all kernels and
SYCL_EXTERNAL
functions without an explicitly specified sub-group size.
If the argument passed to -fsycl-default-sub-group-size
is an integer S
,
all kernels and SYCL_EXTERNAL
functions without an explicitly specified
sub-group size are compiled as-if sub_group_size<S>
was specified as a
property of that kernel or function.
If the argument passed to -fsycl-default-sub-group-size
is a string NAME
,
all kernels and SYCL_EXTERNAL
functions without an explicitly specified
sub-group size are compiled as-if sub_group_size_NAME
was
specified as a property of that kernel or function.
This non-normative section provides information about one possible implementation of this extension. It is not part of the specification of the extension’s API.
The existing mechanism of describing a required sub-group size in SPIR-V may need to be augmented to support named sub-group sizes. The existing sub-group size descriptors could be used with reserved values (similar to the template arguments in the properties), or new descriptors could be created for each case.
Device compilers will need to be taught to interpret these named sub-group sizes as equivalent to a device-specific integral sub-group size at compile-time.
-
What should the sub-group size compatible with all features be called?
RESOLVED: The name adopted is "primary", to convey that it is an integral part of sub-group support provided by the device. Other names considered are listed here for posterity: "default", "stable", "fixed", "core". These terms are easy to misunderstand (i.e. the "default" size may not be chosen by default, the "stable" size is unrelated to the software release cycle, the "fixed" sub-group size may change between devices or compiler releases, the "core" size is unrelated to hardware cores).
-
How does sub-group size interact with
SYCL_EXTERNAL
functions? The current behavior requires exact matching. Should this be relaxed to allow alternative implementations (e.g. link-time optimization, multi-versioning)?RESOLVED: Exact matching is required to ensure that developers can reason about the portability of their code across different implementations. Setting the default sub-group size to "primary" and providing an override flag to select "automatic" everywhere means that only advanced developers who are tuning sub-group size on a per-kernel basis will have to worry about potential matching issues.