Copyright © 2021 Intel Corporation. All rights reserved.
Khronos® is a registered trademark and SYCL™ and SPIR™ are trademarks of The Khronos Group Inc. OpenCL™ is a trademark of Apple Inc. used by permission by Khronos.
To report problems with this extension, please open a new issue at:
This extension is written against the SYCL 2020 specification, Revision 4 and the following extensions:
This is an experimental extension specification, intended to provide early access to features and gather community feedback. Interfaces defined in this specification are implemented in DPC++, but they are not finalized and may change incompatibly in future versions of DPC++ without prior notice. Shipping software products should not rely on APIs defined in this specification.
Jessica Davies, Intel
Joe Garvey, Intel
Greg Lueck, Intel
John Pennycook, Intel
Roland Schulz, Intel
SYCL 2020 allows for attributes to be applied to the function-type of kernel function declarations. These attributes are often used to pass information to a backend device compiler.
There are several known disadvantages to relying on attributes for such information, including:
-
Host C++ compilers are free to ignore unrecognized attributes, implying that attributes should not be employed to convey information that cannot be ignored. Many of the attributes in SYCL 2020 convey information that cannot be ignored (e.g. a kernel may only execute correctly with a specific sub-group size, or on devices which have specific aspects).
-
Library-only implementations of SYCL cannot reason about attributes at all.
-
SMCP implementations of SYCL must ensure that the host and device compiler both understand and agree upon the meaning of each attribute.
-
It is complicated (although not impossible) to generate multiple variants of a kernel with different combinations of attributes.
-
There is no way to inspect the properties associated with a kernel at compile-time (e.g. via type traits or similar); any such inspection must be performed at run-time and after compiling the kernel.
This extension proposes a replacement for these kernel attributes, in the form of a property list accepting properties with compile-time constant values, to address several of these issues.
This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro SYCL_EXT_ONEAPI_KERNEL_PROPERTIES
to one of the values defined in the
table below. Applications can test for the existence of this macro to determine
if the implementation supports this feature, or applications can test the
macro’s value to determine which of the extension’s features the implementation
supports.
Value | Description |
---|---|
1 |
The APIs of this experimental extension are not versioned, so the feature-test macro always has this value. |
The kernel properties below correspond to kernel attributes defined in
Section 5.8.1 of the SYCL 2020 specification. Note that deprecated attributes
(such as vec_type_hint
) are not included.
namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {
// Corresponds to reqd_work_group_size
struct work_group_size_key {
template <size_t... Dims>
using value_t = property_value<work_group_size_key, std::integral_constant<size_t, Dims>...>;
}; // work_group_size_key
// Corresponds to work_group_size_hint
struct work_group_size_hint_key {
template <size_t... Dims>
using value_t = property_value<work_group_size_hint_key, std::integral_constant<size_t, Dims>...>;
}; // work_group_size_hint_key
// Corresponds to reqd_sub_group_size
struct sub_group_size_key {
template <uint32_t Size>
using value_t = property_value<sub_group_size_key, std::integral_constant<uint32_t, Size>>;
}; // sub_group_size_key
// Corresponds to device_has
struct device_has_key {
template <sycl::aspect... Aspects>
using value_t = property_value<device_has_key, std::integral_constant<sycl::aspect, Aspects>...>;
}; // device_has_key
template <size_t... Dims>
struct property_value<work_group_size_key, std::integral_constant<size_t, Dims>...> {
using key_t = work_group_size_key;
constexpr size_t operator[](int dim) const;
};
template <size_t... Dims>
struct property_value<work_group_size_hint_key, std::integral_constant<size_t, Dims>...> {
using key_t = work_group_size_hint_key;
constexpr size_t operator[](int dim) const;
};
template <sycl::aspect... Aspects>
struct property_value<device_has_key, std::integral_constant<sycl::aspect, Aspects>...> {
using key_t = device_has_key;
static constexpr std::array<sycl::aspect, sizeof...(Aspects)> value;
};
template <size_t... Dims>
inline constexpr work_group_size_key::value_t<Dims...> work_group_size;
template <size_t... Dims>
inline constexpr work_group_size_hint_key::value_t<Dims...> work_group_size_hint;
template <uint32_t Size>
inline constexpr sub_group_size_key::value_t<Size> sub_group_size;
template <sycl::aspect... Aspects>
inline constexpr device_has_key::value_t<Aspects...> device_has;
template <> struct is_property_key<work_group_size_key> : std::true_type {};
template <> struct is_property_key<work_group_size_hint_key> : std::true_type {};
template <> struct is_property_key<sub_group_size_key> : std::true_type {};
template <> struct is_property_key<device_has_key> : std::true_type {};
} // namespace experimental
} // namespace oneapi
} // namespace ext
} // namespace sycl
Property | Description |
---|---|
|
The |
|
The |
|
The |
|
The |
SYCL implementations may introduce additional kernel properties. If any combinations of kernel attributes are invalid, this must be clearly documented as part of the new kernel property definition.
To enable properties to be associated with kernels, this extension adds
new overloads to each of the variants of single_task
, parallel_for
and
parallel_for_work_group
defined in the queue
and handler
classes. These
new overloads accept a sycl::ext::oneapi::experimental::properties
argument. For
variants accepting a parameter pack, the sycl::ext::oneapi::experimental::properties
argument is inserted immediately prior to the parameter pack; for variants not
accepting a parameter pack, the sycl::ext::oneapi::experimental::properties
argument is
inserted immediately prior to the kernel function.
The overloads introduced by this extension are listed below:
namespace sycl {
class queue {
public:
template <typename KernelName, typename KernelType, typename PropertyList>
event single_task(PropertyList properties, const KernelType &kernelFunc);
template <typename KernelName, typename KernelType, typename PropertyList>
event single_task(event depEvent, PropertyList properties,
const KernelType &kernelFunc);
template <typename KernelName, typename KernelType, typename PropertyList>
event single_task(const std::vector<event> &depEvents,
PropertyList properties,
const KernelType &kernelFunc);
template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
event parallel_for(range<Dims> numWorkItems,
Rest&&... rest);
template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
event parallel_for(range<Dims> numWorkItems, event depEvent,
PropertyList properties,
Rest&&... rest);
template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
event parallel_for(range<Dims> numWorkItems,
const std::vector<event> &depEvents,
PropertyList properties,
Rest&&... rest);
template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
event parallel_for(nd_range<Dims> executionRange,
PropertyList properties,
Rest&&... rest);
template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
event parallel_for(nd_range<Dims> executionRange,
event depEvent,
PropertyList properties,
Rest&&... rest);
template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
event parallel_for(nd_range<Dims> executionRange,
const std::vector<event> &depEvents,
PropertyList properties,
Rest&&... rest);
}
}
namespace sycl {
class handler {
public:
template <typename KernelName, typename KernelType, typename PropertyList>
void single_task(PropertyList properties, const KernelType &kernelFunc);
template <typename KernelName, int dimensions, typename PropertyList, typename... Rest>
void parallel_for(range<dimensions> numWorkItems,
PropertyList properties,
Rest&&... rest);
template <typename KernelName, int dimensions, typename PropertyList, typename... Rest>
void parallel_for(nd_range<dimensions> executionRange,
PropertyList properties,
Rest&&... rest);
template <typename KernelName, typename WorkgroupFunctionType, int dimensions, typename PropertyList>
void parallel_for_work_group(range<dimensions> numWorkGroups,
PropertyList properties,
const WorkgroupFunctionType &kernelFunc);
template <typename KernelName, typename WorkgroupFunctionType, int dimensions, typename PropertyList>
void parallel_for_work_group(range<dimensions> numWorkGroups,
range<dimensions> workGroupSize,
PropertyList properties,
const WorkgroupFunctionType &kernelFunc);
}
}
Passing a property list as an argument in this way allows properties to be associated with a kernel function without modifying its type. This enables the same kernel function (e.g. a lambda) to be submitted multiple times with different properties, or for libraries building on SYCL to add properties (e.g. for performance reasons) to user-provided kernel functions.
All the properties defined in this extension have compile-time values. However,
an implementation may support additional properties which could have run-time
values. When this occurs, the properties
parameter may be a property list
containing a mix of both run-time and compile-time values, and a SYCL
implementation should respect both run-time and compile-time information when
determining the correct way to launch a kernel. However, only compile-time
information can modify the compilation of the kernel function itself.
A simple example of using this extension to set a required work-group size and required sub-group size is given below:
sycl::ext::oneapi::experimental::properties properties{sycl::ext::oneapi::experimental::work_group_size<8, 8>,
sycl::ext::oneapi::experimental::sub_group_size<8>};
q.parallel_for(range<2>{16, 16}, properties, [=](id<2> i) {
a[i] = b[i] + c[i];
}).wait();
Note
|
It is currently not possible to use the same kernel function in two commands with different properties. For example, the following will result in an error at compile-time: |
auto kernelFunc = [=](){};
q.single_task(kernelFunc);
q.single_task(
sycl::ext::oneapi::experimental::properties{
sycl::ext::oneapi::experimental::sub_group_size<8>},
kernelFunc);
In other situations it may be useful to embed a kernel’s properties directly into its type, to ensure that a kernel cannot be launched without a property that it depends upon for correctness.
To enable this use-case, this extension adds a mechanism for implementations to
extract a property list from a kernel functor, if a kernel functor declares
a member function named get
accepting a sycl::ext::oneapi::experimental::properties_tag
tag type and returning an instance of sycl::ext::oneapi::experimental::properties
.
namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {
struct properties_tag {};
}
}
}
}
Note
|
P1895 proposes a function called tag_invoke
as a general mechanism for customization points that could be used as a
replacement for the get function proposed here. If tag_invoke becomes
a feature in a future version of C++, a future version of this extension
may expose a new interface compatible with tag_invoke .
|
Note
|
The attribute mechanism in SYCL 2020 allows for different kernel attributes to be applied to different call operators within the same functor. An embedded property list applies to all call operators in the functor. |
The example below shows how the kernel from the previous section could be rewritten to leverage an embedded property list:
struct KernelFunctor {
KernelFunctor(sycl::accessor<int, 2> a,
sycl::accessor<int, 2> b,
sycl::accessor<int, 2> c) : a(a), b(b), c(c)
{}
void operator()(id<2> i) const {
a[i] = b[i] + c[i];
}
auto get(sycl::ext::oneapi::experimental::properties_tag) {
return sycl::ext::oneapi::experimental::properties{sycl::ext::oneapi::experimental::work_group_size<8, 8>,
sycl::ext::oneapi::experimental::sub_group_size<8>};
}
sycl::accessor<int, 2> a;
sycl::accessor<int, 2> b;
sycl::accessor<int, 2> c;
};
...
q.parallel_for(range<2>{16, 16}, KernelFunctor(a, b, c)).wait();
If a kernel functor with embedded properties is enqueued for execution using an
invocation function with a property list argument, the kernel is launched as-if
the embedded properties and argument were combined. If the combined list
contains any invalid combinations of properties, then this is an error: invalid
combinations that can be detected at compile-time should be reported via a
diagnostic; invalid combinations that can only be detected at run-time should
result in an implementation throwing an exception
with the errc::invalid
error code.
Any properties embedded into a kernel type via a property list are reflected
in the results of a call to kernel::get_info
with the
info::kernel::attributes
information descriptor, as if the corresponding
attribute from the SYCL 2020 specification had been applied to the kernel
function.
The SYCL 2020 sycl::device_has
attribute can be applied to the declaration
of a non-kernel device function, to assert that the device function uses a
specific set of optional features. This extension provides a mechanism exposing
similar behavior, allowing for kernel properties to be associated with
a function via the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
macro. Each instance of
the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
macro accepts one argument,
corresponding to a single property value.
Note
|
Due to limitations of the C preprocessor, property value expressions containing commas (e.g. due to template arguments) must be enclosed in parentheses to avoid being interpreted as multiple arguments. |
The example below shows a function that uses two optional features,
corresponding to the fp16
and atomic64
aspects.
SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((sycl::device_has<aspect::fp16, aspect::atomic64>))
void foo();
The table below describes the effects of associating each kernel property
with a non-kernel device function via the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
macro.
Property | Description |
---|---|
|
The |
The SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
macro can be used alongside the
SYCL_EXTERNAL
macro, and the macros may be specified in any order.
Whenever SYCL_EXTERNAL
is used, there are two relevant translation units: the
translation unit that defines the function and the translation unit that
calls the function. If a given SYCL_EXTERNAL
function F is defined in
one translation unit with a set of properties P, then all other translation
units that declare that same function F must list the same set of properties
P via the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY
macro. Programs which fail to do this
are ill-formed, but no diagnostic is required.
Note
|
Due to a restriction on attribute ordering in Clang it is only currently
possible to use SYCL_EXT_ONEAPI_FUNCTION_PROPERTY before SYCL_EXTERNAL in
DPC++.
|
-
How should we handle kernels supporting more than one set of device aspects?
UNRESOLVED: A compiler can evaluate complex Boolean expressions in an attribute, but this is non-trivial to emulate using only the C++ type system. A simple alternative may be to introduce an additional level of indirection via new properties, for example
device_has_all_of
anddevice_has_any_of
:device_has_all_of<device_has<aspect::atomic64>, device_has_any_of<device_has<aspect::fp16, device_has<aspect::fp64>>
. -
How should an embedded property list behave with inheritance?
RESOLVED: The specification currently allows for a class to inspect the property list embedded into its base class(es) and construct a new property list that applies to all call operators. Associating different properties with different call operators via inheritance has the potential to be confusing and would increase implementation complexity.