Skip to content

Latest commit

 

History

History
527 lines (412 loc) · 19.9 KB

sycl_ext_oneapi_kernel_properties.asciidoc

File metadata and controls

527 lines (412 loc) · 19.9 KB

sycl_ext_oneapi_kernel_properties

Notice

Copyright © 2021 Intel Corporation. All rights reserved.

Khronos® is a registered trademark and SYCL™ and SPIR™ are trademarks of The Khronos Group Inc. OpenCL™ is a trademark of Apple Inc. used by permission by Khronos.

Contact

To report problems with this extension, please open a new issue at:

Dependencies

This extension is written against the SYCL 2020 specification, Revision 4 and the following extensions:

Status

This is an experimental extension specification, intended to provide early access to features and gather community feedback. Interfaces defined in this specification are implemented in DPC++, but they are not finalized and may change incompatibly in future versions of DPC++ without prior notice. Shipping software products should not rely on APIs defined in this specification.

Contributors

Jessica Davies, Intel
Joe Garvey, Intel
Greg Lueck, Intel
John Pennycook, Intel
Roland Schulz, Intel

Overview

SYCL 2020 allows for attributes to be applied to the function-type of kernel function declarations. These attributes are often used to pass information to a backend device compiler.

There are several known disadvantages to relying on attributes for such information, including:

  • Host C++ compilers are free to ignore unrecognized attributes, implying that attributes should not be employed to convey information that cannot be ignored. Many of the attributes in SYCL 2020 convey information that cannot be ignored (e.g. a kernel may only execute correctly with a specific sub-group size, or on devices which have specific aspects).

  • Library-only implementations of SYCL cannot reason about attributes at all.

  • SMCP implementations of SYCL must ensure that the host and device compiler both understand and agree upon the meaning of each attribute.

  • It is complicated (although not impossible) to generate multiple variants of a kernel with different combinations of attributes.

  • There is no way to inspect the properties associated with a kernel at compile-time (e.g. via type traits or similar); any such inspection must be performed at run-time and after compiling the kernel.

This extension proposes a replacement for these kernel attributes, in the form of a property list accepting properties with compile-time constant values, to address several of these issues.

Specification

Feature test macro

This extension provides a feature-test macro as described in the core SYCL specification. An implementation supporting this extension must predefine the macro SYCL_EXT_ONEAPI_KERNEL_PROPERTIES to one of the values defined in the table below. Applications can test for the existence of this macro to determine if the implementation supports this feature, or applications can test the macro’s value to determine which of the extension’s features the implementation supports.

Value Description

1

The APIs of this experimental extension are not versioned, so the feature-test macro always has this value.

Kernel Properties

The kernel properties below correspond to kernel attributes defined in Section 5.8.1 of the SYCL 2020 specification. Note that deprecated attributes (such as vec_type_hint) are not included.

namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {

// Corresponds to reqd_work_group_size
struct work_group_size_key {
  template <size_t... Dims>
  using value_t = property_value<work_group_size_key, std::integral_constant<size_t, Dims>...>;
}; // work_group_size_key

// Corresponds to work_group_size_hint
struct work_group_size_hint_key {
  template <size_t... Dims>
  using value_t = property_value<work_group_size_hint_key, std::integral_constant<size_t, Dims>...>;
}; // work_group_size_hint_key

// Corresponds to reqd_sub_group_size
struct sub_group_size_key {
  template <uint32_t Size>
  using value_t = property_value<sub_group_size_key, std::integral_constant<uint32_t, Size>>;
}; // sub_group_size_key

// Corresponds to device_has
struct device_has_key {
  template <sycl::aspect... Aspects>
  using value_t = property_value<device_has_key, std::integral_constant<sycl::aspect, Aspects>...>;
}; // device_has_key

template <size_t... Dims>
struct property_value<work_group_size_key, std::integral_constant<size_t, Dims>...> {
  using key_t = work_group_size_key;
  constexpr size_t operator[](int dim) const;
};

template <size_t... Dims>
struct property_value<work_group_size_hint_key, std::integral_constant<size_t, Dims>...> {
  using key_t = work_group_size_hint_key;
  constexpr size_t operator[](int dim) const;
};

template <sycl::aspect... Aspects>
struct property_value<device_has_key, std::integral_constant<sycl::aspect, Aspects>...> {
  using key_t = device_has_key;
  static constexpr std::array<sycl::aspect, sizeof...(Aspects)> value;
};

template <size_t... Dims>
inline constexpr work_group_size_key::value_t<Dims...> work_group_size;

template <size_t... Dims>
inline constexpr work_group_size_hint_key::value_t<Dims...> work_group_size_hint;

template <uint32_t Size>
inline constexpr sub_group_size_key::value_t<Size> sub_group_size;

template <sycl::aspect... Aspects>
inline constexpr device_has_key::value_t<Aspects...> device_has;

template <> struct is_property_key<work_group_size_key> : std::true_type {};
template <> struct is_property_key<work_group_size_hint_key> : std::true_type {};
template <> struct is_property_key<sub_group_size_key> : std::true_type {};
template <> struct is_property_key<device_has_key> : std::true_type {};

} // namespace experimental
} // namespace oneapi
} // namespace ext
} // namespace sycl
Property Description

work_group_size

The work_group_size property adds the requirement that the kernel must be launched with the specified work-group size. The number of template arguments in the Dims parameter pack must match the dimensionality of the work-group used to invoke the kernel. The order of the template arguments matches the constructor of the range class. An implementation may throw an exception for certain combinations of property values, devices and launch configurations, as described for the reqd_work_group_size attribute in Table 180 of the SYCL 2020 specification.

work_group_size_hint

The work_group_size_hint property hints to the compiler that the kernel is likely to be launched with the specified work-group size. The number of template arguments in the Dims parameter pack must match the dimensionality of the work-group used to invoke the kernel. The order of the template arguments matches the constructor of the range class.

sub_group_size

The sub_group_size property adds the requirement that the kernel must be compiled and executed with the specified sub-group size. An implementation may throw an exception for certain combinations of property values, devices and launch configurations, as described for the reqd_sub_group_size attribute in Table 180 of the SYCL 2020 specification.

device_has

The device_has property adds the requirement that the kernel must be launched on a device that has all of the aspects listed in the Aspects parameter pack. An implementation may throw an exception or issue a diagnostic for certain combinations of aspects, devices and kernel functions, as described for the device_has attribute in Table 180 of the SYCL 2020 specification.

SYCL implementations may introduce additional kernel properties. If any combinations of kernel attributes are invalid, this must be clearly documented as part of the new kernel property definition.

Adding a Property List to a Kernel Launch

To enable properties to be associated with kernels, this extension adds new overloads to each of the variants of single_task, parallel_for and parallel_for_work_group defined in the queue and handler classes. These new overloads accept a sycl::ext::oneapi::experimental::properties argument. For variants accepting a parameter pack, the sycl::ext::oneapi::experimental::properties argument is inserted immediately prior to the parameter pack; for variants not accepting a parameter pack, the sycl::ext::oneapi::experimental::properties argument is inserted immediately prior to the kernel function.

The overloads introduced by this extension are listed below:

namespace sycl {
class queue {
 public:
  template <typename KernelName, typename KernelType, typename PropertyList>
  event single_task(PropertyList properties, const KernelType &kernelFunc);

  template <typename KernelName, typename KernelType, typename PropertyList>
  event single_task(event depEvent, PropertyList properties,
                    const KernelType &kernelFunc);

  template <typename KernelName, typename KernelType, typename PropertyList>
  event single_task(const std::vector<event> &depEvents,
                    PropertyList properties,
                    const KernelType &kernelFunc);

  template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
  event parallel_for(range<Dims> numWorkItems,
                     Rest&&... rest);

  template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
  event parallel_for(range<Dims> numWorkItems, event depEvent,
                     PropertyList properties,
                     Rest&&... rest);

  template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
  event parallel_for(range<Dims> numWorkItems,
                     const std::vector<event> &depEvents,
                     PropertyList properties,
                     Rest&&... rest);

  template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
  event parallel_for(nd_range<Dims> executionRange,
                     PropertyList properties,
                     Rest&&... rest);

  template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
  event parallel_for(nd_range<Dims> executionRange,
                     event depEvent,
                     PropertyList properties,
                     Rest&&... rest);

  template <typename KernelName, int Dims, typename PropertyList, typename... Rest>
  event parallel_for(nd_range<Dims> executionRange,
                     const std::vector<event> &depEvents,
                     PropertyList properties,
                     Rest&&... rest);
}
}

namespace sycl {
class handler {
 public:
  template <typename KernelName, typename KernelType, typename PropertyList>
  void single_task(PropertyList properties, const KernelType &kernelFunc);

  template <typename KernelName, int dimensions, typename PropertyList, typename... Rest>
  void parallel_for(range<dimensions> numWorkItems,
                    PropertyList properties,
                    Rest&&... rest);

  template <typename KernelName, int dimensions, typename PropertyList, typename... Rest>
  void parallel_for(nd_range<dimensions> executionRange,
                    PropertyList properties,
                    Rest&&... rest);

  template <typename KernelName, typename WorkgroupFunctionType, int dimensions, typename PropertyList>
  void parallel_for_work_group(range<dimensions> numWorkGroups,
                               PropertyList properties,
                               const WorkgroupFunctionType &kernelFunc);

  template <typename KernelName, typename WorkgroupFunctionType, int dimensions, typename PropertyList>
  void parallel_for_work_group(range<dimensions> numWorkGroups,
                               range<dimensions> workGroupSize,
                               PropertyList properties,
                               const WorkgroupFunctionType &kernelFunc);
}
}

Passing a property list as an argument in this way allows properties to be associated with a kernel function without modifying its type. This enables the same kernel function (e.g. a lambda) to be submitted multiple times with different properties, or for libraries building on SYCL to add properties (e.g. for performance reasons) to user-provided kernel functions.

All the properties defined in this extension have compile-time values. However, an implementation may support additional properties which could have run-time values. When this occurs, the properties parameter may be a property list containing a mix of both run-time and compile-time values, and a SYCL implementation should respect both run-time and compile-time information when determining the correct way to launch a kernel. However, only compile-time information can modify the compilation of the kernel function itself.

A simple example of using this extension to set a required work-group size and required sub-group size is given below:

sycl::ext::oneapi::experimental::properties properties{sycl::ext::oneapi::experimental::work_group_size<8, 8>,
                                                       sycl::ext::oneapi::experimental::sub_group_size<8>};
q.parallel_for(range<2>{16, 16}, properties, [=](id<2> i) {
  a[i] = b[i] + c[i];
}).wait();
Note
It is currently not possible to use the same kernel function in two commands with different properties. For example, the following will result in an error at compile-time:
  auto kernelFunc = [=](){};
  q.single_task(kernelFunc);
  q.single_task(
      sycl::ext::oneapi::experimental::properties{
          sycl::ext::oneapi::experimental::sub_group_size<8>},
      kernelFunc);

Embedding Properties into a Kernel

In other situations it may be useful to embed a kernel’s properties directly into its type, to ensure that a kernel cannot be launched without a property that it depends upon for correctness.

To enable this use-case, this extension adds a mechanism for implementations to extract a property list from a kernel functor, if a kernel functor declares a member function named get accepting a sycl::ext::oneapi::experimental::properties_tag tag type and returning an instance of sycl::ext::oneapi::experimental::properties.

namespace sycl {
namespace ext {
namespace oneapi {
namespace experimental {

struct properties_tag {};

}
}
}
}
Note
P1895 proposes a function called tag_invoke as a general mechanism for customization points that could be used as a replacement for the get function proposed here. If tag_invoke becomes a feature in a future version of C++, a future version of this extension may expose a new interface compatible with tag_invoke.
Note
The attribute mechanism in SYCL 2020 allows for different kernel attributes to be applied to different call operators within the same functor. An embedded property list applies to all call operators in the functor.

The example below shows how the kernel from the previous section could be rewritten to leverage an embedded property list:

struct KernelFunctor {

  KernelFunctor(sycl::accessor<int, 2> a,
                sycl::accessor<int, 2> b,
                sycl::accessor<int, 2> c) : a(a), b(b), c(c)
  {}

  void operator()(id<2> i) const {
    a[i] = b[i] + c[i];
  }

  auto get(sycl::ext::oneapi::experimental::properties_tag) {
    return sycl::ext::oneapi::experimental::properties{sycl::ext::oneapi::experimental::work_group_size<8, 8>,
                                                       sycl::ext::oneapi::experimental::sub_group_size<8>};
  }

  sycl::accessor<int, 2> a;
  sycl::accessor<int, 2> b;
  sycl::accessor<int, 2> c;

};

...

q.parallel_for(range<2>{16, 16}, KernelFunctor(a, b, c)).wait();

If a kernel functor with embedded properties is enqueued for execution using an invocation function with a property list argument, the kernel is launched as-if the embedded properties and argument were combined. If the combined list contains any invalid combinations of properties, then this is an error: invalid combinations that can be detected at compile-time should be reported via a diagnostic; invalid combinations that can only be detected at run-time should result in an implementation throwing an exception with the errc::invalid error code.

Querying Properties in a Compiled Kernel

Any properties embedded into a kernel type via a property list are reflected in the results of a call to kernel::get_info with the info::kernel::attributes information descriptor, as if the corresponding attribute from the SYCL 2020 specification had been applied to the kernel function.

Device Functions

The SYCL 2020 sycl::device_has attribute can be applied to the declaration of a non-kernel device function, to assert that the device function uses a specific set of optional features. This extension provides a mechanism exposing similar behavior, allowing for kernel properties to be associated with a function via the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro. Each instance of the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro accepts one argument, corresponding to a single property value.

Note
Due to limitations of the C preprocessor, property value expressions containing commas (e.g. due to template arguments) must be enclosed in parentheses to avoid being interpreted as multiple arguments.

The example below shows a function that uses two optional features, corresponding to the fp16 and atomic64 aspects.

SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((sycl::device_has<aspect::fp16, aspect::atomic64>))
void foo();

The table below describes the effects of associating each kernel property with a non-kernel device function via the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro.

Property Description

device_has

The device_has property asserts that the device function uses optional features corresponding to the aspects listed in the Aspects parameter pack. The effects of this property are identical to those described for the device_has attribute in Table 181 of the SYCL 2020 specification.

The SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro can be used alongside the SYCL_EXTERNAL macro, and the macros may be specified in any order. Whenever SYCL_EXTERNAL is used, there are two relevant translation units: the translation unit that defines the function and the translation unit that calls the function. If a given SYCL_EXTERNAL function F is defined in one translation unit with a set of properties P, then all other translation units that declare that same function F must list the same set of properties P via the SYCL_EXT_ONEAPI_FUNCTION_PROPERTY macro. Programs which fail to do this are ill-formed, but no diagnostic is required.

Note
Due to a restriction on attribute ordering in Clang it is only currently possible to use SYCL_EXT_ONEAPI_FUNCTION_PROPERTY before SYCL_EXTERNAL in DPC++.

Issues

  1. How should we handle kernels supporting more than one set of device aspects?

    UNRESOLVED: A compiler can evaluate complex Boolean expressions in an attribute, but this is non-trivial to emulate using only the C++ type system. A simple alternative may be to introduce an additional level of indirection via new properties, for example device_has_all_of and device_has_any_of: device_has_all_of<device_has<aspect::atomic64>, device_has_any_of<device_has<aspect::fp16, device_has<aspect::fp64>>.

  2. How should an embedded property list behave with inheritance?

    RESOLVED: The specification currently allows for a class to inspect the property list embedded into its base class(es) and construct a new property list that applies to all call operators. Associating different properties with different call operators via inheritance has the potential to be confusing and would increase implementation complexity.