From 001b54d5973c3052d8bc992921d01667f904088d Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Thu, 8 Feb 2024 11:26:28 +0000 Subject: [PATCH 01/18] [SYCL][Doc] Add specialization constant-length alloca extension proposal Document extension proposal for specialization constant length private memory allocations. Users will be able to perform dynamic memory allocations using specialization constants and a new `private_alloca` function returning a `private_ptr` that will be automatically freed on function return. This is included as an experimental extension as implementation will shortly follow once the extension is approved. Signed-off-by: Victor Perez --- ...neapi_spec_constant_length_alloca.asciidoc | 195 ++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc new file mode 100644 index 0000000000000..242eb3b694100 --- /dev/null +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -0,0 +1,195 @@ += sycl_ext_oneapi_spec_constant_length_alloca + +:source-highlighter: coderay +:coderay-linenums-mode: table + +// This section needs to be after the document title. +:doctype: book +:toc2: +:toc: left +:encoding: utf-8 +:lang: en +:dpcpp: pass:[DPC++] + +// Set the default source code type in this document to C++, +// for syntax highlighting purposes. This is needed because +// docbook uses c++ and html5 uses cpp. +:language: {basebackend@docbook:c++:cpp} + + +== Notice + +[%hardbreaks] +Copyright (C) Codeplay Software Limited. All rights reserved. + +Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks +of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by +permission by Khronos. + + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + + +== Dependencies + +This extension is written against the SYCL 2020 revision 8 specification. All +references below to the "core SYCL specification" or to section numbers in the +SYCL specification refer to that revision. + + +== Status + +This is an experimental extension specification, intended to provide early +access to features and gather community feedback. Interfaces defined in this +specification are implemented in {dpcpp}, but they are not finalized and may +change incompatibly in future versions of {dpcpp} without prior notice. +*Shipping software products should not rely on APIs defined in this +specification.* + +== Overview + +C++ arrays and `std::array` can be used in SYCL code to represent fixed-size +sequences of objects. However, these constructs have a significant restriction: +the number of elements must be known at compile time. In host-code context, +users can make use of dynamic memory allocations, e.g., `std::vector`, but this +is not the case in SYCL device code. + +SYCL specialization constants (SYCL Section 4.9.5.) can be used to represent +constants whose values can be set dynamically during the execution of a SYCL +application, but that will not change when a SYCL kernel function is +invoked. This way, specialization constants could be used to implement SYCL +private arrays whose size is given during the execution of the SYCL +application. There is no possible way of implementing this using `std::array`, +as the size of such container must be known at compile time, so we propose to +define a new `private_alloca` function whose size is specified using SYCL +specialization constants. + +[NOTE] +==== +This extension only supports SPIR-V backends for now, as it relies on +SPIR-V-specific capabilities, +such as specialization constants. +==== + +== Specification + +=== Feature test macro + +This extension provides a feature-test macro as described in the core SYCL +specification. An implementation supporting this extension must predefine the +macro `SYCL_EXT_ONEAPI_SPEC_CONSTANT_LENGTH_ALLOCA` to one of the values defined +in the table below. Applications can test for the existence of this macro to +determine if the implementation supports this feature, or applications can test +the macro's value to determine which of the extension's features the +implementation supports. + +[%header,cols="1,5"] +|=== +|Value +|Description + +|1 +|The APIs of this experimental extension are not versioned, so the + feature-test macro always has this value. +|=== + +=== The `private_alloca` function + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental { +template +private_ptr +private_alloca(kernel_handler &h); +} // namespace sycl::ext::oneapi::experimental +---- + +This extension adds a new `private_alloca` function that can be used to allocate +a private memory region with capacity for +`h.get_specialization_constant()` elements of type `ElementType`. The +returned pointer will be aligned for `ElementType`. + +The underlying memory region is automatically freed when the caller to +`private_alloca` is returned. + +In case of private memory exhaustion, the underlying backend must report an +error in the same fashion as if the allocation size were static. + +If this function is called from host context or an unsupported backend, an +`exception` with the `errc::feature_not_supported` error code must be thrown. + +`ElementType` must be a cv-unqualified trivial type. The return memory is +default initialized. + +==== Parameters + +`h`:: `sycl::kernel_handler` used to obtain the value of `SpecName` + +==== Template Parameters + +`ElementType`:: Cv-unqualified trivial type serving as `value_type` of the + returned `sycl::multi_ptr`. +`SpecName`:: `sycl::specialization_id` of integral `value_type`. The allocated + memory region has capacity for `h.get_specialization_constant` + `ElementType` elements. The default value for the specialization constant must + be at least one and the specialization constant must not be set to a value + less than one. Setting the specialization constant to a value less than 1 or + providing a default value less than 1 is undefined behaviour. +`DecorateAddress`:: Whether the returned `sycl::multi_ptr` is decorated or not. + +==== Return Value + +`sycl::private_ptr` to a region of `h.get_specialization_constant()` +elements of type `ElementType` aligned for such type. The underlying memory +region will be automatically deallocated when the function from which +`private_alloca` is called returns. + +== Example usage + +This non-normative section shows some example usages of the extension. + +[source,c++] +---- +constexpr specialization_id size(1); + +class Kernel; + +SYCL_EXTERNAL void impl(const float *in, float *out, size_t n, + decorated_private_ptr ptr); + +void run(queue q, const float *in, float *out, size_t n) { + q.submit([&](handler &h) { + h.set_specialization_constant(n); + h.parallel_for(n, [=](id<1> i, kernel_handler h) { + // Allocate memory for 'n' 'float's + auto ptr = private_alloca(h); + // Use pointer in implementation + impl(in, out, h.get_specialization_constant(), ptr); + }); + }); +---- + +== Design Constraints + +The big design constraint stems from the unknown allocation size at compile +time. C++ does not support variable length arrays and complete type sizes must +be known at compile time. Thus, the free function interface returning a pointer +to private memory is the better way to represent this construct in C++. Lifetime +of the underlying memory region was a concern too, but the current design +automatically freeing the memory when the caller is returned is in line with +similar constructs in other platforms. + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Authors|Changes +|1|2024-02-08|Victor Lomüller, Lukas Sommer, Victor Perez, Julian Oppermann, Tadej Ciglaric, Romain Biessy|*Initial draft* +|======================================== From 50bcc0bceca864e1e5bb4605b250dab81f0751b3 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Mon, 12 Feb 2024 11:41:27 +0000 Subject: [PATCH 02/18] Address comments Signed-off-by: Victor Perez --- ...neapi_spec_constant_length_alloca.asciidoc | 91 +++++++++---------- 1 file changed, 42 insertions(+), 49 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index 242eb3b694100..7ed0875571922 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -50,6 +50,15 @@ change incompatibly in future versions of {dpcpp} without prior notice. *Shipping software products should not rely on APIs defined in this specification.* +== Backend support status + +The APIs in this extension may be used only on a device that has +`aspect::ext_oneapi_scla`. The application must check that the device has this +aspect before submitting a kernel using any of the APIs in this extension. If +the application fails to do this, the implementation throws a synchronous +exception with the `errc::kernel_not_supported` error code when the kernel is +submitted to the queue. + == Overview C++ arrays and `std::array` can be used in SYCL code to represent fixed-size @@ -64,17 +73,10 @@ application, but that will not change when a SYCL kernel function is invoked. This way, specialization constants could be used to implement SYCL private arrays whose size is given during the execution of the SYCL application. There is no possible way of implementing this using `std::array`, -as the size of such container must be known at compile time, so we propose to +as the size of such containers must be known at compile time, so we propose to define a new `private_alloca` function whose size is specified using SYCL specialization constants. -[NOTE] -==== -This extension only supports SPIR-V backends for now, as it relies on -SPIR-V-specific capabilities, -such as specialization constants. -==== - == Specification === Feature test macro @@ -97,6 +99,22 @@ implementation supports. feature-test macro always has this value. |=== +=== New Aspect for Specification Constant-Length Allocations + +This extension adds a new device aspect: + +[source,c++] +---- +namespace sycl { +enum class aspect : /*unspecified*/ { + ext_oneapi_scla +}; +} // namespace sycl +---- + +The `ext_oneapi_scla` aspect indicates that the device is capable of using the +`private_alloca` API defined in the following sections. + === The `private_alloca` function [source,c++] @@ -105,49 +123,24 @@ namespace sycl::ext::oneapi::experimental { template private_ptr -private_alloca(kernel_handler &h); +private_alloca(kernel_handler &kh); } // namespace sycl::ext::oneapi::experimental ---- +_Mandates_: `ElementType` must be a cv-unqualified trivial type and `SpecName` +must be a reference to a specialization constant of integral `value_type`. -This extension adds a new `private_alloca` function that can be used to allocate -a private memory region with capacity for -`h.get_specialization_constant()` elements of type `ElementType`. The -returned pointer will be aligned for `ElementType`. - -The underlying memory region is automatically freed when the caller to -`private_alloca` is returned. - -In case of private memory exhaustion, the underlying backend must report an -error in the same fashion as if the allocation size were static. - -If this function is called from host context or an unsupported backend, an -`exception` with the `errc::feature_not_supported` error code must be thrown. - -`ElementType` must be a cv-unqualified trivial type. The return memory is -default initialized. - -==== Parameters - -`h`:: `sycl::kernel_handler` used to obtain the value of `SpecName` - -==== Template Parameters - -`ElementType`:: Cv-unqualified trivial type serving as `value_type` of the - returned `sycl::multi_ptr`. -`SpecName`:: `sycl::specialization_id` of integral `value_type`. The allocated - memory region has capacity for `h.get_specialization_constant` - `ElementType` elements. The default value for the specialization constant must - be at least one and the specialization constant must not be set to a value - less than one. Setting the specialization constant to a value less than 1 or - providing a default value less than 1 is undefined behaviour. -`DecorateAddress`:: Whether the returned `sycl::multi_ptr` is decorated or not. +_Effects_: `h.get_specialization_constant()` elements of type +`ElementType` are allocated and default initialized in the stack. -==== Return Value +_Returns_: A pointer to a default initialized region of private memory of +`h.get_specialization_constant()` elements of type +`ElementType`. `DecorateAddress` defines whether the returned `multi_ptr` is +decorated. -`sycl::private_ptr` to a region of `h.get_specialization_constant()` -elements of type `ElementType` aligned for such type. The underlying memory -region will be automatically deallocated when the function from which -`private_alloca` is called returns. +_Remarks_: In case of private memory exhaustion, the backend should notify the +user in the same way as if a statically sized vector of private memory lead to +memory exhaustion. In case of a successful call, memory is freed automatically +when the function which called `private_alloca` returns to its caller. == Example usage @@ -165,11 +158,11 @@ SYCL_EXTERNAL void impl(const float *in, float *out, size_t n, void run(queue q, const float *in, float *out, size_t n) { q.submit([&](handler &h) { h.set_specialization_constant(n); - h.parallel_for(n, [=](id<1> i, kernel_handler h) { + h.parallel_for(n, [=](id<1> i, kernel_handler kh) { // Allocate memory for 'n' 'float's - auto ptr = private_alloca(h); + auto ptr = private_alloca(kh); // Use pointer in implementation - impl(in, out, h.get_specialization_constant(), ptr); + impl(in, out, kh.get_specialization_constant(), ptr); }); }); ---- From b4eaaaa133fe398e9acf0a8778bbc33b715c6db6 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Mon, 12 Feb 2024 12:19:12 +0000 Subject: [PATCH 03/18] Add precondition --- ...cl_ext_oneapi_spec_constant_length_alloca.asciidoc | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index 7ed0875571922..147472350f967 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -129,6 +129,9 @@ private_alloca(kernel_handler &kh); _Mandates_: `ElementType` must be a cv-unqualified trivial type and `SpecName` must be a reference to a specialization constant of integral `value_type`. +_Preconditions_: `SpecName` has a default value of at least 1 and is not set to +a value less than 1 during program execution. + _Effects_: `h.get_specialization_constant()` elements of type `ElementType` are allocated and default initialized in the stack. @@ -137,10 +140,10 @@ _Returns_: A pointer to a default initialized region of private memory of `ElementType`. `DecorateAddress` defines whether the returned `multi_ptr` is decorated. -_Remarks_: In case of private memory exhaustion, the backend should notify the -user in the same way as if a statically sized vector of private memory lead to -memory exhaustion. In case of a successful call, memory is freed automatically -when the function which called `private_alloca` returns to its caller. +_Remarks_: In case of private memory exhaustion, the underlying backend must +report an error in the same fashion as if the allocation size were static.. In +case of a successful call, memory is freed automatically when the function which +called `private_alloca` returns to its caller. == Example usage From a7541458685cde7e418e17aa0dd42a9dc8d0f4bf Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Mon, 12 Feb 2024 12:21:24 +0000 Subject: [PATCH 04/18] Change precondition with remark --- .../sycl_ext_oneapi_spec_constant_length_alloca.asciidoc | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index 147472350f967..ce0ab23465b43 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -129,9 +129,6 @@ private_alloca(kernel_handler &kh); _Mandates_: `ElementType` must be a cv-unqualified trivial type and `SpecName` must be a reference to a specialization constant of integral `value_type`. -_Preconditions_: `SpecName` has a default value of at least 1 and is not set to -a value less than 1 during program execution. - _Effects_: `h.get_specialization_constant()` elements of type `ElementType` are allocated and default initialized in the stack. @@ -141,9 +138,11 @@ _Returns_: A pointer to a default initialized region of private memory of decorated. _Remarks_: In case of private memory exhaustion, the underlying backend must -report an error in the same fashion as if the allocation size were static.. In +report an error in the same fashion as if the allocation size were static. In case of a successful call, memory is freed automatically when the function which -called `private_alloca` returns to its caller. +called `private_alloca` returns to its caller. Additionally, `SpecName` must +have a default value of at least 1 and not be set to a value less than 1 during +program execution. Violation of these conditions lead to undefined behavior. == Example usage From ee4fca68ede3dc770995bbe1302bbe63f2f7f613 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Mon, 12 Feb 2024 12:26:50 +0000 Subject: [PATCH 05/18] Do not mention stack --- .../sycl_ext_oneapi_spec_constant_length_alloca.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index ce0ab23465b43..c8324f429204c 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -130,7 +130,7 @@ _Mandates_: `ElementType` must be a cv-unqualified trivial type and `SpecName` must be a reference to a specialization constant of integral `value_type`. _Effects_: `h.get_specialization_constant()` elements of type -`ElementType` are allocated and default initialized in the stack. +`ElementType` are allocated and default initialized in private memory. _Returns_: A pointer to a default initialized region of private memory of `h.get_specialization_constant()` elements of type From 975b7fee205e516da7a02f9704988ad9d0d6b2e0 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Mon, 12 Feb 2024 12:31:13 +0000 Subject: [PATCH 06/18] NIT --- .../sycl_ext_oneapi_spec_constant_length_alloca.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index c8324f429204c..bc22e0f1bbcb1 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -142,7 +142,7 @@ report an error in the same fashion as if the allocation size were static. In case of a successful call, memory is freed automatically when the function which called `private_alloca` returns to its caller. Additionally, `SpecName` must have a default value of at least 1 and not be set to a value less than 1 during -program execution. Violation of these conditions lead to undefined behavior. +program execution. Violation of these conditions results in undefined behavior. == Example usage From bdaa3466db2dc999e06f28ac80f2ab5e1609bc60 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Mon, 12 Feb 2024 16:42:21 +0000 Subject: [PATCH 07/18] Add two issues --- ...neapi_spec_constant_length_alloca.asciidoc | 32 +++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index bc22e0f1bbcb1..7d103595b19ca 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -179,6 +179,38 @@ of the underlying memory region was a concern too, but the current design automatically freeing the memory when the caller is returned is in line with similar constructs in other platforms. +== Issues + +=== Default `DecorateAddress` Value + +At the time this extension was first proposed, there was no consensus for a +default value for `sycl::access::decorate` in SYCL. The SYCL specification +chooses `sycl::access::decorate::legacy` to avoid making breaking changes, but +this would not justify using that value in this extension. + +Although it would be desirable to have one, the SCLA extension will not commit +to a default value until the SYCL community has come to an agreement. + +=== Passing Size as an Argument + +Initial design passes size as a `sycl::specialization_id &` template +argument and receives a `sycl::kernel_handler &` as an argument. This decision +comes from the current situation in which `sycl::specialization_id` is a unique +identifier to represent a specialization constant and `sycl::kernel_handler` is +used to query the **value** of specialization constants with +`sycl::kernel_handler::get_specialization_constant`. Having a +`sycl::specialization_constant` class representing specialization constants +would enable cleaner interfaces to this function like: + +[source,c++] +---- +namespace sycl::ext::oneapi::experimental { +template +private_ptr +private_alloca(const specialization_constant &size); +} // namespace sycl::ext::oneapi::experimental +---- + == Revision History [cols="5,15,15,70"] From d100f378283d18e02e395a7ad9dcfaf9d41e2f46 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Wed, 14 Feb 2024 09:43:17 +0000 Subject: [PATCH 08/18] Add aligned version --- ...t_oneapi_spec_constant_length_alloca.asciidoc | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index 7d103595b19ca..9f11c8af9fc11 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -115,7 +115,7 @@ enum class aspect : /*unspecified*/ { The `ext_oneapi_scla` aspect indicates that the device is capable of using the `private_alloca` API defined in the following sections. -=== The `private_alloca` function +=== The SCLA API [source,c++] ---- @@ -124,10 +124,18 @@ template private_ptr private_alloca(kernel_handler &kh); + +template +private_ptr +aligned_private_alloca(kernel_handler &kh); } // namespace sycl::ext::oneapi::experimental ---- + _Mandates_: `ElementType` must be a cv-unqualified trivial type and `SpecName` -must be a reference to a specialization constant of integral `value_type`. +must be a reference to a specialization constant of integral `value_type`. In +the case of `aligned_private_alloca`, `Alignment` must be a power of 2 +fundamental alignment stricter than `ElementType`'s alignment requirement. _Effects_: `h.get_specialization_constant()` elements of type `ElementType` are allocated and default initialized in private memory. @@ -135,7 +143,9 @@ _Effects_: `h.get_specialization_constant()` elements of type _Returns_: A pointer to a default initialized region of private memory of `h.get_specialization_constant()` elements of type `ElementType`. `DecorateAddress` defines whether the returned `multi_ptr` is -decorated. +decorated. In the case of `private_alloca`, the pointer is suitably aligned for +an object of type `ElementType`. In the case of `aligned_private_alloca`, the +pointer is aligned to the specified `Alignment`. _Remarks_: In case of private memory exhaustion, the underlying backend must report an error in the same fashion as if the allocation size were static. In From 6fc644acdb31259ae14f91942f6a7e2a354b6c93 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Wed, 14 Feb 2024 09:45:42 +0000 Subject: [PATCH 09/18] Fix formatting --- .../sycl_ext_oneapi_spec_constant_length_alloca.asciidoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index 9f11c8af9fc11..cad4f48b9fd46 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -135,7 +135,7 @@ aligned_private_alloca(kernel_handler &kh); _Mandates_: `ElementType` must be a cv-unqualified trivial type and `SpecName` must be a reference to a specialization constant of integral `value_type`. In the case of `aligned_private_alloca`, `Alignment` must be a power of 2 -fundamental alignment stricter than `ElementType`'s alignment requirement. +fundamental alignment stricter than the alignment requirement of `ElementType`. _Effects_: `h.get_specialization_constant()` elements of type `ElementType` are allocated and default initialized in private memory. From 92be4998c808dad4011413b55ca77d49bcf21304 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Thu, 15 Feb 2024 13:16:03 +0000 Subject: [PATCH 10/18] Change memory storage duration to automatic --- ...ext_oneapi_spec_constant_length_alloca.asciidoc | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index cad4f48b9fd46..89642fda12dd0 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -149,10 +149,10 @@ pointer is aligned to the specified `Alignment`. _Remarks_: In case of private memory exhaustion, the underlying backend must report an error in the same fashion as if the allocation size were static. In -case of a successful call, memory is freed automatically when the function which -called `private_alloca` returns to its caller. Additionally, `SpecName` must -have a default value of at least 1 and not be set to a value less than 1 during -program execution. Violation of these conditions results in undefined behavior. +case of a successful call, allocated memory has automatic storage +duration. Additionally, `SpecName` must have a default value of at least 1 and +not be set to a value less than 1 during program execution. Violation of these +conditions results in undefined behaviour. == Example usage @@ -185,9 +185,9 @@ The big design constraint stems from the unknown allocation size at compile time. C++ does not support variable length arrays and complete type sizes must be known at compile time. Thus, the free function interface returning a pointer to private memory is the better way to represent this construct in C++. Lifetime -of the underlying memory region was a concern too, but the current design -automatically freeing the memory when the caller is returned is in line with -similar constructs in other platforms. +of the underlying memory region was a concern too, but the current design with +automatic storage duration for the allocated memory region closely follows what +the user would get from a stack-allocated array. == Issues From c18c5c548d670dd28a2218cde9cd4056198b0a35 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Thu, 15 Feb 2024 13:48:50 +0000 Subject: [PATCH 11/18] Add storage duration example Signed-off-by: Victor Perez --- ...neapi_spec_constant_length_alloca.asciidoc | 33 +++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index 89642fda12dd0..37d3ddf31fce8 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -158,6 +158,8 @@ conditions results in undefined behaviour. This non-normative section shows some example usages of the extension. +=== Basic Usage + [source,c++] ---- constexpr specialization_id size(1); @@ -179,6 +181,37 @@ void run(queue q, const float *in, float *out, size_t n) { }); ---- +=== Storage Duration Clarification + +The following example is intended to clarify storage duration of memory +allocated by `private_alloca`. + +[source,c++] +---- +constexpr specialization_id size(1); + +class Kernel; + +SYCL_EXTERNAL void impl(const float *in, float *out, size_t n, + raw_private_ptr ptr); + +void run(queue q, const float *in, float *out, size_t n) { + q.submit([&](handler &h) { + h.set_specialization_constant(n); + h.parallel_for(n, [=](id<1> i, kernel_handler kh) { + raw_private_ptr ptr; + { + ptr = private_alloca(kh); + // 'private_alloca' has allocated a private memory region we can use in + // this block. + impl(in, out, kh.get_specialization_constant(), ptr); + } + // Memory allocated by 'private_alloca' has been deallocated. + // Dereferencing 'ptr' at this program point is undefined behaviour. + }); + }); +---- + == Design Constraints The big design constraint stems from the unknown allocation size at compile From 083168598e8418fd9354b1fb2ac227468861d721 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Thu, 15 Feb 2024 17:10:22 +0000 Subject: [PATCH 12/18] Fix formattign --- ...l_ext_oneapi_spec_constant_length_alloca.asciidoc | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index 37d3ddf31fce8..f4f6d7a87418f 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -103,7 +103,7 @@ implementation supports. This extension adds a new device aspect: -[source,c++] +[source] ---- namespace sycl { enum class aspect : /*unspecified*/ { @@ -117,7 +117,7 @@ The `ext_oneapi_scla` aspect indicates that the device is capable of using the === The SCLA API -[source,c++] +[source] ---- namespace sycl::ext::oneapi::experimental { template size(1); @@ -186,7 +186,7 @@ void run(queue q, const float *in, float *out, size_t n) { The following example is intended to clarify storage duration of memory allocated by `private_alloca`. -[source,c++] +[source] ---- constexpr specialization_id size(1); @@ -240,12 +240,12 @@ Initial design passes size as a `sycl::specialization_id &` template argument and receives a `sycl::kernel_handler &` as an argument. This decision comes from the current situation in which `sycl::specialization_id` is a unique identifier to represent a specialization constant and `sycl::kernel_handler` is -used to query the **value** of specialization constants with +used to query the *value* of specialization constants with `sycl::kernel_handler::get_specialization_constant`. Having a `sycl::specialization_constant` class representing specialization constants would enable cleaner interfaces to this function like: -[source,c++] +[source] ---- namespace sycl::ext::oneapi::experimental { template From 09742ab4f50abeafa6a0cb40502b5de71cd6b637 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Thu, 15 Feb 2024 17:14:33 +0000 Subject: [PATCH 13/18] Try another fix --- ...neapi_spec_constant_length_alloca.asciidoc | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index f4f6d7a87418f..7fd69803653ee 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -103,7 +103,7 @@ implementation supports. This extension adds a new device aspect: -[source] +[source,c++] ---- namespace sycl { enum class aspect : /*unspecified*/ { @@ -117,7 +117,7 @@ The `ext_oneapi_scla` aspect indicates that the device is capable of using the === The SCLA API -[source] +[source,c++] ---- namespace sycl::ext::oneapi::experimental { template size(1); @@ -186,7 +186,7 @@ void run(queue q, const float *in, float *out, size_t n) { The following example is intended to clarify storage duration of memory allocated by `private_alloca`. -[source] +[source,c++] ---- constexpr specialization_id size(1); @@ -215,12 +215,12 @@ void run(queue q, const float *in, float *out, size_t n) { == Design Constraints The big design constraint stems from the unknown allocation size at compile -time. C++ does not support variable length arrays and complete type sizes must +time. C\+\+ does not support variable length arrays and complete type sizes must be known at compile time. Thus, the free function interface returning a pointer -to private memory is the better way to represent this construct in C++. Lifetime -of the underlying memory region was a concern too, but the current design with -automatic storage duration for the allocated memory region closely follows what -the user would get from a stack-allocated array. +to private memory is the better way to represent this construct in +C\+\+. Lifetime of the underlying memory region was a concern too, but the +current design with automatic storage duration for the allocated memory region +closely follows what the user would get from a stack-allocated array. == Issues @@ -245,7 +245,7 @@ used to query the *value* of specialization constants with `sycl::specialization_constant` class representing specialization constants would enable cleaner interfaces to this function like: -[source] +[source,c++] ---- namespace sycl::ext::oneapi::experimental { template From 66d583a8ffe1fde543c7ffb61d706f3d6f9c23a2 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Thu, 15 Feb 2024 17:21:02 +0000 Subject: [PATCH 14/18] Use substitution --- .../sycl_ext_oneapi_spec_constant_length_alloca.asciidoc | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index 7fd69803653ee..c5dc65aa5f3d7 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -10,6 +10,7 @@ :encoding: utf-8 :lang: en :dpcpp: pass:[DPC++] +:cpp: pass:[C++] // Set the default source code type in this document to C++, // for syntax highlighting purposes. This is needed because @@ -61,7 +62,7 @@ submitted to the queue. == Overview -C++ arrays and `std::array` can be used in SYCL code to represent fixed-size +{cpp} arrays and `std::array` can be used in SYCL code to represent fixed-size sequences of objects. However, these constructs have a significant restriction: the number of elements must be known at compile time. In host-code context, users can make use of dynamic memory allocations, e.g., `std::vector`, but this @@ -215,10 +216,10 @@ void run(queue q, const float *in, float *out, size_t n) { == Design Constraints The big design constraint stems from the unknown allocation size at compile -time. C\+\+ does not support variable length arrays and complete type sizes must +time. {cpp} does not support variable length arrays and complete type sizes must be known at compile time. Thus, the free function interface returning a pointer to private memory is the better way to represent this construct in -C\+\+. Lifetime of the underlying memory region was a concern too, but the +{cpp}. Lifetime of the underlying memory region was a concern too, but the current design with automatic storage duration for the allocated memory region closely follows what the user would get from a stack-allocated array. From 3949fefe6788a94d96c2f7af5dd7fb0215ac05cf Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Mon, 19 Feb 2024 11:21:14 +0000 Subject: [PATCH 15/18] Change section headers --- ...neapi_spec_constant_length_alloca.asciidoc | 28 +++++++++---------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc index c5dc65aa5f3d7..9e9716f6144e4 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc @@ -100,7 +100,7 @@ implementation supports. feature-test macro always has this value. |=== -=== New Aspect for Specification Constant-Length Allocations +=== New aspect for specialization constant-length allocations This extension adds a new device aspect: @@ -136,7 +136,7 @@ aligned_private_alloca(kernel_handler &kh); _Mandates_: `ElementType` must be a cv-unqualified trivial type and `SpecName` must be a reference to a specialization constant of integral `value_type`. In the case of `aligned_private_alloca`, `Alignment` must be a power of 2 -fundamental alignment stricter than the alignment requirement of `ElementType`. +fundamental alignment and must be a positive multiple of `alignof(ElementType)`. _Effects_: `h.get_specialization_constant()` elements of type `ElementType` are allocated and default initialized in private memory. @@ -148,18 +148,18 @@ decorated. In the case of `private_alloca`, the pointer is suitably aligned for an object of type `ElementType`. In the case of `aligned_private_alloca`, the pointer is aligned to the specified `Alignment`. -_Remarks_: In case of private memory exhaustion, the underlying backend must -report an error in the same fashion as if the allocation size were static. In -case of a successful call, allocated memory has automatic storage -duration. Additionally, `SpecName` must have a default value of at least 1 and -not be set to a value less than 1 during program execution. Violation of these -conditions results in undefined behaviour. +_Remarks_: In case of private memory exhaustion, the implementation must report +an error in the same fashion as if the allocation size were static. In case of a +successful call, allocated memory has automatic storage duration. Additionally, +`SpecName` must have a default value of at least 1 and not be set to a value +less than 1 during program execution. Violation of these conditions results in +undefined behaviour. == Example usage This non-normative section shows some example usages of the extension. -=== Basic Usage +=== Basic usage [source,c++] ---- @@ -182,7 +182,7 @@ void run(queue q, const float *in, float *out, size_t n) { }); ---- -=== Storage Duration Clarification +=== Storage duration clarification The following example is intended to clarify storage duration of memory allocated by `private_alloca`. @@ -213,7 +213,7 @@ void run(queue q, const float *in, float *out, size_t n) { }); ---- -== Design Constraints +== Design constraints The big design constraint stems from the unknown allocation size at compile time. {cpp} does not support variable length arrays and complete type sizes must @@ -225,7 +225,7 @@ closely follows what the user would get from a stack-allocated array. == Issues -=== Default `DecorateAddress` Value +=== Default `DecorateAddress` value At the time this extension was first proposed, there was no consensus for a default value for `sycl::access::decorate` in SYCL. The SYCL specification @@ -235,7 +235,7 @@ this would not justify using that value in this extension. Although it would be desirable to have one, the SCLA extension will not commit to a default value until the SYCL community has come to an agreement. -=== Passing Size as an Argument +=== Passing size as an argument Initial design passes size as a `sycl::specialization_id &` template argument and receives a `sycl::kernel_handler &` as an argument. This decision @@ -255,7 +255,7 @@ private_alloca(const specialization_constant &size); } // namespace sycl::ext::oneapi::experimental ---- -== Revision History +== Revision history [cols="5,15,15,70"] [grid="rows"] From 609a863d230100fb19efd7818dada513be81e7ab Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Mon, 19 Feb 2024 12:28:37 +0000 Subject: [PATCH 16/18] Apply renaming Signed-off-by: Victor Perez --- ...iidoc => sycl_ext_private_alloca.asciidoc} | 33 +++++++++---------- 1 file changed, 16 insertions(+), 17 deletions(-) rename sycl/doc/extensions/experimental/{sycl_ext_oneapi_spec_constant_length_alloca.asciidoc => sycl_ext_private_alloca.asciidoc} (89%) diff --git a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc similarity index 89% rename from sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc rename to sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc index 9e9716f6144e4..a378a6cbeda00 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_oneapi_spec_constant_length_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc @@ -1,4 +1,4 @@ -= sycl_ext_oneapi_spec_constant_length_alloca += sycl_ext_oneapi_private_alloca :source-highlighter: coderay :coderay-linenums-mode: table @@ -54,11 +54,11 @@ specification.* == Backend support status The APIs in this extension may be used only on a device that has -`aspect::ext_oneapi_scla`. The application must check that the device has this -aspect before submitting a kernel using any of the APIs in this extension. If -the application fails to do this, the implementation throws a synchronous -exception with the `errc::kernel_not_supported` error code when the kernel is -submitted to the queue. +`aspect::ext_oneapi_private_alloca`. The application must check that the device +has this aspect before submitting a kernel using any of the APIs in this +extension. If the application fails to do this, the implementation throws a +synchronous exception with the `errc::kernel_not_supported` error code when the +kernel is submitted to the queue. == Overview @@ -84,11 +84,10 @@ specialization constants. This extension provides a feature-test macro as described in the core SYCL specification. An implementation supporting this extension must predefine the -macro `SYCL_EXT_ONEAPI_SPEC_CONSTANT_LENGTH_ALLOCA` to one of the values defined -in the table below. Applications can test for the existence of this macro to -determine if the implementation supports this feature, or applications can test -the macro's value to determine which of the extension's features the -implementation supports. +macro `SYCL_EXT_ONEAPI_PRIVATE_ALLOCA` to one of the values defined in the table +below. Applications can test for the existence of this macro to determine if the +implementation supports this feature, or applications can test the macro's value +to determine which of the extension's features the implementation supports. [%header,cols="1,5"] |=== @@ -108,15 +107,15 @@ This extension adds a new device aspect: ---- namespace sycl { enum class aspect : /*unspecified*/ { - ext_oneapi_scla + ext_oneapi_private_alloca }; } // namespace sycl ---- -The `ext_oneapi_scla` aspect indicates that the device is capable of using the -`private_alloca` API defined in the following sections. +The `ext_oneapi_private_alloca` aspect indicates that the device is capable of +using the `private_alloca` API defined in the following sections. -=== The SCLA API +=== The `private_alloca` API [source,c++] ---- @@ -232,8 +231,8 @@ default value for `sycl::access::decorate` in SYCL. The SYCL specification chooses `sycl::access::decorate::legacy` to avoid making breaking changes, but this would not justify using that value in this extension. -Although it would be desirable to have one, the SCLA extension will not commit -to a default value until the SYCL community has come to an agreement. +Although it would be desirable to have one, the `private_alloca` extension will +not commit to a default value until the SYCL community has come to an agreement. === Passing size as an argument From 1d4c7f6e26e24933f012c5546bc34c666c074f58 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Wed, 21 Feb 2024 09:49:25 +0000 Subject: [PATCH 17/18] Fix alignment claim --- .../extensions/experimental/sycl_ext_private_alloca.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc index a378a6cbeda00..4cba89ee5fcd6 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc @@ -134,8 +134,8 @@ aligned_private_alloca(kernel_handler &kh); _Mandates_: `ElementType` must be a cv-unqualified trivial type and `SpecName` must be a reference to a specialization constant of integral `value_type`. In -the case of `aligned_private_alloca`, `Alignment` must be a power of 2 -fundamental alignment and must be a positive multiple of `alignof(ElementType)`. +the case of `aligned_private_alloca`, `Alignment` must be a fundamental +alignment and must be a positive multiple of `alignof(ElementType)`. _Effects_: `h.get_specialization_constant()` elements of type `ElementType` are allocated and default initialized in private memory. From 33ba6779d27ff81a5ab54a1ea208b41eefaaa1f0 Mon Sep 17 00:00:00 2001 From: Victor Perez Date: Wed, 21 Feb 2024 10:07:59 +0000 Subject: [PATCH 18/18] Rephrase alignment --- .../extensions/experimental/sycl_ext_private_alloca.asciidoc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc b/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc index 4cba89ee5fcd6..fbf37679bcb99 100644 --- a/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc +++ b/sycl/doc/extensions/experimental/sycl_ext_private_alloca.asciidoc @@ -134,8 +134,9 @@ aligned_private_alloca(kernel_handler &kh); _Mandates_: `ElementType` must be a cv-unqualified trivial type and `SpecName` must be a reference to a specialization constant of integral `value_type`. In -the case of `aligned_private_alloca`, `Alignment` must be a fundamental -alignment and must be a positive multiple of `alignof(ElementType)`. +the case of `aligned_private_alloca`, `Alignment` must be an alignment value and +must be a positive multiple of `alignof(ElementType)`. If `Alignment` is an +extended alignment, it must be supported by the implementation. _Effects_: `h.get_specialization_constant()` elements of type `ElementType` are allocated and default initialized in private memory.