In OpenCL 2.0 and later, a user is able to allocate program
scope memory which can be accessed like a C++ global variable by any kernel in
an OpenCL program (cl_program
). When a program is shared between multiple
devices, each device receives its own unique instance of the program scope
memory allocation.
This extension introduces device scoped memory allocations into SYCL that can be
accessed within a kernel using syntax similar to C++ global variables, but
that have unique instances per sycl::device
and sycl::context
. Mechanisms
are provided for the host program to enqueue copies to or from the allocations
on a specific device. Restrictions are placed on the types of data that can be
stored within device_global
allocations, particularly around copyability and
constructors/destructors.
Copyright (c) 2021 - 2023 Intel Corporation. All rights reserved.
Note
|
Khronos® is a registered trademark and SYCL™ and SPIR™ are trademarks of The Khronos Group Inc. OpenCL™ is a trademark of Apple Inc. used by permission by Khronos. |
This is an experimental extension specification, intended to provide early access to features and gather community feedback. Interfaces defined in this specification are implemented in DPC++, but they are not finalized and may change incompatibly in future versions of DPC++ without prior notice. Shipping software products should not rely on APIs defined in this specification.
Artem Radzikhovskyy, Intel
Michael Kinsner, Intel
Jessica Davies, Intel
Joe Garvey, Intel
Mohammad Fawaz, Intel
Tommy Hoffner, Intel
John Pennycook, Intel
Greg Lueck, Intel
Roland Schulz, Intel
This extension is written against the SYCL 2020 specification, revision 3.
It also depends on the SYCL_EXT_ONEAPI_PROPERTIES
extension.
Note
|
In this document, we use |
The purpose of this document is to clearly describe and specify device_global
and related
concepts, types, and mechanisms, and to give examples and context for their usage.
Device scope memory allocations can provide an efficient mechanism for communication between multiple invocations of a kernel, or between kernels executing on the same device and context. There are additional benefits and optimization opportunities when a device compiler has visibility into the allocation size (static sizing) and uses of the allocation.
Syntax allowing direct use of an allocation (without passing pointers or parameters through function call boundaries) can also lead to syntax simplification in some important use cases.
Two example device_global
variables can be declared at namespace scope, as follows:
struct MyClass {
bool flag;
};
using namespace sycl::ext::oneapi::experimental;
device_global<MyClass> dm1;
static device_global<int[4]> dm2;
dm1
creates an allocation on each sycl::device
that contains an object of type MyClass
.
dm2
has internal linkage (due to static
), and creates allocations containing an array
of four int
per device.
Uses of dm1
and dm2
in device functions are syntactically similar to uses of global variables
in C++ (access directly through the namespace scope identifier), and device_global
has
reference wrapper-like semantics on a device. Of note, because C++ doesn’t allow for
overloading of the "dot operator", a get()
member of device_global
allows a reference
to be extracted, to which the usual dot operator may be applied as in:
sycl::queue Q;
Q.submit([&](sycl::handler& h) {
h.single_task([=] {
int x = 5;
if (dm1.get().flag)
x = dm2[0];
});
});
For both dm1
and dm2
, the MyClass
and int[4]
allocations on each device
in the context associated with Q
are zero-initialized before any
non-initialization accesses occur.
This extension provides a feature-test macro as described in the core SYCL
specification, Section 6.3.3 "Feature test macros". Therefore, an
implementation supporting this extension must predefine the macro
SYCL_EXT_ONEAPI_DEVICE_GLOBAL
to one of the values defined in the table below.
Applications can test for the existence of this macro to determine if the
implementation supports this feature, or applications can test the macro’s
value to determine which of the extension’s features
that the implementation supports.
Value | Description |
---|---|
1 |
Initial extension version |
device_global
provides a mechanism to allocate device scope memory - memory
which has unique underlying storage (of type T) for each sycl::device
and
sycl::context
combination. If multiple valid device and context combinations
are present then each receives its own unique underlying allocation. All kernels
that reference the same device_global
entity (either directly or via a pointer
to its underlying object of type T) share the same allocation of that object
when those kernels run on the same device and context.
device_global
allocations are in the global address space, as are any
underlying allocations of type T
which are implicitly allocated on each device
as a result of a device_global
object. It is undefined behavior if the host
program directly accesses a device_global
or any address obtained from a
device_global
member function, and similarly it is undefined behavior if a
device_global
or address obtained on one device from a device_global
member
function is accessed on a different device or context. There is no mechanism to
obtain addresses of or directly access a device’s device_global
allocation
within the host program.
A device_global
on a given device and context maintains its state (address of
the allocation and data within the allocation) even after the application
changes the value of a specialization constant via
handler::set_specialization_constant()
. Additionally, a device_global
maintains its state even when it is referenced from a kernel in a different
kernel_bundle
.
namespace sycl::ext::oneapi::experimental {
template <typename T, typename PropertyListT = empty_properties_t>
class device_global {
...
device_global
is a class template, parameterized by the type of the underlying allocation T, and a list of properties PropertyListT. The type of the allocation T also encodes the size of the allocation for potentially multidimensional array types.
T is restricted to types that have a trivial destructor. PropertyListT enables properties to be associated with a device_global
.
When compiling with C++ versions before C++20, T must also have a trivial default constructor. In this case, the allocation of type T for a given device_global
is zero-initialized on a given device prior to the first access to that device_global
on that device. For the purposes of this definition an access can be a direct access of the device_global
in kernel code or a copy to or from that device_global
enqueued to the given device.
When compiling with C++20 or later, T must have a constructor that can be constexpr
evaluated, and the parameters to the device_global
constructor are forwarded to the T constructor. In this case, the allocation of type T for a given device_global
is initialized on a given device prior to the first access to that device_global
on that device.
Properties may be specified for a device_global
to provide semantic
modification or optimization hint information to the compiler. See the section
below for a list of the properties that are allowed.
Note
|
On a device, |
The section below and the table following describe the constructors, member functions and factory methods for device_global
.
namespace sycl::ext::oneapi::experimental {
template <typename T, typename PropertyListT = empty_properties_t>
class device_global {
using subscript_return_t =
std::remove_reference_t<decltype(std::declval<T>()[std::ptrdiff_t{}])>;
public:
using element_type = std::remove_extent_t<T>;
static_assert(std::is_trivially_destructible_v<T>,
"Type T must be trivially destructible.");
// device_global initializes underlying T with the args argument
#if __cpp_consteval
template <typename... Args>
consteval explicit device_global(Args&&... args);
#else
static_assert(std::is_trivially_default_constructible_v<T>,
"Type T must be trivially default constructable (until C++20 "
"consteval is supported and enabled)");
// The underlying memory allocations of type T on devices will be
// zero-initialized before any non-initialization accesses occur.
device_global() = default;
#endif // __cpp_consteval
device_global(const device_global &) = delete;
device_global(const device_global &&) = delete;
device_global &operator=(const device_global &) = delete;
device_global &operator=(const device_global &&) = delete;
template <access::decorated IsDecorated>
multi_ptr<T, access::address_space::global_space, IsDecorated>
get_multi_ptr() noexcept;
template <access::decorated IsDecorated>
multi_ptr<const T, access::address_space::global_space, IsDecorated>
get_multi_ptr() const noexcept;
// Access the underlying data
operator T&() noexcept;
operator const T&() const noexcept;
T& get() noexcept;
const T& get() const noexcept;
// Enable assignments from underlying type
device_global& operator=(const T&) noexcept;
// Available if the operator[] is valid for objects of type T
subscript_return_t& operator[]( std::ptrdiff_t idx ) noexcept;
const subscript_return_t& operator[]( std::ptrdiff_t idx ) const noexcept;
// Available if the operator-> is valid for objects of type T
T& operator->() noexcept;
const T& operator->() const noexcept;
// Note that there is no need for "device_global" to define member functions for
// operators like "++", comparison, etc. Instead, the type "T" need only define
// these operators as non-member functions. Because there is an implicit conversion
// from "device_global" to "T&", the operations can be applied to objects of type
// "device_global<T>".
template<typename propertyT>
static constexpr bool has_property();
// The return type is an unspecified internal class used to represent
// instances of propertyT
template<typename propertyT>
static constexpr /*unspecified*/ get_property();
};
} // namespace sycl::ext::oneapi::experimental
Functions | Description |
---|---|
device_global(); |
Constructs a The storage on each device for
|
template <typename... Args>
consteval explicit device_global(Args&&... args); |
Constructs a The object of type
|
template <access::decorated IsDecorated>
multi_ptr<T, access::address_space::global_space, IsDecorated>
get_multi_ptr() noexcept;
template <access::decorated IsDecorated>
multi_ptr<T, access::address_space::global_space, IsDecorated>
get_multi_ptr() const noexcept; |
Available only in device functions. Returns a |
operator T&() noexcept;
operator const T&() const noexcept; |
Available only in device functions. Implicit conversion to a reference to the underlying |
T& get() noexcept;
const T& get() const noexcept; |
Available only in device functions. Returns a reference to the underlying |
device_global& operator=(const T&) noexcept; |
Available only in device functions. Enables assignment of type |
element_type& operator[]( std::ptrdiff_t idx ) noexcept;
const element_type& operator[]( std::ptrdiff_t idx ) const noexcept; |
Available only in device functions. Available only when the underlying Indexes into the underlying |
T& operator->() noexcept;
const T& operator->() const noexcept; |
Available only in device functions. Available only when Provides member access through |
template<typename propertyT>
static constexpr bool has_property(); |
Returns true if the |
template<typename propertyT>
static constexpr auto get_property(); |
Returns an object of the class used to represent the value of property |
There are restrictions on how the application can create objects of type
device_global
. Applications that violate these restrictions are ill-formed.
-
The application may declare a variable of type
device_global
in the following ways:-
As a variable at namespace scope, or
-
As a static member variable, but only if the member variable is publicly accessible from namespace scope.
The application must not create an object of type
device_global
in any other way. (E.g. variables with automatic storage duration or objects created vianew
are not allowed.) -
-
The
device_global
variable must not itself be an array. The underlying type T may be an array type, but thedevice_global
variable itself must not be an array. -
The
device_global
variable must not be shadowed by another identifier X which has the same name and is declared in an inline namespace, such that thedevice_global
variable is no longer accessible after the declaration of X. -
If the
device_global
variable is declared in a namespace, none of the enclosing namespace names N may be shadowed by another identifier X which has the same name as N and is declared in an inline namespace, such that N is no longer accessible after the declaration of X.
Note
|
The expectation is that some implementations may conceptually insert code at
the end of a translation unit which references each |
The following example illustrates some of these restrictions:
#include <sycl/sycl.hpp>
using namespace sycl::ext::oneapi::experimental;
device_global<int> a; // OK
static device_global<int> b; // OK
inline device_global<int> c; // OK
struct Foo {
static device_global<int> d; // OK
};
device_global<int> Foo::d;
struct Bar {
device_global<int> e; // ILLEGAL: non-static member variable not
}; // allowed
struct Baz {
private:
static device_global<int> f; // ILLEGAL: not publicly accessible from
}; // namespace scope
device_global<int> Baz::f;
device_global<int[4]> g; // OK
device_global<int> h[4]; // ILLEGAL: array of "device_global" not
// allowed
device_global<int> same_name; // OK
namespace foo {
device_global<int> same_name; // OK
}
namespace {
device_global<int> same_name; // OK
}
inline namespace other {
device_global<int> same_name; // ILLEGAL: shadows "device_global" variable
} // with same name in enclosing namespace scope
inline namespace other2 {
namespace foo { // ILLEGAL: namespace name shadows "::foo"
} // namespace which contains "device_global"
// variable.
}
When compiling with C++20, constant compile-time initialization for device_globals is supported. The following example shows a few examples of what this would look like:
// Constant int and array of int device_globals
device_global<int> no_device_image_dg {3};
device_global<int, decltype(properties(device_image_scope))> dg_int{5};
device_global<int[3], decltype(properties(device_image_scope))>
dg_int_arr{5, 2, 3};
// Constant char and array of char device_globals
device_global<char, decltype(properties(device_image_scope))> dg_char{'f'};
device_global<char[3], decltype(properties(device_image_scope))>
dg_char_arr{'d', '4', 'S'};
// Multidimensional array of integers
device_global<int[3][2], decltype(properties(device_image_scope))>
dg_multi_dim_arr{3, 4, 5, 6, 7, 8};
// Constant float and array of float device_globals
device_global<float, decltype(properties(device_image_scope))> dg_float{4.5};
device_global<float[6], decltype(properties(device_image_scope))>
dg_float_arr{4.5, 2.1, 3.5, 9.33, 2.33, 2.1};
// Constant double and array of double device_globals
device_global<double, decltype(properties(device_image_scope))>
dg_double{3.56543};
device_global<double[3], decltype(properties(device_image_scope))>
dg_double_arr{2.2341234, 233.23423, 236.52321};
// Constant bool and array of bool device_globals
device_global<bool, decltype(properties(device_image_scope))> dg_bool{true};
device_global<bool[3], decltype(properties(device_image_scope))>
dg_bool_arr{true, false, true};
// Constant struct and array of struct device_globals
struct TestStruct {
int field1;
bool field2;
float field3;
int field4[4];
};
constexpr TestStruct TS1(5, true, 2.1, {1, 2, 3, 4});
constexpr TestStruct TS2(7, false, 2.4, {1, 2, 3, 4});
constexpr TestStruct TS3(6, false, 4.34534, {5, 6, 7, 8});
device_global<TestStruct, decltype(properties(device_image_scope))>
dg_struct{TS3};
device_global<TestStruct[2], decltype(properties(device_image_scope))>
dg_struct_arr{TS1, TS2};
The device_global
class supports several compile-time-constant properties.
If specified, these properties are included in the PropertyListT
template
parameter as shown in this example:
using namespace sycl::ext::oneapi::experimental;
device_global<MyClass, decltype(properties(device_image_scope))> dm1;
device_global<int[4], decltype(properties(host_access_read))> dm2;
The following code synopsis shows the set of supported properties, and the following table describes their effect.
namespace sycl::ext::oneapi::experimental {
struct device_image_scope_key {
using value_t = property_value<device_image_scope_key>;
};
enum class host_access_enum : /* unspecified */ {
read,
write,
read_write,
none
};
struct host_access_key {
template <host_access_enum Access>
using value_t =
property_value<host_access_key,
std::integral_constant<host_access_enum, Access>>;
};
enum class init_mode_enum : /* unspecified */ {
reprogram,
reset
};
struct init_mode_key {
template <init_mode_enum Trigger>
using value_t =
property_value<init_mode_key,
std::integral_constant<init_mode_enum, Trigger>>;
};
struct implement_in_csr_key {
template <bool Enable>
using value_t =
property_value<implement_in_csr_key, std::bool_constant<Enable>>;
};
inline constexpr device_image_scope_key::value_t device_image_scope;
template <host_access_enum Access>
inline constexpr host_access_key::value_t<Access> host_access;
inline constexpr host_access_key::value_t<host_access_enum::read>
host_access_read;
inline constexpr host_access_key::value_t<host_access_enum::write>
host_access_write;
inline constexpr host_access_key::value_t<host_access_enum::read_write>
host_access_read_write;
inline constexpr host_access_key::value_t<host_access_enum::none>
host_access_none;
template <init_mode_enum Trigger>
inline constexpr init_mode_key::value_t<Trigger> init_mode;
inline constexpr init_mode_key::value_t<init_mode_enum::reprogram>
init_mode_reprogram;
inline constexpr init_mode_key::value_t<init_mode_enum::reset> init_mode_reset;
template <bool Enable>
inline constexpr implement_in_csr_key::value_t<Enable> implement_in_csr;
inline constexpr implement_in_csr_key::value_t<true> implement_in_csr_on;
inline constexpr implement_in_csr_key::value_t<false> implement_in_csr_off;
template <> struct is_property_key<device_image_scope_key> : std::true_type {};
template <> struct is_property_key<host_access_key> : std::true_type {};
template <> struct is_property_key<init_mode_key> : std::true_type {};
template <> struct is_property_key<implement_in_csr_key> : std::true_type {};
template <typename T, typename PropertyListT>
struct is_property_key_of<device_image_scope_key, device_global<T, PropertyListT>>
: std::true_type {};
template <typename T, typename PropertyListT>
struct is_property_key_of<host_access_key, device_global<T, PropertyListT>>
: std::true_type {};
template <typename T, typename PropertyListT>
struct is_property_key_of<init_mode_key, device_global<T, PropertyListT>>
: std::true_type {};
template <typename T, typename PropertyListT>
struct is_property_key_of<implement_in_csr_key, device_global<T, PropertyListT>>
: std::true_type {};
} // namespace sycl::ext::oneapi::experimental
Property | Description |
---|---|
device_image_scope |
This property is most useful for kernels that are submitted to an FPGA device, but it may be used with any kernel. Normally, a single instance of a device global variable is allocated for each device, and that instance is shared by all kernels that belong to the same context and are submitted to the same device, regardless of which device image contains the kernel. When this property is specified, it is an assertion by the user that on a given device a given device_global decorated with this property is only ever accessed in a single device_image. An implementation may be able to optimize accesses to the device global when this property is specified (especially on an FPGA device), but the user must be aware of which device image contains the kernels that use the variable. A device global that is decorated with this property may not be accessed from kernels that reside in different device images, either by direct reference to the variable or indirectly by passing the variable’s address to another kernel. The implementation is required to diagnose an error if the kernels that directly access a variable do not all reside in the same device image, however no diagnostic is required for an indirect access from another device image. A device global variable is guaranteed to be initialized for a device prior to the first time it is accessed (whether from a kernel or a copy operation). Device globals may also be re-initialized at implementation-defined times if multiple device images are used on the same device. To avoid unexpected re-initializations, applications should ensure that all kernels that are enqueued to a device D come from the same device image. In addition, applications should ensure that all device global copy operation enqueued to device D correspond to that same device image. The application may copy to or from a device global even before any kernel in the device image is submitted to the device. Doing so causes the device global to be initialized immediately before the copy happens. (Typically, the copy operation causes the device image to be loaded onto the device also.) As a result, copying from a device global returns the initial value if the device image that contains the variable is not currently loaded onto the device. |
host_access |
This property provides an assertion by the user telling the implementation whether the host code copies to or from the device global. As a result, the implementation may be able to perform certain optimizations. Although this property may be used with any device, it is generally only beneficial when used on FPGA devices. The following values are supported:
|
init_mode |
This property is only meaningful when used with an FPGA device. It is ignored for other devices. The following values are supported:
If the |
implement_in_csr |
This property is only meaningful when used with an FPGA device. It is ignored for other devices. The following values are supported:
If the |
Note
|
As stated above, the user must understand which device image contains a
kernel in order to use the
In addition, the following factors also affect how kernels are bundled into device images:
|
SYCL 2020 restrictions must be relaxed to allow device_global
to be used within
device functions without being const
or constexpr
and without being zero-initialized
or constant-initialized. This is achieved by adding device_global
exceptions to the
following point in Section 5.4 "Language restrictions for device functions". The modified restriction is:
-
Variables with static storage duration that are odr-used inside a device function, must be
const
orconstexpr
and zero-initialized or constant-initialized, except if the variable is of typedevice_global
in which case it can be odr-used inside a device function without beingconst
/constexpr
or zero-/constant-initialized.-
Amongst other things, this restriction makes it illegal for a device function to access a global variable that isn’t
const
orconstexpr
unless the variable is of typedevice_global
.
-
This extension broadens the use of the SYCL_EXTERNAL
macro to apply also to
device global variables. If the implementation defines the SYCL_EXTERNAL
macro, device code in one translation unit may reference a device global
variable that is defined in a different translation unit so long as the
declaration of the variable in both translation units uses SYCL_EXTERNAL
.
For example:
// In one translation unit
#include <sycl/sycl.hpp>
using namespace sycl::ext::oneapi::experimental;
SYCL_EXTERNAL device_global<int> Foo; // definition (also a declaration)
// In another translation unit
#include <sycl/sycl.hpp>
using namespace sycl::ext::oneapi::experimental;
using namespace sycl;
SYCL_EXTERNAL extern device_global<int> Foo; // declaration
void bar(queue q) {
q.single_task([=] {
Foo = 42;
});
}
Add the following functions to the sycl::queue
interface described in Section 4.6.5.1 of
the SYCL 2020 specification.
Note
|
A pointer to the allocation within a |
namespace sycl {
class queue {
public:
// Copy to device_global
template <typename T, typename PropertyListT>
event copy(const std::remove_all_extents_t<T> *src,
device_global<T, PropertyListT>& dest,
size_t count = sizeof(T) / sizeof(std::remove_all_extents_t<T>),
size_t startIndex = 0);
template <typename T, typename PropertyListT>
event copy(const std::remove_all_extents_t<T> *src,
device_global<T, PropertyListT>& dest,
size_t count, size_t startIndex, event depEvent);
template <typename T, typename PropertyListT>
event copy(const std::remove_all_extents_t<T> *src,
device_global<T, PropertyListT>& dest,
size_t count, size_t startIndex,
const std::vector<event> &depEvents);
// Copy from device_global
template <typename T, typename PropertyListT>
event copy(const device_global<T, PropertyListT>& src,
std::remove_all_extents_t<T> *dest,
size_t count = sizeof(T) / sizeof(std::remove_all_extents_t<T>),
size_t startIndex = 0);
template <typename T, typename PropertyListT>
event copy(const device_global<T, PropertyListT>& src,
std::remove_all_extents_t<T> *dest,
size_t count, size_t startIndex, event depEvent);
template <typename T, typename PropertyListT>
event copy(const device_global<T, PropertyListT>& src,
std::remove_all_extents_t<T> *dest,
size_t count,size_t startIndex, const std::vector<event> &depEvents);
// memcpy to device_global
template <typename T, typename PropertyListT>
event memcpy(device_global<T, PropertyListT>& dest,
const void *src, size_t numBytes = sizeof(T), size_t offset = 0);
template <typename T, typename PropertyListT>
event memcpy(device_global<T, PropertyListT>& dest,
const void *src, size_t numBytes,
size_t offset, event depEvent);
template <typename T, typename PropertyListT>
event memcpy(device_global<T, PropertyListT>& dest,
const void *src, size_t numBytes,
size_t offset, const std::vector<event> &depEvents);
// memcpy from device_global
template <typename T, typename PropertyListT>
event memcpy(void *dest,
const device_global<T, PropertyListT>& src,
size_t numBytes = sizeof(T), size_t offset = 0);
template <typename T, typename PropertyListT>
event memcpy(void *dest,
const device_global<T, PropertyListT>& src, size_t numBytes,
size_t offset, event depEvent);
template <typename T, typename PropertyListT>
event memcpy(void *dest,
const device_global<T, PropertyListT>& src, size_t numBytes,
size_t offset, const std::vector<event> &depEvents);
};
} // namespace sycl
Add the following function descriptions to the sycl::queue
interface description table
in Section 4.6.5.1 of the SYCL 2020 specification.
Function Definition | Function type |
---|---|
template <typename T, typename PropertyListT>
event copy(const std::remove_all_extents_t<T> *src,
device_global<T, PropertyListT>& dest,
size_t count = sizeof(T) / sizeof(std::remove_all_extents_t<T>),
size_t startIndex = 0); |
Explicit copy |
template <typename T, typename PropertyListT>
event copy(const std::remove_all_extents_t<T> *src,
device_global<T, PropertyListT>& dest,
size_t count, size_t startIndex, event depEvent); |
Explicit copy |
template <typename T, typename PropertyListT>
event copy(const std::remove_all_extents_t<T> *src,
device_global<T, PropertyListT>& dest,
size_t count, size_t startIndex, const std::vector<event> &depEvents); |
Explicit copy |
template <typename T, typename PropertyListT>
event copy(const device_global<T, PropertyListT>& src,
std::remove_all_extents_t<T> *dest,
size_t count = sizeof(T) / sizeof(std::remove_all_extents_t<T>),
size_t startIndex = 0); |
Explicit copy |
template <typename T, typename PropertyListT>
event copy(const device_global<T, PropertyListT>& src,
std::remove_all_extents_t<T> *dest,
size_t count, size_t startIndex, event depEvent); |
Explicit copy |
template <typename T, typename PropertyListT>
event copy(const device_global<T, PropertyListT>& src,
std::remove_all_extents_t<T> *dest,
size_t count, size_t startIndex, const std::vector<event> &depEvents); |
Explicit copy |
template <typename T, typename PropertyListT>
event memcpy(device_global<T, PropertyListT>& dest,
const void *src, size_t numBytes = sizeof(T), size_t offset = 0); |
Explicit copy |
template <typename T, typename PropertyListT>
event memcpy(device_global<T, PropertyListT>& dest,
const void *src, size_t numBytes,
size_t offset, event depEvent); |
Explicit copy |
template <typename T, typename PropertyListT>
event memcpy(device_global<T, PropertyListT>& dest,
const void *src, size_t numBytes,
size_t offset, const std::vector<event> &depEvents); |
Explicit copy |
template <typename T, typename PropertyListT>
event memcpy(void *dest,
const device_global<T, PropertyListT>& src,
size_t numBytes = sizeof(T), size_t offset = 0); |
Explicit copy |
template <typename T, typename PropertyListT>
event memcpy(void *dest,
const device_global<T, PropertyListT>& src, size_t numBytes,
size_t offset, event depEvent); |
Explicit copy |
template <typename T, typename PropertyListT>
event memcpy(void *dest,
const device_global<T, PropertyListT>& src, size_t numBytes,
size_t offset, const std::vector<event> &depEvents); |
Explicit copy |
Add the following functions to the sycl::handler
interface described in Section 4.9.4.3 of
the SYCL 2020 specification.
Add to Table 130, "Member functions of the handler class".
Member Function | Description |
---|---|
template <typename T, typename PropertyListT>
void copy(const std::remove_all_extents_t<T> *src,
device_global<T, PropertyListT>& dest,
size_t count = sizeof(T) / sizeof(std::remove_all_extents_t<T>),
size_t startIndex = 0); |
Not available if Copies count elements of type If count and startIndex would cause data to be written beyond the end of
the variable dest, the implementation throws an If If |
template <typename T, typename PropertyListT>
void copy(const device_global<T, PropertyListT>& src,
std::remove_all_extents_t<T> *dest,
size_t count = sizeof(T) / sizeof(std::remove_all_extents_t<T>),
size_t startIndex = 0); |
Not available if Copies count elements of type If count and startIndex would cause data to be read beyond the end of
the variable src, the implementation throws an If If |
template <typename T, typename PropertyListT>
void memcpy(device_global<T, PropertyListT>& dest,
const void *src, size_t numBytes = sizeof(T), size_t offset = 0); |
Not available if Copies count bytes from the pointer src to the If numBytes and offset would cause data to be written beyond the end of
the variable dest, the implementation throws an If If |
template <typename T, typename PropertyListT>
void memcpy(void *dest,
const device_global<T, PropertyListT>& src,
size_t numBytes = sizeof(T), size_t offset = 0); |
Not available if Copies count bytes from the If numBytes and offset would cause data to be read beyond the end of
the variable src, the implementation throws an If If |
Note
|
As specified above, the |
device_global
prioritizes usability over simplicity of implementation, and therefore adds requirements such as (1) that contents and addresses of the allocation on each device remain stable across changes to specialization constant values, and (2) that the allocation be accessible across device_image
on the same device. These requirements mean that the semantics of device_global
do not match the semantics of SPIR-V module scope variables, and therefore may not be implementable exclusively using the SPIR-V feature in existing SPIR-V consuming implementations.
Also note that there are no restrictions on passing (and subsequent dereferencing) of pointers obtained on a device from a device_global
, between kernels on the same device, including through storage to memory.
1) Can sycl::atomic_ref
be used with device_global
?
Resolved: Yes, but only on the device side. There is no visibility/communication across devices because each device receives a unique allocation of type T underlying the device_global
. There is no way for an atomic_ref
associated with the allocation to be created in host code because there is no way to extract a pointer or reference in host code (only copy/memcpy).
2) Should we restrict device_global
to static storage duration, and if so how?
Resolved: Yes, through similar language as specialization_id
. Moreover restricted to namespace scope, because it is expensive to implement function scope statics. This could change if a compelling use case arises that needs function scope static support.
3) Should the returned multi_ptr
default to decorated or an undecorated?
Resolved: No default - follow convention on this set by multi_ptr
4) Is a mechanism needed that can mark device accesses as read only, while allowing for host write access?
Resolved: No known compelling use cases at this point.
5) Are there important use cases that require arbitrary destructors to be supported by device_global
?
Resolved: No important cases known at this time. May loosen restriction in the future.
Rev | Date | Author | Changes |
---|---|---|---|
1 |
2021-06-11 |
Artem Radzikhovskyy |
Initial review version |
2 |
2021-08-01 |
Mike Kinsner |
Restrict to trivial default constructors for first release, change from pointer to reference semantics, swap order of arguments in |
3 |
2023-09-14 |
Justin Rosner |
Adding consteval constructor to allow for constant compile-time initialization of |