Copyright © Codeplay. All rights reserved.
Khronos® is a registered trademark and SYCL™ and SPIR™ are trademarks of The Khronos Group Inc. OpenCL™ is a trademark of Apple Inc. used by permission by Khronos.
This extension is written against the SYCL 2020 revision 6 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision.
This is an experimental extension specification, intended to provide early access to features and gather community feedback. Interfaces defined in this specification are implemented in DPC++, but they are not finalized and may change incompatibly in future versions of DPC++ without prior notice. Shipping software products should not rely on APIs defined in this specification.
This extension is currently implemented in DPC++ only for GPU devices and only when using the CUDA backend. Attempting to use this extension in kernels that run on other devices or backends will not work. Be aware that the compiler may not be able to issue a diagnostic to warn you if this happens.
Images in SYCL 1.2.1 were designed to work with OpenCL. SYCL 2020 tried to make them more versatile by splitting the image type into sampled and unsampled images. This enabled SYCL 2020 images to work better with other backends. However, SYCL 2020 images still didn’t quite meet user expectations. There was feedback about various use cases where the current model falls short (see examples at the end of this document for some of the use cases).
One of the key issues is requesting access to arbitrary images through handles, and not accessors. Accessing images through handles instead of accessors grants much more flexibility to the user, at the expense of automatic data dependency tracking. Bypassing accessors allows users to implement programs where the number of images is not known at compile-time, such as a texture atlas where one image holds references to other images. This kind of feature is impossible to implement with the accessor model outlined in the core specification.
These shortcomings are why we propose a new extension for SYCL 2020 images. Per our proposal, users would be able to separate memory allocation for the image from the actual image creation. Images will be represented by opaque handle types that can be passed directly into a kernel without requesting access. In many ways, this model more closely resembles the USM model when accessing data on the device, but it’s specialized for dealing with images.
The proposed model does not replace SYCL 2020 images, it is instead meant as building blocks for implementing SYCL 2020 images on top of it.
In addition to bindless images, this document also proposes an interoperability extension providing functionality to allow users to import external memory and semaphore objects from other APIs, such as Vulkan or DirectX.
Importing memory allows it to be shared between APIs without the need to duplicate allocations and perform multiple copies between host and device to ensure that said memory is kept uniform across those APIs at all times.
Importing semaphores will also allow SYCL to schedule command groups and queue operations that depend on completion of GPU commands submitted by external APIs.
Note
|
The interoperability outlined in this document concerns only the importing of external API objects into the SYCL runtime. We do not expose exportation of SYCL objects to external APIs. Interoperability capabilities vary between APIs. For example, CUDA allows the import of external memory and semaphores, but does not allow export of its own resources. |
This extension provides a feature-test macro as described in the core SYCL
specification. An implementation supporting this extension must predefine the
macro SYCL_EXT_ONEAPI_BINDLESS_IMAGES
to one of the values defined in the
table below. Applications can test for the existence of this macro to
determine if the implementation supports this feature, or applications can test
the macro’s value to determine which of the extension’s features the
implementation supports.
Rev | Description |
---|---|
1 |
Initial draft of the proposal |
2 |
Second revision of the proposal |
3 |
Third revision of the proposal |
4 |
Fourth revision of the proposal |
See the revision history at the bottom of this document for features added in each revision.
We provide the following device queries to retrieve information on whether a SYCL implementation provides support for various bindless image features.
The device aspects for these queries are:
Device descriptor | Description |
---|---|
|
Indicates if the device supports
creation of bindless images backed by the |
|
Indicates if the device supports the creation of bindless images backed by shared USM memory. |
|
Indicates if the device supports creation of 1D bindless images backed by USM. |
|
Indicates if the device supports creation of 2D bindless images backed by USM. |
Note
|
Not all SYCL backends may provide support for bindless images constructed from USM memory with all dimensions. As an example, CUDA does not have native support for 3D image resources constructed from USM. In the future, some backends may support this, and this proposal may be updated to allow creation of 3D USM images. |
namespace sycl::ext::oneapi::experimental {
enum class image_channel_order : /* unspecified */ {
a,
r,
rx,
rg,
rgx,
ra,
rgb,
rgbx,
rgba,
argb,
bgra,
intensity,
luminance,
abgr,
};
enum class image_channel_type : /* unspecified */ {
snorm_int8,
snorm_int16,
unorm_int8,
unorm_int16,
signed_int8,
signed_int16,
signed_int32,
unsigned_int8,
unsigned_int16,
unsigned_int32,
fp16,
fp32,
};
enum class image_type : /* unspecified */ {
standard,
mipmap,
array,
cubemap,
};
struct image_descriptor {
size_t width;
size_t height;
size_t depth;
image_channel_type channel_type;
image_channel_order channel_order;
image_type type;
unsigned int num_levels;
unsigned int array_size;
image_descriptor(sycl::range<1> dims, image_channel_order channel_order,
image_channel_type channel_type,
image_type type = image_type::standard,
unsigned int num_levels = 1, unsigned int array_size = 1);
image_descriptor(sycl::range<2> dims, image_channel_order channel_order,
image_channel_type channel_type,
image_type type = image_type::standard,
unsigned int num_levels = 1, unsigned int array_size = 1);
image_descriptor(sycl::range<3> dims, image_channel_order channel_order,
image_channel_type channel_type,
image_type type = image_type::standard,
unsigned int num_levels = 1, unsigned int array_size = 1);
image_descriptor get_mip_level_desc(unsigned int level) const;
void verify() const;
};
}
The image descriptor represents the image dimensions, channel type, and channel
order. An image_type
member is also present to allow for implementation of
mipmapped, image array, and cubemapped images.
The image_descriptor
shall be default constructible and follow by-value
semantics.
Note
|
Additional future `image_type`s may include combined image types like "mipmapped cubemap". |
Note that image_channel_type
and image_channel_order
existed in SYCL 1.2.1,
but were removed in SYCL 2020 in favor of a single, unified enum class.
We propose separating them again to enable better flexibility
and to avoid combinatorial complexity.
The verify
member function is available to check the validity of the image
descriptor against the limitations outlined below. If the given descriptor is
deemed invalid, then a sycl::exception
will be thrown with error code
sycl::errc::invalid
.
For the standard
image type, the value of num_levels
and array_size
must
both be 1
.
The type
member will inform the implementation of the type of image to
create, allocate, or free.
Only mipmap image types support more than one level. For mipmap images, the
member function get_mip_level_desc
will return an image_descriptor
for
a given level of a mipmap, with valid dimension values for that level, and the
type of the returned image_descriptor
will be image_type::standard
.
Only array image types support more than one array layer.
The process of creating an image is two-fold: allocate an image’s memory, then create an image handle from the allocation. Allocation of image memory can be achieved in two ways.
namespace sycl::ext::oneapi::experimental {
struct image_mem_handle {
using raw_handle_type = /* implementation defined */;
raw_handle_type raw_handle;
}
class image_mem {
public:
image_mem();
image_mem(const image_mem &rhs);
image_mem(image_mem &&rhs) noexcept;
image_mem(const image_descriptor &imageDesc,
const sycl::device &syclDevice,
const sycl::context &syclContext);
image_mem(const image_descriptor &imageDesc,
const sycl::queue &syclQueue);
~image_mem();
image_mem &operator=(image_mem &rhs);
image_mem &operator=(image_mem &&rhs) noexcept;
bool operator==(const image_mem &rhs) const;
bool operator!=(const image_mem &rhs) const;
image_mem_handle get_handle() const;
image_descriptor get_descriptor() const;
sycl::device get_device() const;
sycl::context get_context() const;
sycl::range<3> get_range() const;
sycl::image_channel_type get_image_channel_type() const;
sycl::image_channel_type get_image_channel_order() const;
unsigned int get_image_num_channels() const;
image_type get_type() const;
image_mem_handle get_mip_level_mem_handle(unsigned int level) const;
};
image_mem_handle alloc_image_mem(const image_descriptor &imageDesc,
const sycl::device &syclDevice,
const sycl::context &syclContext);
image_mem_handle alloc_image_mem(const image_descriptor &imageDesc,
const sycl::queue &syclQueue);
void free_image_mem(image_mem_handle memHandle,
image_type imageType,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void free_image_mem(image_mem_handle memHandle,
image_type imageType,
const sycl::queue &syclQueue);
}
The first method of allocating device memory for images is through
alloc_image_mem
. This takes a sycl::device
, sycl::context
,
and image_descriptor
to allocate device memory, with the appropriate image
type and size based on the image_descriptor
. Alternatively, we can also pass a
sycl::queue
instead of both sycl::device
and sycl::context
.
Memory allocated in this way requires the user to free that memory after all
operations using the memory are completed and no more operations operating on
the memory will be scheduled. This is done using free_image_mem
. An
image_type
should be passed to free_image_mem
to inform the implementation
of the type of memory to be freed.
The second method involves the image_mem
class, which is a RAII class wrapper
that performs allocation and deallocation of device memory.
The default constructor does not allocate any memory on the device and the
resulting image_mem
object is in an uninitialized state.
the constructor is a wrapper for alloc_image_mem
functionality.
The destructor is a wrapper for free_image_mem
functionality.
image_mem
also provides some functions to get various properties of the
image memory allocation such as the image range, channel type, channel order,
number of channels, number of levels, and image type.
In the case where a mipmap has been allocated, get_mip_level_mem_handle
can
be used to return an image_mem_handle
to a specific level of the mipmap. This
can then be used to copy data to that specific level or create an image handle
based on that level.
Note that the handle type image_mem_handle::raw_handle_type
is an opaque type,
and the handle cannot be dereferenced on the host. The layout of the memory is
backend-specific, and may be an optimized layout, e.g. tile swizzle patterns.
The image_mem
class must follow Common Reference Semantics as outlined by the
core SYCL 2020 specification.
The image_mem
class is not a valid kernel argument.
If the construction of the image_mem
class fails, a
sycl::exception
with error code sycl::errc::memory_allocation
will be
thrown.
Similarly, if alloc_image_mem
or free_image_mem
fail, a sycl::exception
with error code sycl::errc::memory_allocation
will be thrown.
Note
|
In the DPC++ CUDA backend, |
Extension functions are provided to retrieve information about images allocated
using the image_mem_alloc
function. These are similar to the member functions
provided by image_mem
. However, since the image_mem_handle
is a minimal
struct representing just the opaque handle the underlying memory object, there
is some information that we cannot retrieve from it, namely the image_type
,
image_channel_order
, the sycl::context
or sycl::device
the memory was
allocated in, and the image_descriptor
used to allocate the memory.
namespace sycl::ext::oneapi {
sycl::range<3> get_image_range(const image_mem_handle memHandle,
const sycl::device &syclDevice,
const sycl::context &syclContext);
sycl::range<3> get_image_range(const image_mem_handle memHandle,
const sycl::queue &syclQueue);
sycl::image_channel_type
get_image_channel_type(const image_mem_handle memHandle,
const sycl::device &syclDevice,
const sycl::context &syclContext);
sycl::image_channel_type
get_image_channel_type(const image_mem_handle memHandle,
const sycl::queue &syclQueue);
unsigned int get_image_num_channels(const image_mem_handle memHandle,
const sycl::device &syclDevice,
const sycl::context &syclContext);
unsigned int get_image_num_channels(const image_mem_handle memHandle,
const sycl::queue &syclQueue);
image_mem_handle get_mip_level_mem_handle(const image_mem_handle mipMemHandle,
unsigned int level,
const sycl::device &syclDevice,
const sycl::context &syclContext);
image_mem_handle get_mip_level_mem_handle(const image_mem_handle mipMemHandle,
unsigned int level,
const sycl::queue &syclQueue);
}
For get_image_range
where the underlying image memory was allocated with one
or two dimensions, the returned sycl::range<3>
will contain zero values for
the dimensions unused by the underlying image memory object.
The second way to allocate image memory is to use USM allocations. SYCL already
provides a number of USM allocation functions. This proposal would add another,
pitched memory allocation, through pitched_alloc_device
.
namespace sycl::ext::oneapi::experimental {
void *pitched_alloc_device(size_t *retRowPitch,
size_t widthInBytes, size_t height,
unsigned int elementSizeBytes,
const sycl::queue &syclQueue);
void *pitched_alloc_device(size_t *retRowPitch
size_t widthInBytes, size_t height,
unsigned int elementSizeBytes,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void *pitched_alloc_device(size_t *resultPitch,
const image_descriptor &desc,
const sycl::queue &queue);
void *pitched_alloc_device(size_t *resultPitch,
const image_descriptor &desc,
const sycl::device &syclDevice,
const sycl::context &syclContext);
}
This function will allocate a memory region aimed to be used for two-dimensional images. It allocates memory that is guaranteed to adhere to the device’s alignment requirements for USM images.
If the user does not wish to use pitched_alloc_device
to allocate
two-dimensional USM images, but prefers to use another USM allocation
function instead, then that allocation must adhere to some alignment
restrictions. These restrictions are device specific, and queries for them can
be found in the "Pitch alignment restrictions and queries" section below.
If the allocation of pitched memory fails, pitched_alloc_device
will throw a
sycl::exception
with error code sycl::errc::memory_allocation
.
For the purposes of this document, the row pitch of an image memory allocation
is the distance in bytes between the first elements of adjacent rows of the
image. Some devices may require two-dimensional USM images to be allocated with
specific alignments for their width and pitch values. The pitched_alloc_device
API intends to make allocation of USM memory adhering to these restrictions
easy, returning the appropriate pitch value to the user. However, if a user
wishes to use another USM allocation function, they must be aware of these
restrictions, and query the device to ensure the allocations they wish to use
adhere to those restrictions.
This proposal provides a number of additional device queries that enable the user to allocate appropriate pitched USM memory for two-dimensional images. One-dimensional images do not require any pitch values.
The device information descriptors for these queries are:
Device descriptor | Return type | Description |
---|---|---|
|
|
Returns the required alignment of the pitch between two rows of an image in bytes for images allocated using USM. |
|
|
Returns the maximum linear width allowed for images allocated using USM. |
|
|
Returns the maximum linear height allowed for images allocated using USM. |
|
|
Returns the maximum linear row pitch allowed for images allocated using USM. |
The next step is to create the image, and obtain the handle.
namespace sycl::ext::oneapi::experimental {
/// Opaque unsampled image handle type.
struct unsampled_image_handle {
using raw_image_handle_type = /* Implementation defined */;
unsampled_image_handle();
unsampled_image_handle(raw_image_handle_type raw_handle);
raw_image_handle_type raw_handle;
};
/// Opaque sampled image handle type.
struct sampled_image_handle {
using raw_image_handle_type = /* Implementation defined */;
sampled_image_handle();
sampled_image_handle(raw_image_handle_type raw_image_handle);
raw_image_handle_type raw_handle;
};
// Creating an unsampled image from an `image_mem_handle`
unsampled_image_handle create_image(image_mem_handle memHandle,
const image_descriptor &desc,
const sycl::device &syclDevice,
const sycl::context &syclContext);
unsampled_image_handle create_image(image_mem_handle memHandle,
const image_descriptor &desc,
const sycl::queue &syclQueue);
// Creating a sampled image from an `image_mem_handle`
sampled_image_handle create_image(image_mem_handle memHandle,
const image_descriptor &desc,
const bindless_image_sampler &sampler,
const sycl::device &syclDevice,
const sycl::context &syclContext);
sampled_image_handle create_image(image_mem_handle memHandle,
const image_descriptor &desc,
const bindless_image_sampler &sampler,
const sycl::queue &syclQueue);
// Creating an unsampled image from an `image_mem` object
unsampled_image_handle create_image(const image_mem &memHandle,
const image_descriptor &desc,
const sycl::device &syclDevice,
const sycl::context &syclContext);
unsampled_image_handle create_image(const image_mem &memHandle,
const image_descriptor &desc,
const sycl::queue &syclQueue);
// Creating a sampled image from an `image_mem` object
sampled_image_handle create_image(const image_mem &memHandle,
const image_descriptor &desc,
const bindless_image_sampler &sampler,
const sycl::device &syclDevice,
const sycl::context &syclContext);
sampled_image_handle create_image(const image_mem &memHandle,
const image_descriptor &desc,
const bindless_image_sampler &sampler,
const sycl::queue &syclQueue);
// Creating a sampled image from a USM allocation and pitch
sampled_image_handle create_image(const void *usmPtr, size_t pitch,
const image_descriptor &desc,
const bindless_image_sampler &sampler,
const sycl::device &syclDevice,
const sycl::context &syclContext);
sampled_image_handle create_image(const void *usmPtr, size_t pitch,
const image_descriptor &desc,
const bindless_image_sampler &sampler,
const sycl::queue &syclQueue);
// Destroying an image handle
void destroy_image_handle(sampled_image_handle &imageHandle,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void destroy_image_handle(sampled_image_handle &imageHandle,
const sycl::queue &syclQueue);
void destroy_image_handle(unsampled_image_handle &imageHandle,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void destroy_image_handle(unsampled_image_handle &imageHandle,
const sycl::queue &syclQueue);
}
Once we have allocated memory, we can pass it into the create_image
function
to obtain a sampled_image_handle
or unsampled_image_handle
.
These objects are opaque types that represent an image object.
They can be captured by value into a SYCL kernel, or they can be passed in a
buffer as a dynamic array of images (see examples at the bottom of this
document).
We can either provide a bindless_image_sampler
(defined in section below) or
not when creating the image. Doing so will create a sampled_image_handle
,
where otherwise an unsampled_image_handle
would be returned. A
sampled_image_handle
should contain a raw sampler handle that will be used
when sampling an image.
Whether an image_descriptor
or void *
USM allocation was passed to
create_image
, it must have been allocated in the same context and on the same
device as the one passed to create_image
.
If we choose to create a 2D image from a USM allocation by passing a void *
,
we must also pass the pitch of the memory allocation. If the memory was
allocated using pitched_alloc_device
, the pitch passed must be the one which
was returned by pitched_alloc_device
. If the user did not use
pitched_alloc_device
to allocate this memory, then that memory must still
adhere to device specific alignment restrictions. These restrictions and their
queries are outlined in the section "Pitch alignment restrictions and queries"
below.
The pitch is ignored for 1D USM images.
If the creation of an image fails, create_image
will throw a sycl::exception
with error code sycl::errc::runtime
.
The unsampled_image_handle
and sampled_image_handle
types shall be
default-constructible, copy-constructible, and device-copyable. When default
constructed, image handles are not valid until a user manually assigns a valid
raw_image_handle_type
to the raw_handle
field of the handle struct. The
default value of the raw_handle
is implementation defined.
The unsampled_image_handle
and sampled_image_handle
types have a
constructor to allow creation of the types from a raw_image_handle_type
Note
|
In the DPC++ CUDA backend a sampled image will correspond to a CUDA texture, whereas an unsampled image will correspond to a CUDA surface. |
After we’re done with the image, we need to destroy the handle using
destroy_image_handle
. Destroying an image handle does not deallocate the
underlying image memory. The user is responsible for deallocation, either
through free_image_mem
, or destroying the image_mem
object, if one was used.
The bindless_image_sampler
struct shown below is used to set the sampling
properties of sampled_images
upon image creation. It can be used to set
sampling properties that exist in the SYCL 2020 image_sampler
as well as
extra properties used for sampling additional image types including
level-of-detail (LOD) and anisotropic filtering for mipmaps, and seamless
filtering for cubemaps.
namespace sycl::ext::oneapi::experimental {
enum class cubemap_filtering_mode : /* unspecified */ {
disjointed,
seamless,
};
struct bindless_image_sampler {
// Assign addressing mode to all dimensions
bindless_image_sampler(sycl::addressing_mode addressing,
sycl::coordinate_normalization_mode coordinate,
sycl::filtering_mode filtering);
bindless_image_sampler(sycl::addressing_mode addressing,
sycl::coordinate_normalization_mode coordinate,
sycl::filtering_mode filtering,
sycl::filtering_mode mipFiltering,
float minMipmapLevelClamp, float maxMipmapLevelClamp,
float maxAnisotropy);
bindless_image_sampler(sycl::addressing_mode addressing,
sycl::coordinate_normalization_mode coordinate,
sycl::filtering_mode filtering,
cubemap_filtering_mode cubemapFiltering);
// Specific addressing modes per dimension
bindless_image_sampler(sycl::addressing_mode addressing[3],
sycl::coordinate_normalization_mode coordinate,
sycl::filtering_mode filtering);
bindless_image_sampler(sycl::addressing_mode addressing[3],
sycl::coordinate_normalization_mode coordinate,
sycl::filtering_mode filtering,
sycl::filtering_mode mipmapFiltering,
float minMipmapLevelClamp, float maxMipmapLevelClamp,
float maxAnisotropy);
bindless_image_sampler(sycl::addressing_mode addressing[3],
sycl::coordinate_normalization_mode coordinate,
sycl::filtering_mode filtering,
cubemap_filtering_mode cubemapFiltering);
sycl::addressing_mode addressing[3] = {sycl::addressing_mode::none};
sycl::coordinate_normalization_mode coordinate =
sycl::coordinate_normalization_mode::unnormalized;
sycl::filtering_mode filtering = sycl::filtering_mode::nearest;
sycl::filtering_mode mipmap_filtering = sycl::filtering_mode::nearest;
float min_mipmap_level_clamp = 0.f;
float max_mipmap_level_clamp = 0.f;
float max_anisotropy = 0.f;
ext::oneapi::experimental::cubemap_filtering_mode cubemap_filtering =
cubemap_filtering_mode::disjointed;
};
}
The bindless_image_sampler
shall be default constructible and follow by-value
semantics. The value for the addressing mode, addressing_mode::none
,
represents the backend’s default addressing mode. On CUDA this is Wrap
, i.e.
addressing_mode::repeat
.
addressing[3]
defines the addressing mode per texture dimension. A
bindless_image_sampler
can be constructed with a singular
sycl::addressing_mode
, where this parameter will define all dimensions.
mipmap_filtering
dictates the method in which sampling between mipmap
levels is performed.
min_mipmap_level_clamp
defines the minimum mipmap level from which we can
sample, with the minimum value being 0.
max_mipmap_level_clamp
defines the maximum mipmap level from which we can
sample. This value cannot be higher than the number of allocated levels.
max_anisotropy
dictates the anisotropic ratio used when sampling the mipmap
with anisotropic filtering.
cubemap_filtering
dictates the method of sampling along cubemap face borders.
Disjointed indicates no sampling between faces whereas seamless indicates that
sampling across face boundaries is enabled.
Note
|
In CUDA, when seamless cubemap filtering is enabled, sampled image address modes
specified are ignored. Instead, if the |
namespace sycl {
class handler {
public:
// Simple host to device copy
void ext_oneapi_copy(
void *Src,
ext::oneapi::experimental::image_mem_handle Dest,
const ext::oneapi::experimental:image_descriptor &DestImgDesc);
// Host to device copy with offsets and extent
void ext_oneapi_copy(
void *Src,
sycl::range<3> SrcOffset,
sycl::range<3> SrcExtent,
ext::oneapi::experimental::image_mem_handle Dest,
sycl::range<3> DestOffset,
const ext::oneapi::experimental::image_descriptor &DestImgDesc,
sycl::range<3> CopyExtent);
// Simple device to host copy
void ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
void *Dest,
const ext::oneapi::experimental::image_descriptor &SrcImgDesc);
// Device to host copy with offsets and extent
void ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
sycl::range<3> SrcOffset,
const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
void *Dest,
sycl::range<3> DestOffset,
sycl::range<3> DestExtent,
sycl::range<3> CopyExtent);
// Simple HtoD or DtoH copy with USM device memory
void ext_oneapi_copy(void *Src,
void *Dest,
const ext::oneapi::experimental::image_descriptor &Desc,
size_t DeviceRowPitch);
// HtoD or DtoH copy with USM device memory, using offsets, extent
void ext_oneapi_copy(
void *Src,
sycl::range<3> SrcOffset,
void *Dest,
sycl::range<3> DestOffset,
const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
size_t DeviceRowPitch,
sycl::range<3> HostExtent,
sycl::range<3> CopyExtent);
// Simple device to device copy
void ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
ext::oneapi::experimental::image_mem_handle Dest,
const ext::oneapi::experimental::image_descriptor &ImageDesc);
};
class queue {
public:
// Simple host to device copy
event ext_oneapi_copy(void *Src,
ext::oneapi::experimental::image_mem_handle Dest,
const ext::oneapi::experimental::image_descriptor &DestImgDesc);
event ext_oneapi_copy(void *Src,
ext::oneapi::experimental::image_mem_handle Dest,
const ext::oneapi::experimental::image_descriptor &DestImgDesc,
event DepEvent);
event ext_oneapi_copy(void *Src,
ext::oneapi::experimental::image_mem_handle Dest,
const ext::oneapi::experimental::image_descriptor &DestImgDesc,
const std::vector<event> &DepEvents);
// Host to device copy with offsets and extent
event ext_oneapi_copy(
void *Src,
range<3> SrcOffset,
range<3> SrcExtent,
ext::oneapi::experimental::image_mem_handle Dest,
range<3> DestOffset,
const ext::oneapi::experimental::image_descriptor &DestImgDesc,
range<3> CopyExtent);
event ext_oneapi_copy(
void *Src,
range<3> SrcOffset,
range<3> SrcExtent,
ext::oneapi::experimental::image_mem_handle Dest,
range<3> DestOffset,
const ext::oneapi::experimental::image_descriptor &DestImgDesc,
range<3> Extent, event DepEvent);
event ext_oneapi_copy(
void *Src,
range<3> SrcOffset,
range<3> SrcExtent,
ext::oneapi::experimental::image_mem_handle Dest,
range<3> DestOffset,
const ext::oneapi::experimental::image_descriptor &DestImgDesc,
range<3> CopyExtent, const std::vector<event> &DepEvents);
// Simple device to host copy
event ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
void *Dest,
const ext::oneapi::experimental::image_descriptor &Desc);
event ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
void *Dest,
const ext::oneapi::experimental::image_descriptor &Desc,
event DepEvent);
event ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
void *Dest,
const ext::oneapi::experimental::image_descriptor &Desc,
const std::vector<event> &DepEvents);
// Device to host copy with offsets and extent
event ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
range<3> SrcOffset,
const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
void *Dest,
range<3> DestOffset,
range<3> DestExtent,
range<3> CopyExtent);
event ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
range<3> SrcOffset,
const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
void *Dest,
range<3> DestOffset,
range<3> DestExtent,
range<3> CopyExtent, event DepEvent);
event ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
range<3> SrcOffset,
const ext::oneapi::experimental::image_descriptor &SrcImgDesc,
void *Dest,
range<3> DestOffset,
range<3> DestExtent,
range<3> CopyExtent, const std::vector<event> &DepEvents);
// Host to device OR device to host using USM device memory
event ext_oneapi_copy(
void *Src, void *Dest,
const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
size_t DeviceRowPitch);
event ext_oneapi_copy(
void *Src, void *Dest,
const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
size_t DeviceRowPitch,
event DepEvent);
event ext_oneapi_copy(
void *Src, void *Dest,
const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
size_t DeviceRowPitch,
const std::vector<event> &DepEvents);
// Host to device OR device to host using USM device memory,
// with control over sub-region
event ext_oneapi_copy(
void *Src, sycl::range<3> SrcOffset,
void *Dest, sycl::range<3> DestOffset,
const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
size_t DeviceRowPitch,
sycl::range<3> HostExtent,
sycl::range<3> CopyExtent);
event ext_oneapi_copy(
void *Src, sycl::range<3> SrcOffset,
void *Dest, sycl::range<3> DestOffset,
const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
size_t DeviceRowPitch,
sycl::range<3> HostExtent,
sycl::range<3> CopyExtent);
event ext_oneapi_copy(
void *Src, sycl::range<3> SrcOffset,
void *Dest, sycl::range<3> DestOffset,
const ext::oneapi::experimental::image_descriptor &DeviceImgDesc,
size_t DeviceRowPitch,
sycl::range<3> HostExtent,
sycl::range<3> CopyExtent);
// Simple device to device copy
event ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
ext::oneapi::experimental::image_mem_handle Dest,
const ext::oneapi::experimental::image_descriptor &ImageDesc);
event ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
ext::oneapi::experimental::image_mem_handle Dest,
const ext::oneapi::experimental::image_descriptor &ImageDesc,
event DepEvent);
event ext_oneapi_copy(
ext::oneapi::experimental::image_mem_handle Src,
ext::oneapi::experimental::image_mem_handle Dest,
const ext::oneapi::experimental::image_descriptor &ImageDesc,
const std::vector<event> &DepEvents);
};
}
To enable the copying of images an ext_oneapi_copy
function is proposed as a
method of the queue and handler. It can be used to copy image memory, whether
allocated through USM or using an image_mem_handle
, from host to
device, or device to host. Device to device copies are currently supported only
through image_mem_handle
allocations.
For the ext_oneapi_copy
variants that do not take
offsets and extents, the image descriptor passed to the ext_oneapi_copy
API
is used to determine the pixel size, dimensions, and extent in memory of the
image to copy. If performing sub-region copy, the size of the memory region is
also determined by the offsets and extent passed.
For images allocated using USM, existing SYCL functionality can be used to
copy their memory, but we also provide ext_oneapi_copy
functions that take
USM pointers. If the image memory was allocated using pitched_alloc_device
,
then the source and destination, row pitch parameter passed must match that
which was returned from pitched_alloc_device
. If the user opted to
use another allocation function then the device pitch parameters must adhere to
the alignment restrictions outlined in the
"Pitch alignment restrictions and queries" section.
Unless performing a sub-region copy, the user must ensure that the memory
regions accessed through Dest
and Src
have the same capacity.
Whether copying image memory to the device through a USM Dest
pointer, or an
image_mem_handle
, the host memory is always assumed to be tightly packed.
Similarly, the host memory is assumed to be packed when copying from device to
host.
For the functions that take an image_mem_handle
, the handle must have been
allocated within the same context and device of the queue
.
For the forms that take a USM pointer, the image memory must also have been
allocated within the same context and device of the queue
. The USM memory
must be accessible on the queue’s device.
The ext_oneapi_copy
function variants that don’t take offsets and extents may
fail in the following scenarios:
-
The
Src
andDest
memory was not allocated on the same device and context of the queue. -
The
Src
andDest
memory regions, whereSrc
orDest
can be either on the host or device, do not have the same memory capacity, where the capacity is calculate from thewidth
,height
,depth
,channel_order
, andchannel_type
members of theimage_descriptor
parameter.
The ext_oneapi_copy
function variants that do take offsets and extents may
fail in the following scenarios:
-
The
Src
andDest
memory was not allocated on the same device and context of the queue. -
The image descriptor passed does not match the image descriptor used to allocate the image on the device.
-
the
CopyExtent
describes a memory region larger than that which was allocated on either the host or the device. -
The
HostExtent
describes a memory region larger than that which was allocated on the host. -
The
SrcExtent
describes a memory region larger than that which was allocated, whereSrc
can be either the host or device. -
The
DestExtent
describes a memory region larger than that which was allocated, whereDest
can be either the host or device. -
If
SrcOffset + CopyExtent
moves the memory sub-region outside the bounds of the memory described bySrc
, irrespective of whetherSrc
is on the host or the device. -
If
DestOffset + CopyExtent
moves the memory sub-region outside the bounds of the memory described byDest
, irrespective of whetherDest
is on the host or the device. -
The
DeviceRowPitch
does not adhere to the alignment requirements outlined in section "Pitch alignment restrictions and queries" -
The value of
DeviceRowPitch
is smaller than the width of the image on the device.
If copying of an image fails, ext_oneapi_copy
will throw a sycl::exception
with error code sycl::errc::invalid
, and relay an error message back to the
user through sycl::exception::what()
, describing which of the scenarios
listed above caused the failure.
namespace sycl::ext::oneapi::experimental {
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT fetch_image(const unsampled_image_handle &ImageHandle,
const CoordT &Coords);
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT fetch_image(const sampled_image_handle &ImageHandle,
const CoordT &Coords);
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT sample_image(const sampled_image_handle &ImageHandle,
const CoordT &Coords);
template <typename DataT, typename CoordT>
void write_image(unsampled_image_handle ImageHandle,
const CoordT &Coords, const DataT &Color);
}
Inside a kernel, it’s possible to retrieve data from an image via fetch_image
or sample_image
, passing the appropirate image handle. The fetch_image
API
is applicable to sampled and unsampled images, and the data will be fetched
exactly as is in device memory. The sample_image
API is only applicable to
sampled images, the image data will be sampled according to the
bindless_image_sampler
that was passed to the image upon construction.
When fetching from a sampled image handle, data exatly as is in memory, no
sampling operations will be performed, and the bindless_image_sampler
passed
to the image upon creation has no effect on the returned image data. Note that
not all devices may support fetching of sampled image data depending on the
dimension or backing memory type. We provide device aspect queries for this in
Querying sampled image fetch support.
The user is required to pass a DataT
template parameter, which specifies the
return type of the fetch_image
and sample_image
functions. If DataT
is
not a recognized standard type, as defined in Recognized standard types,
and instead a user-defined type, the user must provide a HintT
template
parameter to the fetch_image
and sample_image
functions, to allow the
backend to select the correct device intrinsic to fetch or sample their data.
HintT
must be one of the the Recognized standard types, and must be the
same size as DataT
.
If DataT
is a recognized standard type, and HintT
is also passed, HintT
will be ignored.
When fetching or sampling an image backed by a normalized integer channel type,
either DataT
must be a 32-bit or 16-bit floating point value, a sycl::vec
of 32-bit or 16-bit floating point values, or, in the case DataT
is not one
of the above, then HintT
must be one of the above, and be of the same size as
DataT
.
It’s possible to write to an unsampled image via write_image
passing the
handle of the image to be written to, along with the coordinates to write to and
the data. User-defined types are allowed to be written provided that type is
trivially copyable. The user defined type must also be of the same size as any
of the Recognized standard types.
Sampled images cannot be written to using write_image
.
For fetching and writing of unsampled images, coordinates are specified by
int
, sycl::vec<int, 2>
, and sycl::vec<int, 3>
for 1D, 2D, and 3D images,
respectively.
Sampled image "sampled reads" take float
, sycl::vec<float, 2>
, and
sycl::vec<float, 3>
coordinate types for 1D, 2D, and 3D images, respectively.
Sampled image "fetch reads" take int
, sycl::vec<int, 2>
, and
sycl::vec<int, 3>
coordinate types for 1D, 2D, and 3D images, respectively.
Note also that all images must be used in either read-only or write-only fashion within a single kernel invocation; read/write images are not supported.
Note also that read-after-write functionality is not supported. Unsampled images may be read from and written back to within the same kernel, however, reading from that same image again will result in undefined behaviour. A new kernel must be submitted for the written data to be accessible.
Note
|
Attempting to sample a standard sampled image with |
For the purposes of this extension, the following are classified as recognized standard types.
-
All POD types (
char
,short
,int
,float
, etc.) excludingdouble
-
sycl::half
-
Variants of
sycl::vec<T, N>
whereT
is one of the above, andN
is1
,2
, or3
Any other types are classified as user-defined types.
Some examples of a user-defined types may be:
struct my_float4 {
float r, g, b, a;
};
struct my_short2 {
short r, g;
};
When providing the above types as DataT
parameters to an image read function,
the corresponding HintT
parameters to use would be sycl::vec<float, 4>
and
sycl::vec<short, 2>
, respectively.
We provide the following device queries to query support for sampled image fetch with various backing memory types and dimensionalities.
The device aspect descriptors for these queries are:
Device descriptor | Description |
---|---|
|
Indicates if the device is capable of fetching USM backed 1D sampled image data. |
|
Indicates if the device is capable of fetching non-USM backed 1D sampled image data. |
|
Indicates if the device is capable of fetching USM backed 2D sampled image data. |
|
Indicates if the device is capable of fetching non-USM backed 2D sampled image data. |
|
Indicates if the device is capable of fetching USM backed 3D sampled image data. |
|
Indicates if the device is capable of fetching non-USM backed 3D sampled image data. |
So far, we have described how to create and operate on standard bindless images.
Another type of image we propose support for is a mipmapped image. Mipmapped
images are an image type with multiple levels. Each consecutive dimension of a
mipmapped image level is smaller than the previous level. The dimensions of a
succeeding mip level is half that of the preceding level. As an example, a
two-dimensional mipmapped image where the top-most level (level==0
) image has
a width==16
and height==16
, the succeeding level (level==1
) in the mipmap
will have sizes width==8
and height==8
. This pattern continues until either
the final level has sizes of width==1
and height==1
, or the user-specified
maximum mip level has been reached (described by the num_levels
member of
image_descriptor
).
We provide the following device queries to retrieve information on a SYCL implementation of various mipmap features.
The device aspect descriptors for these queries are:
Device descriptor | Description |
---|---|
|
Indicates if the device supports allocating mipmap resources. |
|
Indicates if the device supports sampling mipmap images with anisotropic filtering |
|
Indicates if the device supports using images created from individual mipmap levels |
The device information descriptors for these queries are:
Device descriptor | Return type | Description |
---|---|---|
|
|
Return the maximum anisotropic ratio supported by the device |
Mipmaps are allocated in a similar manner to standard images, however, mipmaps do not support USM backed memory.
Mipmap memory is allocated through alloc_image_mem
. The user should populate
the image_descriptor
with the image type of image_type::mipmap
, and provide
the number of mipmaps levels they wish to allocate. The value of num_levels
must be greater than 1
.
Mipmap memory allocated this way requires the user to free that memory after all
operations using the memory are completed and no more operations operating on
the memory will be scheduled. This is done using free_image_mem
, passing
image_type::mipmap
. Importantly, individual levels of a mipmap must not be
freed before calling free_image_mem
.
The RAII class image_mem
may also be used to perform allocation and
deallocation of mipmap device memory. The constructor and destructor act as a
wrapper for the functions alloc_image_mem
and free_image_mem
respectively.
When the underlying memory of image_mem
is a mipmap,
get_mip_level_mem_handle
can be used to return an image_mem_handle
to a
specific level of the mipmap. This can then be used to copy data to that
specific level or create an image based on that level.
A handle to a mipmap image is acquired in the same way as a
sampled_image_handle
. Mipmaps can only be sampled image types. We can create a
sampled_image_handle
to the allocated mipmap through the create_image
functions which take a bindless_image_sampler
. To sample a mipmap correctly,
the mipmap attributes of this sampler must be defined.
Attempting to create an unsampled_image_handle
to a mipmap will result in a
sycl::exception
with error code sycl::errc::runtime
being thrown.
In order to copy to or from mipmaps, the user should retrieve an individual
level’s image_mem_handle
through image_mem::get_mip_level_mem_handle
, which
can then be passed to ext_oneapi_copy
. The user must ensure that the image
descriptor passed to ext_oneapi_copy
is dimensioned correctly for the mip
level being copied to/from. The provided image_descriptor::get_mip_level_desc
allows the user to retrieve a correctly dimensioned image descriptor for any
level of a given top-level descriptor.
Inside the kernel, it’s possible to sample a mipmap via sample_mipmap
,
passing the sampled_image_handle
, the coordinates, and either the level or
anisotropic gradient values.
The method of sampling a mipmap is different based on which sample_mipmap
function is used, and the sampler attributes passed upon creation of the
mipmap.
// Nearest/linear filtering between mip levels
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT sample_mipmap(const sampled_image_handle &ImageHandle,
const CoordT &Coords,
const float Level);
// Anisotropic filtering
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT sample_mipmap(const sampled_image_handle &ImageHandle,
const CoordT &Coords,
const CoordT &Dx, const CoordT &Dy);
Reading a mipmap follows the same restrictions on what coordinate types may be used as laid out in Reading and writing inside the kernel, and the viewing gradients are bound to the same type as used for the coordinates.
Reading a mipmap by providing a user-defined return DataT
type also follows
the restrictions as laid out in Reading and writing inside the kernel.
Note
|
Attempting to sample a mipmap with |
Another type of image we propose support for is image arrays. Image arrays are images made up of multiple array indices where each index is itself an image and every index has the same dimensionality, size, and data type.
Image arrays may also be referred to as layered images, and the array indices may be referred to layers.
Image arrays are allocated in a similar manner to standard images.
Image array memory is allocated through alloc_image_mem
. The user should
populate the image_descriptor
with the image type of image_type::array
,
and provide the size of the array they wish to allocate. The value of
array_size
must be greater than 1
.
Image array memory allocated this way requires the user to free that memory
after all operations using the memory are completed and no more operations
operating on the memory will be scheduled. This is done using free_image_mem
,
passing image_type::array
.
The RAII class image_mem
may also be used to perform allocation and
deallocation of arrayed image device memory. The constructor and destructor act
as a wrapper for the functions alloc_image_mem
and free_image_mem
respectively.
Note
|
Currently there is no support for image arrays backed by USM. |
A handle to an image array is acquired in the same way as
unsampled_image_handle
. We create the handle through the create_image
functions which take image_descriptor
that has image_type::array
and
array_size
greater than 1
.
Note
|
Currently there is no support for sampled image arrays. |
When copying to or from image arrays, the user should copy to/from the entire
array of images in one call to ext_oneapi_copy
by passing the image arrays'
image_mem_handle
.
Inside the kernel, it’s possible to fetch data from an unsampled image array
via fetch_image_array
, passing the unsampled_image_handle
, the coordinates,
and the array index.
// Fetch an unsampled image array
template <typename DataT, typename HintT = DataT, typename CoordT>
DataT fetch_image_array(const unsampled_image_handle &ImageHandle,
const CoordT &Coords, const unsigned int ArrayLayer);
Fetching an image array follows the same restrictions on what coordinate types may be used as laid out in Reading and writing inside the kernel.
Fetching an image array by providing a user-defined return DataT
type also
follows the restrictions as laid out in Reading and writing inside the kernel.
Note
|
Attempting to fetch an image array with |
Inside the kernel, it’s possible to write to an image array via
write_image_array
, passing the unsampled_image_handle
, the coordinates, the
array index, and the data to write. User-defined types are allowed to be written
provided that type is trivially copyable.
// Write to an unsampled image array
template <typename DataT, typename CoordT>
DataT write_image_array(unsampled_image_handle ImageHandle,
const CoordT &Coords, const unsigned int ArrayLayer
const DataT &Color);
Writing to an image array follows the same restrictions on what coordinate types may be used as laid out in Reading and writing inside the kernel.
Note
|
Attempting to write to an image array with |
Another image type this extension supports is cubemapped images. Cubemap images are a specialisation of 2D image arrays that have exactly six layers representing the faces of a cube where the width and height of each layer (cube face) are equal. Cube mapping is a method of environment mapping, where the environment is projected onto the sides of the cube. Cubemaps have been applied in graphical systems such as skylight illumination, dynamic reflection, and skyboxes.
We provide the following device aspects to retrieve support information on a SYCL implementation of just a couple of cubemap features.
The device aspect descriptors for these queries are:
Device descriptor | Description |
---|---|
|
Indicates if the device supports allocating and accessing cubemap resources |
|
Indicates if the device supports sampling cubemapped images across face bounderies |
As with all other image types, cubemap memory is allocated through
alloc_image_mem
with the appropriately populated image_descriptor
, where
width
and height
are equal, and the type is set to image_type::cubemap
.
Since cubemaps are specialised image arrays, the array_size
must be populated
with the only valid value, 6. Overriding this with any other value for
array_size
could result in an exception or undefined behaviour. Cubemaps are
not supported with USM backed memory.
Cubemap memory allocated this way requires the user to free that memory after
all operations using the memory are completed and no more operations operating
on the memory will be scheduled. This is done using free_image_mem
, passing
image_type::cubemap
.
The RAII class image_mem
may also be used to perform allocation and
deallocation of cubemapped device memory. The constructor and destructor act as
a wrapper for the functions alloc_image_mem
and free_image_mem
respectively.
A handle to a cubemap is acquired in the same way as a standard
image for both
an unsampled_image_handle
and sampled_image_handle
. We create the handle for
a cubemap through the appropriate create_image
functions which take the
image_descriptor
and bindless_image_sampler
for a sampled_image_handle
, or
just the image_descriptor
for an unsampled_image_handle
.
As with allocation, the descriptor must be populated appropriately, i.e.
image_type::cubemap
, width
and height
are equal, and array_size
is equal
to 6. To sample a cubemap as expected, the cubemap sampling attribute of the
sampler, namely seamless_filtering_mode
, must be defined.
In order to copy to or from cubemaps, the user should utilise the provided
ext_oneapi_copy
functions following the details laid out in
Copying image array data.
Cubemaps are supported as both unsampled and sampled images, however, the meaning of their usage is quite different.
An unsampled cubemap is treated as an image array with six layers, i.e. an
integer index denoting a face and two integer coordinates addressing a texel
within the layer corresponding to this face. Inside the kernel, this is done via
fetch_cubemap
, passing the unsampled_image_handle
, the integer coordinates,
int2
, and an integer index denoting the face, int
. Being an unsampled image,
a cubemap can be written with write_cubemap
, passing the
unsampled_image_handle
, the integer coordinates, int2
, and an integer index
denoting the face, int
.
On the other hand, a sampled cubemap is addressed using three floating-point
coordinates x
, y
, and z
that are interpreted as a direction vector
emanating from the centre of the cube and pointing to one face of the cube and a
texel within the layer corresponding to that face. Inside the kernel, this is
done via sample_cubemap
, passing the sampled_image_handle
, the
floating-point coordinates x
, y
, and z
, as a float3
. The method of
sampling depends on the sampler attributes passed upon creation of the cubemap.
// Unsampled cubemap read
template <typename DataT, typename HintT = DataT>
DataT fetch_cubemap(const unsampled_image_handle &ImageHandle,
const int2 &Coords,
const int Face);
// Sampled cubemap read
template <typename DataT, typename HintT = DataT>
DataT sample_cubemap(const sampled_image_handle &ImageHandle,
const float3 &Vec);
// Unsampled cubemap write
template <typename DataT>
void write_cubemap(unsampled_image_handle ImageHandle,
const int2 &Coords,
const int Face,
const DataT &Color);
Note
|
Attempting to read or write to a cubemap with any other defined read/write function will result in undefined behaviour. |
We provide the following device queries to retrieve information on whether a SYCL implementation provides support for various interoperability features.
The device aspect descriptors for these queries are:
Device descriptor | Description |
---|---|
|
Indicates if the device supports importing external memory resources. |
|
Indicates if the device supports exporting internal memory resources. |
|
Indicates if the device supports importing external semaphore resources. |
|
Indicates if the device supports exporting internal event resources. |
Note
|
Not all SYCL backends may provide support for importing or exporting native memory or semaphore objects. CUDA for example only supports importation of external memory and semaphores, but provides no support for their exportation. |
In order to facilitate the importing of a number of different external memory and external semaphore handle types, we propose the following resource structures.
Note
|
We only show three examples of external resource handle types here, but the
|
namespace sycl::ext::oneapi::experimental {
// POSIX file descriptor handle type
struct resource_fd {
int file_descriptor;
};
// Windows NT handle type
struct resource_win32_handle {
void *handle;
};
// Windows NT name type
struct resource_win32_name {
const void *name;
};
}
In order to import a memory object, an external API must provide an appropriate handle to that memory. The exact structure and type of this handle can depend on the external API, and the operating system the application is running on.
External memory import is facilitated through the following proposed descriptor struct.
namespace sycl::ext::oneapi::experimental {
// Descriptor templated on specific resource type
template <typename ResourceType>
struct external_mem_descriptor {
ResourceType external_resource;
size_t size_in_bytes;
};
}
The user should create an external_mem_descriptor
templated on the appropriate
handle type, ResourceType
, for their purposes, e.g. resource_fd
to describe
a POSIX file descriptor resource on Linux systems, or a resource_win32_handle
for Windows NT resource handles.
Once the user populates the external_mem_descriptor
with the appropriate
ResourceType
values, and the size of the external memory in bytes,
they can then import that memory into SYCL through import_external_memory
.
namespace sycl::ext::oneapi::experimental {
struct interop_mem_handle {
using raw_handle_type = /* Implementation defined */;
raw_handle_type raw_handle;
};
template <typename ResourceType>
interop_mem_handle import_external_memory(
external_mem_descriptor<ResourceType> externalMemDescriptor,
const sycl::device &syclDevice,
const sycl::context &syclContext);
template <typename ResourceType>
interop_mem_handle import_external_memory(
external_mem_descriptor<ResourceType> externalMemDescriptor,
const sycl::queue &syclQueue);
image_mem_handle map_external_image_memory(
interop_mem_handle interopMemHandle,
const image_descriptor &imageDescriptor,
const sycl::device &syclDevice,
const sycl::context &syclContext);
image_mem_handle map_external_image_memory(
interop_mem_handle interopMemHandle,
const image_descriptor &imageDescriptor,
const sycl::queue &syclQueue);
}
The resulting interop_mem_handle
can then be mapped, where the resulting type
is an image_mem_handle
. This can be used to construct images in the same way
as memory allocated through alloc_image_mem
. The ext_oneapi_copy
operations
also work with imported memory mapped to image_mem_handle
types.
When calling create_image
with an image_mem_handle
mapped from an external
memory object, the user must ensure that the image descriptor they pass to
create_image
has members that match or map to those of the external API.
A mismatch between any of the width
, height
, depth
, image_channel_type
,
or image_channel_order
members will result in undefined behavior.
Additionally, the image_type
describing the image must match to the image of
the external API. The current supported importable image types are standard
and mipmap
. Attempting to import other image types will result in undefined
behaviour.
Once a user has finished operating on imported memory, they must ensure that
they destroy the imported memory handle through release_external_memory
.
release_external_memory
can only accept interop_mem_handles
that were
created through import_external_memory
.
namespace sycl::ext::oneapi::experimental {
void release_external_memory(interop_mem_handle interopMem,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void release_external_memory(interop_mem_handle interopMem,
const sycl::queue &syclQueue);
}
Destroying or freeing any imported memory through image_mem_free
or
sycl::free
will result in undefined behavior.
In addition to proposing importation of external memory resources, we also propose importation of synchronization primitives. Just like the sharing of memory between APIs described above, any external APIs must provide a valid a handle to a valid semaphore resource they wish to share, and just as external memory resources handles can take different forms of structure and type depending on the API and operating system, so do external semaphore resource handles.
External semaphore import is facilitated through the following proposed descriptor struct.
namespace sycl::ext::oneapi::experimental {
// Descriptor templated on specific resource type
template <typename ResourceType>
struct external_semaphore_descriptor {
ResourceType external_resource;
};
}
The user should create an external_semaphore_descriptor
templated on the
appropriate handle type, ResourceType
, for their purposes, e.g. resource_fd
to describe a POSIX file descriptor resource on Linux systems, or a
resource_win32_handle
for Windows NT resource handles.
Once the user populates the external_semaphore_descriptor
with the appropriate
ResourceType
values, they can then import that semaphore into SYCL through
import_external_semaphore
.
namespace sycl::ext::oneapi::experimental {
struct interop_semaphore_handle {
using raw_handle_type = /* Implementation defined */;
raw_handle_type raw_handle;
};
template <typename ResourceType>
interop_semaphore_handle import_external_semaphore(
external_semaphore_descriptor<ResourceType>
externalSemaphoreDescriptor,
const sycl::device &syclDevice,
const sycl::context &syclContext);
}
template <typename ResourceType>
interop_semaphore_handle import_external_semaphore(
external_semaphore_descriptor<ResourceType>
externalSemaphoreDescriptor,
const sycl::queue &syclQueue);
}
The resulting interop_semaphore_handle
can then be used in a SYCL command
group, to either wait until the semaphore is in the signaled state, or set the
semaphore to a signaled state.
We propose to extend the SYCL queue and handler classes with semaphore waiting and signalling operations.
namespace sycl {
class handler {
public:
void ext_oneapi_wait_external_semaphore(
ext::oneapi::experimental::interop_semaphore_handle
interop_semaphore_handle);
void ext_oneapi_signal_external_semaphore(
ext::oneapi::experimental::interop_semaphore_handle
interop_semaphore_handle);
};
class queue {
public:
event ext_oneapi_wait_external_semaphore(
ext::oneapi::experimental::interop_semaphore_handle
interop_semaphore_handle);
event ext_oneapi_wait_external_semaphore(
ext::oneapi::experimental::interop_semaphore_handle
interop_semaphore_handle,
event DepEvent);
event ext_oneapi_wait_external_semaphore(
ext::oneapi::experimental::interop_semaphore_handle
interop_semaphore_handle,
const std::vector<event> &DepEvents);
event ext_oneapi_signal_external_semaphore(
ext::oneapi::experimental::interop_semaphore_handle
interop_semaphore_handle);
event ext_oneapi_signal_external_semaphore(
ext::oneapi::experimental::interop_semaphore_handle
interop_semaphore_handle,
event DepEvent);
event ext_oneapi_signal_external_semaphore(
ext::oneapi::experimental::interop_semaphore_handle
interop_semaphore_handle,
const std::vector<event> &DepEvents);
};
}
Any operations submitted to the queue after a
ext_oneapi_wait_external_semaphore
call will not begin until the imported
semaphore is in a signaled state.
When ext_oneapi_signal_external_semaphore
is called, the external semaphore
will be set to the signaled state after all commands submitted to the queue
prior to the ext_oneapi_signal_external_semaphore
call complete.
ext_oneapi_wait_external_semaphore
and ext_oneapi_signal_external_semaphore
are non-blocking, asynchronous operations.
The user must ensure to destroy all external semaphore objects once they are no
longer required through destroy_external_semaphore
.
namespace sycl::ext::oneapi::experimental {
void destroy_external_semaphore(interop_semaphore_handle semaphoreHandle,
const sycl::device &syclDevice,
const sycl::context &syclContext);
void destroy_external_semaphore(interop_semaphore_handle semaphoreHandle,
const sycl::queue &syclQueue);
}
// Set up device, queue, and context
sycl::device device;
sycl::queue queue(device);
sycl::context context = queue.get_context();
// Initialize input data
constexpr size_t width = 512;
std::vector<float> dataIn(width);
std::vector<float> dataOut(width);
for (int i = 0; i < width; i++) {
dataIn[i] = static_cast<float>(i);
}
// Image descriptor - can use the same for both images
sycl::ext::oneapi::experimental::image_descriptor desc(
sycl::range{width}, sycl::ext::oneapi::experimental::image_channel_order::r,
sycl::ext::oneapi::experimental::image_channel_type::fp32);
try {
// Extension: returns the device pointer to the allocated memory
sycl::ext::oneapi::experimental::image_mem imgMemoryIn(desc, queue);
sycl::ext::oneapi::experimental::image_mem imgMemoryOut(desc, queue);
// Extension: create the image and return the handle
sycl::ext::oneapi::experimental::unsampled_image_handle imgIn =
sycl::ext::oneapi::experimental::create_image(imgMemoryIn, desc, queue);
sycl::ext::oneapi::experimental::unsampled_image_handle imgOut =
sycl::ext::oneapi::experimental::create_image(imgMemoryOut, desc, queue);
// Extension: copy over data to device
q.ext_oneapi_copy(dataIn.data(), imgMemoryIn, desc);
// Bindless images require manual synchronization
// Wait for copy operation to finish
q.wait_and_throw();
q.submit([&](sycl::handler &cgh) {
// No need to request access, handles captured by value
cgh.parallel_for(width, [=](sycl::id<1> id) {
// Extension: read image data from handle
float pixel = sycl::ext::oneapi::experimental::fetch_image<float>(
imgIn, int(id[0]));
// Extension: write to image data using handle
sycl::ext::oneapi::experimental::write_image(imgOut, int(id[0]), pixel);
});
});
// Using image handles requires manual synchronization
q.wait_and_throw();
// Copy data written to imgOut to host
q.ext_oneapi_copy(imgMemoryOut, dataOut.data(), desc);
// Cleanup
sycl::ext::oneapi::experimental::destroy_image_handle(imgIn, queue);
sycl::ext::oneapi::experimental::destroy_image_handle(imgOut, queue);
} catch (sycl::exception e) {
std::cerr << "SYCL exception caught: " << e.what();
exit(-1);
}
// Validate that `dataIn` correctly transferred to `dataOut`
bool validated = (dataIn == dataOut);
// Set up device, queue, and context
sycl::device device;
sycl::queue queue(device);
sycl::context context = queue.get_context();
// declare image data
size_t numImages = 5;
size_t width = 8;
size_t height = 8;
size_t numPixels = width * height;
std::vector<float> dataIn(numPixels);
std::vector<float> dataOut(numPixels);
std::vector<float> dataExpected(numPixels);
for (int i = 0; i < width; i++) {
for (int j = 0; j < height; j++) {
int index = j + (height * i);
dataIn[index] = index;
dataExpected[index] = index * numImages;
}
}
// Image descriptor - can use the same for all images
sycl::ext::oneapi::experimental::image_descriptor desc(
{width, height}, sycl::ext::oneapi::experimental::image_channel_order::r,
sycl::ext::oneapi::experimental::image_channel_type::fp32);
try {
// Allocate each image and save the handles
std::vector<sycl::ext::oneapi::experimental::image_mem> imgAllocations;
for (int i = 0; i < numImages; i++) {
// Extension: move-construct device allocated memory
imgAllocations.emplace_back(
sycl::ext::oneapi::experimental::image_mem{desc, queue});
}
// Copy over data to device for each image
for (int i = 0; i < numImages; i++) {
// Extension: copy over data to device
q.ext_oneapi_copy(dataIn.data(), imgAllocations[i], desc);
}
// Wait for copy operations to finish
q.wait_and_throw();
// Create the images and return the handles
std::vector<sycl::ext::oneapi::experimental::unsampled_image_handle>
imgHandles;
for (int i = 0; i < numImages; i++) {
// Extension: create the image and return the handle
sycl::ext::oneapi::experimental::unsampled_image_handle imgHandle =
sycl::ext::oneapi::experimental::create_image(imgAllocations[i],
desc, queue);
imgHandles.push_back(imgHandle);
}
sycl::buffer outBuf{dataOut.data(), sycl::range{height, width}};
sycl::buffer imgHandlesBuf{imgHandles.data(), sycl::range{numImages}};
q.submit([&](sycl::handler &cgh) {
sycl::accessor outAcc{outBuf, cgh, sycl::write_only};
sycl::accessor imgHandleAcc{imgHandlesBuf, cgh, sycl::read_only};
cgh.parallel_for(
sycl::nd_range<2>{{width, height}, {width, height}},
[=](sycl::nd_item<2> it) {
size_t dim0 = it.get_local_id(0);
size_t dim1 = it.get_local_id(1);
// Sum each image by reading via its handle
float sum = 0;
for (int i = 0; i < numImages; i++) {
// Extension: read image data from handle
sum += (sycl::ext::oneapi::experimental::fetch_image<float>(
imgHandleAcc[i], sycl::vec<int, 2>(dim0, dim1)));
}
outAcc[sycl::id{dim1, dim0}] = sum;
});
});
// Using image handles requires manual synchronization
q.wait_and_throw();
// Cleanup
for (int i = 0; i < numImages; i++) {
sycl::ext::oneapi::experimental::destroy_image_handle(imgHandles[i], queue);
}
} catch (sycl::exception e) {
std::cerr << "SYCL exception caught: " << e.what();
exit(-1);
}
// Validate that `dataOut` is correct
bool validated = (dataOut == dataExpected);
// Set up device, queue, and context
sycl::device device;
sycl::queue queue(device);
sycl::context context = q.get_context();
// declare image data
constexpr size_t width = 16;
unsigned int num_levels = 2;
std::vector<float> dataIn1(width);
std::vector<float> dataIn2(width / 2);
std::vector<float> dataOut(width);
std::vector<float> dataExpected(width);
int j = 0;
for (int i = 0; i < width; i++) {
dataExpected[i] = static_cast<float>(i + (j + 10));
if (i % 2)
j++;
dataIn1[i] = static_cast<float>(i);
if (i < (N / 2))
dataIn2[i] = static_cast<float>(i + 10);
}
try {
// Image descriptor -- number of levels
sycl::ext::oneapi::experimental::image_descriptor desc(
{width}, sycl::ext::oneapi::experimental::image_channel_order::r,
sycl::ext::oneapi::experimental::image_channel_type::fp32,
sycl::ext::oneapi::experimental::image_type::mipmap, num_levels);
// Allocate the mipmap
sycl::ext::oneapi::experimental::image_mem mip_mem(desc, queue);
// Retrieve level 0
sycl::ext::oneapi::experimental::image_mem_handle img_mem1 =
mip_mem.get_mip_level_mem_handle(0)
// Copy over data to level 0
q.ext_oneapi_copy(dataIn1.data(), img_mem1, desc);
// Copy over data to level 1
q.ext_oneapi_copy(dataIn2.data(), mip_mem.get_mip_level_mem_handle(1),
desc.get_mip_level_desc(1));
q.wait_and_throw();
// Extended sampler object to take in mipmap attributes
sycl::ext::oneapi::experimental::bindless_image_sampler samp(
addressing_mode::mirrored_repeat,
coordinate_normalization_mode::normalized, filtering_mode::nearest,
mipmap_filtering_mode::nearest, 0.0f, (float)num_levels, 8.0f);
// Create a sampled image handle to represent the mipmap
sycl::ext::oneapi::experimental::sampled_image_handle mipHandle =
sycl::ext::oneapi::experimental::create_image(mip_mem, samp, desc, queue);
q.wait_and_throw();
sycl::buffer<float, 1> buf((float *)dataOut.data(), width);
q.submit([&](handler &cgh) {
auto outAcc = buf.get_access<access_mode::write>(cgh, width);
cgh.parallel_for<image_addition>(width, [=](id<1> id) {
float sum = 0;
float x = (static_cast<float>(id[0]) + 0.5f) / static_cast<float>(width);
// Read mipmap level 0 with anisotropic filtering
// and level 1 with level filtering
float px1 = sycl::ext::oneapi::experimental::sample_mipmap<float>(
mipHandle, x, 0.0f, 0.0f);
float px2 = sycl::ext::oneapi::experimental::sample_mipmap<float>(
mipHandle, x, 1.0f);
sum = px1 + px2;
outAcc[id] = sum;
});
});
q.wait_and_throw();
// Cleanup
sycl::ext::oneapi::experimental::destroy_image_handle(mipHandle, queue);
} catch (sycl::exception e) {
std::cerr << "SYCL exception caught! : " << e.what() << "\n";
exit(-1);
} catch (...) {
std::cerr << "Unknown exception caught!\n";
exit(-1);
}
// Validate that `dataOut` is correct
bool validated = (dataOut == dataExpected);
using VecType = sycl::vec<float, 4>;
sycl::device dev;
sycl::queue q(dev);
auto ctxt = q.get_context();
// declare image data
constexpr size_t width = 5;
constexpr size_t array_size = 2;
constexpr size_t N = width;
std::vector<VecType> out(N * array_size);
std::vector<float> expected(N * array_size);
std::vector<float> outBuf(N);
std::vector<VecType> dataIn1(N * array_size);
std::vector<VecType> dataIn2(N * array_size);
for (int i = 0; i < N * array_size; i++) {
// Populate input data (to-be image arrays)
dataIn1[i] = VecType(i);
dataIn2[i] = VecType(2*i);
}
// Populate expected output
for (int i = 0; i < width; i++) {
for (int l = 0; l < array_size; l++) {
expected[l * N + i] = dataIn1[l * N + i][0] + dataIn2[l * N + i][0];
}
}
try {
// Extension: image descriptor -- number of layers
sycl::ext::oneapi::experimental::image_descriptor desc(
{width}, sycl::image_channel_order::rgba, sycl::image_channel_type::fp32,
sycl::ext::oneapi::experimental::image_type::array, 1, array_size);
// Extension: allocate image array memory on device
sycl::ext::oneapi::experimental::image_mem arrayMem1(desc, dev, ctxt);
sycl::ext::oneapi::experimental::image_mem arrayMem2(desc, dev, ctxt);
sycl::ext::oneapi::experimental::image_mem outMem(desc, dev, ctxt);
// Extension: copy over data to device
q.ext_oneapi_copy(dataIn1.data(), arrayMem1.get_handle(), desc);
q.ext_oneapi_copy(dataIn2.data(), arrayMem2.get_handle(), desc);
q.wait_and_throw();
// Extension: create a unsampled image handles to represent the image arrays
sycl::ext::oneapi::experimental::unsampled_image_handle arrayHandle1 =
sycl::ext::oneapi::experimental::create_image(arrayMem1, desc, dev,
ctxt);
sycl::ext::oneapi::experimental::unsampled_image_handle arrayHandle2 =
sycl::ext::oneapi::experimental::create_image(arrayMem2, desc, dev,
ctxt);
sycl::ext::oneapi::experimental::unsampled_image_handle outHandle =
sycl::ext::oneapi::experimental::create_image(outMem, desc, dev,
ctxt);
q.submit([&](sycl::handler &cgh) {
cgh.parallel_for<kernel>(N, [=](sycl::id<1> id) {
float sum1 = 0;
float sum2 = 0;
// Extension: read image layers 0 and 1
VecType px1 = sycl::ext::oneapi::experimental::fetch_image_array<VecType>(
arrayHandle1, int(id[0]), 0);
VecType px2 = sycl::ext::oneapi::experimental::fetch_image_array<VecType>(
arrayHandle1, int(id[0]), 1);
// Extension: read image layers 0 and 1
VecType px3 = sycl::ext::oneapi::experimental::fetch_image_array<VecType>(
arrayHandle2, int(id[0]), 0);
VecType px4 = sycl::ext::oneapi::experimental::fetch_image_array<VecType>(
arrayHandle2, int(id[0]), 1);
sum1 = px1[0] + px3[0];
sum2 = px2[0] + px4[0];
// Extension: write to image layers with handle
sycl::ext::oneapi::experimental::write_image_array<VecType>(
outHandle, int(id[0]), 0, VecType(sum1));
sycl::ext::oneapi::experimental::write_image_array<VecType>(
outHandle, int(id[0]), 1, VecType(sum2));
});
});
q.wait_and_throw();
// Extension: copy data from device to host
q.ext_oneapi_copy(outMem.get_handle(), out.data(), desc);
q.wait_and_throw();
// Extension: cleanup
sycl::ext::oneapi::experimental::destroy_image_handle(arrayHandle1, dev, ctxt);
sycl::ext::oneapi::experimental::destroy_image_handle(arrayHandle2, dev, ctxt);
sycl::ext::oneapi::experimental::destroy_image_handle(outHandle, dev, ctxt);
} catch (sycl::exception e) {
std::cerr << "SYCL exception caught! : " << e.what() << "\n";
std::cout << "Test failed!" << std::endl;
exit(1);
} catch (...) {
std::cerr << "Unknown exception caught!\n";
std::cout << "Test failed!" << std::endl;
exit(2);
}
// collect and validate output
bool validated = true;
for (int i = 0; i < N * array_size; i++) {
bool mismatch = false;
if (out[i][0] != expected[i]) {
mismatch = true;
validated = false;
}
}
if (validated) {
return 0;
}
return 1;
#include <iostream>
#include <sycl/sycl.hpp>
int main() {
namespace syclexp = sycl::ext::oneapi::experimental;
sycl::device dev;
sycl::queue q(dev);
auto ctxt = q.get_context();
// declare image data
// width and height must be equal
size_t width = 8;
size_t height = 8;
size_t N = width * height;
std::vector<float> out(N);
std::vector<float> expected(N);
std::vector<sycl::float4> dataIn1(N * 6);
for (int i = 0; i < width; i++) {
for (int j = 0; j < height; j++) {
for (int k = 0; k < 6; k++) {
dataIn1[i + width * (j + height * k)] = {i + width * (j + height * k),
0, 0, 0};
}
}
}
int j = 0;
for (int i = N - 1; i >= 0; i--) {
expected[j] = (float)i;
j++;
}
// Extension: image descriptor - Cubemap
syclexp::image_descriptor desc(
{width, height}, sycl::image_channel_order::rgba,
sycl::image_channel_type::fp32, syclexp::image_type::cubemap, 1, 6);
syclexp::bindless_image_sampler samp(
sycl::addressing_mode::clamp_to_edge,
sycl::coordinate_normalization_mode::normalized,
sycl::filtering_mode::nearest, syclexp::cubemap_filtering_mode::seamless);
try {
// Extension: allocate memory on device and create the handle
syclexp::image_mem imgMem(desc, dev, ctxt);
// Extension: create the image and return the handle
syclexp::sampled_image_handle imgHandle =
syclexp::create_image(imgMem, samp, desc, dev, ctxt);
// Extension: copy over data to device (handler variant)
q.submit([&](sycl::handler &cgh) {
cgh.ext_oneapi_copy(dataIn1.data(), imgMem.get_handle(), desc);
});
q.wait_and_throw();
sycl::buffer<float, 2> buf((float *)out.data(),
sycl::range<2>{height, width});
q.submit([&](sycl::handler &cgh) {
auto outAcc = buf.get_access<sycl::access_mode::write>(
cgh, sycl::range<2>{height, width});
// Emanating vector scans one face
cgh.parallel_for<kernel>(
sycl::nd_range<2>{{width, height}, {width, height}},
[=](sycl::nd_item<2> it) {
size_t dim0 = it.get_local_id(0);
size_t dim1 = it.get_local_id(1);
// Direction Vector
// x -- largest magnitude
// y -- shifted between [-0.99, 0.99] + offset
// z -- shifted between [-0.99, 0.99] + offset
//
// [-0.99, 0.99] -- maintains x as largest magnitude
//
// 4 elems == [-1, -0.5, 0, 0.5] -- need offset to bring uniformity
// +0.25 = [-0.75, -0.25, 0.25, 0.75]
float fdim0 = 1.f;
float fdim1 = (((float(dim0) / (float)width) * 1.98) - 0.99) +
(1.f / (float)width);
float fdim2 = (((float(dim1) / (float)height) * 1.98) - 0.99) +
(1.f / (float)height);
// Extension: read texture cubemap data from handle
sycl::float4 px = syclexp::sample_cubemap<sycl::float4>(
imgHandle, sycl::float3(fdim0, fdim1, fdim2));
outAcc[sycl::id<2>{dim0, dim1}] = px[0];
});
});
q.wait_and_throw();
// Extension: cleanup
syclexp::destroy_image_handle(imgHandle, dev, ctxt);
} catch (sycl::exception e) {
std::cerr << "SYCL exception caught! : " << e.what() << "\n";
return 1;
} catch (...) {
std::cerr << "Unknown exception caught!\n";
return 2;
}
// collect and validate output
bool validated = true;
for (int i = 0; i < N; i++) {
bool mismatch = false;
if (out[i] != expected[i]) {
mismatch = true;
validated = false;
}
if (mismatch) {
std::cout << "Result mismatch! Expected: " << expected[i]
<< ", Actual: " << out[i] << std::endl;
}
}
if (validated) {
std::cout << "Test passed!" << std::endl;
return 0;
}
std::cout << "Test failed!" << std::endl;
return 3;
}
// Set up device, queue, and context
sycl::device device;
sycl::queue queue(device);
sycl::context context = queue.get_context();
size_t width = /* passed from external API */;
size_t height = /* passed from external API */;
sycl::ext::oneapi::experimental::image_channel_order channel_order =
/* mapped from external API */
/* we assume sycl::image_channel_order::r */;
sycl::ext::oneapi::experimental::image_channel_type channel_type =
/* mapped from external API */
/* we assume sycl::image_channel_type::unsigned_int32 */;
// Image descriptor - mapped to external API image layout
sycl::ext::oneapi::experimental::image_descriptor desc(
{width, height}, channel_order, channel_type);
size_t img_size_in_bytes = width * height * sizeof(uint32_t);
int external_input_image_file_descriptor = /* passed from external API */
int external_output_image_file_descriptor = /* passed from external API */
// Extension: populate external memory descriptors
sycl::ext::oneapi::experimental::external_mem_descriptor<
sycl::ext::oneapi::experimental::resource_fd>
input_ext_mem_desc{external_input_image_file_descriptor,
img_size_in_bytes};
sycl::ext::oneapi::experimental::external_mem_descriptor<
sycl::ext::oneapi::experimental::resource_fd>
output_ext_mem_desc{external_output_image_file_descriptor,
img_size_in_bytes};
// An external API semaphore will signal this semaphore before our SYCL commands
// can begin execution
int wait_semaphore_file_descriptor = /* passed from external API */;
// An external API will wait on this semaphore to be signalled by us before it
// can execute some commands
int done_semaphore_file_descriptor = /* passed from external API */;
// Extension: populate external semaphore descriptor.
// We assume POSIX file descriptor resource types
sycl::ext::oneapi::experimental::external_semaphore_descriptor<
sycl::ext::oneapi::experimental::resource_fd>
wait_external_semaphore_desc{wait_semaphore_file_descriptor};
sycl::ext::oneapi::experimental::external_semaphore_descriptor<
sycl::ext::oneapi::experimental::resource_fd>
done_external_semaphore_desc{done_semaphore_file_descriptor};
try {
// Extension: import external semaphores
sycl::ext::oneapi::experimental::interop_semaphore_handle
wait_interop_semaphore_handle =
sycl::ext::oneapi::experimental::import_external_semaphore(
wait_external_semaphore_desc, queue);
sycl::ext::oneapi::experimental::interop_semaphore_handle
done_interop_semaphore_handle =
sycl::ext::oneapi::experimental::import_external_semaphore(
done_external_semaphore_desc, queue);
// Extension: import external memory from descriptors
sycl::ext::oneapi::experimental::interop_mem_handle
input_interop_mem_handle =
sycl::ext::oneapi::experimental::import_external_memory(
input_ext_mem_desc, queue);
sycl::ext::oneapi::experimental::interop_mem_handle
output_interop_mem_handle =
sycl::ext::oneapi::experimental::import_external_memory(
output_ext_mem_desc, queue);
// Extension: map imported external memory to image memory
sycl::ext::oneapi::experimental::image_mem_handle input_mapped_mem_handle =
sycl::ext::oneapi::experimental::map_external_image_memory(
input_interop_mem_handle, desc, queue);
sycl::ext::oneapi::experimental::image_mem_handle output_mapped_mem_handle =
sycl::ext::oneapi::experimental::map_external_image_memory(
output_interop_mem_handle, desc, queue);
// Extension: create images from mapped memory and return the handles
sycl::ext::oneapi::experimental::unsampled_image_handle img_input =
sycl::ext::oneapi::experimental::create_image(
input_mapped_mem_handle, desc, queue);
sycl::ext::oneapi::experimental::unsampled_image_handle img_output =
sycl::ext::oneapi::experimental::create_image(
output_mapped_mem_handle, desc, queue);
// Extension: wait for imported semaphore
q.ext_oneapi_wait_external_semaphore(wait_interop_semaphore_handle)
// Submit our kernel that depends on imported "wait_semaphore_file_descriptor"
q.submit([&](sycl::handler &cgh) {
cgh.parallel_for<>(
sycl::nd_range<2>{{width, height}, {32, 32}},
[=](sycl::nd_item<2> it) {
size_t dim0 = it.get_global_id(0);
size_t dim1 = it.get_global_id(1);
// Extension: read image data from handle to imported image
uint32_t pixel =
sycl::ext::oneapi::experimental::fetch_image<uint32_t>(
img_input, sycl::vec<int, 2>(dim0, dim1));
// Modify the data before writing back
pixel *= 10;
// Extension: write image data using handle to imported image
sycl::ext::oneapi::experimental::write_image(
img_output, sycl::vec<int, 2>(dim0, dim1), pixel);
});
});
// Extension: signal imported semaphore
q.ext_oneapi_signal_external_semaphore(done_interop_semaphore_handle)
// The external API can now use the semaphore it exported to
// "done_semaphore_file_descriptor" to schedule its own command submissions
q.wait_and_throw();
// Extension: destroy all external resources
sycl::ext::oneapi::experimental::release_external_memory(
input_interop_mem_handle, queue);
sycl::ext::oneapi::experimental::release_external_memory(
output_interop_mem_handle, queue);
sycl::ext::oneapi::experimental::destroy_external_semaphore(
wait_interop_semaphore_handle, queue);
sycl::ext::oneapi::experimental::destroy_external_semaphore(
done_interop_semaphore_handle, queue);
sycl::ext::oneapi::experimental::destroy_image_handle(img_input, queue);
sycl::ext::oneapi::experimental::destroy_image_handle(img_output, queue);
} catch (sycl::exception e) {
std::cerr << "SYCL exception caught! : " << e.what() << "\n";
exit(-1);
} catch (...) {
std::cerr << "Unknown exception caught!\n";
exit(-1);
}
The current DPC++ prototype only implements the proposal for the CUDA backend, however we are actively exploring Level Zero with SPIR-V. We are looking at other backend as well in order to ensure the extension can work across different backends.
Because this extension allows images to work in a USM-like model, there are similar limitations to using USM for non-images, mainly the lack of dependency tracking and the need for users to manually synchronize operations.
There are dimension specific limitations:
-
1D - Linear interpolation not possible in the CUDA backend. A workaround is to allocate 2D pitched memory with a height of 1.
-
2D - There are some alignment restrictions. See the "Pitch alignment restrictions and queries" section, or use
pitched_alloc_device
to allocate 2D USM image memory. -
3D - No support at the moment. Possible support in non CUDA backends in the future.
Rev | Date | Changes |
---|---|---|
1 |
2023-02-03 |
Initial draft |
2 |
2023-02-23 |
- Added - Added ability to create images from USM - Added new way to copy images, removed requirement for copy direction - Added image memory information getters to reflect
|
3 |
2023-03-30 |
- Some text clarifications. - Unsampled images can no longer be created from USM. - Added SYCL 1.2.1 - Added - Added device information descriptors for querying pitched allocation size and alignment requirement. - Added - - Created - Renamed - - Added interoperability features - Added support to query bindless image and interoperability capabilities - Added mipmap support |
4 |
2023-06-23 |
- Added - Changed naming and order of some parameters to be consistent throughout the proposal and with core SYCL. - Added variants of functions that take a - Removed standalone wait and signal semaphore functions. These should always go through the queue or handler methods. - Removed - Made all enum types and values unspecified - Moved support queries to device aspects, improved naming of queries for better consistency, and moved device info queries to the experimental namespace. - Added - Fixed - Removed - Removed - Added - Created a list of failure scenarios for - Added a - Specified that - Updated code samples. |
4.1 |
2023-07-21 |
- Made bindless image sampler member names snake-case |
4.2 |
2023-08-18 |
- |
4.3 |
2023-09-08 |
- Clarify how normalized image formats are read
- Remove support for packed normalized image formats
( |
4.4 |
2023-09-12 |
- Added overload with |
4.5 |
2023-09-14 |
- Update wording for allocating images + fix typo |
4.6 |
2023-09-19 |
- Clarify restrictions on reading/writing coordinate types |
4.7 |
2023-10-16 |
- Introduce |
4.8 |
2023-10-25 |
- Change the name of |
4.9 |
2023-11-13 |
- Add that the bindless sampler is default constructible and follows by-value semantics |
4.10 |
2023-11-15 |
- Added constructors for |
5.0 |
2023-11-21 |
- Added section "Recognized standard types", to simplify
wording around what types are allowed to be read or written.
- Allow |
5.1 |
2024-01-17 |
- Added overload for |
5.1 |
2023-12-06 |
- Added unique addressing modes per dimension to the
|
5.2 |
2024-02-14 |
- Image read and write functions now accept 3-component coordinates for 3D reads, instead of 4-component coordinates. |
5.3 |
2024-02-16 |
- Replace |
5.4 |
2024-02-23 |
- Added support for unsampled image arrays.
- Creation of unsampled image arrays.
- Fetching/writing of unsampled image arrays.
- |
5.5 |
2024-02-27 |
- Update interop with mipmap interop and slight redesign
- |
5.6 |
2024-03-04 |
- Added cubemap support.
- Allocation of cubemaps.
- Creation of cubemaps.
- Fetching/writing of unsampled cubemaps and sampling cubemaps.
- |
5.7 |
2024-04-09 |
- Allow fetching of sampled image data through the
|
5.8 |
2024-05-09 |
- Add missing cubemap |