Skip to content
Merged
9 changes: 9 additions & 0 deletions sycl/doc/SYCLInstrumentationUsingXPTI.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,15 @@ All trace point types in bold provide semantic information about the graph, node
| `barrier_begin` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_begin` that marks the beginning of a barrier while enqueuing a command group object</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |
| `barrier_end` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::barrier_end` that marks the end of the barrier that is encountered during enqueue.</li> <li> **parent**: The global graph event that is created during the `graph_create` event.</li> <li> **event**: The event ID will reflect the ID of the command group object that has encountered a barrier during the enqueue operation. </li> <li> **instance**: Unique ID to allow the correlation of the `barrier_begin` event with the `barrier_end` event. </li> <li> **user_data**: String indicating `enqueue.barrier` and the reason for the barrier as a `const char *` </li> <p></p>The reason for the barrier could be one of `Buffer locked by host accessor`, `Blocked by host task` or `Unknown reason`.</div> | <li> Computational Kernels </li> `sycl_device`, `kernel_name`, `from_source`, `sym_function_name`, `sym_source_file_name`, `sym_line_no` <li>Memory operations</li> `memory_object`, `offset`, `access_range`, `allocation_type`, `copy_from`, `copy_to` |

## Buffer management stream `"sycl.experimental.buffer"` Notification Signatures

| Trace Point Type | Parameter Description | Metadata |
| :------------------------: | :-------------------- | :------- |
| `offload_alloc_construct` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::offload_buffer_data_t` that marks offload buffer createtion point</li> <li> **parent**: Event ID created for all functions in the `oneapi.experimental.buffer` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: `nullptr` since no begin-end event alignment is needed. </li> <li> **user_data**: A pointer to `offload_buffer_data_t` object, that includes user object ID, source code location (file name (if available), function name, line number) where the buffer object is created. </li></div> | None |
| `offload_alloc_associate` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::offload_buffer_association_data_t` that provides association between user level buffer object and platform specific memory object</li> <li> **parent**: Event ID created for all functions in the `oneapi.experimental.buffer` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: `nullptr` since no begin-end event alignment is needed.</li> <li> **user_data**: A pointer to `offload_buffer_association_data_t` object, that includes user object ID and platform-specific representation for offload buffer. </li></div> | None |
| `offload_alloc_release` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::offload_buffer_release_data_t` that provides information about release of platform specific memory object</li> <li> **parent**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: `nullptr` since no begin-end event alignment is needed.</li> <li> **user_data**: A pointer to `offload_buffer_association_data_t` object, that includes user object ID and platform-specific representation for offload buffer. </li></div> | None |
| `offload_alloc_construct` | <div style="text-align: left"><li>**trace_type**: `xpti::trace_point_type_t::offload_buffer_data_t` that marks offload buffer createtion point</li> <li> **parent**: Event ID created for all functions in the `oneapi.experimental.buffer` layer.</li> <li> **event**: `nullptr` - since the stream of data just captures functions being called.</li> <li> **instance**: `nullptr` since no begin-end event alignment is needed. </li> <li> **user_data**: A pointer to `offload_buffer_data_t` object, that includes user object ID. </li></div>| None |

## Level Zero Plugin Stream `"oneapi.level_zero.experimental.mem_alloc"` Notification Signatures

| Trace Point Type | Parameter Description | Metadata |
Expand Down
95 changes: 71 additions & 24 deletions sycl/include/CL/sycl/buffer.hpp
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ class buffer {
template <class Container>
using EnableIfContiguous =
detail::void_t<detail::enable_if_t<std::is_convertible<
detail::remove_pointer_t<decltype(
std::declval<Container>().data())> (*)[],
detail::remove_pointer_t<
decltype(std::declval<Container>().data())> (*)[],
const T (*)[]>::value>,
decltype(std::declval<Container>().size())>;
template <class It>
Expand All @@ -73,157 +73,187 @@ class buffer {
std::is_same<ItA, ItB>::value && !std::is_const<ItA>::value, ItA>;

buffer(const range<dimensions> &bufferRange,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)), propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

buffer(const range<dimensions> &bufferRange, AllocatorT allocator,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)), propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
allocator));
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

buffer(T *hostData, const range<dimensions> &bufferRange,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

buffer(T *hostData, const range<dimensions> &bufferRange,
AllocatorT allocator, const property_list &propList = {})
AllocatorT allocator, const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
allocator));
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

template <typename _T = T>
buffer(EnableIfSameNonConstIterators<T, _T> const *hostData,
const range<dimensions> &bufferRange,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

template <typename _T = T>
buffer(EnableIfSameNonConstIterators<T, _T> const *hostData,
const range<dimensions> &bufferRange, AllocatorT allocator,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
allocator));
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

buffer(const std::shared_ptr<T> &hostData,
const range<dimensions> &bufferRange, AllocatorT allocator,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
allocator));
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

buffer(const std::shared_ptr<T[]> &hostData,
const range<dimensions> &bufferRange, AllocatorT allocator,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
allocator));
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

buffer(const std::shared_ptr<T> &hostData,
const range<dimensions> &bufferRange,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

buffer(const std::shared_ptr<T[]> &hostData,
const range<dimensions> &bufferRange,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(bufferRange) {
impl = std::make_shared<detail::buffer_impl>(
hostData, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

template <class InputIterator, int N = dimensions,
typename = EnableIfOneDimension<N>,
typename = EnableIfItInputIterator<InputIterator>>
buffer(InputIterator first, InputIterator last, AllocatorT allocator,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(range<1>(std::distance(first, last))) {
impl = std::make_shared<detail::buffer_impl>(
first, last, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
allocator));
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

template <class InputIterator, int N = dimensions,
typename = EnableIfOneDimension<N>,
typename = EnableIfItInputIterator<InputIterator>>
buffer(InputIterator first, InputIterator last,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(range<1>(std::distance(first, last))) {
impl = std::make_shared<detail::buffer_impl>(
first, last, size() * sizeof(T), detail::getNextPowerOfTwo(sizeof(T)),
propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>());
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

// This constructor is a prototype for a future SYCL specification
template <class Container, int N = dimensions,
typename = EnableIfOneDimension<N>,
typename = EnableIfContiguous<Container>>
buffer(Container &container, AllocatorT allocator,
const property_list &propList = {})
const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range(range<1>(container.size())) {
impl = std::make_shared<detail::buffer_impl>(
container.data(), size() * sizeof(T),
detail::getNextPowerOfTwo(sizeof(T)), propList,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(
allocator));
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

// This constructor is a prototype for a future SYCL specification
template <class Container, int N = dimensions,
typename = EnableIfOneDimension<N>,
typename = EnableIfContiguous<Container>>
buffer(Container &container, const property_list &propList = {})
: buffer(container, {}, propList) {}
buffer(Container &container, const property_list &propList = {},
const detail::code_location CodeLoc = detail::code_location::current())
: buffer(container, {}, propList, CodeLoc) {}

buffer(buffer<T, dimensions, AllocatorT> &b, const id<dimensions> &baseIndex,
const range<dimensions> &subRange)
const range<dimensions> &subRange,
const detail::code_location CodeLoc = detail::code_location::current())
: impl(b.impl), Range(subRange),
OffsetInBytes(getOffsetInBytes<T>(baseIndex, b.Range)),
IsSubBuffer(true) {
impl->constructorNotification(CodeLoc, (void *)impl.get());

if (b.is_sub_buffer())
throw cl::sycl::invalid_object_error(
"Cannot create sub buffer from sub buffer.", PI_INVALID_VALUE);
Expand All @@ -239,7 +269,8 @@ class buffer {
#ifdef __SYCL_INTERNAL_API
template <int N = dimensions, typename = EnableIfOneDimension<N>>
buffer(cl_mem MemObject, const context &SyclContext,
event AvailableEvent = {})
event AvailableEvent = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range{0} {

size_t BufSize = detail::SYCLMemObjT::getBufSizeForContext(
Expand All @@ -250,12 +281,23 @@ class buffer {
detail::pi::cast<pi_native_handle>(MemObject), SyclContext, BufSize,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(),
AvailableEvent);
impl->constructorNotification(CodeLoc, (void *)impl.get());
}
#endif

buffer(const buffer &rhs) = default;
buffer(const buffer &rhs,
const detail::code_location CodeLoc = detail::code_location::current())
: impl(rhs.impl), Range(rhs.Range), OffsetInBytes(rhs.OffsetInBytes),
IsSubBuffer(rhs.IsSubBuffer) {
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

buffer(buffer &&rhs) = default;
buffer(buffer &&rhs,
const detail::code_location CodeLoc = detail::code_location::current())
: impl(std::move(rhs.impl)), Range(rhs.Range),
OffsetInBytes(rhs.OffsetInBytes), IsSubBuffer(rhs.IsSubBuffer) {
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

buffer &operator=(const buffer &rhs) = default;

Expand Down Expand Up @@ -424,7 +466,8 @@ class buffer {
// Interop constructor
template <int N = dimensions, typename = EnableIfOneDimension<N>>
buffer(pi_native_handle MemObject, const context &SyclContext,
event AvailableEvent = {})
event AvailableEvent = {},
const detail::code_location CodeLoc = detail::code_location::current())
: Range{0} {

size_t BufSize = detail::SYCLMemObjT::getBufSizeForContext(
Expand All @@ -435,14 +478,18 @@ class buffer {
MemObject, SyclContext, BufSize,
make_unique_ptr<detail::SYCLMemObjAllocatorHolder<AllocatorT>>(),
AvailableEvent);
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

// Reinterpret contructor
buffer(std::shared_ptr<detail::buffer_impl> Impl,
range<dimensions> reinterpretRange, size_t reinterpretOffset,
bool isSubBuffer)
bool isSubBuffer,
const detail::code_location CodeLoc = detail::code_location::current())
: impl(Impl), Range(reinterpretRange), OffsetInBytes(reinterpretOffset),
IsSubBuffer(isSubBuffer){};
IsSubBuffer(isSubBuffer) {
impl->constructorNotification(CodeLoc, (void *)impl.get());
}

template <typename Type, int N>
size_t getOffsetInBytes(const id<N> &offset, const range<N> &range) {
Expand Down
4 changes: 4 additions & 0 deletions sycl/include/CL/sycl/detail/buffer_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,9 @@ class __SYCL_EXPORT buffer_impl final : public SYCLMemObjT {

void *allocateMem(ContextImplPtr Context, bool InitFromUserData,
void *HostPtr, RT::PiEvent &OutEventToWait) override;
void constructorNotification(const detail::code_location &CodeLoc,
void *UserObj);
void destructorNotification(void *UserObj);

MemObjType getType() const override { return MemObjType::Buffer; }

Expand All @@ -163,6 +166,7 @@ class __SYCL_EXPORT buffer_impl final : public SYCLMemObjT {
BaseT::updateHostMemory();
} catch (...) {
}
destructorNotification(this);
}

void resize(size_t size) { BaseT::MSizeInBytes = size; }
Expand Down
1 change: 1 addition & 0 deletions sycl/source/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ set(SYCL_SOURCES
"detail/sycl_mem_obj_t.cpp"
"detail/usm/usm_impl.cpp"
"detail/util.cpp"
"detail/xpti_registry.cpp"
"accessor.cpp"
"context.cpp"
"device.cpp"
Expand Down
Loading