-
Notifications
You must be signed in to change notification settings - Fork 796
[SYCL][Doc] Add sycl_ext_oneapi_cache_size draft #14837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f7fa6c2
[SYCL][Doc] Add sycl_ext_oneapi_cache_size draft
Pennycook 6b908a6
Add note about non-L2 cache sizes on NVIDIA GPUs
Pennycook 4b764bc
Add note about unknown/unsupported cache sizes
Pennycook 229dc32
Make extension experimental and declare namespace
Pennycook 6b57068
Adopt new style for info descriptor
Pennycook d77dbd4
Fix namespace in template argument
Pennycook File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
162 changes: 162 additions & 0 deletions
162
sycl/doc/extensions/proposed/sycl_ext_oneapi_cache_size.asciidoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,162 @@ | ||
| = sycl_ext_oneapi_cache_size | ||
|
|
||
| :source-highlighter: coderay | ||
| :coderay-linenums-mode: table | ||
|
|
||
| // This section needs to be after the document title. | ||
| :doctype: book | ||
| :toc2: | ||
| :toc: left | ||
| :encoding: utf-8 | ||
| :lang: en | ||
| :dpcpp: pass:[DPC++] | ||
| :endnote: —{nbsp}end{nbsp}note | ||
|
|
||
| // Set the default source code type in this document to C++, | ||
| // for syntax highlighting purposes. This is needed because | ||
| // docbook uses c++ and html5 uses cpp. | ||
| :language: {basebackend@docbook:c++:cpp} | ||
|
|
||
|
|
||
| == Notice | ||
|
|
||
| [%hardbreaks] | ||
| Copyright (C) 2024 Intel Corporation. All rights reserved. | ||
|
|
||
| Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks | ||
| of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
| permission by Khronos. | ||
|
|
||
|
|
||
| == Contact | ||
|
|
||
| To report problems with this extension, please open a new issue at: | ||
|
|
||
| https://github.com/intel/llvm/issues | ||
|
|
||
|
|
||
| == Dependencies | ||
|
|
||
| This extension is written against the SYCL 2020 revision 8 specification. All | ||
| references below to the "core SYCL specification" or to section numbers in the | ||
| SYCL specification refer to that revision. | ||
|
|
||
|
|
||
| == Status | ||
|
|
||
| This is a proposed extension specification, intended to gather community | ||
| feedback. Interfaces defined in this specification may not be implemented yet | ||
| or may be in a preliminary state. The specification itself may also change in | ||
| incompatible ways before it is finalized. *Shipping software products should | ||
| not rely on APIs defined in this specification.* | ||
|
|
||
|
|
||
| == Overview | ||
|
|
||
| SYCL 2020's device partitioning functions acknowledge that devices will | ||
| typically have multiple levels of cache (L1, L2, L3 and L4) but its device | ||
| queries only allow developers to request information about one (unnamed) level | ||
| of cache. | ||
|
|
||
| This extension proposes a mechanism to query the availability and size of | ||
| specific levels of cache on individual devices, to help developers with | ||
| performance tuning and writing other cache-aware operations. | ||
|
|
||
|
|
||
| == Specification | ||
|
|
||
| === Feature test macro | ||
|
|
||
| This extension provides a feature-test macro as described in the core SYCL | ||
| specification. An implementation supporting this extension must predefine the | ||
| macro `SYCL_EXT_ONEAPI_CACHE_SIZES` to one of the values defined in the table | ||
| below. Applications can test for the existence of this macro to determine if | ||
| the implementation supports this feature, or applications can test the macro's | ||
| value to determine which of the extension's features the implementation | ||
| supports. | ||
|
|
||
|
|
||
| [%header,cols="1,5"] | ||
| |=== | ||
| |Value | ||
| |Description | ||
|
|
||
| |1 | ||
| |The APIs of this experimental extension are not versioned, so the | ||
| feature-test macro always has this value. | ||
| |=== | ||
|
|
||
|
|
||
| === Cache Levels | ||
|
|
||
| A new `enum` is added to describe the four levels of cache: | ||
|
|
||
| [source,c++] | ||
| ---- | ||
| namespace sycl::ext::oneapi::experimental { | ||
| enum class cache_level : /* unspecified */ | ||
| { | ||
| L1 = 1, | ||
| L2 = 2, | ||
| L3 = 3, | ||
| L4 = 4, | ||
| }; | ||
| } // namespace sycl::ext::oneapi::experimental | ||
| ---- | ||
|
|
||
|
|
||
| === Device Queries | ||
|
|
||
| [source,c++] | ||
| ---- | ||
| namespace sycl::ext::oneapi::experimental::info::device { | ||
| template <cache_level CacheLevel> | ||
| struct cache_size { | ||
| using return_type = size_t; | ||
| }; | ||
| } // namespace sycl::ext::oneapi::experimental::info::device | ||
| ---- | ||
|
|
||
| _Remarks_: Template parameter to `device::get_info`. | ||
|
|
||
| _Returns_: The size in bytes of the cache at the requested `cache_level` for | ||
| this device, or 0 if this level of cache does not exist on this device. | ||
|
|
||
| The set of cache levels for which a device returns a non-zero value is not | ||
| required to be continuous (e.g., a device may report an L1 and L3 cache without | ||
| reporting an L2 cache). | ||
|
|
||
| [_Note:_ Although this may seem an unusual choice, there are several real-life | ||
| devices that name their cache levels such that there are gaps. This extension | ||
| allows for this behavior to minimize the cognitive burden to developers of | ||
| shifting between the naming of cache levels in hardware specification sheets | ||
| and in SYCL. _{endnote}_] | ||
|
|
||
|
|
||
| == Implementation notes | ||
|
|
||
| This non-normative section provides information about one possible | ||
| implementation of this extension. It is not part of the specification of the | ||
| extension's API. | ||
|
|
||
| CUDA exposes an `l2CacheSize` property via the `cudaDeviceProp` struct, which | ||
| could be used to implement the size query for `cache_level::L2`. Other sizes | ||
| could be derived from the Compute Capability. | ||
|
|
||
|
|
||
| == Issues | ||
|
|
||
| . Should devices be able to signal an "unknown"/"unsupported" cache size? | ||
| + | ||
| -- | ||
| *UNRESOLVED*: | ||
| There are many mechanisms that could be used to signal that an implementation | ||
| simply does not know anything about a specific level of cache (e.g., | ||
| an exception, a special return value, an orthogonal query). However, requiring | ||
| implementations to determine and return an accurate size would make the query | ||
| significantly easier for developers to use. | ||
|
|
||
| We should revisit this issue once we have implementation experience across | ||
| multiple backends, which should give us a better idea of how hard it is to | ||
| return accurate cache sizes in practice. | ||
gmlueck marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| -- | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.