-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Specification & design for "if_device_has" #6712
Conversation
Document the limitations that are expected in the first implementation phase for these extensions.
This is the design document for the first implementation phase. More will be added later for the second implementation phase.
* Allow `if_device_has` when AOT compiling for the CPU. * Improve error checking when compiling for an unsupported target.
sycl/doc/extensions/proposed/sycl_ext_intel_device_aspects.asciidoc
Outdated
Show resolved
Hide resolved
Remove redundant list of target names in "OptionalDeviceFeatures.md" by referring to the list in the "sycl_ext_intel_device_aspects" extension.
@kbobrovs and @mdtoguchi: I have removed "Draft" status. Please approve or give comments. @intel/dpcpp-specification-reviewers: I will need an approval from one of you too. |
namespace sycl { | ||
|
||
enum class aspect : /* unspecified */ { | ||
ext_intel_x86_64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this sets a bad precedent for aspects: having a separate aspect for every architecture and SKU is going to lead to a very long list of aspects, and the aspects are set up such that any device can only ever have exactly one of the aspects.
If I understand correctly, testing one of these aspects effectively tests a bunch of other aspects implicitly, so it seems pretty unlikely that anybody would ever use if_device_has<one_of_these_new_aspects>
in conjunction with any other aspects. For example, once you know that something is x86_64, you also know that it supports fp64, atomic64, cpu, and a bunch of other things; nobody would ever write something like if_device_has<ext_intel_x86_64, fp64, atomic64>
.
I think things would be cleaner if we separated these architecture descriptions from aspects. We'd need two query interfaces, but everything else could remain basically as specified. Most of the draft implementation (the macros, the new compiler switches, etc) wouldn't have to be changed -- only the definition of if_device_has
.
That would leave us with:
if_device_has<Aspects...>
=> Test if a device has a specific set of fine-grained featuresif_device_is<Architecture>
=> Test if a device is a specific architecture (since it can only be one of them anyway)
This separation would enable us to teach people that aspects are fine-grained features that may be shared by multiple architectures, while architectures are more coarse-grained. That would enable developers to use aspects for future-proof code (e.g. because all Intel GPUs after a certain architecture are expected to have aspect X), and architectures when they don't care about being future-proof (i.e. because the next Intel GPU will have a different architecture name).
(I'm not tied to the name "architecture" here, but I had to pick something.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are pros and cons to this. Here are some more things to consider:
-
I expected these aspects to also be used (eventually) in the
[[sycl::device_has()]]
C++ attribute or the proposeddevice_has
kernel property. If we change them to be a new enumeration likearchitecture
, we'd have to add a new C++ attribute and / or a new kernel property named something likedevice_architecture_is
. -
Wouldn't we also want a way to test the device architecture from host code? If so, we'd also need to add an extended member function like
device::ext_intel_architecture_is()
. -
I could easily imagine adding designations in the future for categories of GPU architecture like "intel_gpu_xe". Would that be added to the "aspect" enum or to the "architecture" enum? This starts to blur the lines between the two enums, and this might be the strongest reason to represent everything as aspects. If we do have two enums, I think "intel_gpu_xe" would probably be part of the "architecture" enum.
-
Adding a new API like
if_device_architecture_is
let's us define new semantics when multiple things are listed. Althoughif_device_has
uses the "AND" semantic when there are multiple aspects, it would make more sense forif_device_architecture_is
to use "OR" semantics. This does seem useful, and it might be the strongest reason to have two different enums for aspects vs. architectures.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I expected these aspects to also be used (eventually) in the
[[sycl::device_has()]]
C++ attribute or the proposeddevice_has
kernel property. If we change them to be a new enumeration likearchitecture
, we'd have to add a new C++ attribute and / or a new kernel property named something likedevice_architecture_is
.
I think a new kernel property would be okay. Once the machinery for compile-time kernel properties is in place, adding new properties should be much easier than adding new attributes is today.
- Wouldn't we also want a way to test the device architecture from host code? If so, we'd also need to add an extended member function like
device::ext_intel_architecture_is()
.
Good point.
I can see some use-cases where we'd want this to be dynamic, and calling something like ext_intel_architecture_is<ext_intel_x86_64>
would be inconvenient. Perhaps we could also have info::device::ext_intel_architecture
that returns a size_t
matching the architecture's value in the enum
?
- I could easily imagine adding designations in the future for categories of GPU architecture like "intel_gpu_xe". Would that be added to the "aspect" enum or to the "architecture" enum? This starts to blur the lines between the two enums, and this might be the strongest reason to represent everything as aspects. If we do have two enums, I think "intel_gpu_xe" would probably be part of the "architecture" enum.
You're right that this is blurry. Something about an "intel_gpu_xe" aspect representing multiple aspects still feels off to me, so I agree that making it part of the "architecture" enum would make more sense...
We'd have to drop what I said about how if_device_architecture_is
could only return true
for a single "architecture" value, but I think the semantics are still good (e.g. "this device is an 'intel_gpu_xe' and this device is an 'intel_gpu_pvc'").
If we go this route and also had an info::device::ext_intel_architecture
returning a size_t
as I suggested above, then I think we'd want it to return the most precise value (e.g. return "intel_gpu_pvc", not "intel_gpu_xe").
- Adding a new API like
if_device_architecture_is
let's us define new semantics when multiple things are listed. Althoughif_device_has
uses the "AND" semantic when there are multiple aspects, it would make more sense forif_device_architecture_is
to use "OR" semantics. This does seem useful, and it might be the strongest reason to have two different enums for aspects vs. architectures.
Completely agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the idea for info::device::ext_intel_architecture
. Are you saying that you want to support code like:
using sycl::ext::intel::experimental;
architecture a = dev.get_info<info::device::architecture>();
switch (a) {
case architecture::intel_gpu_pvc:
/* ... */
}
If that is the goal, I don't see why the info descriptor should return the type size_t
vs. the architecture
enumeration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it can return the architecture
enumeration. I clearly need more coffee. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aelovikov-intel: This was actually the first iteration of this API! We decided against it because of @AerialMantis's proposal to return an object with "else_if" and "else" member functions. This allows us to write if-elseif-else chains of arbitrary length:
if_architecture_is<architecture::intel_gpu_bdw>([] {
/* ... */
}).else_if_architecture_is<architecture::intel_gpu_skl>([] {
/* ... */
}).else_([] {
/* ... */
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rolandschulz: That's not bad, but it creates a bit of an asymmetry with the else_if_architecture_is
function:
if_architecture_is<architecture::intel_gpu_bdw>([] {
/* ... */
}).else_if_architecture_is<architecture::intel_gpu_skl>([] {
/* ... */
}).or_else([] {
/* ... */
});
Another possibility is to use fallback
:
if_architecture_is<architecture::intel_gpu_bdw>([] {
/* ... */
}).else_if_architecture_is<architecture::intel_gpu_skl>([] {
/* ... */
}).fallback([] {
/* ... */
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.otherwise([] { /*...*/ })
I really like otherwise
. It's the closest to the English I'd use to describe these sorts of conditions: "A if X, B if Y, and Z otherwise".
I don't like fallback
as much. It's a good word for the default path in this sort of device specialization, but it looks out of place to me alongside the if
/else
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it change your mind when you see it in the context of a long chain?
if_architecture_is<architecture::intel_gpu_bdw>([] {
/* ... */
}).else_if_architecture_is<architecture::intel_gpu_skl>([] {
/* ... */
}).otherwise([] {
/* ... */
});
It seems a bit weird to me to have else_if_architecture
and then otherwise
. We could rename the middle function to otherwise_if_architecture_is
, but that's a lot of typing:
if_architecture_is<architecture::intel_gpu_bdw>([] {
/* ... */
}).otherwise_if_architecture_is<architecture::intel_gpu_skl>([] {
/* ... */
}).otherwise([] {
/* ... */
});
My concern with otherwise
(and my concern with or_else
) are both fairly weak, though.
FWIW, I kind of liked else_
because it kept the symmetry with else_if_architecture_is
, but I'm not very wild about the trailing underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it seems like discussion has stopped, I decided on otherwise
. See eb8aed6.
The implementation is divided into two phases. In the first phase, we support | ||
only AOT mode. The second phase adds support also for JIT mode. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a section where all targets are summarized for simpler reference, and all other section would refer to it?
## Target device identification. | |
A number of sections in this document refer to device targets in various | |
contexts. This section provides a table with complete list of the targets | |
supported by this design, where a row corresponds to a specific target, and | |
columns define: | |
Description | |
Description of the target. | |
AOT compiler ID | |
How the target is specified in the AOT compiler invocation in the | |
-fsycl-targets= option. | |
AOT compiler ID alias | |
Alternative AOT compiler ID (alias) for some of the targets. | |
Aspect | |
Specific enum id of sycl::aspect corresponding to the target. | |
Aspect alias | |
Aleternative aspec enum id (with the same value as main enum id) | |
Preprocessor Macro | |
Preprocessor macro (device compiler only) predefined when AOT-compiling | |
for the target. | |
|AOT compiler ID |AOT compiler ID alias |Aspect | Aspect alias |Preprocessor Macro |Description | | |
|-----------------|----------------------|---------------------|---------------------|--------------------------------------|------------------------------------------------| | |
|ptx64 | | | | |Generic 64-bit PTX target architecture | | |
|spir64 | | | | |Generic 64-bit SPIR-V target | | |
|x86_64 | |ext_intel_x86_64 | | |Any CPU device with the x86_64 instruction set. | | |
|intel_gpu_pvc | |ext_intel_gpu_pvc | |`__SYCL_TARGET_INTEL_GPU_PVC__` |Ponte Vecchio Intel graphics architecture | | |
|intel_gpu_acm_g12| |ext_intel_gpu_acm_g12| |`__SYCL_TARGET_INTEL_GPU_ACM_G12__`|Alchemist G12 Intel graphics architecture | | |
|intel_gpu_acm_g11| |ext_intel_gpu_acm_g11| |`__SYCL_TARGET_INTEL_GPU_ACM_G11__`|Alchemist G11 Intel graphics architecture | | |
|intel_gpu_acm_g10| |ext_intel_gpu_acm_g10| |`__SYCL_TARGET_INTEL_GPU_ACM_G10__`|Alchemist G10 Intel graphics architecture | | |
|intel_gpu_dg1 |intel_gpu_12_10_0 |ext_intel_gpu_dg1 |ext_intel_gpu_12_10_0|`__SYCL_TARGET_INTEL_GPU_DG1__` |DG1 Intel graphics architecture | | |
|intel_gpu_adl_n | |ext_intel_gpu_adl_n | |`__SYCL_TARGET_INTEL_GPU_ADL_N__` |Alder Lake N Intel graphics architecture | | |
|intel_gpu_adl_p | |ext_intel_gpu_adl_p | |`__SYCL_TARGET_INTEL_GPU_ADL_P__` |Alder Lake P Intel graphics architecture | | |
|intel_gpu_rpl_s | |ext_intel_gpu_rpl_s | |`__SYCL_TARGET_INTEL_GPU_RPL_S__` |Raptor Lake Intel graphics architecture | | |
|intel_gpu_adl_s | |ext_intel_gpu_adl_s | |`__SYCL_TARGET_INTEL_GPU_ADL_S__` |Alder Lake S Intel graphics architecture | | |
|intel_gpu_rkl | |ext_intel_gpu_rkl | |`__SYCL_TARGET_INTEL_GPU_RKL__` |Rocket Lake Intel graphics architecture | | |
|intel_gpu_tgllp | intel_gpu_12_0_0 |ext_intel_gpu_tgllp |ext_intel_gpu_12_0_0 |`__SYCL_TARGET_INTEL_GPU_TGLLP__` |Tiger Lake Intel graphics architecture | | |
|intel_gpu_ehl | intel_gpu_11_2_0 |ext_intel_gpu_ehl |ext_intel_gpu_11_2_0 |`__SYCL_TARGET_INTEL_GPU_EHL__` |Elkhart Lake Intel graphics architecture | | |
|intel_gpu_icllp | intel_gpu_11_0_0 |ext_intel_gpu_icllp |ext_intel_gpu_11_0_0 |`__SYCL_TARGET_INTEL_GPU_ICLLP__` |Ice Lake Intel graphics architecture | | |
|intel_gpu_cml | intel_gpu_9_7_0 |ext_intel_gpu_cml | ext_intel_gpu_9_7_0 |`__SYCL_TARGET_INTEL_GPU_CML__` |Comet Lake Intel graphics architecture | | |
|intel_gpu_aml | intel_gpu_9_6_0 |ext_intel_gpu_aml | ext_intel_gpu_9_6_0 |`__SYCL_TARGET_INTEL_GPU_AML__` |Amber Lake Intel graphics architecture | | |
|intel_gpu_whl | intel_gpu_9_5_0 |ext_intel_gpu_whl | ext_intel_gpu_9_5_0 |`__SYCL_TARGET_INTEL_GPU_WHL__` |Whiskey Lake Intel graphics architecture | | |
|intel_gpu_glk | intel_gpu_9_4_0 |ext_intel_gpu_glk | ext_intel_gpu_9_4_0 |`__SYCL_TARGET_INTEL_GPU_GLK__` |Gemini Lake Intel graphics architecture | | |
|intel_gpu_apl | intel_gpu_9_3_0 |ext_intel_gpu_apl | ext_intel_gpu_9_3_0 |`__SYCL_TARGET_INTEL_GPU_APL__` |Apollo Lake Intel graphics architecture | | |
|intel_gpu_cfl | intel_gpu_9_2_9 |ext_intel_gpu_cfl | ext_intel_gpu_9_2_9 |`__SYCL_TARGET_INTEL_GPU_CFL__` |Coffee Lake Intel graphics architecture | | |
|intel_gpu_kbl | intel_gpu_9_1_9 |ext_intel_gpu_kbl | ext_intel_gpu_9_1_9 |`__SYCL_TARGET_INTEL_GPU_KBL__` |Kaby Lake Intel graphics architecture | | |
|intel_gpu_skl | intel_gpu_9_0_9 |ext_intel_gpu_skl | ext_intel_gpu_9_0_9 |`__SYCL_TARGET_INTEL_GPU_SKL__` |Skylake Intel graphics architecture | | |
|intel_gpu_bdw | intel_gpu_8_0_0 |ext_intel_gpu_bdw | ext_intel_gpu_8_0_0 |`__SYCL_TARGET_INTEL_GPU_BDW__` |Broadwell Intel graphics architecture | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not ignoring this, but I wanted to resolve @Pennycook's issue about the aspect vs. architecture enums first.
Note, however, that there isn't any duplication in the lists currently in this document, so adding a new section like this doesn't reduce any verbosity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is no duplication, but my point is that e.g. to match e.g. AOT compiler ID
to the corresponding Preprocessor macro
one needs to scroll back and forth.
I'm not insisting, this is just how I would prefer this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a weak preference not to combine these into a single table. The names for the -fsycl-targets
option are relevant to both phase 1 and 2. However, I think the macros like __SYCL_TARGET_INTEL_GPU_BDW__
will be specific to phase 2. Separating them will make it easier to update this document once we implement phase 2.
@AerialMantis mentioning you here so you can see this proposal. Comments welcome! |
We decided not add extended aspects for the device architectures and to add a separate enumeration instead. This means that we will have both `if_device_has` (which tests aspects) and `if_device_architecture_is` (which tests the architecture). The first phase of the implementation plan will implement only `if_device_architecture_is` and only in AOT mode.
* `if_device_architecture_is` -> `if_architecture_is` * `else_device` -> `otherwise`
@intel/llvm-gatekeepers can this be merged? |
Add specifications for two proposed extensions:
"sycl_ext_oneapi_device_if": Allows device code to conditionally use
"optional kernel features" based on the device's aspects.
"sycl_ext_intel_device_architecture": Allows device code to
conditionally use "optional kernel features" based on the device's
architecture.
This PR also adds a design document describing how this can be
implemented. However, the design is split into two phases, and the
document currently describes only the first phase. We expect to update
this design soon(ish) to include the second phase. There are several
limitations imposed by the first phase, and these are documented in the
extension specifications.