Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Specification & design for "if_device_has" #6712

Merged
merged 12 commits into from
Sep 26, 2022

Conversation

gmlueck
Copy link
Contributor

@gmlueck gmlueck commented Sep 6, 2022

Add specifications for two proposed extensions:

  • "sycl_ext_oneapi_device_if": Allows device code to conditionally use
    "optional kernel features" based on the device's aspects.

  • "sycl_ext_intel_device_architecture": Allows device code to
    conditionally use "optional kernel features" based on the device's
    architecture.

This PR also adds a design document describing how this can be
implemented. However, the design is split into two phases, and the
document currently describes only the first phase. We expect to update
this design soon(ish) to include the second phase. There are several
limitations imposed by the first phase, and these are documented in the
extension specifications.

Document the limitations that are expected in the first implementation
phase for these extensions.
This is the design document for the first implementation phase.
More will be added later for the second implementation phase.
* Allow `if_device_has` when AOT compiling for the CPU.
* Improve error checking when compiling for an unsupported target.
Remove redundant list of target names in "OptionalDeviceFeatures.md"
by referring to the list in the "sycl_ext_intel_device_aspects"
extension.
@gmlueck gmlueck marked this pull request as ready for review September 9, 2022 19:55
@gmlueck gmlueck requested a review from a team as a code owner September 9, 2022 19:55
@gmlueck
Copy link
Contributor Author

gmlueck commented Sep 9, 2022

@kbobrovs and @mdtoguchi: I have removed "Draft" status. Please approve or give comments.

@intel/dpcpp-specification-reviewers: I will need an approval from one of you too.

namespace sycl {

enum class aspect : /* unspecified */ {
ext_intel_x86_64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sets a bad precedent for aspects: having a separate aspect for every architecture and SKU is going to lead to a very long list of aspects, and the aspects are set up such that any device can only ever have exactly one of the aspects.

If I understand correctly, testing one of these aspects effectively tests a bunch of other aspects implicitly, so it seems pretty unlikely that anybody would ever use if_device_has<one_of_these_new_aspects> in conjunction with any other aspects. For example, once you know that something is x86_64, you also know that it supports fp64, atomic64, cpu, and a bunch of other things; nobody would ever write something like if_device_has<ext_intel_x86_64, fp64, atomic64>.

I think things would be cleaner if we separated these architecture descriptions from aspects. We'd need two query interfaces, but everything else could remain basically as specified. Most of the draft implementation (the macros, the new compiler switches, etc) wouldn't have to be changed -- only the definition of if_device_has.

That would leave us with:

  • if_device_has<Aspects...> => Test if a device has a specific set of fine-grained features
  • if_device_is<Architecture> => Test if a device is a specific architecture (since it can only be one of them anyway)

This separation would enable us to teach people that aspects are fine-grained features that may be shared by multiple architectures, while architectures are more coarse-grained. That would enable developers to use aspects for future-proof code (e.g. because all Intel GPUs after a certain architecture are expected to have aspect X), and architectures when they don't care about being future-proof (i.e. because the next Intel GPU will have a different architecture name).

(I'm not tied to the name "architecture" here, but I had to pick something.)

Copy link
Contributor Author

@gmlueck gmlueck Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are pros and cons to this. Here are some more things to consider:

  • I expected these aspects to also be used (eventually) in the [[sycl::device_has()]] C++ attribute or the proposed device_has kernel property. If we change them to be a new enumeration like architecture, we'd have to add a new C++ attribute and / or a new kernel property named something like device_architecture_is.

  • Wouldn't we also want a way to test the device architecture from host code? If so, we'd also need to add an extended member function like device::ext_intel_architecture_is().

  • I could easily imagine adding designations in the future for categories of GPU architecture like "intel_gpu_xe". Would that be added to the "aspect" enum or to the "architecture" enum? This starts to blur the lines between the two enums, and this might be the strongest reason to represent everything as aspects. If we do have two enums, I think "intel_gpu_xe" would probably be part of the "architecture" enum.

  • Adding a new API like if_device_architecture_is let's us define new semantics when multiple things are listed. Although if_device_has uses the "AND" semantic when there are multiple aspects, it would make more sense for if_device_architecture_is to use "OR" semantics. This does seem useful, and it might be the strongest reason to have two different enums for aspects vs. architectures.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I expected these aspects to also be used (eventually) in the [[sycl::device_has()]] C++ attribute or the proposed device_has kernel property. If we change them to be a new enumeration like architecture, we'd have to add a new C++ attribute and / or a new kernel property named something like device_architecture_is.

I think a new kernel property would be okay. Once the machinery for compile-time kernel properties is in place, adding new properties should be much easier than adding new attributes is today.

  • Wouldn't we also want a way to test the device architecture from host code? If so, we'd also need to add an extended member function like device::ext_intel_architecture_is().

Good point.

I can see some use-cases where we'd want this to be dynamic, and calling something like ext_intel_architecture_is<ext_intel_x86_64> would be inconvenient. Perhaps we could also have info::device::ext_intel_architecture that returns a size_t matching the architecture's value in the enum?

  • I could easily imagine adding designations in the future for categories of GPU architecture like "intel_gpu_xe". Would that be added to the "aspect" enum or to the "architecture" enum? This starts to blur the lines between the two enums, and this might be the strongest reason to represent everything as aspects. If we do have two enums, I think "intel_gpu_xe" would probably be part of the "architecture" enum.

You're right that this is blurry. Something about an "intel_gpu_xe" aspect representing multiple aspects still feels off to me, so I agree that making it part of the "architecture" enum would make more sense...

We'd have to drop what I said about how if_device_architecture_is could only return true for a single "architecture" value, but I think the semantics are still good (e.g. "this device is an 'intel_gpu_xe' and this device is an 'intel_gpu_pvc'").

If we go this route and also had an info::device::ext_intel_architecture returning a size_t as I suggested above, then I think we'd want it to return the most precise value (e.g. return "intel_gpu_pvc", not "intel_gpu_xe").

  • Adding a new API like if_device_architecture_is let's us define new semantics when multiple things are listed. Although if_device_has uses the "AND" semantic when there are multiple aspects, it would make more sense for if_device_architecture_is to use "OR" semantics. This does seem useful, and it might be the strongest reason to have two different enums for aspects vs. architectures.

Completely agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the idea for info::device::ext_intel_architecture. Are you saying that you want to support code like:

using sycl::ext::intel::experimental;

architecture a = dev.get_info<info::device::architecture>();
switch (a) {
  case architecture::intel_gpu_pvc:
  /* ... */
}

If that is the goal, I don't see why the info descriptor should return the type size_t vs. the architecture enumeration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it can return the architecture enumeration. I clearly need more coffee. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aelovikov-intel: This was actually the first iteration of this API! We decided against it because of @AerialMantis's proposal to return an object with "else_if" and "else" member functions. This allows us to write if-elseif-else chains of arbitrary length:

if_architecture_is<architecture::intel_gpu_bdw>([] {
  /* ... */
}).else_if_architecture_is<architecture::intel_gpu_skl>([] {
  /* ... */
}).else_([] {
 /* ... */
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rolandschulz: That's not bad, but it creates a bit of an asymmetry with the else_if_architecture_is function:

if_architecture_is<architecture::intel_gpu_bdw>([] {
  /* ... */
}).else_if_architecture_is<architecture::intel_gpu_skl>([] {
  /* ... */
}).or_else([] {
 /* ... */
});

Another possibility is to use fallback:

if_architecture_is<architecture::intel_gpu_bdw>([] {
  /* ... */
}).else_if_architecture_is<architecture::intel_gpu_skl>([] {
  /* ... */
}).fallback([] {
 /* ... */
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • .otherwise([] { /*...*/ })

I really like otherwise. It's the closest to the English I'd use to describe these sorts of conditions: "A if X, B if Y, and Z otherwise".

I don't like fallback as much. It's a good word for the default path in this sort of device specialization, but it looks out of place to me alongside the if/else.

Copy link
Contributor Author

@gmlueck gmlueck Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it change your mind when you see it in the context of a long chain?

if_architecture_is<architecture::intel_gpu_bdw>([] {
  /* ... */
}).else_if_architecture_is<architecture::intel_gpu_skl>([] {
  /* ... */
}).otherwise([] {
 /* ... */
});

It seems a bit weird to me to have else_if_architecture and then otherwise. We could rename the middle function to otherwise_if_architecture_is, but that's a lot of typing:

if_architecture_is<architecture::intel_gpu_bdw>([] {
  /* ... */
}).otherwise_if_architecture_is<architecture::intel_gpu_skl>([] {
  /* ... */
}).otherwise([] {
 /* ... */
});

My concern with otherwise (and my concern with or_else) are both fairly weak, though.

FWIW, I kind of liked else_ because it kept the symmetry with else_if_architecture_is, but I'm not very wild about the trailing underscore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it seems like discussion has stopped, I decided on otherwise. See eb8aed6.

sycl/doc/design/DeviceIf.md Outdated Show resolved Hide resolved
The implementation is divided into two phases. In the first phase, we support
only AOT mode. The second phase adds support also for JIT mode.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a section where all targets are summarized for simpler reference, and all other section would refer to it?

Suggested change
## Target device identification.
A number of sections in this document refer to device targets in various
contexts. This section provides a table with complete list of the targets
supported by this design, where a row corresponds to a specific target, and
columns define:
Description
Description of the target.
AOT compiler ID
How the target is specified in the AOT compiler invocation in the
-fsycl-targets= option.
AOT compiler ID alias
Alternative AOT compiler ID (alias) for some of the targets.
Aspect
Specific enum id of sycl::aspect corresponding to the target.
Aspect alias
Aleternative aspec enum id (with the same value as main enum id)
Preprocessor Macro
Preprocessor macro (device compiler only) predefined when AOT-compiling
for the target.
|AOT compiler ID |AOT compiler ID alias |Aspect | Aspect alias |Preprocessor Macro |Description |
|-----------------|----------------------|---------------------|---------------------|--------------------------------------|------------------------------------------------|
|ptx64 | | | | |Generic 64-bit PTX target architecture |
|spir64 | | | | |Generic 64-bit SPIR-V target |
|x86_64 | |ext_intel_x86_64 | | |Any CPU device with the x86_64 instruction set. |
|intel_gpu_pvc | |ext_intel_gpu_pvc | |`__SYCL_TARGET_INTEL_GPU_PVC__` |Ponte Vecchio Intel graphics architecture |
|intel_gpu_acm_g12| |ext_intel_gpu_acm_g12| |`__SYCL_TARGET_INTEL_GPU_ACM_G12__`|Alchemist G12 Intel graphics architecture |
|intel_gpu_acm_g11| |ext_intel_gpu_acm_g11| |`__SYCL_TARGET_INTEL_GPU_ACM_G11__`|Alchemist G11 Intel graphics architecture |
|intel_gpu_acm_g10| |ext_intel_gpu_acm_g10| |`__SYCL_TARGET_INTEL_GPU_ACM_G10__`|Alchemist G10 Intel graphics architecture |
|intel_gpu_dg1 |intel_gpu_12_10_0 |ext_intel_gpu_dg1 |ext_intel_gpu_12_10_0|`__SYCL_TARGET_INTEL_GPU_DG1__` |DG1 Intel graphics architecture |
|intel_gpu_adl_n | |ext_intel_gpu_adl_n | |`__SYCL_TARGET_INTEL_GPU_ADL_N__` |Alder Lake N Intel graphics architecture |
|intel_gpu_adl_p | |ext_intel_gpu_adl_p | |`__SYCL_TARGET_INTEL_GPU_ADL_P__` |Alder Lake P Intel graphics architecture |
|intel_gpu_rpl_s | |ext_intel_gpu_rpl_s | |`__SYCL_TARGET_INTEL_GPU_RPL_S__` |Raptor Lake Intel graphics architecture |
|intel_gpu_adl_s | |ext_intel_gpu_adl_s | |`__SYCL_TARGET_INTEL_GPU_ADL_S__` |Alder Lake S Intel graphics architecture |
|intel_gpu_rkl | |ext_intel_gpu_rkl | |`__SYCL_TARGET_INTEL_GPU_RKL__` |Rocket Lake Intel graphics architecture |
|intel_gpu_tgllp | intel_gpu_12_0_0 |ext_intel_gpu_tgllp |ext_intel_gpu_12_0_0 |`__SYCL_TARGET_INTEL_GPU_TGLLP__` |Tiger Lake Intel graphics architecture |
|intel_gpu_ehl | intel_gpu_11_2_0 |ext_intel_gpu_ehl |ext_intel_gpu_11_2_0 |`__SYCL_TARGET_INTEL_GPU_EHL__` |Elkhart Lake Intel graphics architecture |
|intel_gpu_icllp | intel_gpu_11_0_0 |ext_intel_gpu_icllp |ext_intel_gpu_11_0_0 |`__SYCL_TARGET_INTEL_GPU_ICLLP__` |Ice Lake Intel graphics architecture |
|intel_gpu_cml | intel_gpu_9_7_0 |ext_intel_gpu_cml | ext_intel_gpu_9_7_0 |`__SYCL_TARGET_INTEL_GPU_CML__` |Comet Lake Intel graphics architecture |
|intel_gpu_aml | intel_gpu_9_6_0 |ext_intel_gpu_aml | ext_intel_gpu_9_6_0 |`__SYCL_TARGET_INTEL_GPU_AML__` |Amber Lake Intel graphics architecture |
|intel_gpu_whl | intel_gpu_9_5_0 |ext_intel_gpu_whl | ext_intel_gpu_9_5_0 |`__SYCL_TARGET_INTEL_GPU_WHL__` |Whiskey Lake Intel graphics architecture |
|intel_gpu_glk | intel_gpu_9_4_0 |ext_intel_gpu_glk | ext_intel_gpu_9_4_0 |`__SYCL_TARGET_INTEL_GPU_GLK__` |Gemini Lake Intel graphics architecture |
|intel_gpu_apl | intel_gpu_9_3_0 |ext_intel_gpu_apl | ext_intel_gpu_9_3_0 |`__SYCL_TARGET_INTEL_GPU_APL__` |Apollo Lake Intel graphics architecture |
|intel_gpu_cfl | intel_gpu_9_2_9 |ext_intel_gpu_cfl | ext_intel_gpu_9_2_9 |`__SYCL_TARGET_INTEL_GPU_CFL__` |Coffee Lake Intel graphics architecture |
|intel_gpu_kbl | intel_gpu_9_1_9 |ext_intel_gpu_kbl | ext_intel_gpu_9_1_9 |`__SYCL_TARGET_INTEL_GPU_KBL__` |Kaby Lake Intel graphics architecture |
|intel_gpu_skl | intel_gpu_9_0_9 |ext_intel_gpu_skl | ext_intel_gpu_9_0_9 |`__SYCL_TARGET_INTEL_GPU_SKL__` |Skylake Intel graphics architecture |
|intel_gpu_bdw | intel_gpu_8_0_0 |ext_intel_gpu_bdw | ext_intel_gpu_8_0_0 |`__SYCL_TARGET_INTEL_GPU_BDW__` |Broadwell Intel graphics architecture |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not ignoring this, but I wanted to resolve @Pennycook's issue about the aspect vs. architecture enums first.

Note, however, that there isn't any duplication in the lists currently in this document, so adding a new section like this doesn't reduce any verbosity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is no duplication, but my point is that e.g. to match e.g. AOT compiler ID to the corresponding Preprocessor macro one needs to scroll back and forth.
I'm not insisting, this is just how I would prefer this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a weak preference not to combine these into a single table. The names for the -fsycl-targets option are relevant to both phase 1 and 2. However, I think the macros like __SYCL_TARGET_INTEL_GPU_BDW__ will be specific to phase 2. Separating them will make it easier to update this document once we implement phase 2.

sycl/doc/design/DeviceIf.md Show resolved Hide resolved
@gmlueck
Copy link
Contributor Author

gmlueck commented Sep 13, 2022

@AerialMantis mentioning you here so you can see this proposal. Comments welcome!

We decided not add extended aspects for the device architectures and to
add a separate enumeration instead.  This means that we will have both
`if_device_has` (which tests aspects) and `if_device_architecture_is`
(which tests the architecture).  The first phase of the implementation
plan will implement only `if_device_architecture_is` and only in AOT
mode.
Pennycook
Pennycook previously approved these changes Sep 16, 2022
@bader bader changed the title [SYCL] Specification & design for "if_device_has" [SYCL][Doc] Specification & design for "if_device_has" Sep 19, 2022
* `if_device_architecture_is` -> `if_architecture_is`
* `else_device` -> `otherwise`
@gmlueck gmlueck requested a review from a team as a code owner September 26, 2022 15:08
@gmlueck
Copy link
Contributor Author

gmlueck commented Sep 26, 2022

@intel/llvm-gatekeepers can this be merged?

@steffenlarsen steffenlarsen merged commit 7f2b17e into intel:sycl Sep 26, 2022
@gmlueck gmlueck deleted the gmlueck/device-if branch October 5, 2022 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants