Skip to content

Commit

Permalink
[SYCL][Doc] Specification & design for "if_device_has" (#6712)
Browse files Browse the repository at this point in the history
Add specifications for two proposed extensions:

* "sycl_ext_oneapi_device_if": Allows device code to conditionally use
  "optional kernel features" based on the device's aspects.

* "sycl_ext_intel_device_architecture": Allows device code to
  conditionally use "optional kernel features" based on the device's
  architecture.

This PR also adds a design document describing how this can be
implemented.  However, the design is split into two phases, and the
document currently describes only the first phase.  We expect to update
this design soon(ish) to include the second phase.  There are several
limitations imposed by the first phase, and these are documented in the
extension specifications.
  • Loading branch information
gmlueck committed Sep 26, 2022
1 parent 2f58f44 commit 7f2b17e
Show file tree
Hide file tree
Showing 5 changed files with 856 additions and 323 deletions.
272 changes: 272 additions & 0 deletions sycl/doc/design/DeviceIf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
# Implementation design for "device\_if" and "device\_architecture"

This document describes the design for the DPC++ implementation of the
[sycl\_ext\_oneapi\_device\_if][1] and
[sycl\_ext\_intel\_device\_architecture][2] extensions.

[1]: <../extensions/proposed/sycl_ext_oneapi_device_if.asciidoc>
[2]: <../extensions/proposed/sycl_ext_intel_device_architecture.asciidoc>


## Phased implementation

The implementation is divided into two phases. In the first phase, we support
only [sycl\_ext\_intel\_device\_architecture][2] and it is supported only in
AOT mode. The second phase adds support for both extensions in both AOT and
JIT modes.


## Changes to compiler driver

Both phases require changes to the `-fsycl-targets` option that is recognized
by the compiler driver. The problem is that the current form of that option
does not identify a specific device name. As a reminder, the current command
line for AOT compilation on GPU looks like this:

```
$ clang++ -fsycl -fsycl-targets=spir64_gen -Xs "-device skl" ...
```

Notice that `-fsycl-targets` option specifies only the generic name
`spir64_gen` whereas the device name is passed directly to `ocloc` (the Intel
GPU AOT compiler) via `-Xs "-device skl"`. Since the compiler driver merely
passes the `-Xs` options directly to the underlying `ocloc` without
understanding them, the compiler driver does not currently know the target
device(s) of the AOT compilation.

To fix this, the `-fsycl-targets` option should be changed to accept the
following GPU device names in addition to the target names it currently
recognizes:

* `intel_gpu_bdw`
* `intel_gpu_skl`
* `intel_gpu_kbl`
* `intel_gpu_cfl`
* `intel_gpu_apl`
* `intel_gpu_glk`
* `intel_gpu_whl`
* `intel_gpu_aml`
* `intel_gpu_cml`
* `intel_gpu_icllp`
* `intel_gpu_ehl`
* `intel_gpu_tgllp`
* `intel_gpu_rkl`
* `intel_gpu_adl_s`
* `intel_gpu_rpl_s`
* `intel_gpu_adl_p`
* `intel_gpu_adl_n`
* `intel_gpu_dg1`
* `intel_gpu_acm_g10`
* `intel_gpu_acm_g11`
* `intel_gpu_acm_g12`
* `intel_gpu_pvc`
* `intel_gpu_8_0_0` (alias for `intel_gpu_bdw`)
* `intel_gpu_9_0_9` (alias for `intel_gpu_skl`)
* `intel_gpu_9_1_9` (alias for `intel_gpu_kbl`)
* `intel_gpu_9_2_9` (alias for `intel_gpu_cfl`)
* `intel_gpu_9_3_0` (alias for `intel_gpu_apl`)
* `intel_gpu_9_4_0` (alias for `intel_gpu_glk`)
* `intel_gpu_9_5_0` (alias for `intel_gpu_whl`)
* `intel_gpu_9_6_0` (alias for `intel_gpu_aml`)
* `intel_gpu_9_7_0` (alias for `intel_gpu_cml`)
* `intel_gpu_11_0_0` (alias for `intel_gpu_icllp`)
* `intel_gpu_11_2_0` (alias for `intel_gpu_ehl`)
* `intel_gpu_12_0_0` (alias for `intel_gpu_tgllp`)
* `intel_gpu_12_10_0` (alias for `intel_gpu_dg1`)

The above listed device names may not be mixed with the existing target name
`spir64_gen` on the same command line. In addition, the user must not pass the
`-device` option to `ocloc` via `-Xs` or related command line options because
the compiler driver will pass this option to `ocloc` automatically.

Note that in the first phase of implementation, only one of the above listed
GPU device names may appear on the command line. As a result, the first phase
of implementation supports AOT compilation in this new mode only for a single
GPU device.


## Phase 1

The first phase requires changes only to the compiler driver and to the
device headers.

### Compiler driver macro predefines

Most of the changes to the compiler driver are described above, but there are
a few small additional changes that are specific to phase 1. If the user
invokes the compiler driver with `-fsycl-targets=<device>` where `<device>` is
one of the GPU device names listed above, the compiler driver must predefine
one of the following corresponding C++ macro names:

* `__SYCL_TARGET_INTEL_GPU_BDW__`
* `__SYCL_TARGET_INTEL_GPU_SKL__`
* `__SYCL_TARGET_INTEL_GPU_KBL__`
* `__SYCL_TARGET_INTEL_GPU_CFL__`
* `__SYCL_TARGET_INTEL_GPU_APL__`
* `__SYCL_TARGET_INTEL_GPU_GLK__`
* `__SYCL_TARGET_INTEL_GPU_WHL__`
* `__SYCL_TARGET_INTEL_GPU_AML__`
* `__SYCL_TARGET_INTEL_GPU_CML__`
* `__SYCL_TARGET_INTEL_GPU_ICLLP__`
* `__SYCL_TARGET_INTEL_GPU_EHL__`
* `__SYCL_TARGET_INTEL_GPU_TGLLP__`
* `__SYCL_TARGET_INTEL_GPU_RKL__`
* `__SYCL_TARGET_INTEL_GPU_ADL_S__`
* `__SYCL_TARGET_INTEL_GPU_RPL_S__`
* `__SYCL_TARGET_INTEL_GPU_ADL_P__`
* `__SYCL_TARGET_INTEL_GPU_ADL_N__`
* `__SYCL_TARGET_INTEL_GPU_DG1__`
* `__SYCL_TARGET_INTEL_GPU_ACM_G10__`
* `__SYCL_TARGET_INTEL_GPU_ACM_G11__`
* `__SYCL_TARGET_INTEL_GPU_ACM_G12__`
* `__SYCL_TARGET_INTEL_GPU_PVC__`

If the user invokes the compiler driver with `-fsycl-targets=spir64_x86_64`,
the compiler driver must predefine the following C++ macro name:

* `__SYCL_TARGET_INTEL_X86_64__`

These macros are an internal implementation detail, so they should not be
documented to users, and user code should not make use of them.

### Changes to the device headers

The device headers implement the [sycl\_ext\_intel\_device\_architecture][2]
extension using these predefined macros and leverage `if constexpr` to discard
statements in the "if" or "else" body when the device does not match one of the
listed architectures. The following code snippet illustrates the technique:

```
namespace sycl {
namespace ext::intel::experimental {
enum class architecture {
x86_64,
intel_gpu_bdw,
intel_gpu_skl,
intel_gpu_kbl
// ...
};
} // namespace ext::intel::experimental
namespace detail {
#ifndef __SYCL_TARGET_INTEL_X86_64__
#define __SYCL_TARGET_INTEL_X86_64__ 0
#endif
#ifndef __SYCL_TARGET_INTEL_GPU_BDW__
#define __SYCL_TARGET_INTEL_GPU_BDW__ 0
#endif
#ifndef __SYCL_TARGET_INTEL_GPU_SKL__
#define __SYCL_TARGET_INTEL_GPU_SKL__ 0
#endif
#ifndef __SYCL_TARGET_INTEL_GPU_KBL__
#define __SYCL_TARGET_INTEL_GPU_KBL__ 0
#endif
// ...
// This is true when the translation unit is compiled in AOT mode with target
// names that supports the "if_architecture_is" features. If an unsupported
// target name is specified via "-fsycl-targets", the associated invocation of
// the device compiler will set this variable to false, and that will trigger
// an error for code that uses "if_architecture_is".
static constexpr bool is_allowable_aot_mode =
(__SYCL_TARGET_INTEL_X86_64__ == 1) ||
(__SYCL_TARGET_INTEL_GPU_BDW__ == 1) ||
(__SYCL_TARGET_INTEL_GPU_SKL__ == 1) ||
(__SYCL_TARGET_INTEL_GPU_KBL__ == 1)
// ...
;
// One entry for each enumerator in "architecture" telling whether the AOT
// target matches that architecture.
static constexpr bool is_aot_for_architecture[] = {
(__SYCL_TARGET_INTEL_X86_64__ == 1),
(__SYCL_TARGET_INTEL_GPU_BDW__ == 1),
(__SYCL_TARGET_INTEL_GPU_SKL__ == 1),
(__SYCL_TARGET_INTEL_GPU_KBL__ == 1)
// ...
};
// Read the value of "is_allowable_aot_mode" via a template to defer triggering
// static_assert() until template instantiation time.
template<ext::intel::experimental::architecture... Archs>
constexpr static bool allowable_aot_mode() {
return is_allowable_aot_mode;
}
// Tells if the current device has one of the architectures in the parameter
// pack.
template<ext::intel::experimental::architecture... Archs>
constexpr static bool device_architecture_is() {
return (is_aot_for_architecture[static_cast<int>(Archs)] || ...);
}
// Helper object used to implement "else_if_architecture_is" and "otherwise".
// The "MakeCall" template parameter tells whether a previous clause in the
// "if-elseif-elseif ..." chain was true. When "MakeCall" is false, some
// previous clause was true, so none of the subsequent
// "else_if_architecture_is" or "otherwise" member functions should call the
// user's function.
template<bool MakeCall>
class if_architecture_helper {
public:
template<ext::intel::experimental::architecture ...Archs, typename T,
typename ...Args>
constexpr auto else_if_architecture_is(T fnTrue, Args ...args) {
if constexpr (MakeCall && device_architecture_is<Archs...>()) {
fnTrue(args...);
return if_architecture_helper<false>{};
} else {
return if_architecture_helper<MakeCall>{};
}
}
template<typename T, typename ...Args>
constexpr void otherwise(T fn, Args ...args) {
if constexpr (MakeCall) {
fn(args...);
}
}
};
} // namespace detail
namespace ext::intel::experimental {
template<architecture ...Archs, typename T, typename ...Args>
constexpr static auto if_architecture_is(T fnTrue, Args ...args) {
static_assert(detail::allowable_aot_mode<Archs...>(),
"The if_architecture_is function may only be used when AOT "
"compiling with '-fsycl-targets=spir64_x86_64' or "
"'-fsycl-targets=intel_gpu_*'");
if constexpr (detail::device_architecture_is<Archs...>()) {
fnTrue(args...);
return detail::if_architecture_helper<false>{};
} else {
return detail::if_architecture_helper<true>{};
}
}
} // namespace ext::intel::experimental
} // namespace sycl
```

### Analysis of error checking for unsupported AOT modes

The header file code presented above triggers a `static_assert` if the
`if_architecture_is` function is used in a translation unit that is compiled
for an unsupported target. The only supported targets are `spir64_x86_64` and
the new `intel_gpu_*` GPU device names.

The error checking relies on the fact that the device compiler is invoked
separately for each target listed in `-fsycl-target`. If any target is
unsupported, the associated device compilation will compute
`is_allowable_aot_mode` as `false`, and this will trigger the `static_assert`
in that compilation phase.


## Phase 2

TBD.
65 changes: 26 additions & 39 deletions sycl/doc/design/OptionalDeviceFeatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -1025,43 +1025,30 @@ architectures need to be identified:

- `-fsycl-targets` option
- a device configuration file entry
- `-target` option of the `sycl-aspec-filter` tool
- a SYCL aspect enum identifier (we expect to add a new SYCL aspect for each
device target architecture)

In all such places architecture naming should be the same. In some cases aliases
are allowed. Below is a list of target architectures supported by DPC++:

| target/alias(es) | description |
|--------------------------------------|-------------------------------------------|
| ptx64 | Generic 64-bit PTX target architecture |
| spir64 | Generic 64-bit SPIR-V target |
| x86_64 | Generic 64-bit x86 architecture |
| intel_gpu_pvc | Ponte Vecchio Intel graphics architecture |
| intel_gpu_acm_g12 | Alchemist G12 Intel graphics architecture |
| intel_gpu_acm_g11 | Alchemist G11 Intel graphics architecture |
| intel_gpu_acm_g10 | Alchemist G10 Intel graphics architecture |
| intel_gpu_12_10_0, intel_gpu_dg1 | DG1 Intel graphics architecture |
| intel_gpu_adl_n | Alder Lake N Intel graphics architecture |
| intel_gpu_adl_p | Alder Lake P Intel graphics architecture |
| intel_gpu_rpl_s | Raptor Lake Intel graphics architecture |
| intel_gpu_adl_s | Alder Lake S Intel graphics architecture |
| intel_gpu_rkl | Rocket Lake Intel graphics architecture |
| intel_gpu_12_0_0, intel_gpu_tgllp | Tiger Lake Intel graphics architecture |
| intel_gpu_11_2_0, intel_gpu_ehl | Elkhart Lake Intel graphics architecture |
| intel_gpu_11_0_0, intel_gpu_icllp | Ice Lake Intel graphics architecture |
| intel_gpu_9_7_0, intel_gpu_cml | Comet Lake Intel graphics architecture |
| intel_gpu_9_6_0, intel_gpu_aml | Amber Lake Intel graphics architecture |
| intel_gpu_9_5_0, intel_gpu_whl | Whiskey Lake Intel graphics architecture |
| intel_gpu_9_4_0, intel_gpu_glk | Gemini Lake Intel graphics architecture |
| intel_gpu_9_3_0, intel_gpu_apl | Apollo Lake Intel graphics architecture |
| intel_gpu_9_2_9, intel_gpu_cfl | Coffee Lake Intel graphics architecture |
| intel_gpu_9_1_9, intel_gpu_kbl | Kaby Lake Intel graphics architecture |
| intel_gpu_9_0_9, intel_gpu_skl | Skylake Intel graphics architecture |
| intel_gpu_8_0_0, intel_gpu_bdw | Broadwell Intel graphics architecture |

TODO: Provide full list of AOT targets supported by the identification
mechanism.
- `-target` option of the `sycl-aspect-filter` tool
- a new SYCL enumeration named `architecture`

The following table lists these target names:

| name | has `architecture` enum | description |
-----------------------|-------------------------|----------------------------------------|
| ptx64 | no | Generic 64-bit PTX target architecture |
| spir64 | no | Generic 64-bit SPIR-V target |
| x86\_64 | yes | Generic 64-bit x86 architecture |
| intel\_gpu\_\<name\> | yes | Intel graphics architecture \<name\> |

The "name" column in this table lists the possible target names. Since not all
targets have a corresponding enumerator in the `architecture` enumeration, the
second column tells when there is such an enumerator. The last row in this
table corresponds to all of the architecture names listed in the
[sycl\_ext\_intel\_device\_architecture][8] extension whose name starts with
`intel_gpu_`.

[8]: <../extensions/proposed/sycl_ext_intel_device_architecture.asciidoc>

TODO: This table needs to be filled out for the CPU variants supported by the
`opencl-aot` tool (avx512, avx2, avx, sse4.2) and for the FPGA targets. We
also need to figure out how CUDA fits in here.

Example of clang compilation invocation with 2 AOT targets and generic SPIR-V:
```
Expand Down Expand Up @@ -1115,15 +1102,15 @@ sub-group size.

## Appendix: Adding an attribute to 8-byte `atomic_ref`

As described above under ["Changes to DPC++ headers"][8], we need to decorate
As described above under ["Changes to DPC++ headers"][9], we need to decorate
any SYCL type representing an optional device feature with the
`[[sycl_detail::uses_aspects()]]` attribute. This is somewhat tricky for
`atomic_ref`, though, because it is only an optional feature when specialized
for a 8-byte type. However, we can accomplish this by using partial
specialization techniques. The following code snippet demonstrates (best read
from bottom to top):

[8]: <#changes-to-dpc-headers>
[9]: <#changes-to-dpc-headers>

```
namespace sycl {
Expand Down
Loading

0 comments on commit 7f2b17e

Please sign in to comment.