[SYCL][Doc] Specification & design for "if_device_has" (#6712)

Add specifications for two proposed extensions: * "sycl_ext_oneapi_device_if": Allows device code to conditionally use "optional kernel features" based on the device's aspects. * "sycl_ext_intel_device_architecture": Allows device code to conditionally use "optional kernel features" based on the device's architecture. This PR also adds a design document describing how this can be implemented. However, the design is split into two phases, and the document currently describes only the first phase. We expect to update this design soon(ish) to include the second phase. There are several limitations imposed by the first phase, and these are documented in the extension specifications.
intel · Sep 26, 2022 · 7f2b17e · 7f2b17e
1 parent 2f58f44
commit 7f2b17e
Show file tree

Hide file tree

Showing 5 changed files with 856 additions and 323 deletions.
diff --git a/sycl/doc/design/DeviceIf.md b/sycl/doc/design/DeviceIf.md
@@ -0,0 +1,272 @@
+# Implementation design for "device\_if" and "device\_architecture"
+
+This document describes the design for the DPC++ implementation of the
+[sycl\_ext\_oneapi\_device\_if][1] and
+[sycl\_ext\_intel\_device\_architecture][2] extensions.
+
+[1]: <../extensions/proposed/sycl_ext_oneapi_device_if.asciidoc>
+[2]: <../extensions/proposed/sycl_ext_intel_device_architecture.asciidoc>
+
+
+## Phased implementation
+
+The implementation is divided into two phases.  In the first phase, we support
+only [sycl\_ext\_intel\_device\_architecture][2] and it is supported only in
+AOT mode.  The second phase adds support for both extensions in both AOT and
+JIT modes.
+
+
+## Changes to compiler driver
+
+Both phases require changes to the `-fsycl-targets` option that is recognized
+by the compiler driver.  The problem is that the current form of that option
+does not identify a specific device name.  As a reminder, the current command
+line for AOT compilation on GPU looks like this:
+
+```
+$ clang++ -fsycl -fsycl-targets=spir64_gen -Xs "-device skl" ...
+```
+
+Notice that `-fsycl-targets` option specifies only the generic name
+`spir64_gen` whereas the device name is passed directly to `ocloc` (the Intel
+GPU AOT compiler) via `-Xs "-device skl"`.  Since the compiler driver merely
+passes the `-Xs` options directly to the underlying `ocloc` without
+understanding them, the compiler driver does not currently know the target
+device(s) of the AOT compilation.
+
+To fix this, the `-fsycl-targets` option should be changed to accept the
+following GPU device names in addition to the target names it currently
+recognizes:
+
+* `intel_gpu_bdw`
+* `intel_gpu_skl`
+* `intel_gpu_kbl`
+* `intel_gpu_cfl`
+* `intel_gpu_apl`
+* `intel_gpu_glk`
+* `intel_gpu_whl`
+* `intel_gpu_aml`
+* `intel_gpu_cml`
+* `intel_gpu_icllp`
+* `intel_gpu_ehl`
+* `intel_gpu_tgllp`
+* `intel_gpu_rkl`
+* `intel_gpu_adl_s`
+* `intel_gpu_rpl_s`
+* `intel_gpu_adl_p`
+* `intel_gpu_adl_n`
+* `intel_gpu_dg1`
+* `intel_gpu_acm_g10`
+* `intel_gpu_acm_g11`
+* `intel_gpu_acm_g12`
+* `intel_gpu_pvc`
+* `intel_gpu_8_0_0` (alias for `intel_gpu_bdw`)
+* `intel_gpu_9_0_9` (alias for `intel_gpu_skl`)
+* `intel_gpu_9_1_9` (alias for `intel_gpu_kbl`)
+* `intel_gpu_9_2_9` (alias for `intel_gpu_cfl`)
+* `intel_gpu_9_3_0` (alias for `intel_gpu_apl`)
+* `intel_gpu_9_4_0` (alias for `intel_gpu_glk`)
+* `intel_gpu_9_5_0` (alias for `intel_gpu_whl`)
+* `intel_gpu_9_6_0` (alias for `intel_gpu_aml`)
+* `intel_gpu_9_7_0` (alias for `intel_gpu_cml`)
+* `intel_gpu_11_0_0` (alias for `intel_gpu_icllp`)
+* `intel_gpu_11_2_0` (alias for `intel_gpu_ehl`)
+* `intel_gpu_12_0_0` (alias for `intel_gpu_tgllp`)
+* `intel_gpu_12_10_0` (alias for `intel_gpu_dg1`)
+
+The above listed device names may not be mixed with the existing target name
+`spir64_gen` on the same command line.  In addition, the user must not pass the
+`-device` option to `ocloc` via `-Xs` or related command line options because
+the compiler driver will pass this option to `ocloc` automatically.
+
+Note that in the first phase of implementation, only one of the above listed
+GPU device names may appear on the command line.  As a result, the first phase
+of implementation supports AOT compilation in this new mode only for a single
+GPU device.
+
+
+## Phase 1
+
+The first phase requires changes only to the compiler driver and to the
+device headers.
+
+### Compiler driver macro predefines
+
+Most of the changes to the compiler driver are described above, but there are
+a few small additional changes that are specific to phase 1.  If the user
+invokes the compiler driver with `-fsycl-targets=<device>` where `<device>` is
+one of the GPU device names listed above, the compiler driver must predefine
+one of the following corresponding C++ macro names:
+
+* `__SYCL_TARGET_INTEL_GPU_BDW__`
+* `__SYCL_TARGET_INTEL_GPU_SKL__`
+* `__SYCL_TARGET_INTEL_GPU_KBL__`
+* `__SYCL_TARGET_INTEL_GPU_CFL__`
+* `__SYCL_TARGET_INTEL_GPU_APL__`
+* `__SYCL_TARGET_INTEL_GPU_GLK__`
+* `__SYCL_TARGET_INTEL_GPU_WHL__`
+* `__SYCL_TARGET_INTEL_GPU_AML__`
+* `__SYCL_TARGET_INTEL_GPU_CML__`
+* `__SYCL_TARGET_INTEL_GPU_ICLLP__`
+* `__SYCL_TARGET_INTEL_GPU_EHL__`
+* `__SYCL_TARGET_INTEL_GPU_TGLLP__`
+* `__SYCL_TARGET_INTEL_GPU_RKL__`
+* `__SYCL_TARGET_INTEL_GPU_ADL_S__`
+* `__SYCL_TARGET_INTEL_GPU_RPL_S__`
+* `__SYCL_TARGET_INTEL_GPU_ADL_P__`
+* `__SYCL_TARGET_INTEL_GPU_ADL_N__`
+* `__SYCL_TARGET_INTEL_GPU_DG1__`
+* `__SYCL_TARGET_INTEL_GPU_ACM_G10__`
+* `__SYCL_TARGET_INTEL_GPU_ACM_G11__`
+* `__SYCL_TARGET_INTEL_GPU_ACM_G12__`
+* `__SYCL_TARGET_INTEL_GPU_PVC__`
+
+If the user invokes the compiler driver with `-fsycl-targets=spir64_x86_64`,
+the compiler driver must predefine the following C++ macro name:
+
+* `__SYCL_TARGET_INTEL_X86_64__`
+
+These macros are an internal implementation detail, so they should not be
+documented to users, and user code should not make use of them.
+
+### Changes to the device headers
+
+The device headers implement the [sycl\_ext\_intel\_device\_architecture][2]
+extension using these predefined macros and leverage `if constexpr` to discard
+statements in the "if" or "else" body when the device does not match one of the
+listed architectures.  The following code snippet illustrates the technique:
+
+```
+namespace sycl {
+namespace ext::intel::experimental {
+
+enum class architecture {
+  x86_64,
+  intel_gpu_bdw,
+  intel_gpu_skl,
+  intel_gpu_kbl
+  // ...
+};
+
+} // namespace ext::intel::experimental
+
+namespace detail {
+
+#ifndef __SYCL_TARGET_INTEL_X86_64__
+#define __SYCL_TARGET_INTEL_X86_64__ 0
+#endif
+#ifndef __SYCL_TARGET_INTEL_GPU_BDW__
+#define __SYCL_TARGET_INTEL_GPU_BDW__ 0
+#endif
+#ifndef __SYCL_TARGET_INTEL_GPU_SKL__
+#define __SYCL_TARGET_INTEL_GPU_SKL__ 0
+#endif
+#ifndef __SYCL_TARGET_INTEL_GPU_KBL__
+#define __SYCL_TARGET_INTEL_GPU_KBL__ 0
+#endif
+// ...
+
+// This is true when the translation unit is compiled in AOT mode with target
+// names that supports the "if_architecture_is" features.  If an unsupported
+// target name is specified via "-fsycl-targets", the associated invocation of
+// the device compiler will set this variable to false, and that will trigger
+// an error for code that uses "if_architecture_is".
+static constexpr bool is_allowable_aot_mode =
+  (__SYCL_TARGET_INTEL_X86_64__ == 1) ||
+  (__SYCL_TARGET_INTEL_GPU_BDW__ == 1) ||
+  (__SYCL_TARGET_INTEL_GPU_SKL__ == 1) ||
+  (__SYCL_TARGET_INTEL_GPU_KBL__ == 1)
+  // ...
+  ;
+
+// One entry for each enumerator in "architecture" telling whether the AOT
+// target matches that architecture.
+static constexpr bool is_aot_for_architecture[] = {
+  (__SYCL_TARGET_INTEL_X86_64__ == 1),
+  (__SYCL_TARGET_INTEL_GPU_BDW__ == 1),
+  (__SYCL_TARGET_INTEL_GPU_SKL__ == 1),
+  (__SYCL_TARGET_INTEL_GPU_KBL__ == 1)
+  // ...
+};
+
+// Read the value of "is_allowable_aot_mode" via a template to defer triggering
+// static_assert() until template instantiation time.
+template<ext::intel::experimental::architecture... Archs>
+constexpr static bool allowable_aot_mode() {
+  return is_allowable_aot_mode;
+}
+
+// Tells if the current device has one of the architectures in the parameter
+// pack.
+template<ext::intel::experimental::architecture... Archs>
+constexpr static bool device_architecture_is() {
+  return (is_aot_for_architecture[static_cast<int>(Archs)] || ...);
+}
+
+// Helper object used to implement "else_if_architecture_is" and "otherwise".
+// The "MakeCall" template parameter tells whether a previous clause in the
+// "if-elseif-elseif ..." chain was true.  When "MakeCall" is false, some
+// previous clause was true, so none of the subsequent
+// "else_if_architecture_is" or "otherwise" member functions should call the
+// user's function.
+template<bool MakeCall>
+class if_architecture_helper {
+ public:
+  template<ext::intel::experimental::architecture ...Archs, typename T,
+           typename ...Args>
+  constexpr auto else_if_architecture_is(T fnTrue, Args ...args) {
+    if constexpr (MakeCall && device_architecture_is<Archs...>()) {
+      fnTrue(args...);
+      return if_architecture_helper<false>{};
+    } else {
+      return if_architecture_helper<MakeCall>{};
+    }
+  }
+
+  template<typename T, typename ...Args>
+  constexpr void otherwise(T fn, Args ...args) {
+    if constexpr (MakeCall) {
+      fn(args...);
+    }
+  }
+};
+
+} // namespace detail
+
+namespace ext::intel::experimental {
+
+template<architecture ...Archs, typename T, typename ...Args>
+constexpr static auto if_architecture_is(T fnTrue, Args ...args) {
+  static_assert(detail::allowable_aot_mode<Archs...>(),
+    "The if_architecture_is function may only be used when AOT "
+    "compiling with '-fsycl-targets=spir64_x86_64' or "
+    "'-fsycl-targets=intel_gpu_*'");
+  if constexpr (detail::device_architecture_is<Archs...>()) {
+    fnTrue(args...);
+    return detail::if_architecture_helper<false>{};
+  } else {
+    return detail::if_architecture_helper<true>{};
+  }
+}
+
+} // namespace ext::intel::experimental
+} // namespace sycl
+```
+
+### Analysis of error checking for unsupported AOT modes
+
+The header file code presented above triggers a `static_assert` if the
+`if_architecture_is` function is used in a translation unit that is compiled
+for an unsupported target.  The only supported targets are `spir64_x86_64` and
+the new `intel_gpu_*` GPU device names.
+
+The error checking relies on the fact that the device compiler is invoked
+separately for each target listed in `-fsycl-target`.  If any target is
+unsupported, the associated device compilation will compute
+`is_allowable_aot_mode` as `false`, and this will trigger the `static_assert`
+in that compilation phase.
+
+
+## Phase 2
+
+TBD.
diff --git a/sycl/doc/design/OptionalDeviceFeatures.md b/sycl/doc/design/OptionalDeviceFeatures.md
@@ -1025,43 +1025,30 @@ architectures need to be identified:
 
 - `-fsycl-targets` option
 - a device configuration file entry
-- `-target` option of the `sycl-aspec-filter` tool
-- a SYCL aspect enum identifier (we expect to add a new SYCL aspect for each
-  device target architecture)
-
-In all such places architecture naming should be the same. In some cases aliases
-are allowed. Below is a list of target architectures supported by DPC++:
-
-| target/alias(es)                     | description                               |
-|--------------------------------------|-------------------------------------------|
-| ptx64                                | Generic 64-bit PTX target architecture    |
-| spir64                               | Generic 64-bit SPIR-V target              |
-| x86_64                               | Generic 64-bit x86 architecture           |
-|                    intel_gpu_pvc     | Ponte Vecchio Intel graphics architecture |
-|                    intel_gpu_acm_g12 | Alchemist G12 Intel graphics architecture |
-|                    intel_gpu_acm_g11 | Alchemist G11 Intel graphics architecture |
-|                    intel_gpu_acm_g10 | Alchemist G10 Intel graphics architecture |
-| intel_gpu_12_10_0, intel_gpu_dg1     | DG1           Intel graphics architecture |
-|                    intel_gpu_adl_n   | Alder Lake N  Intel graphics architecture |
-|                    intel_gpu_adl_p   | Alder Lake P  Intel graphics architecture |
-|                    intel_gpu_rpl_s   | Raptor Lake   Intel graphics architecture |
-|                    intel_gpu_adl_s   | Alder Lake S  Intel graphics architecture |
-|                    intel_gpu_rkl     | Rocket Lake   Intel graphics architecture |
-|  intel_gpu_12_0_0, intel_gpu_tgllp   | Tiger Lake    Intel graphics architecture |
-|  intel_gpu_11_2_0, intel_gpu_ehl     | Elkhart Lake  Intel graphics architecture |
-|  intel_gpu_11_0_0, intel_gpu_icllp   | Ice Lake      Intel graphics architecture |
-|   intel_gpu_9_7_0, intel_gpu_cml     | Comet Lake    Intel graphics architecture |
-|   intel_gpu_9_6_0, intel_gpu_aml     | Amber Lake    Intel graphics architecture |
-|   intel_gpu_9_5_0, intel_gpu_whl     | Whiskey Lake  Intel graphics architecture |
-|   intel_gpu_9_4_0, intel_gpu_glk     | Gemini Lake   Intel graphics architecture |
-|   intel_gpu_9_3_0, intel_gpu_apl     | Apollo Lake   Intel graphics architecture |
-|   intel_gpu_9_2_9, intel_gpu_cfl     | Coffee Lake   Intel graphics architecture |
-|   intel_gpu_9_1_9, intel_gpu_kbl     | Kaby Lake     Intel graphics architecture |
-|   intel_gpu_9_0_9, intel_gpu_skl     | Skylake       Intel graphics architecture |
-|   intel_gpu_8_0_0, intel_gpu_bdw     | Broadwell     Intel graphics architecture |
-
-TODO: Provide full list of AOT targets supported by the identification
-mechanism.
+- `-target` option of the `sycl-aspect-filter` tool
+- a new SYCL enumeration named `architecture`
+
+The following table lists these target names:
+
+| name                 | has `architecture` enum | description                            |
+-----------------------|-------------------------|----------------------------------------|
+| ptx64                | no                      | Generic 64-bit PTX target architecture |
+| spir64               | no                      | Generic 64-bit SPIR-V target           |
+| x86\_64              | yes                     | Generic 64-bit x86 architecture        |
+| intel\_gpu\_\<name\> | yes                     | Intel graphics architecture \<name\>   |
+
+The "name" column in this table lists the possible target names.  Since not all
+targets have a corresponding enumerator in the `architecture` enumeration, the
+second column tells when there is such an enumerator.  The last row in this
+table corresponds to all of the architecture names listed in the
+[sycl\_ext\_intel\_device\_architecture][8] extension whose name starts with
+`intel_gpu_`.
+
+[8]: <../extensions/proposed/sycl_ext_intel_device_architecture.asciidoc>
+
+TODO: This table needs to be filled out for the CPU variants supported by the
+`opencl-aot` tool (avx512, avx2, avx, sse4.2) and for the FPGA targets.  We
+also need to figure out how CUDA fits in here.
 
 Example of clang compilation invocation with 2 AOT targets and generic SPIR-V:
 ```
@@ -1115,15 +1102,15 @@ sub-group size.
 
 ## Appendix: Adding an attribute to 8-byte `atomic_ref`
 
-As described above under ["Changes to DPC++ headers"][8], we need to decorate
+As described above under ["Changes to DPC++ headers"][9], we need to decorate
 any SYCL type representing an optional device feature with the
 `[[sycl_detail::uses_aspects()]]` attribute.  This is somewhat tricky for
 `atomic_ref`, though, because it is only an optional feature when specialized
 for a 8-byte type.  However, we can accomplish this by using partial
 specialization techniques.  The following code snippet demonstrates (best read
 from bottom to top):
 
-[8]: <#changes-to-dpc-headers>
+[9]: <#changes-to-dpc-headers>
 
 ```
 namespace sycl {