Skip to content

Latest commit

 

History

History
636 lines (491 loc) · 14.4 KB

sycl_ext_oneapi_device_architecture.asciidoc

File metadata and controls

636 lines (491 loc) · 14.4 KB

sycl_ext_oneapi_device_architecture

Notice

Copyright © 2022-2023 Intel Corporation. All rights reserved.

Khronos® is a registered trademark and SYCL™ and SPIR™ are trademarks of The Khronos Group Inc. OpenCL™ is a trademark of Apple Inc. used by permission by Khronos.

Contact

To report problems with this extension, please open a new issue at:

Dependencies

This extension is written against the SYCL 2020 revision 5 specification. All references below to the "core SYCL specification" or to section numbers in the SYCL specification refer to that revision.

Status

This is an experimental extension specification, intended to provide early access to features and gather community feedback. Interfaces defined in this specification are implemented in DPC++, but they are not finalized and may change incompatibly in future versions of DPC++ without prior notice. Shipping software products should not rely on APIs defined in this specification.

There are important limitations with the DPC++ implementation of this experimental extension. In particular, some parts of this extension may only be used when the application is compiled in AOT mode. See the section below titled "Limitations with the experimental version" for a full description of the limitations.

Overview

This extension provides a way for device code to query the device architecture on which it is running. This is similar to the sycl_ext_oneapi_device_if extension except the comparison is for the device’s architecture not the device’s aspects. In some cases, low-level application code can use special features or do specific optimizations depending on the device architecture, and this extension enables such applications.

Specification

Feature test macro

This extension provides a feature-test macro as described in the core SYCL specification. An implementation supporting this extension must predefine the macro SYCL_EXT_ONEAPI_DEVICE_ARCHITECTURE to one of the values defined in the table below. Applications can test for the existence of this macro to determine if the implementation supports this feature, or applications can test the macro’s value to determine which of the extension’s features the implementation supports.

Value Description

1

The APIs of this experimental extension are not versioned, so the feature-test macro always has this value.

New enumeration of architectures

This extension adds a new enumeration of the architectures that can be tested.

namespace sycl::ext::oneapi::experimental {

enum class architecture : /* unspecified */ {
  x86_64,
  intel_gpu_bdw,
  intel_gpu_skl,
  intel_gpu_kbl,
  intel_gpu_cfl,
  intel_gpu_apl,
  intel_gpu_bxt = intel_gpu_apl,
  intel_gpu_glk,
  intel_gpu_whl,
  intel_gpu_aml,
  intel_gpu_cml,
  intel_gpu_icllp,
  intel_gpu_ehl,
  intel_gpu_jsl = intel_gpu_ehl,
  intel_gpu_tgllp,
  intel_gpu_rkl,
  intel_gpu_adl_s,
  intel_gpu_rpl_s = intel_gpu_adl_s,
  intel_gpu_adl_p,
  intel_gpu_adl_n,
  intel_gpu_dg1,
  intel_gpu_acm_g10,
  intel_gpu_dg2_g10 = intel_gpu_acm_g10,
  intel_gpu_acm_g11,
  intel_gpu_dg2_g11 = intel_gpu_acm_g11,
  intel_gpu_acm_g12,
  intel_gpu_dg2_g12 = intel_gpu_acm_g12,
  intel_gpu_pvc,

  nvidia_gpu_sm_50,
  nvidia_gpu_sm_52,
  nvidia_gpu_sm_53,
  nvidia_gpu_sm_60,
  nvidia_gpu_sm_61,
  nvidia_gpu_sm_62,
  nvidia_gpu_sm_70,
  nvidia_gpu_sm_72,
  nvidia_gpu_sm_75,
  nvidia_gpu_sm_80,
  nvidia_gpu_sm_86,
  nvidia_gpu_sm_87,
  nvidia_gpu_sm_89,
  nvidia_gpu_sm_90,

  amd_gpu_gfx700,
  amd_gpu_gfx701,
  amd_gpu_gfx702,
  amd_gpu_gfx801,
  amd_gpu_gfx802,
  amd_gpu_gfx803,
  amd_gpu_gfx805,
  amd_gpu_gfx810,
  amd_gpu_gfx900,
  amd_gpu_gfx902,
  amd_gpu_gfx904,
  amd_gpu_gfx906,
  amd_gpu_gfx908,
  amd_gpu_gfx90a,
  amd_gpu_gfx1010,
  amd_gpu_gfx1011,
  amd_gpu_gfx1012,
  amd_gpu_gfx1013,
  amd_gpu_gfx1030,
  amd_gpu_gfx1031,
  amd_gpu_gfx1032,
  amd_gpu_gfx1034,

  intel_gpu_8_0_0 = intel_gpu_bdw,
  intel_gpu_9_0_9 = intel_gpu_skl,
  intel_gpu_9_1_9 = intel_gpu_kbl
  intel_gpu_9_2_9 = intel_gpu_cfl,
  intel_gpu_9_3_0 = intel_gpu_apl,
  intel_gpu_9_4_0 = intel_gpu_glk,
  intel_gpu_9_5_0 = intel_gpu_whl,
  intel_gpu_9_6_0 = intel_gpu_aml,
  intel_gpu_9_7_0 = intel_gpu_cml,
  intel_gpu_11_0_0 = intel_gpu_icllp,
  intel_gpu_12_0_0 = intel_gpu_tgllp,
  intel_gpu_12_10_0 = intel_gpu_dg1
};

} // namespace sycl::ext::oneapi::experimental

The following table tells which version of this extension first included each of these enumerators, and it provides a brief description of their meanings.

Enumerator name Added in version Description

x86_64

-

Any CPU device with the x86_64 instruction set.

intel_gpu_bdw

-

Broadwell Intel graphics architecture.

intel_gpu_skl

-

Broadwell Intel graphics architecture.

intel_gpu_kbl

-

Kaby Lake Intel graphics architecture.

intel_gpu_cfl

-

Coffee Lake Intel graphics architecture.

intel_gpu_apl

-

Apollo Lake Intel graphics architecture.

intel_gpu_glk

-

Gemini Lake Intel graphics architecture.

intel_gpu_whl

-

Whiskey Lake Intel graphics architecture.

intel_gpu_aml

-

Amber Lake Intel graphics architecture.

intel_gpu_cml

-

Comet Lake Intel graphics architecture.

intel_gpu_icllp

-

Ice Lake Intel graphics architecture.

intel_gpu_tgllp

-

Tiger Lake Intel graphics architecture.

intel_gpu_rkl

-

Rocket Lake Intel graphics architecture.

intel_gpu_adl_s
intel_gpu_rpl_s

-

Alder Lake S Intel graphics architecture or Raptor Lake Intel graphics architecture.

intel_gpu_adl_p

-

Alder Lake P Intel graphics architecture.

intel_gpu_adl_n

-

Alder Lake N Intel graphics architecture.

intel_gpu_dg1

-

DG1 Intel graphics architecture.

intel_gpu_acm_g10

-

Alchemist G10 Intel graphics architecture.

intel_gpu_acm_g11

-

Alchemist G11 Intel graphics architecture.

intel_gpu_acm_g12

-

Alchemist G12 Intel graphics architecture.

intel_gpu_pvc

-

Ponte Vecchio Intel graphics architecture.

intel_gpu_8_0_0

-

Alias for intel_gpu_bdw.

intel_gpu_9_0_9

-

Alias for intel_gpu_skl.

intel_gpu_9_1_9

-

Alias for intel_gpu_kbl.

intel_gpu_9_2_9

-

Alias for intel_gpu_cfl.

intel_gpu_9_3_0

-

Alias for intel_gpu_apl.

intel_gpu_9_4_0

-

Alias for intel_gpu_glk.

intel_gpu_9_5_0

-

Alias for intel_gpu_whl.

intel_gpu_9_6_0

-

Alias for intel_gpu_aml.

intel_gpu_9_7_0

-

Alias for intel_gpu_cml.

intel_gpu_11_0_0

-

Alias for intel_gpu_icllp.

intel_gpu_12_0_0

-

Alias for intel_gpu_tgllp.

intel_gpu_12_10_0

-

Alias for intel_gpu_dg1.

nvidia_gpu_sm_50

-

NVIDIA Maxwell architecture (compute capability 5.0).

nvidia_gpu_sm_52

-

NVIDIA Maxwell architecture (compute capability 5.2).

nvidia_gpu_sm_53

-

NVIDIA Maxwell architecture (compute capability 5.3).

nvidia_gpu_sm_60

-

NVIDIA Pascal architecture (compute capability 6.0).

nvidia_gpu_sm_61

-

NVIDIA Pascal architecture (compute capability 6.1).

nvidia_gpu_sm_62

-

NVIDIA Pascal architecture (compute capability 6.2).

nvidia_gpu_sm_70

-

NVIDIA Volta architecture (compute capability 7.0).

nvidia_gpu_sm_72

-

NVIDIA Volta architecture (compute capability 7.2).

nvidia_gpu_sm_75

-

NVIDIA Turing architecture (compute capability 7.5).

nvidia_gpu_sm_80

-

NVIDIA Ampere architecture (compute capability 8.0).

nvidia_gpu_sm_86

-

NVIDIA Ampere architecture (compute capability 8.6).

nvidia_gpu_sm_87

-

Jetson/Drive AGX Orin architecture.

nvidia_gpu_sm_89

-

NVIDIA Ada Lovelace architecture.

nvidia_gpu_sm_90

-

NVIDIA Hopper architecture.

amd_gpu_gfx700

-

AMD GCN GFX7 (Sea Islands (CI)) architecture.

amd_gpu_gfx701

-

AMD GCN GFX7 (Sea Islands (CI)) architecture.

amd_gpu_gfx702

-

AMD GCN GFX7 (Sea Islands (CI)) architecture.

amd_gpu_gfx801

-

AMD GCN GFX8 (Volcanic Islands (VI)) architecture.

amd_gpu_gfx802

-

AMD GCN GFX8 (Volcanic Islands (VI)) architecture.

amd_gpu_gfx803

-

AMD GCN GFX8 (Volcanic Islands (VI)) architecture.

amd_gpu_gfx805

-

AMD GCN GFX8 (Volcanic Islands (VI)) architecture.

amd_gpu_gfx810

-

AMD GCN GFX8 (Volcanic Islands (VI)) architecture.

amd_gpu_gfx900

-

AMD GCN GFX9 (Vega) architecture.

amd_gpu_gfx902

-

AMD GCN GFX9 (Vega) architecture.

amd_gpu_gfx904

-

AMD GCN GFX9 (Vega) architecture.

amd_gpu_gfx906

-

AMD GCN GFX9 (Vega) architecture.

amd_gpu_gfx908

-

AMD GCN GFX9 (Vega) architecture.

amd_gpu_gfx90a

-

AMD GCN GFX9 (Vega) architecture.

amd_gpu_gfx1010

-

AMD GCN GFX10.1 (RDNA 1) architecture.

amd_gpu_gfx1011

-

AMD GCN GFX10.1 (RDNA 1) architecture.

amd_gpu_gfx1012

-

AMD GCN GFX10.1 (RDNA 1) architecture.

amd_gpu_gfx1013

-

AMD GCN GFX10.1 (RDNA 1) architecture.

amd_gpu_gfx1030

-

AMD GCN GFX10.3 (RDNA 2) architecture.

amd_gpu_gfx1031

-

GCN GFX10.3 (RDNA 2) architecture.

amd_gpu_gfx1032

-

GCN GFX10.3 (RDNA 2) architecture.

amd_gpu_gfx1034

-

GCN GFX10.3 (RDNA 2) architecture.

Note
  • An "alias" enumerator is generally added for new devices only after hardware has finalized and the exact version is known.

  • For NVIDIA GPUs, the architecture enumerator corresponds to the compute capability of the device, and ext_oneapi_architecture_is can be used similarly to the __CUDA_ARCH__ macro in CUDA.

New if_architecture_is free function

This extension adds one new free function which may be called from device code. This function is not available in host code.

namespace sycl::ext::oneapi::experimental {

template<architecture ...Archs, typename T>
/* unspecified */ if_architecture_is(T fn);

} // namespace sycl::ext::oneapi::experimental

This function operates exactly like if_device_has from the sycl_ext_oneapi_device_if extension except that the condition gating execution of the callable function fn is determined by the Archs parameter pack. This condition is true if the device which executes if_architecture_is matches any of the architectures listed in this pack.

The value returned by if_architecture_is is an object F of an unspecified type, which provides the following member functions:

class /* unspecified */ {
 public:
  template<architecture ...Archs, typename T>
  /* unspecified */ else_if_architecture_is(T fn);

  template<typename T>
  void otherwise(T fn);
};

The otherwise function behaves exactly like the otherwise function from the sycl_ext_oneapi_device_if extension. The else_if_architecture_is function behaves exactly like else_if_device_has from that extension except that the condition gating execution of the callable object fn is determined by the Archs parameter pack. This condition is true only if the object F comes from a previous call to if_architecture_is or else_if_architecture_is whose condition is false and if the device calling else_if_architecture_is has one of the architectures in the Archs parameter pack.

New member function of device class

This extension adds the following new member function to the device class, which returns a Boolean telling whether the device has the specified architecture.

namespace sycl {

class device {
  bool ext_oneapi_architecture_is(
    ext::oneapi::experimental::architecture arch);
};

// namespace sycl

New device descriptor

Device descriptor Return type Description

ext::oneapi::experimental::info::device::architecture

ext::oneapi::experimental::architecture

Returns the architecture of the device

This device descriptor allows host code such as:

namespace syclex = sycl::ext::oneapi::experimental;

syclex::architecture arch = dev.get_info<syclex::info::device::architecture>();
switch (arch) {
case syclex::architecture::x86_64:
  /* ... */
  break;
case syclex::architecture::intel_gpu_bdw:
  /* ... */
  break;
/* etc. */
}

Limitations with the experimental version

The DPC++ implementation of this extension currently has some important limitations with the if_architecture_is free function. In order to use this feature, the application must be compiled in ahead-of-time (AOT) mode using -fsycl-targets=<special-target> where <special-target> is one of the "special target values" listed in the users manual description of the -fsycl-targets option. These are the target names of the form "intel_gpu_*", "nvidia_gpu_*", or "amd_gpu_*".

The two APIs device::ext_oneapi_architecture_is and the ext::oneapi::experimental::info::device::architecture device descriptor are currently supported only for Intel devices (both GPU and CPU). There is no support yet for Nvidia or AMD devices.

Future direction

This experimental extension is still evolving. We expect that future versions will include the following:

  • A compile-time constant property that can be used to decorate kernels and non-kernel device functions:

    namespace sycl::ext::oneapi::experimental {
    
    struct device_architecture_is_key {
      template <architecture... Archs>
      using value_t = property_value<device_architecture_is_key,
        std::integral_constant<architecture, Archs>...>;
    };
    
    template <architecture... Archs>
    struct property_value<device_architecture_is_key,
      std::integral_constant<architecture, Archs>...>
    {
      static constexpr std::array<architecture, sizeof...(Archs)> value;
    };
    
    template <architecture... Archs>
    inline constexpr device_architecture_is_key::value_t<Archs...>
      device_architecture_is;
    
    } // namespace sycl::ext::oneapi::experimental

    This property indicates that a kernel or non-kernel device function uses features that are available on devices with the given architecture list but may not be available on devices with other architectures.

  • Additional enumerators in the architecture enumeration. This could include entries for different x86_64 architectures.