| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| ======================= | ||
| Clang SYCL Linker | ||
| ======================= | ||
|
|
||
| .. contents:: | ||
| :local: | ||
|
|
||
| .. _clang-sycl-linker: | ||
|
|
||
| Introduction | ||
| ============ | ||
|
|
||
| This tool works as a wrapper around the SYCL device code linking process. | ||
| The purpose of this tool is to provide an interface to link SYCL device bitcode | ||
| in LLVM IR format, SYCL device bitcode in SPIR-V IR format, and native binary | ||
| objects, and then use the SPIR-V LLVM Translator tool on fully linked device | ||
| objects to produce the final output. | ||
| After the linking stage, the fully linked device code in LLVM IR format may | ||
| undergo several SYCL-specific finalization steps before the SPIR-V code | ||
| generation step. | ||
| The tool will also support the Ahead-Of-Time (AOT) compilation flow. AOT | ||
| compilation is the process of invoking the back-end at compile time to produce | ||
| the final binary, as opposed to just-in-time (JIT) compilation when final code | ||
| generation is deferred until application runtime. | ||
|
|
||
| Device code linking for SYCL offloading has several known quirks that | ||
| make it difficult to use in a unified offloading setting. Two of the primary | ||
| issues are: | ||
| 1. Several finalization steps are required to be run on the fully linked LLVM | ||
| IR bitcode to guarantee conformance to SYCL standards. This step is unique to | ||
| the SYCL offloading compilation flow. | ||
| 2. The SPIR-V LLVM Translator tool is an external tool and hence SPIR-V IR code | ||
| generation cannot be done as part of LTO. This limitation can be lifted once | ||
| the SPIR-V backend is available as a viable LLVM backend. | ||
|
|
||
| This tool has been proposed to work around these issues. | ||
|
|
||
| Usage | ||
| ===== | ||
|
|
||
| This tool can be used with the following options. Several of these options will | ||
| be passed down to downstream tools like 'llvm-link', 'llvm-spirv', etc. | ||
|
|
||
| .. code-block:: console | ||
| OVERVIEW: A utility that wraps around the SYCL device code linking process. | ||
| This enables linking and code generation for SPIR-V JIT targets and AOT | ||
| targets. | ||
| USAGE: clang-sycl-linker [options] | ||
| OPTIONS: | ||
| --arch <value> Specify the name of the target architecture. | ||
| --dry-run Print generated commands without running. | ||
| -g Specify that this was a debug compile. | ||
| -help-hidden Display all available options | ||
| -help Display available options (--help-hidden for more) | ||
| --library-path=<dir> Set the library path for SYCL device libraries | ||
| --device-libs=<value> A comma separated list of device libraries that are linked during the device link | ||
| -o <path> Path to file to write output | ||
| --save-temps Save intermediate results | ||
| --triple <value> Specify the target triple. | ||
| --version Display the version number and exit | ||
| -v Print verbose information | ||
| -spirv-dump-device-code=<dir> Directory to dump SPIR-V IR code into | ||
| -is-windows-msvc-env Specify if we are compiling under windows environment | ||
| -llvm-spirv-options=<value> Pass options to llvm-spirv tool | ||
| --llvm-spirv-path=<dir> Set the system llvm-spirv path | ||
| Example | ||
| ======= | ||
|
|
||
| This tool is intended to be invoked when targeting any of the target offloading | ||
| toolchains. When the --sycl-link option is passed to the clang driver, the | ||
| driver will invoke the linking job of the target offloading toolchain, which in | ||
| turn will invoke this tool. This tool can be used to create one or more fully | ||
| linked device images that are ready to be wrapped and linked with host code to | ||
| generate the final executable. | ||
|
|
||
| .. code-block:: console | ||
| clang-sycl-linker --triple spirv64 --arch native input.bc |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| /*===------------- amxfp8intrin.h - AMX intrinsics -*- C++ -*----------------=== | ||
| * | ||
| * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| * See https://llvm.org/LICENSE.txt for license information. | ||
| * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| * | ||
| *===------------------------------------------------------------------------=== | ||
| */ | ||
|
|
||
| #ifndef __IMMINTRIN_H | ||
| #error "Never use <amxfp8intrin.h> directly; include <immintrin.h> instead." | ||
| #endif /* __IMMINTRIN_H */ | ||
|
|
||
| #ifndef __AMXFP8INTRIN_H | ||
| #define __AMXFP8INTRIN_H | ||
| #ifdef __x86_64__ | ||
|
|
||
| /// Peform the dot product of a BF8 value \a a by a BF8 value \a b accumulating | ||
| /// into a Single Precision (FP32) source/dest \a dst. | ||
| /// | ||
| /// \headerfile <immintrin.h> | ||
| /// | ||
| /// \code | ||
| /// void _tile_dpbf8ps (__tile dst, __tile a, __tile b) | ||
| /// \endcode | ||
| /// | ||
| /// This intrinsic corresponds to the \c TDPBF8PS instruction. | ||
| /// | ||
| /// \param dst | ||
| /// The destination tile. Max size is 1024 Bytes. | ||
| /// \param a | ||
| /// The 1st source tile. Max size is 1024 Bytes. | ||
| /// \param b | ||
| /// The 2nd source tile. Max size is 1024 Bytes. | ||
| #define _tile_dpbf8ps(dst, a, b) __builtin_ia32_tdpbf8ps((dst), (a), (b)) | ||
|
|
||
| /// Perform the dot product of a BF8 value \a a by an HF8 value \a b | ||
| /// accumulating into a Single Precision (FP32) source/dest \a dst. | ||
| /// | ||
| /// \headerfile <immintrin.h> | ||
| /// | ||
| /// \code | ||
| /// void _tile_dpbhf8ps (__tile dst, __tile a, __tile b) | ||
| /// \endcode | ||
| /// | ||
| /// This intrinsic corresponds to the \c TDPBHF8PS instruction. | ||
| /// | ||
| /// \param dst | ||
| /// The destination tile. Max size is 1024 Bytes. | ||
| /// \param a | ||
| /// The 1st source tile. Max size is 1024 Bytes. | ||
| /// \param b | ||
| /// The 2nd source tile. Max size is 1024 Bytes. | ||
| #define _tile_dpbhf8ps(dst, a, b) __builtin_ia32_tdpbhf8ps((dst), (a), (b)) | ||
|
|
||
| /// Perform the dot product of an HF8 value \a a by a BF8 value \a b | ||
| /// accumulating into a Single Precision (FP32) source/dest \a dst. | ||
| /// | ||
| /// \headerfile <immintrin.h> | ||
| /// | ||
| /// \code | ||
| /// void _tile_dphbf8ps (__tile dst, __tile a, __tile b) | ||
| /// \endcode | ||
| /// | ||
| /// This intrinsic corresponds to the \c TDPHBF8PS instruction. | ||
| /// | ||
| /// \param dst | ||
| /// The destination tile. Max size is 1024 Bytes. | ||
| /// \param a | ||
| /// The 1st source tile. Max size is 1024 Bytes. | ||
| /// \param b | ||
| /// The 2nd source tile. Max size is 1024 Bytes. | ||
| #define _tile_dphbf8ps(dst, a, b) __builtin_ia32_tdphbf8ps((dst), (a), (b)) | ||
|
|
||
| /// Perform the dot product of an HF8 value \a a by an HF8 value \a b | ||
| /// accumulating into a Single Precision (FP32) source/dest \a dst. | ||
| /// | ||
| /// \headerfile <immintrin.h> | ||
| /// | ||
| /// \code | ||
| /// void _tile_dphf8ps (__tile dst, __tile a, __tile b) | ||
| /// \endcode | ||
| /// | ||
| /// This intrinsic corresponds to the \c TDPHF8PS instruction. | ||
| /// | ||
| /// \param dst | ||
| /// The destination tile. Max size is 1024 Bytes. | ||
| /// \param a | ||
| /// The 1st source tile. Max size is 1024 Bytes. | ||
| /// \param b | ||
| /// The 2nd source tile. Max size is 1024 Bytes. | ||
| #define _tile_dphf8ps(dst, a, b) __builtin_ia32_tdphf8ps((dst), (a), (b)) | ||
|
|
||
| #endif /* __x86_64__ */ | ||
| #endif /* __AMXFP8INTRIN_H */ |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| // RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +amx-fp8 \ | ||
| // RUN: -emit-llvm -o - -Werror -pedantic | FileCheck %s | ||
| #include <immintrin.h> | ||
|
|
||
| void test_amx(void *data) { | ||
| //CHECK-LABEL: @test_amx | ||
| //CHECK: call void @llvm.x86.tdpbf8ps(i8 1, i8 2, i8 3) | ||
| _tile_dpbf8ps(1, 2, 3); | ||
| } | ||
|
|
||
| void test_amx2(void *data) { | ||
| //CHECK-LABEL: @test_amx2 | ||
| //CHECK: call void @llvm.x86.tdpbhf8ps(i8 1, i8 2, i8 3) | ||
| _tile_dpbhf8ps(1, 2, 3); | ||
| } | ||
|
|
||
| void test_amx3(void *data) { | ||
| //CHECK-LABEL: @test_amx3 | ||
| //CHECK: call void @llvm.x86.tdphbf8ps(i8 1, i8 2, i8 3) | ||
| _tile_dphbf8ps(1, 2, 3); | ||
| } | ||
|
|
||
| void test_amx4(void *data) { | ||
| //CHECK-LABEL: @test_amx4 | ||
| //CHECK: call void @llvm.x86.tdphf8ps(i8 1, i8 2, i8 3) | ||
| _tile_dphf8ps(1, 2, 3); | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| // RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +amx-tile -target-feature +amx-fp8 -verify | ||
|
|
||
| #include <immintrin.h> | ||
|
|
||
| void test_amx(void *data) { | ||
| _tile_dpbf8ps(4, 3, 3); // expected-error {{tile arguments must refer to different tiles}} | ||
| _tile_dpbhf8ps(4, 3, 3); // expected-error {{tile arguments must refer to different tiles}} | ||
| _tile_dphbf8ps(4, 3, 3); // expected-error {{tile arguments must refer to different tiles}} | ||
| _tile_dphf8ps(4, 3, 3); // expected-error {{tile arguments must refer to different tiles}} | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| // RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +amx-fp8 -emit-llvm -o - -Wall -Werror -pedantic | FileCheck %s | ||
|
|
||
| void f_tilemul(short a) | ||
| { | ||
| //CHECK: call void asm sideeffect "tileloadd 0(%rsi,%r13,4), %tmm0 \0A\09tileloadd 0(%rdx,%r14,4), %tmm6 \0A\09tdpbf8ps %tmm6, %tmm0, %tmm7 \0A\09tilestored %tmm7, 0(%r12,%r15,4) \0A\09", "~{memory},~{tmm0},~{tmm6},~{tmm7},~{dirflag},~{fpsr},~{flags}"() | ||
| __asm__ volatile ("tileloadd 0(%%rsi,%%r13,4), %%tmm0 \n\t" | ||
| "tileloadd 0(%%rdx,%%r14,4), %%tmm6 \n\t" | ||
| "tdpbf8ps %%tmm6, %%tmm0, %%tmm7 \n\t" | ||
| "tilestored %%tmm7, 0(%%r12,%%r15,4) \n\t" | ||
| ::: "memory", "tmm0", "tmm6", "tmm7"); | ||
|
|
||
| //CHECK: call void asm sideeffect "tileloadd 0(%rsi,%r13,4), %tmm0 \0A\09tileloadd 0(%rdx,%r14,4), %tmm6 \0A\09tdpbhf8ps %tmm6, %tmm0, %tmm7 \0A\09tilestored %tmm7, 0(%r12,%r15,4) \0A\09", "~{memory},~{tmm0},~{tmm6},~{tmm7},~{dirflag},~{fpsr},~{flags}"() | ||
| __asm__ volatile ("tileloadd 0(%%rsi,%%r13,4), %%tmm0 \n\t" | ||
| "tileloadd 0(%%rdx,%%r14,4), %%tmm6 \n\t" | ||
| "tdpbhf8ps %%tmm6, %%tmm0, %%tmm7 \n\t" | ||
| "tilestored %%tmm7, 0(%%r12,%%r15,4) \n\t" | ||
| ::: "memory", "tmm0", "tmm6", "tmm7"); | ||
|
|
||
| //CHECK: call void asm sideeffect "tileloadd 0(%rsi,%r13,4), %tmm0 \0A\09tileloadd 0(%rdx,%r14,4), %tmm6 \0A\09tdphbf8ps %tmm6, %tmm0, %tmm7 \0A\09tilestored %tmm7, 0(%r12,%r15,4) \0A\09", "~{memory},~{tmm0},~{tmm6},~{tmm7},~{dirflag},~{fpsr},~{flags}"() | ||
| __asm__ volatile ("tileloadd 0(%%rsi,%%r13,4), %%tmm0 \n\t" | ||
| "tileloadd 0(%%rdx,%%r14,4), %%tmm6 \n\t" | ||
| "tdphbf8ps %%tmm6, %%tmm0, %%tmm7 \n\t" | ||
| "tilestored %%tmm7, 0(%%r12,%%r15,4) \n\t" | ||
| ::: "memory", "tmm0", "tmm6", "tmm7"); | ||
|
|
||
| //CHECK: call void asm sideeffect "tileloadd 0(%rsi,%r13,4), %tmm0 \0A\09tileloadd 0(%rdx,%r14,4), %tmm6 \0A\09tdphf8ps %tmm6, %tmm0, %tmm7 \0A\09tilestored %tmm7, 0(%r12,%r15,4) \0A\09", "~{memory},~{tmm0},~{tmm6},~{tmm7},~{dirflag},~{fpsr},~{flags}"() | ||
| __asm__ volatile ("tileloadd 0(%%rsi,%%r13,4), %%tmm0 \n\t" | ||
| "tileloadd 0(%%rdx,%%r14,4), %%tmm6 \n\t" | ||
| "tdphf8ps %%tmm6, %%tmm0, %%tmm7 \n\t" | ||
| "tilestored %%tmm7, 0(%%r12,%%r15,4) \n\t" | ||
| ::: "memory", "tmm0", "tmm6", "tmm7"); | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| // Tests the clang-sycl-linker tool. | ||
| // | ||
| // Test a simple case without arguments. | ||
| // RUN: %clangxx -emit-llvm -c %s -o %t_1.bc | ||
| // RUN: %clangxx -emit-llvm -c %s -o %t_2.bc | ||
| // RUN: clang-sycl-linker --dry-run -triple spirv64 %t_1.bc %t_2.bc -o a.spv 2>&1 \ | ||
| // RUN: | FileCheck %s --check-prefix=SIMPLE | ||
| // SIMPLE: "{{.*}}llvm-link{{.*}}" {{.*}}.bc {{.*}}.bc -o [[FIRSTLLVMLINKOUT:.*]].bc --suppress-warnings | ||
| // SIMPLE-NEXT: "{{.*}}llvm-spirv{{.*}}" {{.*}}-o a.spv [[FIRSTLLVMLINKOUT]].bc | ||
| // | ||
| // Test that llvm-link is not called when only one input is present. | ||
| // RUN: clang-sycl-linker --dry-run -triple spirv64 %t_1.bc -o a.spv 2>&1 \ | ||
| // RUN: | FileCheck %s --check-prefix=SIMPLE-NO-LINK | ||
| // SIMPLE-NO-LINK: "{{.*}}llvm-spirv{{.*}}" {{.*}}-o a.spv {{.*}}.bc | ||
| // | ||
| // Test a simple case with device library files specified. | ||
| // RUN: touch %T/lib1.bc | ||
| // RUN: touch %T/lib2.bc | ||
| // RUN: clang-sycl-linker --dry-run -triple spirv64 %t_1.bc %t_2.bc --library-path=%T --device-libs=lib1.bc,lib2.bc -o a.spv 2>&1 \ | ||
| // RUN: | FileCheck %s --check-prefix=DEVLIBS | ||
| // DEVLIBS: "{{.*}}llvm-link{{.*}}" {{.*}}.bc {{.*}}.bc -o [[FIRSTLLVMLINKOUT:.*]].bc --suppress-warnings | ||
| // DEVLIBS-NEXT: "{{.*}}llvm-link{{.*}}" -only-needed [[FIRSTLLVMLINKOUT]].bc {{.*}}lib1.bc {{.*}}lib2.bc -o [[SECONDLLVMLINKOUT:.*]].bc --suppress-warnings | ||
| // DEVLIBS-NEXT: "{{.*}}llvm-spirv{{.*}}" {{.*}}-o a.spv [[SECONDLLVMLINKOUT]].bc | ||
| // | ||
| // Test a simple case with .o (fat object) as input. | ||
| // TODO: Remove this test once fat object support is added. | ||
| // RUN: %clangxx -c %s -o %t.o | ||
| // RUN: not clang-sycl-linker --dry-run -triple spirv64 %t.o -o a.spv 2>&1 \ | ||
| // RUN: | FileCheck %s --check-prefix=FILETYPEERROR | ||
| // FILETYPEERROR: Unsupported file type | ||
| // | ||
| // Test to see if device library related errors are emitted. | ||
| // RUN: not clang-sycl-linker --dry-run -triple spirv64 %t_1.bc %t_2.bc --library-path=%T --device-libs= -o a.spv 2>&1 \ | ||
| // RUN: | FileCheck %s --check-prefix=DEVLIBSERR1 | ||
| // DEVLIBSERR1: Number of device library files cannot be zero | ||
| // RUN: not clang-sycl-linker --dry-run -triple spirv64 %t_1.bc %t_2.bc --library-path=%T --device-libs=lib1.bc,lib2.bc,lib3.bc -o a.spv 2>&1 \ | ||
| // RUN: | FileCheck %s --check-prefix=DEVLIBSERR2 | ||
| // DEVLIBSERR2: '{{.*}}lib3.bc' SYCL device library file is not found | ||
| // | ||
| // Test if correct set of llvm-spirv options are emitted for windows environment. | ||
| // RUN: clang-sycl-linker --dry-run -triple spirv64 --is-windows-msvc-env %t_1.bc %t_2.bc -o a.spv 2>&1 \ | ||
| // RUN: | FileCheck %s --check-prefix=LLVMOPTSWIN | ||
| // LLVMOPTSWIN: -spirv-debug-info-version=ocl-100 -spirv-allow-extra-diexpressions -spirv-allow-unknown-intrinsics=llvm.genx. -spirv-ext= | ||
| // | ||
| // Test if correct set of llvm-spirv options are emitted for linux environment. | ||
| // RUN: clang-sycl-linker --dry-run -triple spirv64 %t_1.bc %t_2.bc -o a.spv 2>&1 \ | ||
| // RUN: | FileCheck %s --check-prefix=LLVMOPTSLIN | ||
| // LLVMOPTSLIN: -spirv-debug-info-version=nonsemantic-shader-200 -spirv-allow-unknown-intrinsics=llvm.genx. -spirv-ext= |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| // Tests the driver when linking LLVM IR bitcode files and targeting SPIR-V | ||
| // architecture. | ||
| // | ||
| // Test that -Xlinker options are being passed to clang-sycl-linker. | ||
| // RUN: touch %t.bc | ||
| // RUN: %clangxx -### --target=spirv64 --sycl-link -Xlinker --llvm-spirv-path=/tmp \ | ||
| // RUN: -Xlinker --library-path=/tmp -Xlinker --device-libs=lib1.bc,lib2.bc %t.bc 2>&1 \ | ||
| // RUN: | FileCheck %s -check-prefix=XLINKEROPTS | ||
| // XLINKEROPTS: "{{.*}}clang-sycl-linker{{.*}}" "--llvm-spirv-path=/tmp" "--library-path=/tmp" "--device-libs=lib1.bc,lib2.bc" "{{.*}}.bc" "-o" "a.out" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| // RUN: rm -rf %t | ||
| // RUN: mkdir %t | ||
| // RUN: split-file %s %t | ||
| // | ||
| // RUN: %clang_cc1 -std=c++20 %t/a.cppm -emit-module-interface -o %t/a.pcm -fretain-comments-from-system-headers | ||
| // RUN: %clang_cc1 -std=c++20 %t/b.cpp -fmodule-file=a=%t/a.pcm -verify -fsyntax-only | ||
|
|
||
| //--- a.cppm | ||
| export module a; | ||
|
|
||
| //--- b.cpp | ||
| // expected-no-diagnostics | ||
| import a; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| set(LLVM_LINK_COMPONENTS | ||
| ${LLVM_TARGETS_TO_BUILD} | ||
| Option | ||
| ) | ||
|
|
||
| set(LLVM_TARGET_DEFINITIONS SYCLLinkOpts.td) | ||
| tablegen(LLVM SYCLLinkOpts.inc -gen-opt-parser-defs) | ||
| add_public_tablegen_target(SYCLLinkerOpts) | ||
|
|
||
| if(NOT CLANG_BUILT_STANDALONE) | ||
| set(tablegen_deps intrinsics_gen SYCLLinkerOpts) | ||
| endif() | ||
|
|
||
| add_clang_tool(clang-sycl-linker | ||
| ClangSYCLLinker.cpp | ||
|
|
||
| DEPENDS | ||
| ${tablegen_deps} | ||
| ) | ||
|
|
||
| set(CLANG_SYCL_LINKER_LIB_DEPS | ||
| clangBasic | ||
| ) | ||
|
|
||
| target_link_libraries(clang-sycl-linker | ||
| PRIVATE | ||
| ${CLANG_SYCL_LINKER_LIB_DEPS} | ||
| ) |