Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][SME2] Add SQRSHRN, UQRSHRN, SQRSHRUN builtins for SME2, SVE2p1 #75325

Merged
merged 1 commit into from
Dec 14, 2023

Conversation

dtemirbulatov
Copy link
Contributor

Add SQRSHRN, UQRSHRN, SQRSHRUN builtins for SME2, SVE2p1.

Add SQRSHRN, UQRSHRN, SQRSHRUN builtins for SME2, SVE2p1.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Dec 13, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 13, 2023

@llvm/pr-subscribers-clang

Author: Dinar Temirbulatov (dtemirbulatov)

Changes

Add SQRSHRN, UQRSHRN, SQRSHRUN builtins for SME2, SVE2p1.


Full diff: https://github.com/llvm/llvm-project/pull/75325.diff

2 Files Affected:

  • (modified) clang/include/clang/Basic/arm_sve.td (+9)
  • (added) clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c (+78)
diff --git a/clang/include/clang/Basic/arm_sve.td b/clang/include/clang/Basic/arm_sve.td
index aa9b105364a51a..4e9e318e7ade85 100644
--- a/clang/include/clang/Basic/arm_sve.td
+++ b/clang/include/clang/Basic/arm_sve.td
@@ -2170,6 +2170,15 @@ let TargetGuard = "sme2" in {
   def REINTERPRET_SVCOUNT_TO_SVBOOL : Inst<"svreinterpret[_b]", "P}", "Pc", MergeNone, "", [IsStreamingCompatible], []>;
 }
 
+let TargetGuard = "sve2p1|sme2" in {
+  // SQRSHRN / UQRSHRN
+  def SVQRSHRN_X2   : SInst<"svqrshrn[_n]_{0}[_{d}_x2]", "h2i", "i",    MergeNone, "aarch64_sve_sqrshrn_x2", [IsStreamingCompatible], [ImmCheck<1, ImmCheck1_16>]>;
+  def SVUQRSHRN_X2  : SInst<"svqrshrn[_n]_{0}[_{d}_x2]", "e2i", "Ui",   MergeNone, "aarch64_sve_uqrshrn_x2", [IsStreamingCompatible], [ImmCheck<1, ImmCheck1_16>]>;
+
+  // SQRSHRUN
+  def SVSQRSHRUN_X2 : SInst<"svqrshrun[_n]_{0}[_{d}_x2]", "e2i", "i",  MergeNone, "aarch64_sve_sqrshrun_x2", [IsStreamingCompatible], [ImmCheck<1, ImmCheck1_16>]>;
+}
+
 let TargetGuard = "sve2p1" in {
   // ZIPQ1, ZIPQ2, UZPQ1, UZPQ2
   def SVZIPQ1 : SInst<"svzipq1[_{d}]", "ddd", "cUcsUsiUilUlbhfd", MergeNone, "aarch64_sve_zipq1", [], []>;
diff --git a/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c
new file mode 100644
index 00000000000000..8e8b7203148934
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c
@@ -0,0 +1,78 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S  -passes=mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S  -passes=mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S  -passes=mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - %s | opt -S  -passes=mem2reg,instcombine,tailcallelim | FileCheck %s
+// RUN: %clang_cc1 -DSVE_OVERLOADED_FORMS -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - -x c++ %s | opt -S  -passes=mem2reg,instcombine,tailcallelim | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sme2 -target-feature +sme-f64f64 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sve2p1 -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include <arm_sve.h>
+
+#ifdef SVE_OVERLOADED_FORMS
+// A simple used,unused... macro, long enough to represent any SVE builtin.
+#define SVE_ACLE_FUNC(A1,A2_UNUSED,A3,A4_UNUSED,A5) A1##A3##A5
+#else
+#define SVE_ACLE_FUNC(A1,A2,A3,A4,A5) A1##A2##A3##A4##A5
+#endif
+
+
+// SQRSHRN x 2
+
+// CHECK-LABEL: @test_svqrshrn_s16_s32_x2(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN:%.*]], i64 0)
+// CHECK-NEXT:    [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN]], i64 4)
+// CHECK-NEXT:    [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqrshrn.x2.nxv4i32(<vscale x 4 x i32> [[TMP0]], <vscale x 4 x i32> [[TMP1]], i32 16)
+// CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP2]]
+//
+// CPP-CHECK-LABEL: @_Z24test_svqrshrn_s16_s32_x211svint32x2_t(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN:%.*]], i64 0)
+// CPP-CHECK-NEXT:    [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN]], i64 4)
+// CPP-CHECK-NEXT:    [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqrshrn.x2.nxv4i32(<vscale x 4 x i32> [[TMP0]], <vscale x 4 x i32> [[TMP1]], i32 16)
+// CPP-CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP2]]
+//
+svint16_t test_svqrshrn_s16_s32_x2(svint32x2_t zn) __arm_streaming_compatible {
+  return SVE_ACLE_FUNC(svqrshrn,_n,_s16,_s32_x2,)(zn, 16);
+}
+
+// UQRSHRN x 2
+
+// CHECK-LABEL: @test_svqrshrn_u16_u32_x2(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN:%.*]], i64 0)
+// CHECK-NEXT:    [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN]], i64 4)
+// CHECK-NEXT:    [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uqrshrn.x2.nxv4i32(<vscale x 4 x i32> [[TMP0]], <vscale x 4 x i32> [[TMP1]], i32 16)
+// CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP2]]
+//
+// CPP-CHECK-LABEL: @_Z24test_svqrshrn_u16_u32_x212svuint32x2_t(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN:%.*]], i64 0)
+// CPP-CHECK-NEXT:    [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN]], i64 4)
+// CPP-CHECK-NEXT:    [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.uqrshrn.x2.nxv4i32(<vscale x 4 x i32> [[TMP0]], <vscale x 4 x i32> [[TMP1]], i32 16)
+// CPP-CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP2]]
+//
+svuint16_t test_svqrshrn_u16_u32_x2(svuint32x2_t zn) __arm_streaming_compatible {
+  return SVE_ACLE_FUNC(svqrshrn,_n,_u16,_u32_x2,)(zn, 16);
+}
+
+// SQRSHRUN x 2
+
+// CHECK-LABEL: @test_svqrshrun_u16_s32_x2(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN:%.*]], i64 0)
+// CHECK-NEXT:    [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN]], i64 4)
+// CHECK-NEXT:    [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqrshrun.x2.nxv4i32(<vscale x 4 x i32> [[TMP0]], <vscale x 4 x i32> [[TMP1]], i32 16)
+// CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP2]]
+//
+// CPP-CHECK-LABEL: @_Z25test_svqrshrun_u16_s32_x211svint32x2_t(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN:%.*]], i64 0)
+// CPP-CHECK-NEXT:    [[TMP1:%.*]] = tail call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv8i32(<vscale x 8 x i32> [[ZN]], i64 4)
+// CPP-CHECK-NEXT:    [[TMP2:%.*]] = tail call <vscale x 8 x i16> @llvm.aarch64.sve.sqrshrun.x2.nxv4i32(<vscale x 4 x i32> [[TMP0]], <vscale x 4 x i32> [[TMP1]], i32 16)
+// CPP-CHECK-NEXT:    ret <vscale x 8 x i16> [[TMP2]]
+//
+svuint16_t test_svqrshrun_u16_s32_x2(svint32x2_t zn) __arm_streaming_compatible {
+  return SVE_ACLE_FUNC(svqrshrun,_n,_u16,_s32_x2,)(zn, 16);
+}

@dtemirbulatov dtemirbulatov merged commit 101083e into llvm:main Dec 14, 2023
6 checks passed
@dtemirbulatov dtemirbulatov deleted the sme2-svqrshrn branch December 14, 2023 11:38
@jayfoad
Copy link
Contributor

jayfoad commented Dec 14, 2023

Please remember to add a suitable REQUIRES: line to these new codegen tests, or put them in an ARM subdirectory with a suitable lit.local.cfg!

This new test is failing in non-ARM builds with:

FAIL: Clang :: CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c (5567 of 76786)
******************** TEST 'Clang :: CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/build/bin/clang -cc1 -internal-isystem /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/build/lib/clang/18/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c | /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/build/bin/opt -S  -passes=mem2reg,instcombine,tailcallelim | /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/build/bin/FileCheck /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c
+ /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/build/bin/clang -cc1 -internal-isystem /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/build/lib/clang/18/include -nostdsysteminc -triple aarch64-none-linux-gnu -target-feature +sve -target-feature +sme2 -S -disable-O0-optnone -Werror -Wall -emit-llvm -o - /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c
+ /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/build/bin/opt -S -passes=mem2reg,instcombine,tailcallelim
+ /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/build/bin/FileCheck /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c
/jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c:10:10: fatal error: 'arm_sve.h' file not found
   10 | #include <arm_sve.h>
      |          ^~~~~~~~~~~
1 error generated.
/jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c:22:17: error: CHECK-LABEL: expected string not found in input
// CHECK-LABEL: @test_svqrshrn_s16_s32_x2(
                ^
<stdin>:1:1: note: scanning from here
; ModuleID = '<stdin>'
^
<stdin>:1:14: note: possible intended match here
; ModuleID = '<stdin>'
             ^

Input file: <stdin>
Check file: /jenkins/workspace/llvm-llpc/llvm-npi-merge/llvm-project/clang/test/CodeGen/aarch64-sve2p1-intrinsics/acle_sve2p1_qrshr.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = '<stdin>' 
label:22'0     X~~~~~~~~~~~~~~~~~~~~~~ error: no match found
label:22'1                  ?          possible intended match
            2: source_filename = "<stdin>" 
label:22'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

@jayfoad
Copy link
Contributor

jayfoad commented Dec 14, 2023

Please remember to add a suitable REQUIRES: line to these new codegen tests

I've added one in 50e78de

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants