-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clang][AArch64] Add missing SME functions to header file. #75791
Conversation
This includes: * __arm_in_streaming_mode() * __arm_has_sme() * __arm_za_disable() * __svundef_za()
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: Sander de Smalen (sdesmalen-arm) ChangesThis includes:
Full diff: https://github.com/llvm/llvm-project/pull/75791.diff 4 Files Affected:
diff --git a/clang/include/clang/Basic/BuiltinsAArch64.def b/clang/include/clang/Basic/BuiltinsAArch64.def
index 82a1ba3c82ad35..31ec84143f65c1 100644
--- a/clang/include/clang/Basic/BuiltinsAArch64.def
+++ b/clang/include/clang/Basic/BuiltinsAArch64.def
@@ -68,6 +68,9 @@ TARGET_BUILTIN(__builtin_arm_ldg, "v*v*", "t", "mte")
TARGET_BUILTIN(__builtin_arm_stg, "vv*", "t", "mte")
TARGET_BUILTIN(__builtin_arm_subp, "Uiv*v*", "t", "mte")
+// SME state function
+BUILTIN(__builtin_arm_get_sme_state, "vULi*ULi*", "n")
+
// Memory Operations
TARGET_BUILTIN(__builtin_arm_mops_memset_tag, "v*v*iz", "", "mte,mops")
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 4eb1686f095062..ca9070aad95842 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -10570,6 +10570,26 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
return Builder.CreateCall(F, llvm::ConstantInt::get(Int32Ty, HintID));
}
+ if (BuiltinID == clang::AArch64::BI__builtin_arm_get_sme_state) {
+ // Create call to __arm_sme_state and store the results to the two pointers.
+ CallInst *CI = EmitRuntimeCall(CGM.CreateRuntimeFunction(
+ llvm::FunctionType::get(StructType::get(CGM.Int64Ty, CGM.Int64Ty), {},
+ false),
+ "__arm_sme_state"));
+ auto Attrs =
+ AttributeList()
+ .addFnAttribute(getLLVMContext(), "aarch64_pstate_sm_compatible")
+ .addFnAttribute(getLLVMContext(), "aarch64_pstate_za_preserved");
+ CI->setAttributes(Attrs);
+ CI->setCallingConv(
+ llvm::CallingConv::
+ AArch64_SME_ABI_Support_Routines_PreserveMost_From_X2);
+ Builder.CreateStore(Builder.CreateExtractValue(CI, 0),
+ EmitPointerWithAlignment(E->getArg(0)));
+ return Builder.CreateStore(Builder.CreateExtractValue(CI, 1),
+ EmitPointerWithAlignment(E->getArg(1)));
+ }
+
if (BuiltinID == clang::AArch64::BI__builtin_arm_rbit) {
assert((getContext().getTypeSize(E->getType()) == 32) &&
"rbit of unusual size!");
diff --git a/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_state_funs.c b/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_state_funs.c
new file mode 100644
index 00000000000000..282819c8ca3501
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_state_funs.c
@@ -0,0 +1,72 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -S -O1 -Werror -emit-llvm -o - -x c++ %s | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include <arm_sme_draft_spec_subject_to_change.h>
+
+// CHECK-LABEL: @test_in_streaming_mode(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR3:[0-9]+]]
+// CHECK-NEXT: [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
+// CHECK-NEXT: [[AND_I:%.*]] = and i64 [[TMP1]], 1
+// CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp ne i64 [[AND_I]], 0
+// CHECK-NEXT: ret i1 [[TOBOOL_I]]
+//
+// CPP-CHECK-LABEL: @_Z22test_in_streaming_modev(
+// CPP-CHECK-NEXT: entry:
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR3:[0-9]+]]
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
+// CPP-CHECK-NEXT: [[AND_I:%.*]] = and i64 [[TMP1]], 1
+// CPP-CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp ne i64 [[AND_I]], 0
+// CPP-CHECK-NEXT: ret i1 [[TOBOOL_I]]
+//
+bool test_in_streaming_mode(void) __arm_streaming_compatible {
+ return __arm_in_streaming_mode();
+}
+
+// CHECK-LABEL: @test_za_disable(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR4:[0-9]+]]
+// CHECK-NEXT: ret void
+//
+// CPP-CHECK-LABEL: @_Z15test_za_disablev(
+// CPP-CHECK-NEXT: entry:
+// CPP-CHECK-NEXT: tail call void @__arm_za_disable() #[[ATTR4:[0-9]+]]
+// CPP-CHECK-NEXT: ret void
+//
+void test_za_disable(void) __arm_streaming_compatible {
+ __arm_za_disable();
+}
+
+// CHECK-LABEL: @test_has_sme(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR3]]
+// CHECK-NEXT: [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
+// CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
+// CHECK-NEXT: ret i1 [[TOBOOL_I]]
+//
+// CPP-CHECK-LABEL: @_Z12test_has_smev(
+// CPP-CHECK-NEXT: entry:
+// CPP-CHECK-NEXT: [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR3]]
+// CPP-CHECK-NEXT: [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
+// CPP-CHECK-NEXT: [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
+// CPP-CHECK-NEXT: ret i1 [[TOBOOL_I]]
+//
+bool test_has_sme(void) __arm_streaming_compatible {
+ return __arm_has_sme();
+}
+
+// CHECK-LABEL: @test_svundef_za(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: ret void
+//
+// CPP-CHECK-LABEL: @_Z15test_svundef_zav(
+// CPP-CHECK-NEXT: entry:
+// CPP-CHECK-NEXT: ret void
+//
+void test_svundef_za(void) __arm_streaming_compatible __arm_shared_za {
+ svundef_za();
+}
+
diff --git a/clang/utils/TableGen/SveEmitter.cpp b/clang/utils/TableGen/SveEmitter.cpp
index a59b7099d5adf2..5758cf59b6eca0 100644
--- a/clang/utils/TableGen/SveEmitter.cpp
+++ b/clang/utils/TableGen/SveEmitter.cpp
@@ -1600,6 +1600,24 @@ void SVEEmitter::createSMEHeader(raw_ostream &OS) {
OS << "extern \"C\" {\n";
OS << "#endif\n\n";
+ OS << "void __arm_za_disable(void) __arm_streaming_compatible;\n\n";
+
+ OS << "__ai bool __arm_has_sme(void) __arm_streaming_compatible {\n";
+ OS << " uint64_t x0, x1;\n";
+ OS << " __builtin_arm_get_sme_state(&x0, &x1);\n";
+ OS << " return x0 & (1ULL << 63);\n";
+ OS << "}\n\n";
+
+ OS << "__ai bool __arm_in_streaming_mode(void) __arm_streaming_compatible "
+ "{\n";
+ OS << " uint64_t x0, x1;\n";
+ OS << " __builtin_arm_get_sme_state(&x0, &x1);\n";
+ OS << " return x0 & 1;\n";
+ OS << "}\n\n";
+
+ OS << "__ai void svundef_za(void) __arm_streaming_compatible __arm_shared_za "
+ "{ }\n\n";
+
createCoreHeaderIntrinsics(OS, *this, ACLEKind::SME);
OS << "#ifdef __cplusplus\n";
|
@@ -10570,6 +10570,26 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID, | |||
return Builder.CreateCall(F, llvm::ConstantInt::get(Int32Ty, HintID)); | |||
} | |||
|
|||
if (BuiltinID == clang::AArch64::BI__builtin_arm_get_sme_state) { | |||
// Create call to __arm_sme_state and store the results to the two pointers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest why does this builtin return the result via memory? instead of being an alias of __arm_sme_state
. Or rather, can __arm_has_sme
and __arm_in_streaming_mode
call __arm_sme_state
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could call __arm_sme_state
directly, but we don't expose the special calling convention for __arm_sme_state with an attribute at the C level, hence the indirection through a builtin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes sense. But the new builtin can still return by value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this a while back when I implemented this builtin, but couldn't find any prior work where a builtin returned two scalar values which is why I opted for doing this through memory. LLVM will optimise these away similar to how it optimises all of Clang's other memory operations away (mem2reg).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. In which case I thinking creating specific builtins to avoid having to go through memory would be better because I don’t get the point of creating a builtin that has a less efficient interface than the function it targets. That said, I’ll not push for it if you feel this implementation is better.
@@ -10570,6 +10570,26 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID, | |||
return Builder.CreateCall(F, llvm::ConstantInt::get(Int32Ty, HintID)); | |||
} | |||
|
|||
if (BuiltinID == clang::AArch64::BI__builtin_arm_get_sme_state) { | |||
// Create call to __arm_sme_state and store the results to the two pointers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. In which case I thinking creating specific builtins to avoid having to go through memory would be better because I don’t get the point of creating a builtin that has a less efficient interface than the function it targets. That said, I’ll not push for it if you feel this implementation is better.
This includes: