Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clang][AArch64] Add missing SME functions to header file. #75791

Merged
merged 2 commits into from
Jan 2, 2024

Conversation

sdesmalen-arm
Copy link
Collaborator

This includes:

  • __arm_in_streaming_mode()
  • __arm_has_sme()
  • __arm_za_disable()
  • __svundef_za()

This includes:
* __arm_in_streaming_mode()
* __arm_has_sme()
* __arm_za_disable()
* __svundef_za()
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen labels Dec 18, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 18, 2023

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-clang

Author: Sander de Smalen (sdesmalen-arm)

Changes

This includes:

  • __arm_in_streaming_mode()
  • __arm_has_sme()
  • __arm_za_disable()
  • __svundef_za()

Full diff: https://github.com/llvm/llvm-project/pull/75791.diff

4 Files Affected:

  • (modified) clang/include/clang/Basic/BuiltinsAArch64.def (+3)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+20)
  • (added) clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_state_funs.c (+72)
  • (modified) clang/utils/TableGen/SveEmitter.cpp (+18)
diff --git a/clang/include/clang/Basic/BuiltinsAArch64.def b/clang/include/clang/Basic/BuiltinsAArch64.def
index 82a1ba3c82ad35..31ec84143f65c1 100644
--- a/clang/include/clang/Basic/BuiltinsAArch64.def
+++ b/clang/include/clang/Basic/BuiltinsAArch64.def
@@ -68,6 +68,9 @@ TARGET_BUILTIN(__builtin_arm_ldg, "v*v*", "t", "mte")
 TARGET_BUILTIN(__builtin_arm_stg, "vv*", "t", "mte")
 TARGET_BUILTIN(__builtin_arm_subp, "Uiv*v*", "t", "mte")
 
+// SME state function
+BUILTIN(__builtin_arm_get_sme_state, "vULi*ULi*", "n")
+
 // Memory Operations
 TARGET_BUILTIN(__builtin_arm_mops_memset_tag, "v*v*iz", "", "mte,mops")
 
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 4eb1686f095062..ca9070aad95842 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -10570,6 +10570,26 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
     return Builder.CreateCall(F, llvm::ConstantInt::get(Int32Ty, HintID));
   }
 
+  if (BuiltinID == clang::AArch64::BI__builtin_arm_get_sme_state) {
+    // Create call to __arm_sme_state and store the results to the two pointers.
+    CallInst *CI = EmitRuntimeCall(CGM.CreateRuntimeFunction(
+        llvm::FunctionType::get(StructType::get(CGM.Int64Ty, CGM.Int64Ty), {},
+                                false),
+        "__arm_sme_state"));
+    auto Attrs =
+        AttributeList()
+            .addFnAttribute(getLLVMContext(), "aarch64_pstate_sm_compatible")
+            .addFnAttribute(getLLVMContext(), "aarch64_pstate_za_preserved");
+    CI->setAttributes(Attrs);
+    CI->setCallingConv(
+        llvm::CallingConv::
+            AArch64_SME_ABI_Support_Routines_PreserveMost_From_X2);
+    Builder.CreateStore(Builder.CreateExtractValue(CI, 0),
+                        EmitPointerWithAlignment(E->getArg(0)));
+    return Builder.CreateStore(Builder.CreateExtractValue(CI, 1),
+                               EmitPointerWithAlignment(E->getArg(1)));
+  }
+
   if (BuiltinID == clang::AArch64::BI__builtin_arm_rbit) {
     assert((getContext().getTypeSize(E->getType()) == 32) &&
            "rbit of unusual size!");
diff --git a/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_state_funs.c b/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_state_funs.c
new file mode 100644
index 00000000000000..282819c8ca3501
--- /dev/null
+++ b/clang/test/CodeGen/aarch64-sme-intrinsics/acle_sme_state_funs.c
@@ -0,0 +1,72 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
+// REQUIRES: aarch64-registered-target
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -S -O1 -Werror -emit-llvm -o - %s | FileCheck %s
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -S -O1 -Werror -emit-llvm -o - -x c++ %s | FileCheck %s -check-prefix=CPP-CHECK
+// RUN: %clang_cc1 -triple aarch64-none-linux-gnu -target-feature +sme -S -disable-O0-optnone -Werror -Wall -o /dev/null %s
+
+#include <arm_sme_draft_spec_subject_to_change.h>
+
+// CHECK-LABEL: @test_in_streaming_mode(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR3:[0-9]+]]
+// CHECK-NEXT:    [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
+// CHECK-NEXT:    [[AND_I:%.*]] = and i64 [[TMP1]], 1
+// CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp ne i64 [[AND_I]], 0
+// CHECK-NEXT:    ret i1 [[TOBOOL_I]]
+//
+// CPP-CHECK-LABEL: @_Z22test_in_streaming_modev(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR3:[0-9]+]]
+// CPP-CHECK-NEXT:    [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
+// CPP-CHECK-NEXT:    [[AND_I:%.*]] = and i64 [[TMP1]], 1
+// CPP-CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp ne i64 [[AND_I]], 0
+// CPP-CHECK-NEXT:    ret i1 [[TOBOOL_I]]
+//
+bool test_in_streaming_mode(void) __arm_streaming_compatible {
+  return __arm_in_streaming_mode();
+}
+
+// CHECK-LABEL: @test_za_disable(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR4:[0-9]+]]
+// CHECK-NEXT:    ret void
+//
+// CPP-CHECK-LABEL: @_Z15test_za_disablev(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    tail call void @__arm_za_disable() #[[ATTR4:[0-9]+]]
+// CPP-CHECK-NEXT:    ret void
+//
+void test_za_disable(void) __arm_streaming_compatible {
+  __arm_za_disable();
+}
+
+// CHECK-LABEL: @test_has_sme(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR3]]
+// CHECK-NEXT:    [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
+// CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
+// CHECK-NEXT:    ret i1 [[TOBOOL_I]]
+//
+// CPP-CHECK-LABEL: @_Z12test_has_smev(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    [[TMP0:%.*]] = tail call aarch64_sme_preservemost_from_x2 { i64, i64 } @__arm_sme_state() #[[ATTR3]]
+// CPP-CHECK-NEXT:    [[TMP1:%.*]] = extractvalue { i64, i64 } [[TMP0]], 0
+// CPP-CHECK-NEXT:    [[TOBOOL_I:%.*]] = icmp slt i64 [[TMP1]], 0
+// CPP-CHECK-NEXT:    ret i1 [[TOBOOL_I]]
+//
+bool test_has_sme(void) __arm_streaming_compatible {
+  return __arm_has_sme();
+}
+
+// CHECK-LABEL: @test_svundef_za(
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:    ret void
+//
+// CPP-CHECK-LABEL: @_Z15test_svundef_zav(
+// CPP-CHECK-NEXT:  entry:
+// CPP-CHECK-NEXT:    ret void
+//
+void test_svundef_za(void) __arm_streaming_compatible __arm_shared_za {
+  svundef_za();
+}
+
diff --git a/clang/utils/TableGen/SveEmitter.cpp b/clang/utils/TableGen/SveEmitter.cpp
index a59b7099d5adf2..5758cf59b6eca0 100644
--- a/clang/utils/TableGen/SveEmitter.cpp
+++ b/clang/utils/TableGen/SveEmitter.cpp
@@ -1600,6 +1600,24 @@ void SVEEmitter::createSMEHeader(raw_ostream &OS) {
   OS << "extern \"C\" {\n";
   OS << "#endif\n\n";
 
+  OS << "void __arm_za_disable(void) __arm_streaming_compatible;\n\n";
+
+  OS << "__ai bool __arm_has_sme(void) __arm_streaming_compatible {\n";
+  OS << "  uint64_t x0, x1;\n";
+  OS << "  __builtin_arm_get_sme_state(&x0, &x1);\n";
+  OS << "  return x0 & (1ULL << 63);\n";
+  OS << "}\n\n";
+
+  OS << "__ai bool __arm_in_streaming_mode(void) __arm_streaming_compatible "
+        "{\n";
+  OS << "  uint64_t x0, x1;\n";
+  OS << "  __builtin_arm_get_sme_state(&x0, &x1);\n";
+  OS << "  return x0 & 1;\n";
+  OS << "}\n\n";
+
+  OS << "__ai void svundef_za(void) __arm_streaming_compatible __arm_shared_za "
+        "{ }\n\n";
+
   createCoreHeaderIntrinsics(OS, *this, ACLEKind::SME);
 
   OS << "#ifdef __cplusplus\n";

clang/utils/TableGen/SveEmitter.cpp Show resolved Hide resolved
@@ -10570,6 +10570,26 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
return Builder.CreateCall(F, llvm::ConstantInt::get(Int32Ty, HintID));
}

if (BuiltinID == clang::AArch64::BI__builtin_arm_get_sme_state) {
// Create call to __arm_sme_state and store the results to the two pointers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest why does this builtin return the result via memory? instead of being an alias of __arm_sme_state. Or rather, can __arm_has_sme and __arm_in_streaming_mode call __arm_sme_state directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could call __arm_sme_state directly, but we don't expose the special calling convention for __arm_sme_state with an attribute at the C level, hence the indirection through a builtin.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense. But the new builtin can still return by value?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this a while back when I implemented this builtin, but couldn't find any prior work where a builtin returned two scalar values which is why I opted for doing this through memory. LLVM will optimise these away similar to how it optimises all of Clang's other memory operations away (mem2reg).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. In which case I thinking creating specific builtins to avoid having to go through memory would be better because I don’t get the point of creating a builtin that has a less efficient interface than the function it targets. That said, I’ll not push for it if you feel this implementation is better.

@@ -10570,6 +10570,26 @@ Value *CodeGenFunction::EmitAArch64BuiltinExpr(unsigned BuiltinID,
return Builder.CreateCall(F, llvm::ConstantInt::get(Int32Ty, HintID));
}

if (BuiltinID == clang::AArch64::BI__builtin_arm_get_sme_state) {
// Create call to __arm_sme_state and store the results to the two pointers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. In which case I thinking creating specific builtins to avoid having to go through memory would be better because I don’t get the point of creating a builtin that has a less efficient interface than the function it targets. That said, I’ll not push for it if you feel this implementation is better.

@sdesmalen-arm sdesmalen-arm merged commit 5055eee into llvm:main Jan 2, 2024
3 of 4 checks passed
@sdesmalen-arm sdesmalen-arm deleted the sme-missing-functions branch February 23, 2024 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants