-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][SME] Emit Zero instruction for NewZA functions #66361
Conversation
65e0d386889fe3e8f2284a4515945e426ae72f8d Demands that functions with the aarch64_pstate_za_new attribute set all bits of the ZA register to zero upon entry.
@llvm/pr-subscribers-backend-aarch64 Changes65e0d386889fe3e8f2284a4515945e426ae72f8d Demands that functions with the aarch64_pstate_za_new attribute set all bits of the ZA register to zero upon entry. -- Full diff: https://github.com//pull/66361.diff3 Files Affected:
diff --git a/llvm/lib/Target/AArch64/SMEABIPass.cpp b/llvm/lib/Target/AArch64/SMEABIPass.cpp index 72e87a663fceb1b..3b4e86aa9feb29e 100644 --- a/llvm/lib/Target/AArch64/SMEABIPass.cpp +++ b/llvm/lib/Target/AArch64/SMEABIPass.cpp @@ -112,6 +112,12 @@ bool SMEABI::updateNewZAFunctions(Module *M, Function *F, Intrinsic::getDeclaration(M, Intrinsic::aarch64_sme_za_enable); Builder.CreateCall(EnableZAIntr->getFunctionType(), EnableZAIntr); + // ZA state must be zeroed upon entry to a function with NewZA + Function *ZeroIntr = + Intrinsic::getDeclaration(M, Intrinsic::aarch64_sme_zero); + Builder.CreateCall(ZeroIntr->getFunctionType(), ZeroIntr, + Builder.getInt32(0b11111111)); + // Before returning, disable pstate.za for (BasicBlock &BB : *F) { Instruction *T = BB.getTerminator(); diff --git a/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll b/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll index 848cc9903b34417..98a8769afea8513 100644 --- a/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll +++ b/llvm/test/CodeGen/AArch64/sme-disable-gisel-fisel.ll @@ -229,6 +229,7 @@ define double @za_new_caller_to_za_shared_callee(double %x) nounwind noinline o ; CHECK-COMMON-NEXT: b .LBB6_2 ; CHECK-COMMON-NEXT: .LBB6_2: // %entry ; CHECK-COMMON-NEXT: smstart za +; CHECK-COMMON-NEXT: zero {za} ; CHECK-COMMON-NEXT: bl za_shared_callee ; CHECK-COMMON-NEXT: mov x8, #4631107791820423168 // =0x4045000000000000 ; CHECK-COMMON-NEXT: fmov d1, x8 diff --git a/llvm/test/CodeGen/AArch64/sme-new-za-function.ll b/llvm/test/CodeGen/AArch64/sme-new-za-function.ll index 54ef5fd432755f9..0cee26dbb349edb 100644 --- a/llvm/test/CodeGen/AArch64/sme-new-za-function.ll +++ b/llvm/test/CodeGen/AArch64/sme-new-za-function.ll @@ -15,6 +15,7 @@ define void @private_za() "aarch64_pstate_za_new" { ; CHECK-NEXT: br label [[TMP0]] ; CHECK: 0: ; CHECK-NEXT: call void @llvm.aarch64.sme.za.enable() +; CHECK-NEXT: call void @llvm.aarch64.sme.zero(i32 255) ; CHECK-NEXT: call void @shared_za_callee() ; CHECK-NEXT: call void @llvm.aarch64.sme.za.disable() ; CHECK-NEXT: ret void @@ -35,6 +36,7 @@ define i32 @private_za_multiple_exit(i32 %a, i32 %b, i64 %cond) "aarch64_pstate_ ; CHECK-NEXT: br label [[ENTRY]] ; CHECK: entry: ; CHECK-NEXT: call void @llvm.aarch64.sme.za.enable() +; CHECK-NEXT: call void @llvm.aarch64.sme.zero(i32 255) ; CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq i64 [[COND:%.*]], 1 ; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_ELSE:%.*]], label [[IF_END:%.*]] ; CHECK: if.else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nit addressed.
Function *ZeroIntr = | ||
Intrinsic::getDeclaration(M, Intrinsic::aarch64_sme_zero); | ||
Builder.CreateCall(ZeroIntr->getFunctionType(), ZeroIntr, | ||
Builder.getInt32(0b11111111)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could you write 0xff
instead? I find that more readable than what I assume I'm counting is eight 1's :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? Since each bit in the zero mask represents a ZA .d slice I think the binary form gives more info. I don't mind though.
The ACLE Demands that functions with the aarch64_pstate_za_new attribute set all bits of the ZA register to zero upon entry.