Skip to content

Conversation

@tomershafir
Copy link
Contributor

@tomershafir tomershafir commented Oct 20, 2025

Given a GPR32 zeroing instruction, if the target supports zero cycle zeroing for GPR64 but not for GPR32, widen the instruction to 64 bit $xn = MOVZXi 0, 0 instead of writing to $wn to exploit zero cycle zeroing.

It also aligns naming in the generic zeroing test.

@github-actions
Copy link

github-actions bot commented Oct 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@tomershafir tomershafir force-pushed the aarch64-expand-gpr-zeroing-for-zcz branch from 64d1a8e to 0462d59 Compare October 20, 2025 12:08
@tomershafir tomershafir marked this pull request as ready for review October 20, 2025 14:15
@llvmbot
Copy link
Member

llvmbot commented Oct 20, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Tomer Shafir (tomershafir)

Changes

Given a GPR32 zeroing instruction, if the target supports zero cycle zeroing for GPR64 but not for GPR32, widen the zeroing instruction.

It also aligns naming in the generic zeroing test.


Full diff: https://github.com/llvm/llvm-project/pull/164244.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+9-1)
  • (modified) llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir (+1-1)
  • (modified) llvm/test/CodeGen/AArch64/arm64-zero-cycle-zeroing-gpr.ll (+11-8)
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 84b54b3c707af..b74202e70d7a3 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -5116,7 +5116,15 @@ void AArch64InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
 
   // GPR32 zeroing
   if (AArch64::GPR32spRegClass.contains(DestReg) && SrcReg == AArch64::WZR) {
-    if (Subtarget.hasZeroCycleZeroingGPR32()) {
+    if (Subtarget.hasZeroCycleZeroingGPR64() &&
+        !Subtarget.hasZeroCycleZeroingGPR32()) {
+      MCRegister DestRegX = RI.getMatchingSuperReg(DestReg, AArch64::sub_32,
+                                                   &AArch64::GPR64spRegClass);
+      assert(DestRegX.isValid() && "Destination super-reg not valid");
+      BuildMI(MBB, I, DL, get(AArch64::MOVZXi), DestRegX)
+          .addImm(0)
+          .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
+    } else if (Subtarget.hasZeroCycleZeroingGPR32()) {
       BuildMI(MBB, I, DL, get(AArch64::MOVZWi), DestReg)
           .addImm(0)
           .addImm(AArch64_AM::getShifterImm(AArch64_AM::LSL, 0));
diff --git a/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir b/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir
index f34d3ed510a97..6b2a31b02c097 100644
--- a/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir
+++ b/llvm/test/CodeGen/AArch64/arm64-copy-phys-zero-reg.mir
@@ -35,7 +35,7 @@ body:             |
     ; CHECK-NOZCZ-GPR32-ZCZ-GPR64-LABEL: name: f0
     ; CHECK-NOZCZ-GPR32-ZCZ-GPR64: liveins: $x0, $lr
     ; CHECK-NOZCZ-GPR32-ZCZ-GPR64-NEXT: {{  $}}
-    ; CHECK-NOZCZ-GPR32-ZCZ-GPR64-NEXT: $w0 = ORRWrr $wzr, $wzr
+    ; CHECK-NOZCZ-GPR32-ZCZ-GPR64-NEXT: $x0 = MOVZXi 0, 0
     ; CHECK-NOZCZ-GPR32-ZCZ-GPR64-NEXT: BL @f2, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $w0, implicit-def $sp, implicit-def $w0
     ;
     ; CHECK-ZCZ-GPR32-ZCZ-GPR64-LABEL: name: f0
diff --git a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-zeroing-gpr.ll b/llvm/test/CodeGen/AArch64/arm64-zero-cycle-zeroing-gpr.ll
index dc643062d8697..0f284aa9ad282 100644
--- a/llvm/test/CodeGen/AArch64/arm64-zero-cycle-zeroing-gpr.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-zero-cycle-zeroing-gpr.ll
@@ -1,41 +1,44 @@
-; RUN: llc < %s -mtriple=aarch64-linux-gnu | FileCheck %s -check-prefixes=ALL,NOZCZ-GPR
+; RUN: llc < %s -mtriple=aarch64-linux-gnu | FileCheck %s -check-prefixes=ALL,NOZCZ-GPR32-NOZCZ-GPR64
 ; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+zcz-gpr32 | FileCheck %s -check-prefixes=ALL,ZCZ-GPR32
-; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+zcz-gpr64 | FileCheck %s -check-prefixes=ALL,ZCZ-GPR64
-; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=generic | FileCheck %s -check-prefixes=ALL,NOZCZ-GPR
+; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+zcz-gpr64 | FileCheck %s -check-prefixes=ALL,NOZCZ-GPR32-ZCZ-GPR64
+; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=generic | FileCheck %s -check-prefixes=ALL,NOZCZ-GPR32-NOZCZ-GPR64
 ; RUN: llc < %s -mtriple=arm64-apple-ios -mcpu=cyclone | FileCheck %s -check-prefixes=ALL,ZCZ-GPR32,ZCZ-GPR64
 ; RUN: llc < %s -mtriple=arm64-apple-macosx -mcpu=apple-m1 | FileCheck %s -check-prefixes=ALL,ZCZ-GPR32,ZCZ-GPR64
-; RUN: llc < %s -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 | FileCheck %s -check-prefixes=ALL,NOZCZ-GPR
+; RUN: llc < %s -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 | FileCheck %s -check-prefixes=ALL,NOZCZ-GPR32-NOZCZ-GPR64
 ; RUN: llc < %s -mtriple=aarch64-linux-gnu -mcpu=kryo | FileCheck %s -check-prefixes=ALL,ZCZ-GPR32,ZCZ-GPR64
 ; RUN: llc < %s -mtriple=aarch64-linux-gnu -mcpu=falkor | FileCheck %s -check-prefixes=ALL,ZCZ-GPR32,ZCZ-GPR64
 
 define i8 @ti8() {
 entry:
 ; ALL-LABEL: ti8:
-; NOZCZ-GPR: mov w0, wzr
+; NOZCZ-GPR32-NOZCZ-GPR64: mov w0, wzr
 ; ZCZ-GPR32: mov w0, #0
+; NOZCZ-GPR32-ZCZ-GPR64: mov x0, #0
   ret i8 0
 }
 
 define i16 @ti16() {
 entry:
 ; ALL-LABEL: ti16:
-; NOZCZ-GPR: mov w0, wzr
+; NOZCZ-GPR32-NOZCZ-GPR64: mov w0, wzr
 ; ZCZ-GPR32: mov w0, #0
+; NOZCZ-GPR32-ZCZ-GPR64: mov x0, #0
   ret i16 0
 }
 
 define i32 @ti32() {
 entry:
 ; ALL-LABEL: ti32:
-; NOZCZ-GPR: mov w0, wzr
+; NOZCZ-GPR32-NOZCZ-GPR64: mov w0, wzr
 ; ZCZ-GPR32: mov w0, #0
+; NOZCZ-GPR32-ZCZ-GPR64: mov x0, #0
   ret i32 0
 }
 
 define i64 @ti64() {
 entry:
 ; ALL-LABEL: ti64:
-; NOZCZ-GPR: mov x0, xzr
+; NOZCZ-GPR32-NOZCZ-GPR64 mov x0, xzr
 ; ZCZ-GPR64: mov x0, #0
   ret i64 0
 }

Copy link
Member

@dtellenbach dtellenbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message could be a bit more precise, mentioning that you emit $xn = MOVZXi 0, 0 even when copying into $wn to take advantage of the GPR64 zero-cycle move.

Other than that, lgtm.

Given a GPR32 zeroing instruction, if the target supports zero cycle zeroing for GPR64 but not for GPR32, widen the instruction to 64 bit `$xn = MOVZXi 0, 0` instead of writing to `$wn` to exploit zero cycle zeroing.

It also aligns naming in the generic zeroing test.
@tomershafir tomershafir force-pushed the aarch64-expand-gpr-zeroing-for-zcz branch from 0462d59 to d56419f Compare October 24, 2025 08:31
@tomershafir tomershafir merged commit f7585ad into llvm:main Oct 25, 2025
10 checks passed
@tomershafir tomershafir deleted the aarch64-expand-gpr-zeroing-for-zcz branch October 25, 2025 06:37
dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
Given a GPR32 zeroing instruction, if the target supports zero cycle
zeroing for GPR64 but not for GPR32, widen the instruction to 64 bit
`$xn = MOVZXi 0, 0` instead of writing to `$wn` to exploit zero cycle
zeroing.

It also aligns naming in the generic zeroing test.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
Given a GPR32 zeroing instruction, if the target supports zero cycle
zeroing for GPR64 but not for GPR32, widen the instruction to 64 bit
`$xn = MOVZXi 0, 0` instead of writing to `$wn` to exploit zero cycle
zeroing.

It also aligns naming in the generic zeroing test.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
Given a GPR32 zeroing instruction, if the target supports zero cycle
zeroing for GPR64 but not for GPR32, widen the instruction to 64 bit
`$xn = MOVZXi 0, 0` instead of writing to `$wn` to exploit zero cycle
zeroing.

It also aligns naming in the generic zeroing test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants