Skip to content

Conversation

@tomershafir
Copy link
Contributor

This change improves LLVM's model accuracy by splitting AArch64 subtarget features of zero cycle zeroing per register class. This aligns with how uarch is designed (each register bank has unique capabilities). Similarly to how we improved ZCM modeling.

It splits HasZeroCycleZeroingGP to HasZeroCycleZeroingGPR32 and HasZeroCycleZeroingGPR64, removes opaque FeatureZCZeroing, and infers FeatureNoZCZeroingFP to be FeatureNoZCZeroingFPR64 based on the single usage in AArch64AsmPrinter.cpp.

It also splits arm64-zero-cycle-zeroing.ll into 2 tests one -gpr and one -fpr, similarly to ZCM, to make the tests more focused and managable in correspondance with the new modeling.

The test cases are updated as well, exlpoiting the fact that this is a refactor patch:

  • remove redundant functions that just mix isolated ones (t1-4)
  • specialize check prefixes
  • replace apple-a10 with apple-m1
  • add a -mtriple=arm64-apple-macosx -mcpu=generic test case for GPR
  • isolate mtriple=arm64-apple-ios -mcpu=cyclone FP workaround test cas and move -fullfp16 to another non-workaround test case

This change improves LLVM's model accuracy by splitting AArch64 subtarget features of zero cycle zeroing per register class. This aligns with how uarch is designed (each register bank has unique capabilities). Similarly to how we improved ZCM modeling.

It splits `HasZeroCycleZeroingGP` to `HasZeroCycleZeroingGPR32` and `HasZeroCycleZeroingGPR64`, removes opaque `FeatureZCZeroing`, and infers `FeatureNoZCZeroingFP` to be `FeatureNoZCZeroingFPR64` based on the single usage in `AArch64AsmPrinter.cpp`.

It also splits `arm64-zero-cycle-zeroing.ll` into 2 tests one `-gpr` and one `-fpr`, similarly to ZCM, to make the tests more focused and managable in correspondance with the new modeling.

The test cases are updated as well, exlpoiting the fact that this is a refactor patch:

- remove redundant functions that just mix isolated ones (t1-4)
- specialize check prefixes
- replace `apple-a10` with `apple-m1`
- add a `-mtriple=arm64-apple-macosx -mcpu=generic` test case for GPR
- isolate `mtriple=arm64-apple-ios -mcpu=cyclone` FP workaround test cas and move `-fullfp16` to another non-workaround test case
@github-actions
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 23f5331ce..5634bb98e 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -1829,8 +1829,8 @@ void AArch64AsmPrinter::emitMOVK(Register Dest, uint64_t Imm, unsigned Shift) {
 
 void AArch64AsmPrinter::emitFMov0(const MachineInstr &MI) {
   Register DestReg = MI.getOperand(0).getReg();
-  if (STI->hasZeroCycleZeroingFPR64() && !STI->hasZeroCycleZeroingFPWorkaround() &&
-      STI->isNeonAvailable()) {
+  if (STI->hasZeroCycleZeroingFPR64() &&
+      !STI->hasZeroCycleZeroingFPWorkaround() && STI->isNeonAvailable()) {
     // Convert H/S register to corresponding D register
     if (AArch64::H0 <= DestReg && DestReg <= AArch64::H31)
       DestReg = AArch64::D0 + (DestReg - AArch64::H0);

@tomershafir tomershafir deleted the aarch64-improve-zero-cycle-zeroing branch August 20, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant