Skip to content

Conversation

@guy-david
Copy link
Contributor

The motivation is to allow passes such as MachineLICM to hoist trivial FMOV instructions out of loops, where previously it didn't do so even when the RHS is a constant.
On most architectures, these expensive move instructions have a latency of 2-6 cycles, and certainly not cheap as a 0-1 cycle move.

@llvmbot
Copy link
Member

llvmbot commented Nov 12, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Guy David (guy-david)

Changes

The motivation is to allow passes such as MachineLICM to hoist trivial FMOV instructions out of loops, where previously it didn't do so even when the RHS is a constant.
On most architectures, these expensive move instructions have a latency of 2-6 cycles, and certainly not cheap as a 0-1 cycle move.


Full diff: https://github.com/llvm/llvm-project/pull/167661.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+24)
  • (added) llvm/test/CodeGen/AArch64/licm-regclass-copy.mir (+55)
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 4b4073365483e..6482091c2cc70 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -1043,6 +1043,27 @@ static bool isCheapImmediate(const MachineInstr &MI, unsigned BitSize) {
   return Is.size() <= 2;
 }
 
+// Check if a COPY instruction is cheap.
+static bool isCheapCopy(const MachineInstr &MI,
+                        const AArch64RegisterInfo &RI) {
+  assert(MI.isCopy() && "Expected COPY instruction");
+  const MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
+
+  // Cross-register-class copies (e.g., between GPR and FPR) are expensive on
+  // AArch64, typically requiring an FMOV instruction with a 2-6 cycle latency.
+  auto getRegClass = [&](Register Reg) -> const TargetRegisterClass * {
+    return Reg.isVirtual() ? MRI.getRegClass(Reg)
+           : Reg.isPhysical() ? RI.getMinimalPhysRegClass(Reg)
+           : nullptr;
+  };
+  const TargetRegisterClass *DstRC = getRegClass(MI.getOperand(0).getReg());
+  const TargetRegisterClass *SrcRC = getRegClass(MI.getOperand(1).getReg());
+  if (DstRC && SrcRC && !RI.getCommonSubClass(DstRC, SrcRC))
+    return false;
+
+  return MI.isAsCheapAsAMove();
+}
+
 // FIXME: this implementation should be micro-architecture dependent, so a
 // micro-architecture target hook should be introduced here in future.
 bool AArch64InstrInfo::isAsCheapAsAMove(const MachineInstr &MI) const {
@@ -1056,6 +1077,9 @@ bool AArch64InstrInfo::isAsCheapAsAMove(const MachineInstr &MI) const {
   default:
     return MI.isAsCheapAsAMove();
 
+  case TargetOpcode::COPY:
+    return isCheapCopy(MI, RI);
+
   case AArch64::ADDWrs:
   case AArch64::ADDXrs:
   case AArch64::SUBWrs:
diff --git a/llvm/test/CodeGen/AArch64/licm-regclass-copy.mir b/llvm/test/CodeGen/AArch64/licm-regclass-copy.mir
new file mode 100644
index 0000000000000..287379774f519
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/licm-regclass-copy.mir
@@ -0,0 +1,55 @@
+# NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+# RUN: llc -mtriple=aarch64 -run-pass=early-machinelicm -verify-machineinstrs -o - %s | FileCheck %s
+
+# This test verifies that cross-register-class copies (e.g., between GPR and FPR)
+# ARE hoisted out of loops by MachineLICM, as they translate to expensive
+# instructions like FMOV (2-6 cycles) on AArch64.
+
+---
+name: cross_regclass_copy_hoisted
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: gpr64 }
+  - { id: 1, class: gpr64 }
+  - { id: 2, class: fpr64 }
+body: |
+  ; CHECK-LABEL: name: cross_regclass_copy_hoisted
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   successors: %bb.1(0x80000000)
+  ; CHECK-NEXT:   liveins: $x0, $d0
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %0:gpr64 = COPY $x0
+  ; CHECK-NEXT:   %2:fpr64 = COPY $d0
+  ; CHECK-NEXT:   %1:gpr64 = COPY %2
+  ; CHECK-NEXT:   B %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1(0x40000000), %bb.2(0x40000000)
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %0:gpr64 = ADDXri %0, 1, 0
+  ; CHECK-NEXT:   $xzr = SUBSXri %0, 100, 0, implicit-def $nzcv
+  ; CHECK-NEXT:   Bcc 11, %bb.1, implicit $nzcv
+  ; CHECK-NEXT:   B %bb.2
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.2:
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   $x0 = COPY %1
+  ; CHECK-NEXT:   RET_ReallyLR implicit $x0
+  bb.0:
+    liveins: $x0, $d0
+    %0:gpr64 = COPY $x0
+    %2:fpr64 = COPY $d0
+    B %bb.1
+
+  bb.1:
+    ; This COPY between FPR64 and GPR64 should be hoisted
+    %1:gpr64 = COPY %2:fpr64
+    %0:gpr64 = ADDXri %0:gpr64, 1, 0
+    $xzr = SUBSXri %0:gpr64, 100, 0, implicit-def $nzcv
+    Bcc 11, %bb.1, implicit $nzcv
+    B %bb.2
+
+  bb.2:
+    $x0 = COPY %1:gpr64
+    RET_ReallyLR implicit $x0
+...

@github-actions
Copy link

github-actions bot commented Nov 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@guy-david guy-david force-pushed the users/guy-david/aarch64-fmov-not-cheap branch from 258e4cf to 0714e8b Compare November 12, 2025 10:31
Copy link
Contributor

@nasherm nasherm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guy-david guy-david force-pushed the users/guy-david/aarch64-fmov-not-cheap branch 2 times, most recently from 3f257e5 to e584890 Compare November 12, 2025 22:35
@aemerson
Copy link
Contributor

To nit pick on the language: I think the concept here is not disjoint register classes but of cross-bank copies. For a given bank (set of physical registers) we can have an arbitrary configuration of reg classes (GPR64, GPR64sp etc) but those are not the problem.

@guy-david guy-david force-pushed the users/guy-david/aarch64-fmov-not-cheap branch from e584890 to fb9a2e5 Compare November 14, 2025 16:19
@guy-david guy-david requested a review from fhahn November 16, 2025 21:05
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

Would be good to also update the PR title/description ot use cross-register bank instead of disjoint register classes.

The motivation is to allow passes such as MachineLICM to hoist trivial
FMOV instructions out of loops, where previously it didn't do so even
when the RHS is a constant.
On most architectures, these expensive move instructions have a latency
of 2-6 cycles, and certainly not cheap as a 0-1 cycle move.
@guy-david guy-david force-pushed the users/guy-david/aarch64-fmov-not-cheap branch from fb9a2e5 to 24b8b47 Compare November 17, 2025 20:01
@guy-david guy-david changed the title [AArch64] Consider COPY between disjoint register classes as expensive [AArch64] Treat COPY between cross-register banks as expensive Nov 17, 2025
@github-actions
Copy link

🐧 Linux x64 Test Results

  • 186262 tests passed
  • 4852 tests skipped

@guy-david guy-david merged commit 7d0a208 into main Nov 17, 2025
10 checks passed
@guy-david guy-david deleted the users/guy-david/aarch64-fmov-not-cheap branch November 17, 2025 22:05
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 17, 2025

LLVM Buildbot has detected a new failure on builder lldb-remote-linux-ubuntu running on as-builder-9 while building llvm at step 16 "test-check-lldb-api".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/195/builds/17493

Here is the relevant piece of the build log for the reference
Step 16 (test-check-lldb-api) failure: Test just built components: check-lldb-api completed (failure)
...
PASS: lldb-api :: types/TestIntegerType.py (1317 of 1326)
PASS: lldb-api :: types/TestRecursiveTypes.py (1318 of 1326)
UNSUPPORTED: lldb-api :: windows/launch/missing-dll/TestMissingDll.py (1319 of 1326)
PASS: lldb-api :: types/TestIntegerTypeExpr.py (1320 of 1326)
UNSUPPORTED: lldb-api :: windows/launch/replace-dll/TestReplaceDLL.py (1321 of 1326)
PASS: lldb-api :: types/TestShortType.py (1322 of 1326)
PASS: lldb-api :: types/TestShortTypeExpr.py (1323 of 1326)
PASS: lldb-api :: types/TestLongTypes.py (1324 of 1326)
PASS: lldb-api :: types/TestLongTypesExpr.py (1325 of 1326)
TIMEOUT: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py (1326 of 1326)
******************** TEST 'lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py' FAILED ********************
Script:
--
/usr/bin/python3.12 /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --libcxx-include-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/c++/v1 --libcxx-include-target-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/include/aarch64-unknown-linux-gnu/c++/v1 --libcxx-library-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib/aarch64-unknown-linux-gnu --arch aarch64 --build-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/lldb --compiler /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/bin/clang --dsymutil /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./bin --lldb-obj-root /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/tools/lldb --lldb-libs-dir /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/./lib --cmake-build-type Release --platform-url connect://jetson-agx-2198.lab.llvm.org:1234 --platform-working-dir /home/ubuntu/lldb-tests --sysroot /mnt/fs/jetson-agx-ubuntu --env ARCH_CFLAGS=-mcpu=cortex-a78 --platform-name remote-linux --skip-category=lldb-server /home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/llvm-project/lldb/test/API/python_api/process/cancel_attach -p TestCancelAttach.py
--
Exit Code: -9
Timeout: Reached timeout of 600 seconds

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision 7d0a2082bffb162f79fd739c79f2bf0b552b9007)
  clang revision 7d0a2082bffb162f79fd739c79f2bf0b552b9007
  llvm revision 7d0a2082bffb162f79fd739c79f2bf0b552b9007

--
Command Output (stderr):
--
WARNING:root:Custom libc++ is not supported for remote runs: ignoring --libcxx arguments
FAIL: LLDB (/home/buildbot/worker/as-builder-9/lldb-remote-linux-ubuntu/build/bin/clang-aarch64) :: test_scripted_implementation (TestCancelAttach.AttachCancelTestCase.test_scripted_implementation)

--

********************
Slowest Tests:
--------------------------------------------------------------------------
600.04s: lldb-api :: python_api/process/cancel_attach/TestCancelAttach.py
123.80s: lldb-api :: functionalities/progress_reporting/TestProgressReporting.py
74.10s: lldb-api :: functionalities/data-formatter/data-formatter-stl/libcxx-simulators/string/TestDataFormatterLibcxxStringSimulator.py
70.44s: lldb-api :: commands/process/attach/TestProcessAttach.py
60.77s: lldb-api :: commands/command/script_alias/TestCommandScriptAlias.py
35.78s: lldb-api :: functionalities/completion/TestCompletion.py
35.62s: lldb-api :: functionalities/single-thread-step/TestSingleThreadStepTimeout.py
25.44s: lldb-api :: commands/statistics/basic/TestStats.py
23.71s: lldb-api :: commands/dwim-print/TestDWIMPrint.py
23.10s: lldb-api :: python_api/watchpoint/watchlocation/TestTargetWatchAddress.py
21.29s: lldb-api :: functionalities/gdb_remote_client/TestGDBRemoteClient.py
20.59s: lldb-api :: functionalities/gdb_remote_client/TestPlatformClient.py
19.88s: lldb-api :: functionalities/thread/state/TestThreadStates.py
15.25s: lldb-api :: python_api/find_in_memory/TestFindRangesInMemory.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants