[AArch64] MachineCombiner msub matching #84267

vfdff · 2024-03-07T01:23:32Z

Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub is a multiplication, then we can use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix #84152

llvmbot · 2024-03-07T01:24:06Z

@llvm/pr-subscribers-backend-aarch64

Author: Allen (vfdff)

Changes

Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix #84152

Full diff: https://github.com/llvm/llvm-project/pull/84267.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+1-1)
(added) llvm/test/CodeGen/AArch64/scalar-mla-mls.ll (+31)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 5df691f35275df..5893f76dbd5544 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -6110,8 +6110,8 @@ static bool getMaddPatterns(MachineInstr &Root,
     setFound(AArch64::MADDXrrr, 2, AArch64::XZR, MCP::MULADDX_OP2);
     break;
   case AArch64::SUBWrr:
-    setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULSUBW_OP1);
     setFound(AArch64::MADDWrrr, 2, AArch64::WZR, MCP::MULSUBW_OP2);
+    setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULSUBW_OP1);
     break;
   case AArch64::SUBXrr:
     setFound(AArch64::MADDXrrr, 1, AArch64::XZR, MCP::MULSUBX_OP1);
diff --git a/llvm/test/CodeGen/AArch64/scalar-mla-mls.ll b/llvm/test/CodeGen/AArch64/scalar-mla-mls.ll
new file mode 100644
index 00000000000000..36ac36701fa8aa
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/scalar-mla-mls.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon | FileCheck %s
+
+define ptr @test_scalar_msub(ptr %a, ptr %b) {
+; CHECK-LABEL: test_scalar_msub:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ldp w8, w11, [x1]
+; CHECK-NEXT:    ldp w9, w10, [x0]
+; CHECK-NEXT:    mul w12, w8, w9
+; CHECK-NEXT:    mul w8, w10, w8
+; CHECK-NEXT:    madd w8, w11, w9, w8
+; CHECK-NEXT:    msub w9, w11, w10, w12
+; CHECK-NEXT:    stp w9, w8, [x0]
+; CHECK-NEXT:    ret
+entry:
+  %0 = load i32, ptr %a, align 4
+  %1 = load i32, ptr %b, align 4
+  %mul = mul nsw i32 %1, %0
+  %_M_imag = getelementptr inbounds i8, ptr %a, i64 4
+  %2 = load i32, ptr %_M_imag, align 4
+  %_M_imag.i = getelementptr inbounds i8, ptr %b, i64 4
+  %3 = load i32, ptr %_M_imag.i, align 4
+  %mul3 = mul nsw i32 %3, %2
+  %sub = sub nsw i32 %mul, %mul3
+  %mul6 = mul nsw i32 %3, %0
+  %mul9 = mul nsw i32 %2, %1
+  %add = add nsw i32 %mul6, %mul9
+  store i32 %add, ptr %_M_imag, align 4
+  store i32 %sub, ptr %a, align 4
+  ret ptr %a
+}

davemgreen · 2024-03-07T20:56:36Z

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp

    setFound(AArch64::MADDWrrr, 2, AArch64::WZR, MCP::MULSUBW_OP2);
+    setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULSUBW_OP1);
    break;
  case AArch64::SUBXrr:
    setFound(AArch64::MADDXrrr, 1, AArch64::XZR, MCP::MULSUBX_OP1);


Would the same thing be needed here too for 64bit?

yes, I'll fix that, it has similar issue, thanks https://gcc.godbolt.org/z/79GnMPnY1

Done, add msub matching for i64, thanks

davemgreen

LGTM, thanks

Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them. Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm Fix llvm#84152

vfdff requested review from sebpop and sjoerdmeijer March 7, 2024 01:23

llvmbot added the backend:AArch64 label Mar 7, 2024

vfdff requested a review from davemgreen March 8, 2024 01:45

davemgreen reviewed Mar 8, 2024

View reviewed changes

vfdff requested a review from davemgreen March 8, 2024 10:11

davemgreen approved these changes Mar 8, 2024

View reviewed changes

vfdff added 2 commits March 8, 2024 18:14

[AArch64] MachineCombiner msub matching for i64

a110a1c

vfdff force-pushed the PR84152 branch from c464d51 to a110a1c Compare March 8, 2024 10:15

vfdff merged commit a110a1c into llvm:main Mar 8, 2024
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] MachineCombiner msub matching #84267

[AArch64] MachineCombiner msub matching #84267

vfdff commented Mar 7, 2024 •

edited

llvmbot commented Mar 7, 2024

davemgreen Mar 7, 2024

vfdff Mar 8, 2024 •

edited

vfdff Mar 8, 2024

davemgreen left a comment

[AArch64] MachineCombiner msub matching #84267

[AArch64] MachineCombiner msub matching #84267

Conversation

vfdff commented Mar 7, 2024 • edited

llvmbot commented Mar 7, 2024

davemgreen Mar 7, 2024

Choose a reason for hiding this comment

vfdff Mar 8, 2024 • edited

Choose a reason for hiding this comment

vfdff Mar 8, 2024

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

vfdff commented Mar 7, 2024 •

edited

vfdff Mar 8, 2024 •

edited