Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] MachineCombiner msub matching #84267

Merged
merged 2 commits into from
Mar 8, 2024
Merged

[AArch64] MachineCombiner msub matching #84267

merged 2 commits into from
Mar 8, 2024

Conversation

vfdff
Copy link
Contributor

@vfdff vfdff commented Mar 7, 2024

Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub is a multiplication, then we can use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix #84152

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 7, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Allen (vfdff)

Changes

Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix #84152


Full diff: https://github.com/llvm/llvm-project/pull/84267.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+1-1)
  • (added) llvm/test/CodeGen/AArch64/scalar-mla-mls.ll (+31)
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 5df691f35275df..5893f76dbd5544 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -6110,8 +6110,8 @@ static bool getMaddPatterns(MachineInstr &Root,
     setFound(AArch64::MADDXrrr, 2, AArch64::XZR, MCP::MULADDX_OP2);
     break;
   case AArch64::SUBWrr:
-    setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULSUBW_OP1);
     setFound(AArch64::MADDWrrr, 2, AArch64::WZR, MCP::MULSUBW_OP2);
+    setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULSUBW_OP1);
     break;
   case AArch64::SUBXrr:
     setFound(AArch64::MADDXrrr, 1, AArch64::XZR, MCP::MULSUBX_OP1);
diff --git a/llvm/test/CodeGen/AArch64/scalar-mla-mls.ll b/llvm/test/CodeGen/AArch64/scalar-mla-mls.ll
new file mode 100644
index 00000000000000..36ac36701fa8aa
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/scalar-mla-mls.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon | FileCheck %s
+
+define ptr @test_scalar_msub(ptr %a, ptr %b) {
+; CHECK-LABEL: test_scalar_msub:
+; CHECK:       // %bb.0: // %entry
+; CHECK-NEXT:    ldp w8, w11, [x1]
+; CHECK-NEXT:    ldp w9, w10, [x0]
+; CHECK-NEXT:    mul w12, w8, w9
+; CHECK-NEXT:    mul w8, w10, w8
+; CHECK-NEXT:    madd w8, w11, w9, w8
+; CHECK-NEXT:    msub w9, w11, w10, w12
+; CHECK-NEXT:    stp w9, w8, [x0]
+; CHECK-NEXT:    ret
+entry:
+  %0 = load i32, ptr %a, align 4
+  %1 = load i32, ptr %b, align 4
+  %mul = mul nsw i32 %1, %0
+  %_M_imag = getelementptr inbounds i8, ptr %a, i64 4
+  %2 = load i32, ptr %_M_imag, align 4
+  %_M_imag.i = getelementptr inbounds i8, ptr %b, i64 4
+  %3 = load i32, ptr %_M_imag.i, align 4
+  %mul3 = mul nsw i32 %3, %2
+  %sub = sub nsw i32 %mul, %mul3
+  %mul6 = mul nsw i32 %3, %0
+  %mul9 = mul nsw i32 %2, %1
+  %add = add nsw i32 %mul6, %mul9
+  store i32 %add, ptr %_M_imag, align 4
+  store i32 %sub, ptr %a, align 4
+  ret ptr %a
+}

@vfdff vfdff requested a review from davemgreen March 8, 2024 01:45
setFound(AArch64::MADDWrrr, 2, AArch64::WZR, MCP::MULSUBW_OP2);
setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULSUBW_OP1);
break;
case AArch64::SUBXrr:
setFound(AArch64::MADDXrrr, 1, AArch64::XZR, MCP::MULSUBX_OP1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the same thing be needed here too for 64bit?

Copy link
Contributor Author

@vfdff vfdff Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'll fix that, it has similar issue, thanks https://gcc.godbolt.org/z/79GnMPnY1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, add msub matching for i64, thanks

@vfdff vfdff requested a review from davemgreen March 8, 2024 10:11
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Pattern should be sorted in priority order since the pattern evalutor
stops checking as soon as it finds a faster sequence.
so for a * b - c * d, we prefer to match the 2nd operands of sub,
which can be use msub to fold them.

Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm

Fix llvm#84152
@vfdff vfdff merged commit a110a1c into llvm:main Mar 8, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AArch64] there is redundant neg instruction for mull with complex type
3 participants