-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] MachineCombiner msub matching #84267
Conversation
@llvm/pr-subscribers-backend-aarch64 Author: Allen (vfdff) ChangesPattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them. Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm Fix #84152 Full diff: https://github.com/llvm/llvm-project/pull/84267.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 5df691f35275df..5893f76dbd5544 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -6110,8 +6110,8 @@ static bool getMaddPatterns(MachineInstr &Root,
setFound(AArch64::MADDXrrr, 2, AArch64::XZR, MCP::MULADDX_OP2);
break;
case AArch64::SUBWrr:
- setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULSUBW_OP1);
setFound(AArch64::MADDWrrr, 2, AArch64::WZR, MCP::MULSUBW_OP2);
+ setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULSUBW_OP1);
break;
case AArch64::SUBXrr:
setFound(AArch64::MADDXrrr, 1, AArch64::XZR, MCP::MULSUBX_OP1);
diff --git a/llvm/test/CodeGen/AArch64/scalar-mla-mls.ll b/llvm/test/CodeGen/AArch64/scalar-mla-mls.ll
new file mode 100644
index 00000000000000..36ac36701fa8aa
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/scalar-mla-mls.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon | FileCheck %s
+
+define ptr @test_scalar_msub(ptr %a, ptr %b) {
+; CHECK-LABEL: test_scalar_msub:
+; CHECK: // %bb.0: // %entry
+; CHECK-NEXT: ldp w8, w11, [x1]
+; CHECK-NEXT: ldp w9, w10, [x0]
+; CHECK-NEXT: mul w12, w8, w9
+; CHECK-NEXT: mul w8, w10, w8
+; CHECK-NEXT: madd w8, w11, w9, w8
+; CHECK-NEXT: msub w9, w11, w10, w12
+; CHECK-NEXT: stp w9, w8, [x0]
+; CHECK-NEXT: ret
+entry:
+ %0 = load i32, ptr %a, align 4
+ %1 = load i32, ptr %b, align 4
+ %mul = mul nsw i32 %1, %0
+ %_M_imag = getelementptr inbounds i8, ptr %a, i64 4
+ %2 = load i32, ptr %_M_imag, align 4
+ %_M_imag.i = getelementptr inbounds i8, ptr %b, i64 4
+ %3 = load i32, ptr %_M_imag.i, align 4
+ %mul3 = mul nsw i32 %3, %2
+ %sub = sub nsw i32 %mul, %mul3
+ %mul6 = mul nsw i32 %3, %0
+ %mul9 = mul nsw i32 %2, %1
+ %add = add nsw i32 %mul6, %mul9
+ store i32 %add, ptr %_M_imag, align 4
+ store i32 %sub, ptr %a, align 4
+ ret ptr %a
+}
|
setFound(AArch64::MADDWrrr, 2, AArch64::WZR, MCP::MULSUBW_OP2); | ||
setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULSUBW_OP1); | ||
break; | ||
case AArch64::SUBXrr: | ||
setFound(AArch64::MADDXrrr, 1, AArch64::XZR, MCP::MULSUBX_OP1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the same thing be needed here too for 64bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'll fix that, it has similar issue, thanks https://gcc.godbolt.org/z/79GnMPnY1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, add msub matching for i64, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub, which can be use msub to fold them. Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm Fix llvm#84152
Pattern should be sorted in priority order since the pattern evalutor stops checking as soon as it finds a faster sequence. so for a * b - c * d, we prefer to match the 2nd operands of sub is a multiplication, then we can use msub to fold them.
Refer to https://www.slideshare.net/chimerawang/instruction-combine-in-llvm
Fix #84152