Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Enable expansion mul expansion to shNadd x, (slli x, c) #87105

Closed
wants to merge 1 commit into from

Conversation

preames
Copy link
Collaborator

@preames preames commented Mar 29, 2024

This expansion was originally added in https://reviews.llvm.org/D106648, but restricted to only the non-simm12 case. As noted in the original review, the restriction was conservative. The problem being dodged appears to be the deltas in test/CodeGen/RISCV/addimm-mulimm.ll in this change, but I think those are reasonable to take.

The basic problem illustrated by those diffs is that the expansion (via decomposeMulByConstant) runs early and thus inhibits other optimizations in some cases. This problem doesn't appear specific to this case at all. The alternate expansion some targets do via post-legalize dag combines seems aimed at exactly this issue. I plan to revisit this general problem in a separate change in the near future.

This expansion was originally added in https://reviews.llvm.org/D106648,
but restricted to only the non-simm12 case.  As noted in the original
review, the restriction was conservative.  The problem being dodged
appears to be the deltas in test/CodeGen/RISCV/addimm-mulimm.ll in this
change, but I think those are reasonable to take.

The basic problem illustrated by those diffs is that the expansion
(via decomposeMulByConstant) runs early and thus inhibits other
optimizations in some cases.  This problem doesn't appear specific
to this case at all.  The alternate expansion some targets do
via post-legalize dag combines seems aimed at exactly this issue.
I plan to revisit this general problem in a separate change in
the near future.
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 29, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

This expansion was originally added in https://reviews.llvm.org/D106648, but restricted to only the non-simm12 case. As noted in the original review, the restriction was conservative. The problem being dodged appears to be the deltas in test/CodeGen/RISCV/addimm-mulimm.ll in this change, but I think those are reasonable to take.

The basic problem illustrated by those diffs is that the expansion (via decomposeMulByConstant) runs early and thus inhibits other optimizations in some cases. This problem doesn't appear specific to this case at all. The alternate expansion some targets do via post-legalize dag combines seems aimed at exactly this issue. I plan to revisit this general problem in a separate change in the near future.


Full diff: https://github.com/llvm/llvm-project/pull/87105.diff

5 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/addimm-mulimm.ll (+26-6)
  • (modified) llvm/test/CodeGen/RISCV/rv32zba.ll (+33-15)
  • (modified) llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll (+33-15)
  • (modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+33-15)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 564fda674317f4..22bd4b8c7230a9 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -20393,8 +20393,8 @@ bool RISCVTargetLowering::decomposeMulByConstant(LLVMContext &Context, EVT VT,
         (1 - Imm).isPowerOf2() || (-1 - Imm).isPowerOf2())
       return true;
 
-    // Optimize the MUL to (SH*ADD x, (SLLI x, bits)) if Imm is not simm12.
-    if (Subtarget.hasStdExtZba() && !Imm.isSignedIntN(12) &&
+    // Optimize the MUL to (SH*ADD x, (SLLI x, bits)).
+    if (Subtarget.hasStdExtZba() &&
         ((Imm - 2).isPowerOf2() || (Imm - 4).isPowerOf2() ||
          (Imm - 8).isPowerOf2()))
       return true;
diff --git a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
index 48fa69e1045656..931edacfbfff24 100644
--- a/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
+++ b/llvm/test/CodeGen/RISCV/addimm-mulimm.ll
@@ -551,8 +551,9 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
 ; RV32IMB-NEXT:    sh3add a1, a1, a2
 ; RV32IMB-NEXT:    sh1add a0, a0, a0
 ; RV32IMB-NEXT:    slli a2, a0, 3
-; RV32IMB-NEXT:    addi a0, a2, 2047
-; RV32IMB-NEXT:    addi a0, a0, 1
+; RV32IMB-NEXT:    li a3, 1
+; RV32IMB-NEXT:    slli a3, a3, 11
+; RV32IMB-NEXT:    sh3add a0, a0, a3
 ; RV32IMB-NEXT:    sltu a2, a0, a2
 ; RV32IMB-NEXT:    add a1, a1, a2
 ; RV32IMB-NEXT:    ret
@@ -561,8 +562,8 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
 ; RV64IMB:       # %bb.0:
 ; RV64IMB-NEXT:    addi a0, a0, 86
 ; RV64IMB-NEXT:    sh1add a0, a0, a0
-; RV64IMB-NEXT:    li a1, -16
-; RV64IMB-NEXT:    sh3add a0, a0, a1
+; RV64IMB-NEXT:    slli a0, a0, 3
+; RV64IMB-NEXT:    addi a0, a0, -16
 ; RV64IMB-NEXT:    ret
   %tmp0 = mul i64 %x, 24
   %tmp1 = add i64 %tmp0, 2048
@@ -879,12 +880,31 @@ define i64 @mulneg3000_sub8990_c(i64 %x) {
 define i1 @pr53831(i32 %x) {
 ; RV32IMB-LABEL: pr53831:
 ; RV32IMB:       # %bb.0:
-; RV32IMB-NEXT:    li a0, 0
+; RV32IMB-NEXT:    addi a1, a0, 1
+; RV32IMB-NEXT:    sh1add a1, a1, a1
+; RV32IMB-NEXT:    slli a1, a1, 3
+; RV32IMB-NEXT:    addi a1, a1, 1
+; RV32IMB-NEXT:    sh1add a0, a0, a0
+; RV32IMB-NEXT:    li a2, 1
+; RV32IMB-NEXT:    slli a2, a2, 11
+; RV32IMB-NEXT:    sh3add a0, a0, a2
+; RV32IMB-NEXT:    xor a0, a0, a1
+; RV32IMB-NEXT:    seqz a0, a0
 ; RV32IMB-NEXT:    ret
 ;
 ; RV64IMB-LABEL: pr53831:
 ; RV64IMB:       # %bb.0:
-; RV64IMB-NEXT:    li a0, 0
+; RV64IMB-NEXT:    addi a1, a0, 1
+; RV64IMB-NEXT:    sh1add a1, a1, a1
+; RV64IMB-NEXT:    slliw a1, a1, 3
+; RV64IMB-NEXT:    addi a1, a1, 1
+; RV64IMB-NEXT:    sh1add a0, a0, a0
+; RV64IMB-NEXT:    li a2, 1
+; RV64IMB-NEXT:    slli a2, a2, 11
+; RV64IMB-NEXT:    sh3add a0, a0, a2
+; RV64IMB-NEXT:    sext.w a0, a0
+; RV64IMB-NEXT:    xor a0, a0, a1
+; RV64IMB-NEXT:    seqz a0, a0
 ; RV64IMB-NEXT:    ret
   %tmp0 = add i32 %x, 1
   %tmp1 = mul i32 %tmp0, 24
diff --git a/llvm/test/CodeGen/RISCV/rv32zba.ll b/llvm/test/CodeGen/RISCV/rv32zba.ll
index 0908a393338c50..cc632a09c8054b 100644
--- a/llvm/test/CodeGen/RISCV/rv32zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv32zba.ll
@@ -271,31 +271,49 @@ define i32 @mul288(i32 %a) {
 }
 
 define i32 @mul258(i32 %a) {
-; CHECK-LABEL: mul258:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 258
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV32I-LABEL: mul258:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    li a1, 258
+; RV32I-NEXT:    mul a0, a0, a1
+; RV32I-NEXT:    ret
+;
+; RV32ZBA-LABEL: mul258:
+; RV32ZBA:       # %bb.0:
+; RV32ZBA-NEXT:    slli a1, a0, 8
+; RV32ZBA-NEXT:    sh1add a0, a0, a1
+; RV32ZBA-NEXT:    ret
   %c = mul i32 %a, 258
   ret i32 %c
 }
 
 define i32 @mul260(i32 %a) {
-; CHECK-LABEL: mul260:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 260
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV32I-LABEL: mul260:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    li a1, 260
+; RV32I-NEXT:    mul a0, a0, a1
+; RV32I-NEXT:    ret
+;
+; RV32ZBA-LABEL: mul260:
+; RV32ZBA:       # %bb.0:
+; RV32ZBA-NEXT:    slli a1, a0, 8
+; RV32ZBA-NEXT:    sh2add a0, a0, a1
+; RV32ZBA-NEXT:    ret
   %c = mul i32 %a, 260
   ret i32 %c
 }
 
 define i32 @mul264(i32 %a) {
-; CHECK-LABEL: mul264:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 264
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV32I-LABEL: mul264:
+; RV32I:       # %bb.0:
+; RV32I-NEXT:    li a1, 264
+; RV32I-NEXT:    mul a0, a0, a1
+; RV32I-NEXT:    ret
+;
+; RV32ZBA-LABEL: mul264:
+; RV32ZBA:       # %bb.0:
+; RV32ZBA-NEXT:    slli a1, a0, 8
+; RV32ZBA-NEXT:    sh3add a0, a0, a1
+; RV32ZBA-NEXT:    ret
   %c = mul i32 %a, 264
   ret i32 %c
 }
diff --git a/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
index 90cfb1fdcb779f..ee9b73ca82f213 100644
--- a/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64-legal-i32/rv64zba.ll
@@ -811,31 +811,49 @@ define i64 @adduw_imm(i32 signext %0) nounwind {
 }
 
 define i64 @mul258(i64 %a) {
-; CHECK-LABEL: mul258:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 258
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul258:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 258
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul258:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh1add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 258
   ret i64 %c
 }
 
 define i64 @mul260(i64 %a) {
-; CHECK-LABEL: mul260:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 260
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul260:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 260
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul260:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh2add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 260
   ret i64 %c
 }
 
 define i64 @mul264(i64 %a) {
-; CHECK-LABEL: mul264:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 264
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul264:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 264
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul264:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh3add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 264
   ret i64 %c
 }
diff --git a/llvm/test/CodeGen/RISCV/rv64zba.ll b/llvm/test/CodeGen/RISCV/rv64zba.ll
index d9d83633a8537f..e25ad50ac4c1b4 100644
--- a/llvm/test/CodeGen/RISCV/rv64zba.ll
+++ b/llvm/test/CodeGen/RISCV/rv64zba.ll
@@ -762,31 +762,49 @@ define i64 @adduw_imm(i32 signext %0) nounwind {
 }
 
 define i64 @mul258(i64 %a) {
-; CHECK-LABEL: mul258:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 258
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul258:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 258
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul258:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh1add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 258
   ret i64 %c
 }
 
 define i64 @mul260(i64 %a) {
-; CHECK-LABEL: mul260:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 260
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul260:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 260
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul260:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh2add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 260
   ret i64 %c
 }
 
 define i64 @mul264(i64 %a) {
-; CHECK-LABEL: mul264:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    li a1, 264
-; CHECK-NEXT:    mul a0, a0, a1
-; CHECK-NEXT:    ret
+; RV64I-LABEL: mul264:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    li a1, 264
+; RV64I-NEXT:    mul a0, a0, a1
+; RV64I-NEXT:    ret
+;
+; RV64ZBA-LABEL: mul264:
+; RV64ZBA:       # %bb.0:
+; RV64ZBA-NEXT:    slli a1, a0, 8
+; RV64ZBA-NEXT:    sh3add a0, a0, a1
+; RV64ZBA-NEXT:    ret
   %c = mul i64 %a, 264
   ret i64 %c
 }

@preames
Copy link
Collaborator Author

preames commented Mar 29, 2024

For the record, I can also implement this as the dag combine expansion if anyone wants. I'd actually implemented that originally before figuring out the generic code could handle this case at all.

; RV32IMB-NEXT: addi a0, a2, 2047
; RV32IMB-NEXT: addi a0, a0, 1
; RV32IMB-NEXT: li a3, 1
; RV32IMB-NEXT: slli a3, a3, 11
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we bypass the early shift here, so this sh3add match is actually an improvement. Note that with zbs, the immediate materialization becomes a single bseti.

@@ -561,8 +562,8 @@ define i64 @add_mul_combine_infinite_loop(i64 %x) {
; RV64IMB: # %bb.0:
; RV64IMB-NEXT: addi a0, a0, 86
; RV64IMB-NEXT: sh1add a0, a0, a0
; RV64IMB-NEXT: li a1, -16
; RV64IMB-NEXT: sh3add a0, a0, a1
; RV64IMB-NEXT: slli a0, a0, 3
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a mild regression, but I think only indirectly related to this patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we end up with this instead of (x*3+256)*8?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be the result of transformAddImmMulImm. I agree that looks like a questionable transform.

@@ -879,12 +880,31 @@ define i64 @mulneg3000_sub8990_c(i64 %x) {
define i1 @pr53831(i32 %x) {
; RV32IMB-LABEL: pr53831:
; RV32IMB: # %bb.0:
; RV32IMB-NEXT: li a0, 0
; RV32IMB-NEXT: addi a1, a0, 1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is checking for an infinite combine loop, but this result is the missed optimization I mentioned in the review summary.

@wangpc-pp wangpc-pp requested review from wangpc-pp and removed request for pcwang-thead April 1, 2024 03:07
preames added a commit to preames/llvm-project that referenced this pull request Apr 12, 2024
This expansion is directly inspired by the analogous code in the x86 backend
for LEA.  shXadd and (this sub-case of) LEA are largely equivalent.

This is an alternative to llvm#87105.

This expansion is also supported via the decomposeMulByConstant callback,
but restricted because of interactions with other combines since that code
runs before legalization.  As discussed in the other review, my original
plan had been to support post legalization expansion through the same
interface, but that ended up being more complicated than seems justified.

Instead, lets go ahead and do the general expansion post-legalize.  Other
targets use the combine approach, and matching that structure makes it
easier for us to adapt ideas from other targets to RISCV.
@preames
Copy link
Collaborator Author

preames commented Apr 12, 2024

I dug into whether it was easy to extend this interface to run post-legalize, and ran into more complexity than I expected. It turns out that we're relying on this running pre-legalize to catch some cases on illegal types. There are possible mul expansions which could handle these after legalization, but they aren't currently implemented.

I decided the lift required wasn't really worth it, and have an alternate patch for the same change as a DAGCombine here: #88524

preames added a commit that referenced this pull request Apr 16, 2024
This expansion is directly inspired by the analogous code in the x86
backend for LEA. shXadd and (this sub-case of) LEA are largely
equivalent.

This is an alternative to
#87105.

This expansion is also supported via the decomposeMulByConstant
callback, but restricted because of interactions with other combines
since that code runs before legalization. As discussed in the other
review, my original plan had been to support post legalization expansion
through the same interface, but that ended up being more complicated
than seems justified.

Instead, lets go ahead and do the general expansion post-legalize. Other
targets use the combine approach, and matching that structure makes it
easier for us to adapt ideas from other targets to RISCV.
@preames
Copy link
Collaborator Author

preames commented Apr 16, 2024

Alternative patch landed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants