Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Move RISCVDeadRegisterDefinitions to post vector regalloc #90636

Merged

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Apr 30, 2024

Currently RISCVDeadRegisterDefinitions runs after vsetvli insertion, but in #70549 vsetvli insertion runs after vector regalloc and as a result we no longer convert some vsetvli a0, a0s to vsetvli x0, a0. This patch moves it to after vector regalloc, but before scalar regalloc so we still get the benefits of reducing register pressure.

There are no differences on llvm-test-suite or SPEC CPU 2017 with -O3 and -march=rv64gv, just one case in the in-tree tests where a vsetvli output VL is now marked as dead after phi elimination.

The original patch mentions that it may reduce register pressure and improve register allocation, but I believe the register allocator already accounts for dead defs hence why this almost an NFC.

There are no differences on llvm-test-suite or SPEC CPU 2017 with -O3 and -march=rv64gv, just one case in the in-tree tests where a vsetvli output VL is now marked as dead after phi elimination.

The motivation for this is to slightly simplify how we handle vleff with post (vector) regalloc in llvm#70549, by keeping GPR defs virtual where possible.
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 30, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

The original patch mentions that it may reduce register pressure and improve register allocation, but I believe the register allocator already accounts for dead defs hence why this almost an NFC.

There are no differences on llvm-test-suite or SPEC CPU 2017 with -O3 and -march=rv64gv, just one case in the in-tree tests where a vsetvli output VL is now marked as dead after phi elimination.

The motivation for this is to slightly simplify how we handle vleff with post (vector) regalloc in #70549, by keeping GPR defs virtual where possible.


Full diff: https://github.com/llvm/llvm-project/pull/90636.diff

4 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp (+1-3)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll (+2-2)
diff --git a/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp b/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp
index df607236f7d56f..fd6708c44e7720 100644
--- a/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp
+++ b/llvm/lib/Target/RISCV/RISCVDeadRegisterDefinitions.cpp
@@ -77,10 +77,8 @@ bool RISCVDeadRegisterDefinitions::runOnMachineFunction(MachineFunction &MF) {
           LLVM_DEBUG(dbgs() << "    Ignoring, def is tied operand.\n");
           continue;
         }
-        // We should not have any relevant physreg defs that are replacable by
-        // zero before register allocation. So we just check for dead vreg defs.
         Register Reg = MO.getReg();
-        if (!Reg.isVirtual() || (!MO.isDead() && !MRI->use_nodbg_empty(Reg)))
+        if (!MO.isDead() && !MRI->use_nodbg_empty(Reg))
           continue;
         LLVM_DEBUG(dbgs() << "    Dead def operand #" << I << " in:\n      ";
                    MI.print(dbgs()));
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 0876f46728a10c..d4978abf205f53 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -536,9 +536,6 @@ void RISCVPassConfig::addPreRegAlloc() {
   if (TM->getOptLevel() != CodeGenOptLevel::None)
     addPass(createRISCVMergeBaseOffsetOptPass());
   addPass(createRISCVInsertVSETVLIPass());
-  if (TM->getOptLevel() != CodeGenOptLevel::None &&
-      EnableRISCVDeadRegisterElimination)
-    addPass(createRISCVDeadRegisterDefinitionsPass());
   addPass(createRISCVInsertReadWriteCSRPass());
   addPass(createRISCVInsertWriteVXRMPass());
 }
@@ -553,6 +550,9 @@ void RISCVPassConfig::addPostRegAlloc() {
   if (TM->getOptLevel() != CodeGenOptLevel::None &&
       EnableRedundantCopyElimination)
     addPass(createRISCVRedundantCopyEliminationPass());
+  if (TM->getOptLevel() != CodeGenOptLevel::None &&
+      EnableRISCVDeadRegisterElimination)
+    addPass(createRISCVDeadRegisterDefinitionsPass());
 }
 
 yaml::MachineFunctionInfo *
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index 4121d111091117..174996b9c10b00 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -116,7 +116,6 @@
 ; CHECK-NEXT:       RISC-V Pre-RA pseudo instruction expansion pass
 ; CHECK-NEXT:       RISC-V Merge Base Offset
 ; CHECK-NEXT:       RISC-V Insert VSETVLI pass
-; CHECK-NEXT:       RISC-V Dead register definitions
 ; CHECK-NEXT:       RISC-V Insert Read/Write CSR Pass
 ; CHECK-NEXT:       RISC-V Insert Write VXRM Pass
 ; CHECK-NEXT:       Detect Dead Lanes
@@ -153,6 +152,7 @@
 ; CHECK-NEXT:       Machine Copy Propagation Pass
 ; CHECK-NEXT:       Machine Loop Invariant Code Motion
 ; CHECK-NEXT:       RISC-V Redundant Copy Elimination
+; CHECK-NEXT:       RISC-V Dead register definitions
 ; CHECK-NEXT:       Remove Redundant DEBUG_VALUE analysis
 ; CHECK-NEXT:       Fixup Statepoint Caller Saved
 ; CHECK-NEXT:       PostRA Machine Sink
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
index 4ff2fc7a5fff5d..088d121564bc96 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll
@@ -88,13 +88,13 @@ define <vscale x 1 x double> @test3(i64 %avl, i8 zeroext %cond, <vscale x 1 x do
 ; CHECK:       # %bb.0: # %entry
 ; CHECK-NEXT:    beqz a1, .LBB2_2
 ; CHECK-NEXT:  # %bb.1: # %if.then
-; CHECK-NEXT:    vsetvli a0, a0, e64, m1, ta, ma
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; CHECK-NEXT:    vfadd.vv v9, v8, v9
 ; CHECK-NEXT:    vfmul.vv v8, v9, v8
 ; CHECK-NEXT:    # implicit-def: $x10
 ; CHECK-NEXT:    ret
 ; CHECK-NEXT:  .LBB2_2: # %if.else
-; CHECK-NEXT:    vsetvli a0, a0, e64, m1, ta, ma
+; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
 ; CHECK-NEXT:    vfsub.vv v9, v8, v9
 ; CHECK-NEXT:    vfmul.vv v8, v9, v8
 ; CHECK-NEXT:    # implicit-def: $x10

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@topperc
Copy link
Collaborator

topperc commented Apr 30, 2024

Other than vsetvli, the original patch only affected atomics and non-volatile loads. I'm not sure if those are well tested by SPEC or llvm-test-suite. Especially in high register pressure situation.

believe the register allocator already accounts for dead defs hence why this almost an NFC.

If all the non-reserved registers are in use, the register allocation won't pick X0. It would spill to pick a non-reserved register.

@lukel97
Copy link
Contributor Author

lukel97 commented May 1, 2024

If all the non-reserved registers are in use, the register allocation won't pick X0. It would spill to pick a non-reserved register.

That makes more sense, I forgot that X0 was reserved.

@BeMg I also found we have some regressions in #70549 since this no longer runs after vsetvli insertion, it seems to affect cases where two vsetvlis were coalesced e.g.:

 define <vscale x 1 x double> @test1(i64 %avl, <vscale x 1 x double> %a, <vscale x 1 x double> %b) nounwind {
 ; CHECK-LABEL: test1:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vsetvli zero, a0, e64, m1, ta, ma
+; CHECK-NEXT:    vsetvli a0, a0, e64, m1, ta, ma
 ; CHECK-NEXT:    vfadd.vv v8, v8, v9
 ; CHECK-NEXT:    ret
entry:
  %0 = tail call i64 @llvm.riscv.vsetvli(i64 %avl, i64 2, i64 7)
  %1 = tail call <vscale x 1 x double> @llvm.riscv.vfadd.nxv1f64.nxv1f64(
    <vscale x 1 x double> undef,
    <vscale x 1 x double> %a,
    <vscale x 1 x double> %b,
    i64 7, i64 %0)
  ret <vscale x 1 x double> %1
}

We could potentially run this twice, before regalloc and after regalloc. Or I think we might be able to also make coalescing smarter and have it preserve the x0 def.

@lukel97 lukel97 changed the title [RISCV] Move RISCVDeadRegisterDefinitions to post regalloc [RISCV] Move RISCVDeadRegisterDefinitions to post vector regalloc May 2, 2024
@lukel97
Copy link
Contributor Author

lukel97 commented May 2, 2024

I've updated this to run after vector regalloc but before scalar regalloc, so we should have the best of both worlds now.

@lukel97 lukel97 requested a review from dtcxzyw May 2, 2024 08:29
Copy link
Contributor

@BeMg BeMg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit

@lukel97 lukel97 merged commit 52187b9 into llvm:main May 6, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants