Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Fold addi into load / store even if they are in different BBs. #67024

Closed
wants to merge 1 commit into from

Conversation

mgudim
Copy link
Contributor

@mgudim mgudim commented Sep 21, 2023

Currently, since ISel only looks at one basic block at a time we miss some opportunities to combine load / store with addi. Such opportunities may occur when GEP and the use of GEP are in different basic blocks.

In this PR we combine addi with memory access in RISCVISelLowering:finalizeLowering.

According to my measurements, this patch is about 1% in dynamic instruciton count on mcf.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 21, 2023

@llvm/pr-subscribers-backend-risc-v

Changes

Currently, since ISel only looks at one basic block at a time we miss some opportunities to combine load / store with addi. Such opportunities may occur when GEP and the use of GEP are in different basic blocks.

In this PR we combine addi with memory access in RISCVISelLowering:finalizeLowering.


Full diff: https://github.com/llvm/llvm-project/pull/67024.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+79)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.h (+2)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index d176fcaf54c2db0..492714ae2a78f8e 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -49,6 +49,7 @@ using namespace llvm;
 #define DEBUG_TYPE "riscv-lower"
 
 STATISTIC(NumTailCalls, "Number of tail calls");
+STATISTIC(NumADDIsMerged, "Number of ADDIs merged.");
 
 static cl::opt<unsigned> ExtensionMaxWebSize(
     DEBUG_TYPE "-ext-max-web-size", cl::Hidden,
@@ -2278,6 +2279,84 @@ bool RISCVTargetLowering::isLegalElementTypeForRVV(EVT ScalarTy) const {
   }
 }
 
+static bool tryToFoldInstIntoUse(MachineInstr &UseMI, MachineInstr &MI) {
+
+  if (MI.getOpcode() != RISCV::ADDI)
+    return false;
+  if (!(MI.getOperand(0).isReg() && MI.getOperand(1).isReg()))
+    return false;
+
+  switch (UseMI.getOpcode()) {
+  default:
+    return false;
+  case RISCV::LB:
+  case RISCV::LH:
+  case RISCV::LW:
+  case RISCV::LD:
+  case RISCV::LBU:
+  case RISCV::LHU:
+  case RISCV::SB:
+  case RISCV::SH:
+  case RISCV::SW:
+  case RISCV::SD:
+    break;
+  }
+  MachineOperand &OriginalBaseMO = UseMI.getOperand(1);
+  if (!OriginalBaseMO.isReg())
+    return false;
+  if (OriginalBaseMO.getReg() != MI.getOperand(0).getReg())
+    return false;
+
+  MachineOperand &OriginalOffsetMO = UseMI.getOperand(2);
+  MachineOperand &ADDIOffsetMO = MI.getOperand(2);
+  if (!(OriginalOffsetMO.isImm() && ADDIOffsetMO.isImm()))
+    return false;
+
+  int64_t OriginalOffset = OriginalOffsetMO.getImm();
+  int64_t ADDIOffset = ADDIOffsetMO.getImm();
+  int64_t TotalOffset = OriginalOffset + ADDIOffset;
+  if (!isInt<12>(TotalOffset))
+    return false;
+
+  OriginalOffsetMO.setImm(TotalOffset);
+  OriginalBaseMO.setReg(MI.getOperand(1).getReg());
+  NumADDIsMerged++;
+  return true;
+}
+
+void RISCVTargetLowering::finalizeLowering(MachineFunction &MF) const {
+  TargetLoweringBase::finalizeLowering(MF);
+  MachineRegisterInfo &MRI = MF.getRegInfo();
+
+  SmallVector<MachineInstr *, 8> ToErase;
+  for (MachineFunction::iterator I = MF.begin(), E = MF.end(); I != E; ++I) {
+    MachineBasicBlock *MBB = &*I;
+    for (MachineBasicBlock::iterator MBBI = MBB->begin(), MBBE = MBB->end();
+         MBBI != MBBE;) {
+      MachineInstr &MI = *MBBI++;
+      if (MI.getOpcode() != RISCV::ADDI)
+        continue;
+      if (!MI.getOperand(0).isReg())
+        continue;
+      SmallVector<MachineInstr *, 4> Users;
+      for (MachineInstr &UseMI :
+           MRI.use_instructions(MI.getOperand(0).getReg())) {
+        Users.push_back(&UseMI);
+      }
+      bool AllUsesWereFolded = true;
+      for (MachineInstr *UseMI : Users) {
+        AllUsesWereFolded &= tryToFoldInstIntoUse(*UseMI, MI);
+      }
+      if (AllUsesWereFolded)
+        ToErase.push_back(&MI);
+    }
+  }
+  for (MachineInstr *MI : ToErase) {
+    MI->eraseFromParent();
+  }
+
+  return;
+}
 
 unsigned RISCVTargetLowering::combineRepeatedFPDivisors() const {
   return NumRepeatedDivisors;
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.h b/llvm/lib/Target/RISCV/RISCVISelLowering.h
index 815b9be47f56026..1f0416231a2f4cb 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.h
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.h
@@ -951,6 +951,8 @@ class RISCVTargetLowering : public TargetLowering {
     return false;
   };
 
+  void finalizeLowering(MachineFunction &MF) const override;
+
   /// For available scheduling models FDIV + two independent FMULs are much
   /// faster than two FDIVs.
   unsigned combineRepeatedFPDivisors() const override;

Currently, since ISel only looks at one basic block at a time we miss
some opportunities to combine load / store with `addi`. Such
opportunities may occur when GEP and the use of GEP are in different
basic blocks.

In this PR we combine `addi` with memory access in `RISCVISelLowering:finalizeLowering`.
@topperc
Copy link
Collaborator

topperc commented Sep 21, 2023

Tests?

Isn't CodeGenPrepare::optimizeMemoryInst supposed to move GEPs to their uses?

Does this increase register pressure?

@wangpc-pp
Copy link
Contributor

Why can't this be done in RISCVMergeBaseOffset?


MachineOperand &OriginalOffsetMO = UseMI.getOperand(2);
MachineOperand &ADDIOffsetMO = MI.getOperand(2);
if (!(OriginalOffsetMO.isImm() && ADDIOffsetMO.isImm()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an assert since we know this instruction is an ADDI?

Copy link
Contributor Author

@mgudim mgudim Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be a symbol? Like something that only gets resolved by linker?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can be a symbol. We do need to check isImm().

for (MachineBasicBlock::iterator MBBI = MBB->begin(), MBBE = MBB->end();
MBBI != MBBE;) {
MachineInstr &MI = *MBBI++;
if (MI.getOpcode() != RISCV::ADDI)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you already check Opcode == ADDI and Operand0.isReg() in tryToFoldInstIntoUse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks!

@mgudim
Copy link
Contributor Author

mgudim commented Sep 22, 2023

Tests?
I wanted to commit a test first:
#67022

This way the diff for this PR can show what's changed (https://llvm.org/docs/TestingGuide.html#precommit-workflow-for-tests)

Isn't CodeGenPrepare::optimizeMemoryInst supposed to move GEPs to their uses?
Yes, I've looked at that at first too. The problem is that the gep lowers into more than one instruction, but only one of them (addi) can be sinked into the loop to be absorbed into offset of ld.

Does this increase register pressure?

I hope not. Without this transformation, the addi inside the loop would use a register until the ld. Let me collect the number of spills before / after this pathch and I'll post that soon.

case RISCV::SB:
case RISCV::SH:
case RISCV::SW:
case RISCV::SD:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized I should also include FP loads and stores.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're also missing LWU

@mgudim
Copy link
Contributor Author

mgudim commented Sep 22, 2023

Why can't this be done in RISCVMergeBaseOffset?

I haven't seen this pass when I wrote this. I agree, good idea.

I can't understand why is RISCVMergeBaseOffset so late in the pipeline? If we move it right after ISel, we can do the optmizations earlier, which should be generally better, no? Also, some many things are easier to implement while we are still in SSA. What is the latest point at which opportunities for MergeBaseOffset are created?

I see two questions here:

(1) Should we combine this with MergeBaseOffset (I vote "yes").
(2) Where in pipeline should this be? (Move MergeBaseOffset to finalizeLowering?)
@wangpc-pp @topperc @michaelmaitland @asb What do you think?

Update: I see that there is ExpandPseudoInst pass which needs MergeBaseOffsets to be run after it. This raises another question: can we move ExpandPseudoInst earlier?

@topperc
Copy link
Collaborator

topperc commented Sep 22, 2023

Also, some many things are easier to implement while we are still in SSA.

We're still in SSA form where MergeBaseOffset is now. So I don't understand this statement.

bool AllUsesWereFolded = true;
for (MachineInstr *UseMI : Users)
AllUsesWereFolded &= tryToFoldInstIntoUse(*UseMI, MI);
if (AllUsesWereFolded)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all uses aren't folded, then this would increase register pressure wouldn't it?

@topperc
Copy link
Collaborator

topperc commented Sep 25, 2023

Could we use https://reviews.llvm.org/D152828 for this?

@mgudim
Copy link
Contributor Author

mgudim commented Sep 25, 2023

Could we use https://reviews.llvm.org/D152828 for this?

Good idea, I'll take a look but not this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants