Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Add macro fusions for Xiangshan #72362

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Nov 15, 2023

Doc: https://xiangshan-doc.readthedocs.io/zh-cn/latest/frontend/decode/

This PR is to show the usage of TableGen-based macro fusions.

Some instrcution pairs can be folded into one MacroFusion definition
but I leave them standalone to show the different ways to define a
macro fusion.

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 15, 2023

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-mc
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-backend-amdgpu

Author: Wang Pengcheng (wangpc-pp)

Changes

Doc: https://xiangshan-doc.readthedocs.io/zh-cn/latest/frontend/decode/

This PR is to show the usage of TableGen-based macro fusions.

Some instrcution pairs can be folded into one MacroFusion definition
but I leave them standalone to show the different ways to define a
macro fusion.

This PR is stacked on #72219, #72222, #72223, #72224, #72227


Patch is 56.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72362.diff

35 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MacroFusion.h (+9-9)
  • (modified) llvm/include/llvm/CodeGen/TargetSubtargetInfo.h (+4)
  • (modified) llvm/include/llvm/MC/MCSchedule.h (+9)
  • (modified) llvm/include/llvm/MC/MCSubtargetInfo.h (+13)
  • (modified) llvm/include/llvm/Target/TargetInstrPredicate.td (+6)
  • (modified) llvm/include/llvm/Target/TargetSchedule.td (+65)
  • (modified) llvm/lib/CodeGen/MachineScheduler.cpp (+13-2)
  • (modified) llvm/lib/CodeGen/MacroFusion.cpp (+27-12)
  • (modified) llvm/lib/MC/MCSchedule.cpp (+1)
  • (modified) llvm/lib/Target/AArch64/AArch64MacroFusion.cpp (+1-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUMacroFusion.cpp (+2-2)
  • (modified) llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp (+2-2)
  • (modified) llvm/lib/Target/ARM/ARMMacroFusion.cpp (+2-2)
  • (modified) llvm/lib/Target/PowerPC/PPCMacroFusion.cpp (+2-2)
  • (modified) llvm/lib/Target/RISCV/CMakeLists.txt (+1-1)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.h (+3)
  • (modified) llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp (+1-1)
  • (modified) llvm/lib/Target/RISCV/RISCV.td (+1)
  • (modified) llvm/lib/Target/RISCV/RISCVFeatures.td (+1-6)
  • (removed) llvm/lib/Target/RISCV/RISCVMacroFusion.cpp (-69)
  • (removed) llvm/lib/Target/RISCV/RISCVMacroFusion.h (-28)
  • (added) llvm/lib/Target/RISCV/RISCVMacroFusion.td (+114)
  • (modified) llvm/lib/Target/RISCV/RISCVSchedSiFive7.td (+1)
  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+10-4)
  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+3-2)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+2-23)
  • (modified) llvm/lib/Target/X86/X86MacroFusion.cpp (+2-3)
  • (modified) llvm/test/CodeGen/RISCV/macro-fusion-lui-addi.ll (+5-5)
  • (modified) llvm/utils/TableGen/CMakeLists.txt (+1)
  • (modified) llvm/utils/TableGen/CodeGenSchedule.cpp (+15)
  • (modified) llvm/utils/TableGen/CodeGenSchedule.h (+11)
  • (added) llvm/utils/TableGen/MacroFusionPredicatorEmitter.cpp (+208)
  • (modified) llvm/utils/TableGen/PredicateExpander.cpp (+8)
  • (modified) llvm/utils/TableGen/PredicateExpander.h (+1)
  • (modified) llvm/utils/TableGen/SubtargetEmitter.cpp (+46-1)
diff --git a/llvm/include/llvm/CodeGen/MacroFusion.h b/llvm/include/llvm/CodeGen/MacroFusion.h
index ea2c7a5faae385a..a97f776335368c7 100644
--- a/llvm/include/llvm/CodeGen/MacroFusion.h
+++ b/llvm/include/llvm/CodeGen/MacroFusion.h
@@ -14,8 +14,8 @@
 #ifndef LLVM_CODEGEN_MACROFUSION_H
 #define LLVM_CODEGEN_MACROFUSION_H
 
-#include <functional>
 #include <memory>
+#include <vector>
 
 namespace llvm {
 
@@ -29,10 +29,10 @@ class SUnit;
 /// Check if the instr pair, FirstMI and SecondMI, should be fused
 /// together. Given SecondMI, when FirstMI is unspecified, then check if
 /// SecondMI may be part of a fused pair at all.
-using ShouldSchedulePredTy = std::function<bool(const TargetInstrInfo &TII,
-                                                const TargetSubtargetInfo &TSI,
-                                                const MachineInstr *FirstMI,
-                                                const MachineInstr &SecondMI)>;
+using MacroFusionPredTy = bool (*)(const TargetInstrInfo &TII,
+                                   const TargetSubtargetInfo &STI,
+                                   const MachineInstr *FirstMI,
+                                   const MachineInstr &SecondMI);
 
 /// Checks if the number of cluster edges between SU and its predecessors is
 /// less than FuseLimit
@@ -48,15 +48,15 @@ bool fuseInstructionPair(ScheduleDAGInstrs &DAG, SUnit &FirstSU,
 
 /// Create a DAG scheduling mutation to pair instructions back to back
 /// for instructions that benefit according to the target-specific
-/// shouldScheduleAdjacent predicate function.
+/// predicate functions.
 std::unique_ptr<ScheduleDAGMutation>
-createMacroFusionDAGMutation(ShouldSchedulePredTy shouldScheduleAdjacent);
+createMacroFusionDAGMutation(std::vector<MacroFusionPredTy> Predicates);
 
 /// Create a DAG scheduling mutation to pair branch instructions with one
 /// of their predecessors back to back for instructions that benefit according
-/// to the target-specific shouldScheduleAdjacent predicate function.
+/// to the target-specific predicate functions.
 std::unique_ptr<ScheduleDAGMutation>
-createBranchMacroFusionDAGMutation(ShouldSchedulePredTy shouldScheduleAdjacent);
+createBranchMacroFusionDAGMutation(std::vector<MacroFusionPredTy> Predicates);
 
 } // end namespace llvm
 
diff --git a/llvm/include/llvm/CodeGen/TargetSubtargetInfo.h b/llvm/include/llvm/CodeGen/TargetSubtargetInfo.h
index 55ef95c28543190..7c76293f3e5eaea 100644
--- a/llvm/include/llvm/CodeGen/TargetSubtargetInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetSubtargetInfo.h
@@ -16,6 +16,7 @@
 #include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/CodeGen/MacroFusion.h"
 #include "llvm/CodeGen/PBQPRAConstraint.h"
 #include "llvm/CodeGen/SchedulerRegistry.h"
 #include "llvm/IR/GlobalValue.h"
@@ -323,6 +324,9 @@ class TargetSubtargetInfo : public MCSubtargetInfo {
   /// helps removing redundant copies generated by register allocator when
   /// handling complex eviction chains.
   virtual bool enableSpillageCopyElimination() const { return false; }
+
+  /// Get the list of MacroFusion predicates.
+  virtual std::vector<MacroFusionPredTy> getMacroFusions() const { return {}; }
 };
 
 } // end namespace llvm
diff --git a/llvm/include/llvm/MC/MCSchedule.h b/llvm/include/llvm/MC/MCSchedule.h
index 98ebe42cfd133b5..aa187e5cb400672 100644
--- a/llvm/include/llvm/MC/MCSchedule.h
+++ b/llvm/include/llvm/MC/MCSchedule.h
@@ -14,6 +14,7 @@
 #ifndef LLVM_MC_MCSCHEDULE_H
 #define LLVM_MC_MCSCHEDULE_H
 
+#include "llvm/ADT/Bitset.h"
 #include "llvm/Config/llvm-config.h"
 #include "llvm/Support/DataTypes.h"
 #include <cassert>
@@ -196,6 +197,9 @@ struct MCExtraProcessorInfo {
   unsigned StoreQueueID;
 };
 
+const unsigned MaxMacroFusions = 256;
+using MacroFusionBitset = Bitset<MaxMacroFusions>;
+
 /// Machine model for scheduling, bundling, and heuristics.
 ///
 /// The machine model directly provides basic information about the
@@ -325,9 +329,14 @@ struct MCSchedModel {
   const InstrItinerary *InstrItineraries;
 
   const MCExtraProcessorInfo *ExtraProcessorInfo;
+  const MacroFusionBitset *MacroFusionBits;
 
   bool hasExtraProcessorInfo() const { return ExtraProcessorInfo; }
 
+  const MacroFusionBitset *getMacroFusionBits() const {
+    return MacroFusionBits;
+  }
+
   unsigned getProcessorID() const { return ProcID; }
 
   /// Does this machine model include instruction-level scheduling.
diff --git a/llvm/include/llvm/MC/MCSubtargetInfo.h b/llvm/include/llvm/MC/MCSubtargetInfo.h
index f172a799aa3331c..66fb6c9383272e1 100644
--- a/llvm/include/llvm/MC/MCSubtargetInfo.h
+++ b/llvm/include/llvm/MC/MCSubtargetInfo.h
@@ -120,6 +120,12 @@ class MCSubtargetInfo {
     return FeatureBits[Feature];
   }
 
+  bool hasMacroFusion(unsigned MacroFusion) const {
+    const MacroFusionBitset *MacroFusionBits =
+        CPUSchedModel->getMacroFusionBits();
+    return MacroFusionBits && MacroFusionBits->test(MacroFusion);
+  }
+
 protected:
   /// Initialize the scheduling model and feature bits.
   ///
@@ -295,6 +301,13 @@ class MCSubtargetInfo {
 
   /// \return if target want to issue a prefetch in address space \p AS.
   virtual bool shouldPrefetchAddressSpace(unsigned AS) const;
+
+  /// Enable macro fusion for this subtarget.
+  virtual bool enableMacroFusion() const {
+    const MacroFusionBitset *MacroFusionBits =
+        CPUSchedModel->getMacroFusionBits();
+    return MacroFusionBits && MacroFusionBits->any();
+  }
 };
 
 } // end namespace llvm
diff --git a/llvm/include/llvm/Target/TargetInstrPredicate.td b/llvm/include/llvm/Target/TargetInstrPredicate.td
index 9f2cde9d923050a..82c4c7b23a49b6a 100644
--- a/llvm/include/llvm/Target/TargetInstrPredicate.td
+++ b/llvm/include/llvm/Target/TargetInstrPredicate.td
@@ -95,6 +95,12 @@ class MCOperandPredicate<int Index> : MCInstPredicate {
 // Return true if machine operand at position `Index` is a register operand.
 class CheckIsRegOperand<int Index> : MCOperandPredicate<Index>;
 
+// Return true if machine operand at position `Index` is a virtual register operand.
+class CheckIsVRegOperand<int Index> : MCOperandPredicate<Index>;
+
+// Return true if machine operand at position `Index` is not a virtual register operand.
+class CheckIsNotVRegOperand<int Index> : CheckNot<CheckIsVRegOperand<Index>>;
+
 // Return true if machine operand at position `Index` is an immediate operand.
 class CheckIsImmOperand<int Index> : MCOperandPredicate<Index>;
 
diff --git a/llvm/include/llvm/Target/TargetSchedule.td b/llvm/include/llvm/Target/TargetSchedule.td
index 949baa5d2105c45..a31248c7fe0627c 100644
--- a/llvm/include/llvm/Target/TargetSchedule.td
+++ b/llvm/include/llvm/Target/TargetSchedule.td
@@ -53,6 +53,7 @@
 include "llvm/Target/TargetItinerary.td"
 
 class Predicate; // Forward def
+class MacroFusion;
 
 // DAG operator that interprets the DAG args as Instruction defs.
 def instrs;
@@ -122,6 +123,9 @@ class SchedMachineModel {
   // using intervals via ResourceSegments (see
   // llvm/include/llvm/CodeGen/MachineScheduler.h).
   bit EnableIntervals = false;
+
+  // List of MacroFusion.
+  list<MacroFusion> MacroFusions = [];
 }
 
 def NoSchedModel : SchedMachineModel {
@@ -469,6 +473,67 @@ class SchedAlias<SchedReadWrite match, SchedReadWrite alias> {
   SchedMachineModel SchedModel = ?;
 }
 
+// Base class of MacroFusionPredicate, etc. The avaliable variables are:
+// * const TargetInstrInfo &TII
+// * const TargetSubtargetInfo &STI
+// * const MachineRegisterInfo &MRI
+// * const MachineInstr *FirstMI
+// * const MachineInstr &SecondMI
+class MacroFusionPredicateBase;
+
+// MacroFusionPredicate with raw code predicate.
+class MacroFusionPredicate<code pred> : MacroFusionPredicateBase {
+  code Predicate = pred;
+}
+
+// Tie firstOpIdx and secondOpIdx. The operand of `FirstMI` at position
+// `firstOpIdx` should be the same as the operand of `SenondMI` at position
+// `secondOpIdx`.
+class TieReg<int firstOpIdx, int secondOpIdx> : MacroFusionPredicateBase {
+  int FirstOpIdx = firstOpIdx;
+  int SecondOpIdx = secondOpIdx;
+}
+
+// A predicate for wildcard. The generated code will be like:
+// ```
+// if (!FirstMI)
+//   return ReturnValue;
+// ```
+class WildcardPred<bit ret> : MacroFusionPredicateBase {
+  bit ReturnValue = ret;
+}
+def WildcardFalse : WildcardPred<0>;
+def WildcardTrue : WildcardPred<1>;
+
+// Indicates that the destination register of `FirstMI` should be have one
+// use if it is an virtual register.
+class OneUsePred : MacroFusionPredicateBase;
+def OneUse : OneUsePred;
+
+// Handled by MacroFusionPredicatorEmitter backend.
+// The generated predicator will be like:
+// ```
+// bool isNAME(const TargetInstrInfo &TII,
+//             const TargetSubtargetInfo &STI,
+//             const MachineInstr *FirstMI,
+//             const MachineInstr &SecondMI) {
+//   auto &MRI = SecondMI.getMF()->getRegInfo();
+//   /* Prolog */
+//   /* Predicate for `FirstMI` */
+//   /* Predicate for `SecondMI` */
+//   /* Epilog */
+//   return true;
+// }
+// ```
+class MacroFusion<MCInstPredicate first, MCInstPredicate second,
+                  list<MacroFusionPredicateBase> prolog = [],
+                  list<MacroFusionPredicateBase> epilog = []> {
+  MCInstPredicate First = first;
+  MCInstPredicate Second = second;
+  list<MacroFusionPredicateBase> Prolog = prolog;
+  list<MacroFusionPredicateBase> Epilog = epilog;
+}
+
 // Allow the definition of processor register files for register renaming
 // purposes.
 //
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 4add33ba0996af0..d0cc006ff2598cc 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -3791,6 +3791,11 @@ ScheduleDAGMILive *llvm::createGenericSchedLive(MachineSchedContext *C) {
   // data and pass it to later mutations. Have a single mutation that gathers
   // the interesting nodes in one pass.
   DAG->addMutation(createCopyConstrainDAGMutation(DAG->TII, DAG->TRI));
+
+  const TargetSubtargetInfo &STI = C->MF->getSubtarget();
+  // Add MacroFusionDAGMutation if enabled.
+  if (STI.enableMacroFusion())
+    DAG->addMutation(createMacroFusionDAGMutation(STI.getMacroFusions()));
   return DAG;
 }
 
@@ -3940,8 +3945,14 @@ void PostGenericScheduler::schedNode(SUnit *SU, bool IsTopNode) {
 }
 
 ScheduleDAGMI *llvm::createGenericSchedPostRA(MachineSchedContext *C) {
-  return new ScheduleDAGMI(C, std::make_unique<PostGenericScheduler>(C),
-                           /*RemoveKillFlags=*/true);
+  ScheduleDAGMI *DAG =
+      new ScheduleDAGMI(C, std::make_unique<PostGenericScheduler>(C),
+                        /*RemoveKillFlags=*/true);
+  const TargetSubtargetInfo &STI = C->MF->getSubtarget();
+  // Add MacroFusionDAGMutation if enabled.
+  if (STI.enableMacroFusion())
+    DAG->addMutation(createMacroFusionDAGMutation(STI.getMacroFusions()));
+  return DAG;
 }
 
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/CodeGen/MacroFusion.cpp b/llvm/lib/CodeGen/MacroFusion.cpp
index fa5df68b8abcc0f..1ce2f49763b076f 100644
--- a/llvm/lib/CodeGen/MacroFusion.cpp
+++ b/llvm/lib/CodeGen/MacroFusion.cpp
@@ -137,19 +137,35 @@ namespace {
 /// Post-process the DAG to create cluster edges between instrs that may
 /// be fused by the processor into a single operation.
 class MacroFusion : public ScheduleDAGMutation {
-  ShouldSchedulePredTy shouldScheduleAdjacent;
+  std::vector<MacroFusionPredTy> Predicates;
   bool FuseBlock;
   bool scheduleAdjacentImpl(ScheduleDAGInstrs &DAG, SUnit &AnchorSU);
 
 public:
-  MacroFusion(ShouldSchedulePredTy shouldScheduleAdjacent, bool FuseBlock)
-    : shouldScheduleAdjacent(shouldScheduleAdjacent), FuseBlock(FuseBlock) {}
+  MacroFusion(std::vector<MacroFusionPredTy> Predicates, bool FuseBlock)
+      : Predicates(std::move(Predicates)), FuseBlock(FuseBlock) {}
 
   void apply(ScheduleDAGInstrs *DAGInstrs) override;
+
+  bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
+                              const TargetSubtargetInfo &STI,
+                              const MachineInstr *FirstMI,
+                              const MachineInstr &SecondMI);
 };
 
 } // end anonymous namespace
 
+bool MacroFusion::shouldScheduleAdjacent(const TargetInstrInfo &TII,
+                                         const TargetSubtargetInfo &STI,
+                                         const MachineInstr *FirstMI,
+                                         const MachineInstr &SecondMI) {
+  for (MacroFusionPredTy Predicate : Predicates) {
+    if (Predicate(TII, STI, FirstMI, SecondMI))
+      return true;
+  }
+  return false;
+}
+
 void MacroFusion::apply(ScheduleDAGInstrs *DAG) {
   if (FuseBlock)
     // For each of the SUnits in the scheduling block, try to fuse the instr in
@@ -197,17 +213,16 @@ bool MacroFusion::scheduleAdjacentImpl(ScheduleDAGInstrs &DAG, SUnit &AnchorSU)
 }
 
 std::unique_ptr<ScheduleDAGMutation>
-llvm::createMacroFusionDAGMutation(
-     ShouldSchedulePredTy shouldScheduleAdjacent) {
-  if(EnableMacroFusion)
-    return std::make_unique<MacroFusion>(shouldScheduleAdjacent, true);
+llvm::createMacroFusionDAGMutation(std::vector<MacroFusionPredTy> Predicates) {
+  if (EnableMacroFusion) {
+    return std::make_unique<MacroFusion>(Predicates, true);
+  }
   return nullptr;
 }
 
-std::unique_ptr<ScheduleDAGMutation>
-llvm::createBranchMacroFusionDAGMutation(
-     ShouldSchedulePredTy shouldScheduleAdjacent) {
-  if(EnableMacroFusion)
-    return std::make_unique<MacroFusion>(shouldScheduleAdjacent, false);
+std::unique_ptr<ScheduleDAGMutation> llvm::createBranchMacroFusionDAGMutation(
+    std::vector<MacroFusionPredTy> Predicates) {
+  if (EnableMacroFusion)
+    return std::make_unique<MacroFusion>(Predicates, false);
   return nullptr;
 }
diff --git a/llvm/lib/MC/MCSchedule.cpp b/llvm/lib/MC/MCSchedule.cpp
index 990a693559a7776..19c36cb0e58d9c1 100644
--- a/llvm/lib/MC/MCSchedule.cpp
+++ b/llvm/lib/MC/MCSchedule.cpp
@@ -37,6 +37,7 @@ const MCSchedModel MCSchedModel::Default = {DefaultIssueWidth,
                                             0,
                                             0,
                                             nullptr,
+                                            nullptr,
                                             nullptr};
 
 int MCSchedModel::computeInstrLatency(const MCSubtargetInfo &STI,
diff --git a/llvm/lib/Target/AArch64/AArch64MacroFusion.cpp b/llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
index 05d60872bf51aca..8f46f3eabb3ef45 100644
--- a/llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
+++ b/llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
@@ -478,5 +478,5 @@ static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
 
 std::unique_ptr<ScheduleDAGMutation>
 llvm::createAArch64MacroFusionDAGMutation() {
-  return createMacroFusionDAGMutation(shouldScheduleAdjacent);
+  return createMacroFusionDAGMutation({shouldScheduleAdjacent});
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMacroFusion.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMacroFusion.cpp
index c15c94ee17f8b1d..b2b11d661523e9c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMacroFusion.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMacroFusion.cpp
@@ -59,8 +59,8 @@ static bool shouldScheduleAdjacent(const TargetInstrInfo &TII_,
 
 namespace llvm {
 
-std::unique_ptr<ScheduleDAGMutation> createAMDGPUMacroFusionDAGMutation () {
-  return createMacroFusionDAGMutation(shouldScheduleAdjacent);
+std::unique_ptr<ScheduleDAGMutation> createAMDGPUMacroFusionDAGMutation() {
+  return createMacroFusionDAGMutation({shouldScheduleAdjacent});
 }
 
 } // end namespace llvm
diff --git a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
index 29c9b9ccf27614f..0bddeeef9e9b1a3 100644
--- a/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNVOPDUtils.cpp
@@ -142,10 +142,10 @@ namespace {
 /// be turned into VOPD instructions
 /// Greedily pairs instruction candidates. O(n^2) algorithm.
 struct VOPDPairingMutation : ScheduleDAGMutation {
-  ShouldSchedulePredTy shouldScheduleAdjacent; // NOLINT: function pointer
+  MacroFusionPredTy shouldScheduleAdjacent; // NOLINT: function pointer
 
   VOPDPairingMutation(
-      ShouldSchedulePredTy shouldScheduleAdjacent) // NOLINT: function pointer
+      MacroFusionPredTy shouldScheduleAdjacent) // NOLINT: function pointer
       : shouldScheduleAdjacent(shouldScheduleAdjacent) {}
 
   void apply(ScheduleDAGInstrs *DAG) override {
diff --git a/llvm/lib/Target/ARM/ARMMacroFusion.cpp b/llvm/lib/Target/ARM/ARMMacroFusion.cpp
index 38bf28ba8219b90..7de117925e464fe 100644
--- a/llvm/lib/Target/ARM/ARMMacroFusion.cpp
+++ b/llvm/lib/Target/ARM/ARMMacroFusion.cpp
@@ -62,8 +62,8 @@ static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
   return false;
 }
 
-std::unique_ptr<ScheduleDAGMutation> createARMMacroFusionDAGMutation () {
-  return createMacroFusionDAGMutation(shouldScheduleAdjacent);
+std::unique_ptr<ScheduleDAGMutation> createARMMacroFusionDAGMutation() {
+  return createMacroFusionDAGMutation({shouldScheduleAdjacent});
 }
 
 } // end namespace llvm
diff --git a/llvm/lib/Target/PowerPC/PPCMacroFusion.cpp b/llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
index bf1c39a3a3a2d47..d6a4a5dd5faabae 100644
--- a/llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
+++ b/llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
@@ -286,8 +286,8 @@ static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
 
 namespace llvm {
 
-std::unique_ptr<ScheduleDAGMutation> createPowerPCMacroFusionDAGMutation () {
-  return createMacroFusionDAGMutation(shouldScheduleAdjacent);
+std::unique_ptr<ScheduleDAGMutation> createPowerPCMacroFusionDAGMutation() {
+  return createMacroFusionDAGMutation({shouldScheduleAdjacent});
 }
 
 } // end namespace llvm
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index a0c3345ec1bbd7e..ac88cd49db4e4ba 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -5,6 +5,7 @@ set(LLVM_TARGET_DEFINITIONS RISCV.td)
 tablegen(LLVM RISCVGenAsmMatcher.inc -gen-asm-matcher)
 tablegen(LLVM RISCVGenAsmWriter.inc -gen-asm-writer)
 tablegen(LLVM RISCVGenCompressInstEmitter.inc -gen-compress-inst-emitter)
+tablegen(LLVM RISCVGenMacroFusion.inc -gen-macro-fusion-pred)
 tablegen(LLVM RISCVGenDAGISel.inc -gen-dag-isel)
 tablegen(LLVM RISCVGenDisassemblerTables.inc -gen-disassembler)
 tablegen(LLVM RISCVGenInstrInfo.inc -gen-instr-info)
@@ -43,7 +44,6 @@ add_llvm_target(RISCVCodeGen
   RISCVISelDAGToDAG.cpp
   RISCVISelLowering.cpp
   RISCVMachineFunctionInfo.cpp
-  RISCVMacroFusion.cpp
   RISCVMergeBaseOffset.cpp
   RISCVOptWInstrs.cpp
   RISCVPostRAExpandPseudoInsts.cpp
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.h b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.h
index 3cfddb530cdf630..f3f27efbee95f7d 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.h
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.h
@@ -48,6 +48,9 @@ std::unique_ptr<MCObjectTargetWriter> createRISCVELFObjectWriter(uint8_t OSABI,
 #define GET_INSTRINFO_MC_HELPER_DECLS
 #include "RISCVGenInstrInfo.inc"
 
+#define GET_MACRO_FUSION_ENUM
+#include "RISCVGenMacroFusion.inc"
+
 #define GET_SUBTARGETINFO_ENUM
 #include "RISCVGenSubtargetInfo.inc"
 
diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
index 4358a5b878e6316..dbc286288d2c897 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVMatInt.cpp
@@ -236,7 +236,7 @@ InstSeq generateInstSeq(int64_t Val, const MCSubtargetInfo &STI) {
     // NOTE: We don't check for C extension to minimize differences in generated
     // code.
     bool IsShiftedCompressible =
-        isInt<6>(ShiftedVal) && !STI.hasFeature(RISCV::TuneLUIADDIFusion);
+        isInt<6>(ShiftedVal) && !STI.hasMacroFusion(RISCV::LUIADDI);
     RISCVMatInt::InstSeq TmpSeq;
     generateInstSeqImpl(ShiftedVal, STI, TmpSeq);
 
diff --git a/llvm/lib/Target/RISCV/RISCV.td b/llvm/lib/Target/RISCV/RISCV.td
index be93d5933d3329e..8c7f5b4bb91e891 100644
--- a/llvm/lib/Target/RISCV/RISCV.td
+++ b/llvm/lib/Target/RISCV/RISCV.td
@@ -34,6 +34,7 @@ include "GISel/RISCVRegisterBanks.td"
 // RISC-V Scheduling Models
 //===---------------------------------...
[truncated]

@dtcxzyw
Copy link
Member

dtcxzyw commented Nov 15, 2023

cc: @poemonsense

llvm/lib/Target/RISCV/RISCVMacroFusion.td Outdated Show resolved Hide resolved
llvm/lib/Target/RISCV/RISCVMacroFusion.td Outdated Show resolved Hide resolved
llvm/lib/Target/RISCV/RISCVMacroFusion.td Outdated Show resolved Hide resolved
@wangpc-pp wangpc-pp force-pushed the main-tablegen-macro-fusion-xiangshan branch from 5876e1c to addafc8 Compare November 20, 2023 03:47
Copy link

github-actions bot commented Nov 20, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

[CheckImmOperand<2, 16>],
[CheckImmOperand<2, 16>]>;

// sign-extend a 16-bit number: slliw r1, r0, 16 + sraiw r1, r1, 16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does LLVM generate this pattern? I think think we always use slli 48 + srai 48. Is that something that needs to be changed in isel?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible if the corresponding C program is using int32? I'm not sure. Just to raise a question

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I don't think it's possible and if I knew of a case I'd probably change it to slli+srai since we have c.slli and c.srai but not c.slliw/c.sraiw.

[CheckImmOperand<2, 48>],
[CheckImmOperand<2, 48>]>;

// clear upper 48 bits / get lower 16 bits: slliw r1, r0, 16 + srliw r1, r1, 16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does LLVM generate this pattern? I think think we always use slli 48 + srli 48.


// addw and extract its lower 8 bits (fused into addwbyte)
// addw and extract its lower 1 bit (fused into addwbit)
def AddAndExtractNBits : RISCVMacroFusion<[ADDW], [ANDI], [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would emit ADD+ANDI instead of ADDW since ADD is more compressible than ADDW. Is that something we need to change to enable this macro fusion?


// addw and zext.h (fused into addwzexth)
// addw and sext.h (fused into addwsexth)
def AddwAndExt : RISCVMacroFusion<[ADDW], [ZEXT_H_RV32, ZEXT_H_RV64, SEXT_H]>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, we would emit ADD not ADDW.

@wangpc-pp
Copy link
Contributor Author

cc @dtcxzyw @poemonsense
Can you address @topperc's comments? It seems that some fusions don't match instruction patterns LLVM generated. Is it because you are using GCC when profiling?
It seems that some ISel patterns need to be changed to support these fusions.

@poemonsense
Copy link

poemonsense commented Nov 22, 2023

cc @dtcxzyw @poemonsense Can you address @topperc's comments? It seems that some fusions don't match instruction patterns LLVM generated. Is it because you are using GCC when profiling? It seems that some ISel patterns need to be changed to support these fusions.

Yes, I can confirm that we were using GCC (version 9.2 or 10.2) in 2021 to decide these fusions (and thus Nanhu implements them). If they will not be emitted by LLVM, I think it makes sense to remove them in LLVM?

I was planning to try LLVM to test the fusions and instruction latency hints this week. But I still did not get enough time to do this. Sorry for that.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs update to tip of tree

@wangpc-pp wangpc-pp force-pushed the main-tablegen-macro-fusion-xiangshan branch from 990d735 to 35de0b9 Compare January 30, 2024 12:36
@wangpc-pp wangpc-pp force-pushed the main-tablegen-macro-fusion-xiangshan branch from 35de0b9 to d149cef Compare March 19, 2024 11:58
@wangpc-pp wangpc-pp requested a review from dtcxzyw March 19, 2024 12:00
@wangpc-pp
Copy link
Contributor Author

I added some MIR tests, but I think I need more data about performance. cc @dtcxzyw @poemonsense
And, we may not need to support some fusions in LLVM since the patterns don't match what LLVM generates.

Doc: https://xiangshan-doc.readthedocs.io/zh-cn/latest/frontend/decode/

This PR is to show the usage of TableGen-based macro fusions.

Some instrcution pairs can be folded into one MacroFusion definition
but I leave them standalone to show the different ways to define a
macro fusion.
@poemonsense
Copy link

I added some MIR tests, but I think I need more data about performance. cc @dtcxzyw @poemonsense And, we may not need to support some fusions in LLVM since the patterns don't match what LLVM generates.

I'm glad to provide help for testing on RTL-simulation and real chips. We've already got the SoC board back.

@wangpc-pp wangpc-pp force-pushed the main-tablegen-macro-fusion-xiangshan branch from d149cef to e59fd14 Compare March 19, 2024 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants