Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGen][MISched] Add misched post-regalloc bidirectional scheduling #77138

Merged

Conversation

michaelmaitland
Copy link
Contributor

This PR is stacked on #76186.

This PR keeps the default strategy as top-down since that is what
existing targets expect. It can be enabled using
-misched-postra-direction=bidirectional.

It is up to targets to decide whether they would like to enable this
option for themselves.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 5, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Michael Maitland (michaelmaitland)

Changes

This PR is stacked on #76186.

This PR keeps the default strategy as top-down since that is what
existing targets expect. It can be enabled using
-misched-postra-direction=bidirectional.

It is up to targets to decide whether they would like to enable this
option for themselves.


Patch is 29.23 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77138.diff

7 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachineScheduler.h (+19-9)
  • (modified) llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h (+12)
  • (modified) llvm/lib/CodeGen/MachineScheduler.cpp (+195-25)
  • (modified) llvm/lib/Target/RISCV/RISCVSchedSiFive7.td (+1)
  • (modified) llvm/test/CodeGen/RISCV/machine-combiner.ll (+4-4)
  • (added) llvm/test/CodeGen/RISCV/misched-postra-direction.mir (+75)
  • (modified) llvm/test/CodeGen/RISCV/short-forward-branch-opt.ll (+32-32)
diff --git a/llvm/include/llvm/CodeGen/MachineScheduler.h b/llvm/include/llvm/CodeGen/MachineScheduler.h
index 9f16cf5d5bc387..af18e983030188 100644
--- a/llvm/include/llvm/CodeGen/MachineScheduler.h
+++ b/llvm/include/llvm/CodeGen/MachineScheduler.h
@@ -1293,19 +1293,24 @@ class PostGenericScheduler : public GenericSchedulerBase {
 protected:
   ScheduleDAGMI *DAG = nullptr;
   SchedBoundary Top;
-  SmallVector<SUnit*, 8> BotRoots;
+  SchedBoundary Bot;
+  MachineSchedPolicy RegionPolicy;
+
+  /// Candidate last picked from Top boundary.
+  SchedCandidate TopCand;
+  /// Candidate last picked from Bot boundary.
+  SchedCandidate BotCand;
 
 public:
-  PostGenericScheduler(const MachineSchedContext *C):
-    GenericSchedulerBase(C), Top(SchedBoundary::TopQID, "TopQ") {}
+  PostGenericScheduler(const MachineSchedContext *C)
+      : GenericSchedulerBase(C), Top(SchedBoundary::TopQID, "TopQ"),
+        Bot(SchedBoundary::BotQID, "BotQ") {}
 
   ~PostGenericScheduler() override = default;
 
   void initPolicy(MachineBasicBlock::iterator Begin,
                   MachineBasicBlock::iterator End,
-                  unsigned NumRegionInstrs) override {
-    /* no configurable policy */
-  }
+                  unsigned NumRegionInstrs) override;
 
   /// PostRA scheduling does not track pressure.
   bool shouldTrackPressure() const override { return false; }
@@ -1316,6 +1321,8 @@ class PostGenericScheduler : public GenericSchedulerBase {
 
   SUnit *pickNode(bool &IsTopNode) override;
 
+  SUnit *pickNodeBidirectional(bool &IsTopNode);
+
   void scheduleTree(unsigned SubtreeID) override {
     llvm_unreachable("PostRA scheduler does not support subtree analysis.");
   }
@@ -1326,17 +1333,20 @@ class PostGenericScheduler : public GenericSchedulerBase {
     if (SU->isScheduled)
       return;
     Top.releaseNode(SU, SU->TopReadyCycle, false);
+    TopCand.SU = nullptr;
   }
 
-  // Only called for roots.
   void releaseBottomNode(SUnit *SU) override {
-    BotRoots.push_back(SU);
+    if (SU->isScheduled)
+      return;
+    Bot.releaseNode(SU, SU->BotReadyCycle, false);
+    BotCand.SU = nullptr;
   }
 
 protected:
   virtual bool tryCandidate(SchedCandidate &Cand, SchedCandidate &TryCand);
 
-  void pickNodeFromQueue(SchedCandidate &Cand);
+  void pickNodeFromQueue(SchedBoundary &Zone, SchedCandidate &Cand);
 };
 
 /// Create the standard converging machine scheduler. This will be used as the
diff --git a/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h b/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
index ef7662a8e7a26a..85de18f5169e5e 100644
--- a/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
+++ b/llvm/include/llvm/CodeGen/ScheduleDAGInstrs.h
@@ -191,7 +191,19 @@ namespace llvm {
     /// applicable).
     using SUList = std::list<SUnit *>;
 
+    /// The direction that should be used to dump the scheduled Sequence.
+    enum DumpDirection {
+      TopDown,
+      BottomUp,
+      Bidirectional,
+      NotSet,
+    };
+
+    void setDumpDirection(DumpDirection D) { DumpDir = D; }
+
   protected:
+    DumpDirection DumpDir = NotSet;
+
     /// A map from ValueType to SUList, used during DAG construction, as
     /// a means of remembering which SUs depend on which memory locations.
     class Value2SUsMap;
diff --git a/llvm/lib/CodeGen/MachineScheduler.cpp b/llvm/lib/CodeGen/MachineScheduler.cpp
index 886137d86f87de..b9e6f5c042b551 100644
--- a/llvm/lib/CodeGen/MachineScheduler.cpp
+++ b/llvm/lib/CodeGen/MachineScheduler.cpp
@@ -81,6 +81,26 @@ cl::opt<bool> ForceTopDown("misched-topdown", cl::Hidden,
                            cl::desc("Force top-down list scheduling"));
 cl::opt<bool> ForceBottomUp("misched-bottomup", cl::Hidden,
                             cl::desc("Force bottom-up list scheduling"));
+namespace MISchedPostRASched {
+enum Direction {
+  TopDown,
+  BottomUp,
+  Bidirectional,
+};
+} // end namespace MISchedPostRASched
+cl::opt<MISchedPostRASched::Direction> PostRADirection(
+    "misched-postra-direction", cl::Hidden,
+    cl::desc("PostRA List scheduling direction"),
+    // Default to top-down because it was implemented first and existing targets
+    // expect that behavior by default.
+    cl::init(MISchedPostRASched::TopDown),
+    cl::values(
+        clEnumValN(MISchedPostRASched::TopDown, "topdown",
+                   "Force top-down post reg-alloc list scheduling"),
+        clEnumValN(MISchedPostRASched::BottomUp, "bottomup",
+                   "Force bottom-up post reg-alloc list scheduling"),
+        clEnumValN(MISchedPostRASched::Bidirectional, "bidirectional",
+                   "Force bidirectional post reg-alloc list scheduling")));
 cl::opt<bool>
 DumpCriticalPathLength("misched-dcpl", cl::Hidden,
                        cl::desc("Print critical path length to stdout"));
@@ -440,6 +460,14 @@ bool MachineScheduler::runOnMachineFunction(MachineFunction &mf) {
   // Instantiate the selected scheduler for this target, function, and
   // optimization level.
   std::unique_ptr<ScheduleDAGInstrs> Scheduler(createMachineScheduler());
+  ScheduleDAGMI::DumpDirection Dir;
+  if (ForceTopDown)
+    Dir = ScheduleDAGMI::DumpDirection::TopDown;
+  else if (ForceBottomUp)
+    Dir = ScheduleDAGMI::DumpDirection::BottomUp;
+  else
+    Dir = ScheduleDAGMI::DumpDirection::Bidirectional;
+  Scheduler->setDumpDirection(Dir);
   scheduleRegions(*Scheduler, false);
 
   LLVM_DEBUG(LIS->dump());
@@ -473,6 +501,15 @@ bool PostMachineScheduler::runOnMachineFunction(MachineFunction &mf) {
   // Instantiate the selected scheduler for this target, function, and
   // optimization level.
   std::unique_ptr<ScheduleDAGInstrs> Scheduler(createPostMachineScheduler());
+
+  ScheduleDAGInstrs::DumpDirection D;
+  if (PostRADirection == MISchedPostRASched::TopDown)
+    D = ScheduleDAGMI::DumpDirection::TopDown;
+  else if (PostRADirection == MISchedPostRASched::BottomUp)
+    D = ScheduleDAGMI::DumpDirection::BottomUp;
+  else
+    D = ScheduleDAGMI::DumpDirection::Bidirectional;
+  Scheduler->setDumpDirection(D);
   scheduleRegions(*Scheduler, true);
 
   if (VerifyScheduling)
@@ -1125,12 +1162,14 @@ LLVM_DUMP_METHOD void ScheduleDAGMI::dumpScheduleTraceBottomUp() const {
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
 LLVM_DUMP_METHOD void ScheduleDAGMI::dumpSchedule() const {
   if (MISchedDumpScheduleTrace) {
-    if (ForceTopDown)
+    if (DumpDir == DumpDirection::TopDown)
       dumpScheduleTraceTopDown();
-    else if (ForceBottomUp)
+    else if (DumpDir == DumpDirection::BottomUp)
       dumpScheduleTraceBottomUp();
-    else {
+    else if (DumpDir == DumpDirection::Bidirectional) {
       dbgs() << "* Schedule table (Bidirectional): not implemented\n";
+    } else {
+      dbgs() << "* Schedule table: DumpDirection not set.\n";
     }
   }
 
@@ -3817,7 +3856,7 @@ void PostGenericScheduler::initialize(ScheduleDAGMI *Dag) {
 
   Rem.init(DAG, SchedModel);
   Top.init(DAG, SchedModel, &Rem);
-  BotRoots.clear();
+  Bot.init(DAG, SchedModel, &Rem);
 
   // Initialize the HazardRecognizers. If itineraries don't exist, are empty,
   // or are disabled, then these HazardRecs will be disabled.
@@ -3827,13 +3866,33 @@ void PostGenericScheduler::initialize(ScheduleDAGMI *Dag) {
         DAG->MF.getSubtarget().getInstrInfo()->CreateTargetMIHazardRecognizer(
             Itin, DAG);
   }
+  if (!Bot.HazardRec) {
+    Bot.HazardRec =
+        DAG->MF.getSubtarget().getInstrInfo()->CreateTargetMIHazardRecognizer(
+            Itin, DAG);
+  }
+}
+
+void PostGenericScheduler::initPolicy(MachineBasicBlock::iterator Begin,
+                                      MachineBasicBlock::iterator End,
+                                      unsigned NumRegionInstrs) {
+  if (PostRADirection == MISchedPostRASched::TopDown) {
+    RegionPolicy.OnlyTopDown = true;
+    RegionPolicy.OnlyBottomUp = false;
+  } else if (PostRADirection == MISchedPostRASched::BottomUp) {
+    RegionPolicy.OnlyTopDown = false;
+    RegionPolicy.OnlyBottomUp = true;
+  } else if (PostRADirection == MISchedPostRASched::Bidirectional) {
+    RegionPolicy.OnlyBottomUp = false;
+    RegionPolicy.OnlyTopDown = false;
+  }
 }
 
 void PostGenericScheduler::registerRoots() {
   Rem.CriticalPath = DAG->ExitSU.getDepth();
 
   // Some roots may not feed into ExitSU. Check all of them in case.
-  for (const SUnit *SU : BotRoots) {
+  for (const SUnit *SU : Bot.Available) {
     if (SU->getDepth() > Rem.CriticalPath)
       Rem.CriticalPath = SU->getDepth();
   }
@@ -3890,12 +3949,13 @@ bool PostGenericScheduler::tryCandidate(SchedCandidate &Cand,
   return false;
 }
 
-void PostGenericScheduler::pickNodeFromQueue(SchedCandidate &Cand) {
-  ReadyQueue &Q = Top.Available;
+void PostGenericScheduler::pickNodeFromQueue(SchedBoundary &Zone,
+                                             SchedCandidate &Cand) {
+  ReadyQueue &Q = Zone.Available;
   for (SUnit *SU : Q) {
     SchedCandidate TryCand(Cand.Policy);
     TryCand.SU = SU;
-    TryCand.AtTop = true;
+    TryCand.AtTop = Zone.isTop();
     TryCand.initResourceDelta(DAG, SchedModel);
     if (tryCandidate(Cand, TryCand)) {
       Cand.setBest(TryCand);
@@ -3904,32 +3964,137 @@ void PostGenericScheduler::pickNodeFromQueue(SchedCandidate &Cand) {
   }
 }
 
+/// Pick the best candidate node from either the top or bottom queue.
+SUnit *PostGenericScheduler::pickNodeBidirectional(bool &IsTopNode) {
+  // FIXME: This is similiar to GenericScheduler::pickNodeBidirectional. Factor
+  // out common parts.
+
+  // Schedule as far as possible in the direction of no choice. This is most
+  // efficient, but also provides the best heuristics for CriticalPSets.
+  if (SUnit *SU = Bot.pickOnlyChoice()) {
+    IsTopNode = false;
+    tracePick(Only1, false);
+    return SU;
+  }
+  if (SUnit *SU = Top.pickOnlyChoice()) {
+    IsTopNode = true;
+    tracePick(Only1, true);
+    return SU;
+  }
+  // Set the bottom-up policy based on the state of the current bottom zone and
+  // the instructions outside the zone, including the top zone.
+  CandPolicy BotPolicy;
+  setPolicy(BotPolicy, /*IsPostRA=*/false, Bot, &Top);
+  // Set the top-down policy based on the state of the current top zone and
+  // the instructions outside the zone, including the bottom zone.
+  CandPolicy TopPolicy;
+  setPolicy(TopPolicy, /*IsPostRA=*/false, Top, &Bot);
+
+  // See if BotCand is still valid (because we previously scheduled from Top).
+  LLVM_DEBUG(dbgs() << "Picking from Bot:\n");
+  if (!BotCand.isValid() || BotCand.SU->isScheduled ||
+      BotCand.Policy != BotPolicy) {
+    BotCand.reset(CandPolicy());
+    pickNodeFromQueue(Bot, BotCand);
+    assert(BotCand.Reason != NoCand && "failed to find the first candidate");
+  } else {
+    LLVM_DEBUG(traceCandidate(BotCand));
+#ifndef NDEBUG
+    if (VerifyScheduling) {
+      SchedCandidate TCand;
+      TCand.reset(CandPolicy());
+      pickNodeFromQueue(Bot, BotCand);
+      assert(TCand.SU == BotCand.SU &&
+             "Last pick result should correspond to re-picking right now");
+    }
+#endif
+  }
+
+  // Check if the top Q has a better candidate.
+  LLVM_DEBUG(dbgs() << "Picking from Top:\n");
+  if (!TopCand.isValid() || TopCand.SU->isScheduled ||
+      TopCand.Policy != TopPolicy) {
+    TopCand.reset(CandPolicy());
+    pickNodeFromQueue(Top, TopCand);
+    assert(TopCand.Reason != NoCand && "failed to find the first candidate");
+  } else {
+    LLVM_DEBUG(traceCandidate(TopCand));
+#ifndef NDEBUG
+    if (VerifyScheduling) {
+      SchedCandidate TCand;
+      TCand.reset(CandPolicy());
+      pickNodeFromQueue(Top, TopCand);
+      assert(TCand.SU == TopCand.SU &&
+           "Last pick result should correspond to re-picking right now");
+    }
+#endif
+  }
+
+  // Pick best from BotCand and TopCand.
+  assert(BotCand.isValid());
+  assert(TopCand.isValid());
+  SchedCandidate Cand = BotCand;
+  TopCand.Reason = NoCand;
+  if (tryCandidate(Cand, TopCand)) {
+    Cand.setBest(TopCand);
+    LLVM_DEBUG(traceCandidate(Cand));
+  }
+
+  IsTopNode = Cand.AtTop;
+  tracePick(Cand);
+  return Cand.SU;
+}
+
 /// Pick the next node to schedule.
 SUnit *PostGenericScheduler::pickNode(bool &IsTopNode) {
   if (DAG->top() == DAG->bottom()) {
-    assert(Top.Available.empty() && Top.Pending.empty() && "ReadyQ garbage");
+    assert(Top.Available.empty() && Top.Pending.empty() &&
+           Bot.Available.empty() && Bot.Pending.empty() && "ReadyQ garbage");
     return nullptr;
   }
   SUnit *SU;
   do {
-    SU = Top.pickOnlyChoice();
-    if (SU) {
-      tracePick(Only1, true);
+    if (RegionPolicy.OnlyBottomUp) {
+      SU = Bot.pickOnlyChoice();
+      if (SU) {
+        tracePick(Only1, true);
+      } else {
+        CandPolicy NoPolicy;
+        BotCand.reset(NoPolicy);
+        // Set the top-down policy based on the state of the current top zone
+        // and the instructions outside the zone, including the bottom zone.
+        setPolicy(BotCand.Policy, /*IsPostRA=*/true, Bot, nullptr);
+        pickNodeFromQueue(Bot, BotCand);
+        assert(BotCand.Reason != NoCand && "failed to find a candidate");
+        tracePick(BotCand);
+        SU = BotCand.SU;
+      }
+      IsTopNode = false;
+    } else if (RegionPolicy.OnlyTopDown) {
+      SU = Top.pickOnlyChoice();
+      if (SU) {
+        tracePick(Only1, true);
+      } else {
+        CandPolicy NoPolicy;
+        TopCand.reset(NoPolicy);
+        // Set the top-down policy based on the state of the current top zone
+        // and the instructions outside the zone, including the bottom zone.
+        setPolicy(TopCand.Policy, /*IsPostRA=*/true, Top, nullptr);
+        pickNodeFromQueue(Top, TopCand);
+        assert(TopCand.Reason != NoCand && "failed to find a candidate");
+        tracePick(TopCand);
+        SU = TopCand.SU;
+      }
+      IsTopNode = true;
     } else {
-      CandPolicy NoPolicy;
-      SchedCandidate TopCand(NoPolicy);
-      // Set the top-down policy based on the state of the current top zone and
-      // the instructions outside the zone, including the bottom zone.
-      setPolicy(TopCand.Policy, /*IsPostRA=*/true, Top, nullptr);
-      pickNodeFromQueue(TopCand);
-      assert(TopCand.Reason != NoCand && "failed to find a candidate");
-      tracePick(TopCand);
-      SU = TopCand.SU;
+      SU = pickNodeBidirectional(IsTopNode);
     }
   } while (SU->isScheduled);
 
-  IsTopNode = true;
-  Top.removeReady(SU);
+  if (SU->isTopReady())
+    Top.removeReady(SU);
+  if (SU->isBottomReady())
+    Bot.removeReady(SU);
 
   LLVM_DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") "
                     << *SU->getInstr());
@@ -3939,8 +4104,13 @@ SUnit *PostGenericScheduler::pickNode(bool &IsTopNode) {
 /// Called after ScheduleDAGMI has scheduled an instruction and updated
 /// scheduled/remaining flags in the DAG nodes.
 void PostGenericScheduler::schedNode(SUnit *SU, bool IsTopNode) {
-  SU->TopReadyCycle = std::max(SU->TopReadyCycle, Top.getCurrCycle());
-  Top.bumpNode(SU);
+  if (IsTopNode) {
+    SU->TopReadyCycle = std::max(SU->TopReadyCycle, Top.getCurrCycle());
+    Top.bumpNode(SU);
+  } else {
+    SU->BotReadyCycle = std::max(SU->BotReadyCycle, Bot.getCurrCycle());
+    Bot.bumpNode(SU);
+  }
 }
 
 ScheduleDAGMI *llvm::createGenericSchedPostRA(MachineSchedContext *C) {
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
index f531ab2fac8f9f..31745341fe1da1 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
@@ -198,6 +198,7 @@ def SiFive7Model : SchedMachineModel {
   let LoadLatency = 3;
   let MispredictPenalty = 3;
   let CompleteModel = 0;
+  let PostRAScheduler = true;
   let EnableIntervals = true;
   let UnsupportedFeatures = [HasStdExtZbkb, HasStdExtZbkc, HasStdExtZbkx,
                              HasStdExtZcmt, HasStdExtZknd, HasStdExtZkne,
diff --git a/llvm/test/CodeGen/RISCV/machine-combiner.ll b/llvm/test/CodeGen/RISCV/machine-combiner.ll
index 7c1792e2f101f5..cfdefec04600c8 100644
--- a/llvm/test/CodeGen/RISCV/machine-combiner.ll
+++ b/llvm/test/CodeGen/RISCV/machine-combiner.ll
@@ -1096,10 +1096,10 @@ declare double @llvm.maxnum.f64(double, double)
 define double @test_fmadd_strategy(double %a0, double %a1, double %a2, double %a3, i64 %flag) {
 ; CHECK_LOCAL-LABEL: test_fmadd_strategy:
 ; CHECK_LOCAL:       # %bb.0: # %entry
-; CHECK_LOCAL-NEXT:    fmv.d fa5, fa0
 ; CHECK_LOCAL-NEXT:    fsub.d fa4, fa0, fa1
-; CHECK_LOCAL-NEXT:    fmul.d fa0, fa4, fa2
 ; CHECK_LOCAL-NEXT:    andi a0, a0, 1
+; CHECK_LOCAL-NEXT:    fmv.d fa5, fa0
+; CHECK_LOCAL-NEXT:    fmul.d fa0, fa4, fa2
 ; CHECK_LOCAL-NEXT:    beqz a0, .LBB76_2
 ; CHECK_LOCAL-NEXT:  # %bb.1: # %entry
 ; CHECK_LOCAL-NEXT:    fmul.d fa4, fa5, fa1
@@ -1110,10 +1110,10 @@ define double @test_fmadd_strategy(double %a0, double %a1, double %a2, double %a
 ;
 ; CHECK_GLOBAL-LABEL: test_fmadd_strategy:
 ; CHECK_GLOBAL:       # %bb.0: # %entry
-; CHECK_GLOBAL-NEXT:    fmv.d fa5, fa0
 ; CHECK_GLOBAL-NEXT:    fsub.d fa4, fa0, fa1
-; CHECK_GLOBAL-NEXT:    fmul.d fa0, fa4, fa2
 ; CHECK_GLOBAL-NEXT:    andi a0, a0, 1
+; CHECK_GLOBAL-NEXT:    fmv.d fa5, fa0
+; CHECK_GLOBAL-NEXT:    fmul.d fa0, fa4, fa2
 ; CHECK_GLOBAL-NEXT:    beqz a0, .LBB76_2
 ; CHECK_GLOBAL-NEXT:  # %bb.1: # %entry
 ; CHECK_GLOBAL-NEXT:    fmul.d fa5, fa5, fa1
diff --git a/llvm/test/CodeGen/RISCV/misched-postra-direction.mir b/llvm/test/CodeGen/RISCV/misched-postra-direction.mir
new file mode 100644
index 00000000000000..425b8984bf985b
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/misched-postra-direction.mir
@@ -0,0 +1,75 @@
+# RUN: llc -mtriple=riscv64 -mcpu=sifive-x280 -run-pass=postmisched \
+# RUN:   -debug-only=machine-scheduler -misched-dump-schedule-trace \
+# RUN:   -misched-postra-direction=topdown -o - %s 2>&1 \
+# RUN:   | FileCheck --check-prefix=TOPDOWN %s
+# RUN: llc -mtriple=riscv64 -mcpu=sifive-x280 -run-pass=postmisched \
+# RUN:   -debug-only=machine-scheduler -misched-dump-schedule-trace \
+# RUN:   -misched-postra-direction=bottomup -o - %s 2>&1 \
+# RUN:   | FileCheck --check-prefix=BOTTOMUP %s
+# RUN: llc -mtriple=riscv64 -mcpu=sifive-x280 -run-pass=postmisched \
+# RUN:   -debug-only=machine-scheduler -misched-dump-schedule-trace \
+# RUN:   -misched-postra-direction=bidirectional -o - %s 2>&1 \
+# RUN:   | FileCheck --check-prefix=BIDIRECTIONAL %s
+
+# REQUIRES: asserts
+
+---
+name:            add_m2
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $x10, $v8m2, $v10m2, $v12m2, $v14m2
+
+    dead $x0 = PseudoVSETVLI killed renamable $x10, 153 /* e64, m2, tu, ma */, implicit-def $vl, implicit-def $vtype
+    renamable $v8m2 = PseudoVADD_VV_M2 undef renamable $v8m2, killed renamable $v8m2, killed renamable $v10m2, $noreg, 6 /* e64 */, 0 /* tu, mu */, implicit $vl, implicit $vtype, implicit $vl, implicit $vtype
+    renamable $v10m2 = PseudoVADD_VV_M2 undef renamable $v10m2, killed renamable $v14m2, killed renamable $v12m2, $noreg, 6 /* e64 */, 0 /* tu, mu */, implicit $vl, implicit $vtype, implicit $vl, implicit $vtype
+    renamable $v8m2 = PseudoVADD_VV_M2 undef renamable $v8m2, killed renamable $v8m2, killed renamable $v10m2, $noreg, 6 /* e64 */, 0 /* tu, mu */, implicit $vl, implicit $vtype, implicit $vl, implicit $vtype
+    PseudoRET implicit killed $v8m2
+...
+
+# TOPDOWN: *** Final schedule for %bb.0 ***
+# TOPDOWN-NEXT:  * Schedule table (TopDown):
+# TOPDOWN-NEXT:   i: issue
+# TOPDOWN-NEXT:   x: resource booked
+# TOPDOWN-NEXT: Cycle              | 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 |
+# TOPDOWN-NEXT: SU(0)              | i  |    |    |    |    |    |    |    |    |    |    |    |    |
+# TOPDOWN-NEXT:       SiFive7PipeA | x  |    |    |    |    |    |    |    |    |    |    |    |    |
+# TOPDOWN-NEXT:      SiFive7PipeAB | x  |    |    |    |    |    |    |    |    |    |    |    |    |
+# TOPDOWN-NEXT: SU(1)              | i  |    |    |    |    |    |    |    |    |    |    |    |    |
+# TOPDOWN-NEXT:         SiFive7VCQ | x  |    |    |    |    |    |    |    |    |    |    |    |    |
+# TOPDOWN-NEXT:          SiFive7VA |    | x  | x  | x  | x  |    |    |    |    |    |    |    |    |
+# TOPDOWN-NEXT: SU(2)              |    |    |    |    | i  |    |    |    |    |    |    |    |    |
+# TOPDOWN-NEXT:         SiFive7VCQ |    |    |    |    | x  |    |    |    |    |    |    |    |    |
+# TOPDOWN-NEXT:          SiFive7VA |    |    |  ...
[truncated]

Copy link

github-actions bot commented Jan 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -51,3 +61,6 @@ body: |
# BOTTOMUP-NEXT: SU(1): renamable $x13 = ADD renamable $x11, renamable $x10
# BOTTOMUP-NEXT: SU(0): renamable $x12 = MUL renamable $x11, renamable $x10
# BOTTOMUP-NEXT: SU(2): renamable $x14 = DIVW renamable $x12, renamable $x13

# BIDIRECTIONAL: *** Final schedule for %bb.0 ***
# BIDIRECTIONAL-NEXT: * Schedule table (Bidirectional): not implemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it still not implemented? Should we change ScheduleDAGMI::dumpSchedule() too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to reflect on my usage of the topdown table and the bottomup table. They were useful in helping me understand (a) resource utilization of the final schedule, (b) where the pipeline faced stalls, and (c) where pipeline was able to take advantage of parallelism. I think that llvm-mca could be a good candidate to extract this kind of information as well, although I concede that the tables here allow us to work on pseudo instructions and virtual registers, which does communicate some finer grained information.

I assumed the reason we do not model bidirectional table is because there is ambiguity in how it might be presented. There are three main concerns I can think of:

(1) The topdown print out starts at cycle 0 and each subsequent cycle increases by 1. For bottomup it starts at cycle N and counts down to cycle zero. We would need to have a useful way to impose the two on top of each other for bidirectional. I initially thought about doing something like:

topdown:  1 2 3 4
bottomup: 4 3 2 1

but I have noticed that the number of topdown cycles is not always equal to the number of bottom up cycles.

(2) For the sake of the point I am going to make, assume a case that the number of top down cycles is equal to the number of bottom up cycles: I predict that a bottom up view could model an instruction issued on different absolute cycle than the topdown view modeled the same instruction to be issued.

topdown:  1 2 3 4
bottomup: 4 3 2 1
absolute:   0 1 2 3

ADD            i i  // <======= maybe the topdown view had the add issued on absolute cycle 0 but bottomup view had the add issued on absolute cycle 1.

It may be tricky to represent both views at the same time for these two reasons. How do you model subsequent instructions interacting with both possibilities?

(3) In the bidirectional scheduler there is the notion that you could schedule from top or bottom each time you pick a node to be scheduled. Would such a table be able to capture when the decision came from a top choice or a bottom choice? This information is captured in the debug output that precedes the table. Would this data impact the way the table is generated?

In summary, there is some ambiguity on what the table would look like. I think that llvm-mca might be able to communicate some of the information that these tables intend to communicate, but not all of it. I am open to us adding this bidirectional table in the future, but we need to think how the data would be presented.

@michaelmaitland
Copy link
Contributor Author

ping

// Set the bottom-up policy based on the state of the current bottom zone and
// the instructions outside the zone, including the top zone.
CandPolicy BotPolicy;
setPolicy(BotPolicy, /*IsPostRA=*/false, Bot, &Top);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is IsPostRA false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this!

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits.
These changes are straightforward, but please wait for some days before landing.

@@ -497,11 +501,13 @@ bool PostMachineScheduler::runOnMachineFunction(MachineFunction &mf) {
// Instantiate the selected scheduler for this target, function, and
// optimization level.
std::unique_ptr<ScheduleDAGInstrs> Scheduler(createPostMachineScheduler());
ScheduleDAGMI::DumpDirection D;
ScheduleDAGInstrs::DumpDirection D;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change ScheduleDAGMI to ScheduleDAGInstrs?

@@ -51,3 +61,6 @@ body: |
# BOTTOMUP-NEXT: SU(1): renamable $x13 = ADD renamable $x11, renamable $x10
# BOTTOMUP-NEXT: SU(0): renamable $x12 = MUL renamable $x11, renamable $x10
# BOTTOMUP-NEXT: SU(2): renamable $x14 = DIVW renamable $x12, renamable $x13

# BIDIRECTIONAL: *** Final schedule for %bb.0 ***
# BIDIRECTIONAL-NEXT: * Schedule table (Bidirectional): not implemented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can test SU nodes here, just like llvm/test/CodeGen/AArch64/dump-schedule-trace.mir.

This PR is stacked on llvm#76186.

This PR keeps the default strategy as top-down since that is what
existing targets expect. It can be enabled using
`-misched-postra-direction=bidirectional`.

It is up to targets to decide whether they would like to enable this
option for themselves.
Copy link

✅ With the latest revision this PR passed the Python code formatter.

@michaelmaitland michaelmaitland merged commit 865294b into llvm:main Mar 25, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants