Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm-mca] Account for AcquireAtCycles in llvm-mca #80742

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

michaelmaitland
Copy link
Contributor

@michaelmaitland michaelmaitland commented Feb 5, 2024

In the past, there was a variable named ResourceCycles which modeled how long a resource was consumed. Then a variable named AcquireAtCycles was added and allowed the scheduler to specify how many cycles after issuing the instruction the resource was acquired. Relatively to the issue cycle, the resource was consumed from that AcquireAtCycle until the cycle ResourceCycles.

We decided ResourceCycles should be renamed ReleaseAtCycle, and the number of cycles a resource was consumed is ReleaseAtCycle - AcquireAtCycle. This renaming happened globally, but only the scheduler was updated to account for AcquireAtCycle.

This patch accounts for the total number of cycles to be calculated as ReleaseAtCycle - AcquireAtCycle in llvm-mca. This patch renames the class ReleaseAtCycles to NumCyclesUsed since that better describes what that class now models.

In the past, there was a variable named ResourceCycles which modeled how
long a resource was consumed. Then a variable named AcquireAtCycles was
added and allowed the scheduler to specify how many cycles after "cycle
0" the resource was acquired. It was consumed from that AcquireAtCycle
until the cycle ResourceCycles - AcquireAtCycle.

We decided ResourceCycles should be renamed ReleaseAtCycle, and the
number of cycles a resource was consumed is ReleaseAtCycle -
AcquireAtCycle. This renaming happened globally, but only the scheduler
was updated to account for AcquireAtCycle.

This patch accounts for the total number of cycles to be calculated as
ReleaseAtCycle - AcquireAtCycle in llvm-mca. This patch renames the
class ReleaseAtCycles to NumCyclesUsed since that better describes what that
class now models.
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 5, 2024

@llvm/pr-subscribers-tools-llvm-mca

Author: Michael Maitland (michaelmaitland)

Changes

In the past, there was a variable named ResourceCycles which modeled how long a resource was consumed. Then a variable named AcquireAtCycles was added and allowed the scheduler to specify how many cycles after "cycle 0" the resource was acquired. It was consumed from that AcquireAtCycle until the cycle ResourceCycles - AcquireAtCycle.

We decided ResourceCycles should be renamed ReleaseAtCycle, and the number of cycles a resource was consumed is ReleaseAtCycle - AcquireAtCycle. This renaming happened globally, but only the scheduler was updated to account for AcquireAtCycle.

This patch accounts for the total number of cycles to be calculated as ReleaseAtCycle - AcquireAtCycle in llvm-mca. This patch renames the class ReleaseAtCycles to NumCyclesUsed since that better describes what that class now models.


Patch is 263.91 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80742.diff

37 Files Affected:

  • (modified) llvm/include/llvm/MCA/HWEventListener.h (+1-1)
  • (modified) llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h (+1-1)
  • (modified) llvm/include/llvm/MCA/HardwareUnits/Scheduler.h (+2-2)
  • (modified) llvm/include/llvm/MCA/Support.h (+5-5)
  • (modified) llvm/lib/MCA/HardwareUnits/ResourceManager.cpp (+4-4)
  • (modified) llvm/lib/MCA/HardwareUnits/Scheduler.cpp (+2-2)
  • (modified) llvm/lib/MCA/InstrBuilder.cpp (+2-2)
  • (modified) llvm/lib/MCA/Stages/InstructionTables.cpp (+2-2)
  • (modified) llvm/lib/MCA/Support.cpp (+5-5)
  • (modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s (+106-106)
  • (modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/strided-load-store.s (+81-81)
  • (modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/strided-load-x0.s (+29-29)
  • (modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/vector-integer-arithmetic.s (+373-373)
  • (modified) llvm/test/tools/llvm-mca/RISCV/different-lmul-instruments.s (+4-4)
  • (modified) llvm/test/tools/llvm-mca/RISCV/different-sew-instruments.s (+5-5)
  • (modified) llvm/test/tools/llvm-mca/RISCV/disable-im.s (+15-15)
  • (modified) llvm/test/tools/llvm-mca/RISCV/fractional-lmul-data.s (+5-5)
  • (modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-at-start.s (+3-3)
  • (modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-in-middle.s (+9-9)
  • (modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-in-region.s (+3-3)
  • (modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-straddles-region.s (+3-3)
  • (modified) llvm/test/tools/llvm-mca/RISCV/multiple-same-lmul-instruments.s (+19-19)
  • (modified) llvm/test/tools/llvm-mca/RISCV/multiple-same-sew-instruments.s (+8-8)
  • (modified) llvm/test/tools/llvm-mca/RISCV/needs-sew-but-only-lmul.s (+5-5)
  • (modified) llvm/test/tools/llvm-mca/RISCV/no-vsetvli-to-start.s (+9-9)
  • (modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-at-start.s (+3-3)
  • (modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-in-middle.s (+5-5)
  • (modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-in-region.s (+3-3)
  • (modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-straddles-region.s (+3-3)
  • (modified) llvm/test/tools/llvm-mca/RISCV/vle-vse.s (+205-205)
  • (modified) llvm/test/tools/llvm-mca/RISCV/vsetivli-lmul-instrument.s (+4-4)
  • (modified) llvm/test/tools/llvm-mca/RISCV/vsetivli-lmul-sew-instrument.s (+5-5)
  • (modified) llvm/test/tools/llvm-mca/RISCV/vsetvli-lmul-instrument.s (+4-4)
  • (modified) llvm/test/tools/llvm-mca/RISCV/vsetvli-lmul-sew-instrument.s (+5-5)
  • (modified) llvm/tools/llvm-mca/Views/BottleneckAnalysis.cpp (+3-3)
  • (modified) llvm/tools/llvm-mca/Views/ResourcePressureView.cpp (+2-2)
  • (modified) llvm/tools/llvm-mca/Views/ResourcePressureView.h (+1-1)
diff --git a/llvm/include/llvm/MCA/HWEventListener.h b/llvm/include/llvm/MCA/HWEventListener.h
index a27b1f12e6a6e..adcf6e152d298 100644
--- a/llvm/include/llvm/MCA/HWEventListener.h
+++ b/llvm/include/llvm/MCA/HWEventListener.h
@@ -63,7 +63,7 @@ class HWInstructionEvent {
 // ResourceRef::second is a bitmask of the referenced sub-unit of the resource.
 using ResourceRef = std::pair<uint64_t, uint64_t>;
 
-using ResourceUse = std::pair<ResourceRef, ReleaseAtCycles>;
+using ResourceUse = std::pair<ResourceRef, NumCyclesUsed>;
 
 class HWInstructionIssuedEvent : public HWInstructionEvent {
 public:
diff --git a/llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h b/llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h
index 0e3f16d2a490b..6947b9f6ff2fd 100644
--- a/llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h
+++ b/llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h
@@ -430,7 +430,7 @@ class ResourceManager {
 
   void issueInstruction(
       const InstrDesc &Desc,
-      SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Pipes);
+      SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Pipes);
 
   void cycleEvent(SmallVectorImpl<ResourceRef> &ResourcesFreed);
 
diff --git a/llvm/include/llvm/MCA/HardwareUnits/Scheduler.h b/llvm/include/llvm/MCA/HardwareUnits/Scheduler.h
index 272f6b197868b..3acf142fae1ab 100644
--- a/llvm/include/llvm/MCA/HardwareUnits/Scheduler.h
+++ b/llvm/include/llvm/MCA/HardwareUnits/Scheduler.h
@@ -136,7 +136,7 @@ class Scheduler : public HardwareUnit {
   /// Issue an instruction without updating the ready queue.
   void issueInstructionImpl(
       InstRef &IR,
-      SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Pipes);
+      SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Pipes);
 
   // Identify instructions that have finished executing, and remove them from
   // the IssuedSet. References to executed instructions are added to input
@@ -202,7 +202,7 @@ class Scheduler : public HardwareUnit {
   /// result of this event.
   void issueInstruction(
       InstRef &IR,
-      SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Used,
+      SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Used,
       SmallVectorImpl<InstRef> &Pending,
       SmallVectorImpl<InstRef> &Ready);
 
diff --git a/llvm/include/llvm/MCA/Support.h b/llvm/include/llvm/MCA/Support.h
index e3c155435e234..d93043f673b46 100644
--- a/llvm/include/llvm/MCA/Support.h
+++ b/llvm/include/llvm/MCA/Support.h
@@ -48,12 +48,12 @@ template <typename T> char InstructionError<T>::ID;
 /// number of resources, are kept separate.  This is used by the
 /// ResourcePressureView to calculate the average resource cycles
 /// per instruction/iteration.
-class ReleaseAtCycles {
+class NumCyclesUsed {
   unsigned Numerator, Denominator;
 
 public:
-  ReleaseAtCycles() : Numerator(0), Denominator(1) {}
-  ReleaseAtCycles(unsigned Cycles, unsigned ResourceUnits = 1)
+  NumCyclesUsed() : Numerator(0), Denominator(1) {}
+  NumCyclesUsed(unsigned Cycles, unsigned ResourceUnits = 1)
       : Numerator(Cycles), Denominator(ResourceUnits) {}
 
   operator double() const {
@@ -67,7 +67,7 @@ class ReleaseAtCycles {
   // Add the components of RHS to this instance.  Instead of calculating
   // the final value here, we keep track of the numerator and denominator
   // separately, to reduce floating point error.
-  ReleaseAtCycles &operator+=(const ReleaseAtCycles &RHS);
+  NumCyclesUsed &operator+=(const NumCyclesUsed &RHS);
 };
 
 /// Populates vector Masks with processor resource masks.
@@ -105,7 +105,7 @@ inline unsigned getResourceStateIndex(uint64_t Mask) {
 /// Compute the reciprocal block throughput from a set of processor resource
 /// cycles. The reciprocal block throughput is computed as the MAX between:
 ///  - NumMicroOps / DispatchWidth
-///  - ProcReleaseAtCycles / #ProcResourceUnits  (for every consumed resource).
+///  - ProcNumCyclesUsed / #ProcResourceUnits  (for every consumed resource).
 double computeBlockRThroughput(const MCSchedModel &SM, unsigned DispatchWidth,
                                unsigned NumMicroOps,
                                ArrayRef<unsigned> ProcResourceUsage);
diff --git a/llvm/lib/MCA/HardwareUnits/ResourceManager.cpp b/llvm/lib/MCA/HardwareUnits/ResourceManager.cpp
index 8d99695f4c29e..d58e3e68cdd32 100644
--- a/llvm/lib/MCA/HardwareUnits/ResourceManager.cpp
+++ b/llvm/lib/MCA/HardwareUnits/ResourceManager.cpp
@@ -346,7 +346,7 @@ uint64_t ResourceManager::checkAvailability(const InstrDesc &Desc) const {
 
 void ResourceManager::issueInstruction(
     const InstrDesc &Desc,
-    SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Pipes) {
+    SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Pipes) {
   for (const std::pair<uint64_t, ResourceUsage> &R : Desc.Resources) {
     const CycleSegment &CS = R.second.CS;
     if (!CS.size()) {
@@ -354,13 +354,13 @@ void ResourceManager::issueInstruction(
       continue;
     }
 
-    assert(CS.begin() == 0 && "Invalid {Start, End} cycles!");
+    assert(CS.isValid() && "Invalid {Start, End} cycles!");
     if (!R.second.isReserved()) {
       ResourceRef Pipe = selectPipe(R.first);
       use(Pipe);
       BusyResources[Pipe] += CS.size();
-      Pipes.emplace_back(std::pair<ResourceRef, ReleaseAtCycles>(
-          Pipe, ReleaseAtCycles(CS.size())));
+      Pipes.emplace_back(std::pair<ResourceRef, NumCyclesUsed>(
+          Pipe, NumCyclesUsed(CS.size())));
     } else {
       assert((llvm::popcount(R.first) > 1) && "Expected a group!");
       // Mark this group as reserved.
diff --git a/llvm/lib/MCA/HardwareUnits/Scheduler.cpp b/llvm/lib/MCA/HardwareUnits/Scheduler.cpp
index a9bbf69799198..06013d4d0cfd5 100644
--- a/llvm/lib/MCA/HardwareUnits/Scheduler.cpp
+++ b/llvm/lib/MCA/HardwareUnits/Scheduler.cpp
@@ -69,7 +69,7 @@ Scheduler::Status Scheduler::isAvailable(const InstRef &IR) {
 
 void Scheduler::issueInstructionImpl(
     InstRef &IR,
-    SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &UsedResources) {
+    SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &UsedResources) {
   Instruction *IS = IR.getInstruction();
   const InstrDesc &D = IS->getDesc();
 
@@ -98,7 +98,7 @@ void Scheduler::issueInstructionImpl(
 // Release the buffered resources and issue the instruction.
 void Scheduler::issueInstruction(
     InstRef &IR,
-    SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &UsedResources,
+    SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &UsedResources,
     SmallVectorImpl<InstRef> &PendingInstructions,
     SmallVectorImpl<InstRef> &ReadyInstructions) {
   const Instruction &Inst = *IR.getInstruction();
diff --git a/llvm/lib/MCA/InstrBuilder.cpp b/llvm/lib/MCA/InstrBuilder.cpp
index 1a82e45763a26..f346368da5de8 100644
--- a/llvm/lib/MCA/InstrBuilder.cpp
+++ b/llvm/lib/MCA/InstrBuilder.cpp
@@ -89,11 +89,11 @@ static void initializeUsedResources(InstrDesc &ID,
       AllInOrderResources &= (PR.BufferSize <= 1);
     }
 
-    CycleSegment RCy(0, PRE->ReleaseAtCycle, false);
+    CycleSegment RCy(PRE->AcquireAtCycle, PRE->ReleaseAtCycle, false);
     Worklist.emplace_back(ResourcePlusCycles(Mask, ResourceUsage(RCy)));
     if (PR.SuperIdx) {
       uint64_t Super = ProcResourceMasks[PR.SuperIdx];
-      SuperResources[Super] += PRE->ReleaseAtCycle;
+      SuperResources[Super] += PRE->ReleaseAtCycle - PRE->AcquireAtCycle;
     }
   }
 
diff --git a/llvm/lib/MCA/Stages/InstructionTables.cpp b/llvm/lib/MCA/Stages/InstructionTables.cpp
index 937cc7da8de72..c4e83d00f4769 100644
--- a/llvm/lib/MCA/Stages/InstructionTables.cpp
+++ b/llvm/lib/MCA/Stages/InstructionTables.cpp
@@ -38,7 +38,7 @@ Error InstructionTables::execute(InstRef &IR) {
       for (unsigned I = 0, E = NumUnits; I < E; ++I) {
         ResourceRef ResourceUnit = std::make_pair(Index, 1U << I);
         UsedResources.emplace_back(
-            std::make_pair(ResourceUnit, ReleaseAtCycles(Cycles, NumUnits)));
+            std::make_pair(ResourceUnit, NumCyclesUsed(Cycles, NumUnits)));
       }
       continue;
     }
@@ -54,7 +54,7 @@ Error InstructionTables::execute(InstRef &IR) {
         ResourceRef ResourceUnit = std::make_pair(SubUnitIdx, 1U << I2);
         UsedResources.emplace_back(std::make_pair(
             ResourceUnit,
-            ReleaseAtCycles(Cycles, NumUnits * SubUnit.NumUnits)));
+            NumCyclesUsed(Cycles, NumUnits * SubUnit.NumUnits)));
       }
     }
   }
diff --git a/llvm/lib/MCA/Support.cpp b/llvm/lib/MCA/Support.cpp
index f8b8a2d129c1c..849a0c7aa01f6 100644
--- a/llvm/lib/MCA/Support.cpp
+++ b/llvm/lib/MCA/Support.cpp
@@ -21,7 +21,7 @@ namespace mca {
 
 #define DEBUG_TYPE "llvm-mca"
 
-ReleaseAtCycles &ReleaseAtCycles::operator+=(const ReleaseAtCycles &RHS) {
+NumCyclesUsed &NumCyclesUsed::operator+=(const NumCyclesUsed &RHS) {
   if (Denominator == RHS.Denominator)
     Numerator += RHS.Numerator;
   else {
@@ -92,18 +92,18 @@ double computeBlockRThroughput(const MCSchedModel &SM, unsigned DispatchWidth,
   // The number of available resource units affects the resource pressure
   // distribution, as well as how many blocks can be executed every cycle.
   for (unsigned I = 0, E = SM.getNumProcResourceKinds(); I < E; ++I) {
-    unsigned ReleaseAtCycles = ProcResourceUsage[I];
-    if (!ReleaseAtCycles)
+    unsigned NumCyclesUsed = ProcResourceUsage[I];
+    if (!NumCyclesUsed)
       continue;
 
     const MCProcResourceDesc &MCDesc = *SM.getProcResource(I);
-    double Throughput = static_cast<double>(ReleaseAtCycles) / MCDesc.NumUnits;
+    double Throughput = static_cast<double>(NumCyclesUsed) / MCDesc.NumUnits;
     Max = std::max(Max, Throughput);
   }
 
   // The block reciprocal throughput is computed as the MAX of:
   //  - (NumMicroOps / DispatchWidth)
-  //  - (NumUnits / ReleaseAtCycles)   for every consumed processor resource.
+  //  - (NumUnits / NumCyclesUsed)   for every consumed processor resource.
   return Max;
 }
 
diff --git a/llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s b/llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s
index a6b756ba8151b..6ffb2bd1a23d1 100644
--- a/llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s
+++ b/llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s
@@ -223,13 +223,13 @@ vfredmin.vs  v4, v8, v12
 
 # CHECK:      Iterations:        1
 # CHECK-NEXT: Instructions:      206
-# CHECK-NEXT: Total Cycles:      8746
+# CHECK-NEXT: Total Cycles:      8644
 # CHECK-NEXT: Total uOps:        206
 
 # CHECK:      Dispatch Width:    2
 # CHECK-NEXT: uOps Per Cycle:    0.02
 # CHECK-NEXT: IPC:               0.02
-# CHECK-NEXT: Block RThroughput: 8743.0
+# CHECK-NEXT: Block RThroughput: 8640.0
 
 # CHECK:      Instruction Info:
 # CHECK-NEXT: [1]: #uOps
@@ -459,213 +459,213 @@ vfredmin.vs  v4, v8, v12
 
 # CHECK:      Resource pressure per iteration:
 # CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]
-# CHECK-NEXT:  -      -     103.00  -     8743.00 103.00  -     -
+# CHECK-NEXT:  -      -     103.00  -     8640.00 103.00  -     -
 
 # CHECK:      Resource pressure by instruction:
 # CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    Instructions:
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf8, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf4, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     48.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     50.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     49.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     54.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     53.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     62.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     61.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, mf4, tu, mu
-# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     41.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     41.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     43.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     45.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     44.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     49.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     48.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     57.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     56.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     37.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     36.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     38.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     37.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     40.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     39.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     44.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     43.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     52.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     51.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e64, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     33.00  1.00    -      -     vredxor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     32.00  1.00    -      -     vredxor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e64, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     35.00  1.00    -      -     vredxor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     34.00  1.00    -      -     vredxor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e64, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     39.00  1.00    -      -     vredxor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     38.00  1.00    -      -     vredxor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e64, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredxor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredxor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf8, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf4, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     48.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     50.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     49.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     54.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     53.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     62.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     61.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, mf4, tu, mu
-# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredmax.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     41.00  1.00    -      -     vredmax.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredmax.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     41.00  1.00    -      -     vredmax.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m1, tu, mu
-# CHECK-NEXT:  -      -     ...
[truncated]

Copy link

github-actions bot commented Feb 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I don't see any problem here, but please wait for @fpetrogalli to see if there are any comments.

@fpetrogalli
Copy link
Member

In the past, there was a variable named ResourceCycles which modeled how long a resource was consumed. Then a variable named AcquireAtCycles was added and allowed the scheduler to specify how many cycles after "cycle 0" the resource was acquired. It was consumed from that AcquireAtCycle until the cycle ResourceCycles - AcquireAtCycle.

Nit: I think you should rephrase as follows:

[...] the scheduler to specify how many cycles after issuing the instruction the resource was acquired. Relatively to the issue cycle, the resource was consumed from that AcquireAtCycle until the cycle ResourceCycles.

The reason is that with the introduction of AcquireAtCycle, if you mark as c the issue cycle, you end up having the following resource usage (marked with x):

| c | c+ 1| ... | c+AcquireAtCycle| ... | c+ReleaseAtCycle |
+---+-----+-----+-----------------+-----+------------------+
|   |     |     |xxxxxxxxxxxxxxxxx|xxxxx|xxxxxxxxxxxxxxxxxx|

Therefore, when you say how many cycle after "cycle 0",you refer "cycle 0" to be the issue cycle, which means that the resource is taken until ResourceCycles, not until ResourceCycles - AcquireAtCycles. (notice that I am assuming ReleaseAtCycle == ResourceCycle)

@@ -430,7 +430,7 @@ class ResourceManager {

void issueInstruction(
const InstrDesc &Desc,
SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Pipes);
SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Pipes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How comes that LLVM-MCA does not need to know about the AcquireAtCycle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumCycleUsed is based on the ReleaseAtCycle - AcquireAtCycle. In that sense, it does know about AcquireAtCycle.

I think what you are pointing out here is that the current implementation will use the resource for the correct number of cycles now, but it ignores when it can start acquiring the resource relative to when the instruction is issued. To demonstrate, say an instruction X uses resource A with (Acquire, Release) = (0, 2) and resource B with (Acquire, Release) = (1, 2). This patch will model that resource A is used for 2 cycles and resource B for 1 cycle, but it does not model that we have to wait one cycle before consuming resource B.

I should look into adding this support. Thanks for catching this!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Michael wrote, this patch doesn't add support for AcquireAtCycle.

The difficult part will be refactoring the ResourceManager logic so to keep track of multiple AcquireAtCycle requests, and quickly resolve checks on whether resources are busy or not.
Changing that logic might not be too complicated. However, doing it without slowing down the process too much might be challenging.

At the moment, mca doesn't know about the AcquireAtCycles, so this will be a problem for future patches.

@@ -37,13 +37,13 @@ vle64.v v1, (a1)

# CHECK: Iterations: 1
# CHECK-NEXT: Instructions: 26
# CHECK-NEXT: Total Cycles: 234
# CHECK-NEXT: Total Cycles: 211
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the fix for llvm-mca to be a NFC for its output, at least for all those scheduling model that are already existing and that by default have AcquireAtCycle = 0.

How do you explain this new values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think llvm-mca has over estimated the cycles before. With this change, the cycles of RVV instrcctions should reduce by 1 cycle (for SiFive7VCQ), so here the total cycles reduce 23 cycles (there are 24 RVV instrcutions and extra 1 cycle for first RVV instruction?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changing of the cycles makes sense since AcquireAtCycle != 0 for the SiFive7 scheduler model. You can see this here for some of the vector alu operations. llvm-mca thought that the arithmetic/load/store sequencer (VA,VS, or VL) was used for one extra cycle than it should have been since AcquireAtCycle = 1 for those resources.

Copy link
Member

@fpetrogalli fpetrogalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch @michaelmaitland! When talking about the AcquireAtCYcle/ReleaseAtCycle change with @adibiagio , he mentioned that llvm-mca needed updating to account for this.

Unfortunately I do now know enough about LLVM-MCA to be able to tell whether what you are doing is correct or not.

What puzzles me are the changes in the tests, because I was expecting a NFC for the rework.

I kindly ask you to seek @adibiagio feedback for moving forward.

@adibiagio
Copy link
Collaborator

Thank you for the patch @michaelmaitland! When talking about the AcquireAtCYcle/ReleaseAtCycle change with @adibiagio , he mentioned that llvm-mca needed updating to account for this.

Unfortunately I do now know enough about LLVM-MCA to be able to tell whether what you are doing is correct or not.

What puzzles me are the changes in the tests, because I was expecting a NFC for the rework.

I kindly ask you to seek @adibiagio feedback for moving forward.

I'll review the patch today :-).

Thanks @michaelmaitland for working on this!

Copy link
Collaborator

@adibiagio adibiagio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -89,11 +89,11 @@ static void initializeUsedResources(InstrDesc &ID,
AllInOrderResources &= (PR.BufferSize <= 1);
}

CycleSegment RCy(0, PRE->ReleaseAtCycle, false);
CycleSegment RCy(PRE->AcquireAtCycle, PRE->ReleaseAtCycle, false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code here made me wonder if we are accounting for AcquireAtCycle correctly.

The NumCyclesUsed class may only represent the total number of cycles used, without saying anything about when the resource is acquired or released. CycleSegment class looks like it is the one who is responsible for taking this information into account. After looking into it, I found that although this information is being tracked, it is not being used as needed:

In ResourceManager::issueInstruction, we say BusyResources[Pipe] += CS.size(), which is a map from pipe to how many cycles are left before the pipe is usable again. The problem is that the this map data structure does not model the fact that a resource can be free from issue cycle until acquire at cycle. My thought is to add another map that tracks this data, and in conjunction with BusyResources we can tell the full story.

The other part that needs some fixing is that reserveResource is called in issueInstruction. We need to model that a resource may not need to be reserved on instruction issue, but on acquire at cycles after instruction issue. It may be a good idea to move the call to reserveResource from ResourceManager::issueInstruction to ResourceManager::cycleEvent.

I will take a look into this once I get some bandwidth. I am leaving this comment here and marking this PR as needs changes in the meantime.

EDIT: It turns out the creator of a PR cannot request changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code here made me wonder if we are accounting for AcquireAtCycle correctly.

The NumCyclesUsed class may only represent the total number of cycles used, without saying anything about when the resource is acquired or released. CycleSegment class looks like it is the one who is responsible for taking this information into account. After looking into it, I found that although this information is being tracked, it is not being used as needed

Correct.

If you see my comment from three weeks ago, this is literally what I was trying to say when I wrote "this patch doesn't add support for AcquireAtCycle".
The ResourceManager still assumes that every resource is consumed at relative cycle zero (i.e. relative to the issue event).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed your comment above. I think your point is that we need to be able to do this without slowing things down too much. I have a rough idea on how me might do this in my head, but I am not sure when I will have time to pick this task back up again, since it is much less straightforward than I had for the original patch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed your comment above. I think your point is that we need to be able to do this without slowing things down too much. I have a rough idea on how me might do this in my head, but I am not sure when I will have time to pick this task back up again, since it is much less straightforward than I had for the original patch.

No problem. It is something that I always wanted to do in my spare time but never ended up doing.
Redesigning the ResourceManager is definitely not straightforward; the simulation time is heavily dependent on how quickly it replies to individual dispatch/issue requests.
It won't be easy to find a design which doesn't hurt performance too much. In the end we will inevitably have to compromise a bit between simulation speed and simulation accuracy. That is expected.

The complexity of the new design also depends on how many pending CycleSegments can be "issued" for every resource. I am mainly thinking about out-of-order processors where multiple delayed resource consumptions can be pending at the same time. Not sure if that makes sense...

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out the creator of a PR cannot request changes.

:-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants