[llvm-mca] Account for AcquireAtCycles in llvm-mca #80742

michaelmaitland · 2024-02-05T21:08:14Z

In the past, there was a variable named ResourceCycles which modeled how long a resource was consumed. Then a variable named AcquireAtCycles was added and allowed the scheduler to specify how many cycles after issuing the instruction the resource was acquired. Relatively to the issue cycle, the resource was consumed from that AcquireAtCycle until the cycle ResourceCycles.

We decided ResourceCycles should be renamed ReleaseAtCycle, and the number of cycles a resource was consumed is ReleaseAtCycle - AcquireAtCycle. This renaming happened globally, but only the scheduler was updated to account for AcquireAtCycle.

This patch accounts for the total number of cycles to be calculated as ReleaseAtCycle - AcquireAtCycle in llvm-mca. This patch renames the class ReleaseAtCycles to NumCyclesUsed since that better describes what that class now models.

In the past, there was a variable named ResourceCycles which modeled how long a resource was consumed. Then a variable named AcquireAtCycles was added and allowed the scheduler to specify how many cycles after "cycle 0" the resource was acquired. It was consumed from that AcquireAtCycle until the cycle ResourceCycles - AcquireAtCycle. We decided ResourceCycles should be renamed ReleaseAtCycle, and the number of cycles a resource was consumed is ReleaseAtCycle - AcquireAtCycle. This renaming happened globally, but only the scheduler was updated to account for AcquireAtCycle. This patch accounts for the total number of cycles to be calculated as ReleaseAtCycle - AcquireAtCycle in llvm-mca. This patch renames the class ReleaseAtCycles to NumCyclesUsed since that better describes what that class now models.

llvmbot · 2024-02-05T21:08:45Z

@llvm/pr-subscribers-tools-llvm-mca

Author: Michael Maitland (michaelmaitland)

Changes

In the past, there was a variable named ResourceCycles which modeled how long a resource was consumed. Then a variable named AcquireAtCycles was added and allowed the scheduler to specify how many cycles after "cycle 0" the resource was acquired. It was consumed from that AcquireAtCycle until the cycle ResourceCycles - AcquireAtCycle.

We decided ResourceCycles should be renamed ReleaseAtCycle, and the number of cycles a resource was consumed is ReleaseAtCycle - AcquireAtCycle. This renaming happened globally, but only the scheduler was updated to account for AcquireAtCycle.

This patch accounts for the total number of cycles to be calculated as ReleaseAtCycle - AcquireAtCycle in llvm-mca. This patch renames the class ReleaseAtCycles to NumCyclesUsed since that better describes what that class now models.

Patch is 263.91 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80742.diff

37 Files Affected:

(modified) llvm/include/llvm/MCA/HWEventListener.h (+1-1)
(modified) llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h (+1-1)
(modified) llvm/include/llvm/MCA/HardwareUnits/Scheduler.h (+2-2)
(modified) llvm/include/llvm/MCA/Support.h (+5-5)
(modified) llvm/lib/MCA/HardwareUnits/ResourceManager.cpp (+4-4)
(modified) llvm/lib/MCA/HardwareUnits/Scheduler.cpp (+2-2)
(modified) llvm/lib/MCA/InstrBuilder.cpp (+2-2)
(modified) llvm/lib/MCA/Stages/InstructionTables.cpp (+2-2)
(modified) llvm/lib/MCA/Support.cpp (+5-5)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s (+106-106)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/strided-load-store.s (+81-81)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/strided-load-x0.s (+29-29)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/vector-integer-arithmetic.s (+373-373)
(modified) llvm/test/tools/llvm-mca/RISCV/different-lmul-instruments.s (+4-4)
(modified) llvm/test/tools/llvm-mca/RISCV/different-sew-instruments.s (+5-5)
(modified) llvm/test/tools/llvm-mca/RISCV/disable-im.s (+15-15)
(modified) llvm/test/tools/llvm-mca/RISCV/fractional-lmul-data.s (+5-5)
(modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-at-start.s (+3-3)
(modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-in-middle.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-in-region.s (+3-3)
(modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-straddles-region.s (+3-3)
(modified) llvm/test/tools/llvm-mca/RISCV/multiple-same-lmul-instruments.s (+19-19)
(modified) llvm/test/tools/llvm-mca/RISCV/multiple-same-sew-instruments.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/needs-sew-but-only-lmul.s (+5-5)
(modified) llvm/test/tools/llvm-mca/RISCV/no-vsetvli-to-start.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-at-start.s (+3-3)
(modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-in-middle.s (+5-5)
(modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-in-region.s (+3-3)
(modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-straddles-region.s (+3-3)
(modified) llvm/test/tools/llvm-mca/RISCV/vle-vse.s (+205-205)
(modified) llvm/test/tools/llvm-mca/RISCV/vsetivli-lmul-instrument.s (+4-4)
(modified) llvm/test/tools/llvm-mca/RISCV/vsetivli-lmul-sew-instrument.s (+5-5)
(modified) llvm/test/tools/llvm-mca/RISCV/vsetvli-lmul-instrument.s (+4-4)
(modified) llvm/test/tools/llvm-mca/RISCV/vsetvli-lmul-sew-instrument.s (+5-5)
(modified) llvm/tools/llvm-mca/Views/BottleneckAnalysis.cpp (+3-3)
(modified) llvm/tools/llvm-mca/Views/ResourcePressureView.cpp (+2-2)
(modified) llvm/tools/llvm-mca/Views/ResourcePressureView.h (+1-1)

diff --git a/llvm/include/llvm/MCA/HWEventListener.h b/llvm/include/llvm/MCA/HWEventListener.h
index a27b1f12e6a6e..adcf6e152d298 100644
--- a/llvm/include/llvm/MCA/HWEventListener.h
+++ b/llvm/include/llvm/MCA/HWEventListener.h
@@ -63,7 +63,7 @@ class HWInstructionEvent {
 // ResourceRef::second is a bitmask of the referenced sub-unit of the resource.
 using ResourceRef = std::pair<uint64_t, uint64_t>;
 
-using ResourceUse = std::pair<ResourceRef, ReleaseAtCycles>;
+using ResourceUse = std::pair<ResourceRef, NumCyclesUsed>;
 
 class HWInstructionIssuedEvent : public HWInstructionEvent {
 public:
diff --git a/llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h b/llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h
index 0e3f16d2a490b..6947b9f6ff2fd 100644
--- a/llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h
+++ b/llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h
@@ -430,7 +430,7 @@ class ResourceManager {
 
   void issueInstruction(
       const InstrDesc &Desc,
-      SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Pipes);
+      SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Pipes);
 
   void cycleEvent(SmallVectorImpl<ResourceRef> &ResourcesFreed);
 
diff --git a/llvm/include/llvm/MCA/HardwareUnits/Scheduler.h b/llvm/include/llvm/MCA/HardwareUnits/Scheduler.h
index 272f6b197868b..3acf142fae1ab 100644
--- a/llvm/include/llvm/MCA/HardwareUnits/Scheduler.h
+++ b/llvm/include/llvm/MCA/HardwareUnits/Scheduler.h
@@ -136,7 +136,7 @@ class Scheduler : public HardwareUnit {
   /// Issue an instruction without updating the ready queue.
   void issueInstructionImpl(
       InstRef &IR,
-      SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Pipes);
+      SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Pipes);
 
   // Identify instructions that have finished executing, and remove them from
   // the IssuedSet. References to executed instructions are added to input
@@ -202,7 +202,7 @@ class Scheduler : public HardwareUnit {
   /// result of this event.
   void issueInstruction(
       InstRef &IR,
-      SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Used,
+      SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Used,
       SmallVectorImpl<InstRef> &Pending,
       SmallVectorImpl<InstRef> &Ready);
 
diff --git a/llvm/include/llvm/MCA/Support.h b/llvm/include/llvm/MCA/Support.h
index e3c155435e234..d93043f673b46 100644
--- a/llvm/include/llvm/MCA/Support.h
+++ b/llvm/include/llvm/MCA/Support.h
@@ -48,12 +48,12 @@ template <typename T> char InstructionError<T>::ID;
 /// number of resources, are kept separate.  This is used by the
 /// ResourcePressureView to calculate the average resource cycles
 /// per instruction/iteration.
-class ReleaseAtCycles {
+class NumCyclesUsed {
   unsigned Numerator, Denominator;
 
 public:
-  ReleaseAtCycles() : Numerator(0), Denominator(1) {}
-  ReleaseAtCycles(unsigned Cycles, unsigned ResourceUnits = 1)
+  NumCyclesUsed() : Numerator(0), Denominator(1) {}
+  NumCyclesUsed(unsigned Cycles, unsigned ResourceUnits = 1)
       : Numerator(Cycles), Denominator(ResourceUnits) {}
 
   operator double() const {
@@ -67,7 +67,7 @@ class ReleaseAtCycles {
   // Add the components of RHS to this instance.  Instead of calculating
   // the final value here, we keep track of the numerator and denominator
   // separately, to reduce floating point error.
-  ReleaseAtCycles &operator+=(const ReleaseAtCycles &RHS);
+  NumCyclesUsed &operator+=(const NumCyclesUsed &RHS);
 };
 
 /// Populates vector Masks with processor resource masks.
@@ -105,7 +105,7 @@ inline unsigned getResourceStateIndex(uint64_t Mask) {
 /// Compute the reciprocal block throughput from a set of processor resource
 /// cycles. The reciprocal block throughput is computed as the MAX between:
 ///  - NumMicroOps / DispatchWidth
-///  - ProcReleaseAtCycles / #ProcResourceUnits  (for every consumed resource).
+///  - ProcNumCyclesUsed / #ProcResourceUnits  (for every consumed resource).
 double computeBlockRThroughput(const MCSchedModel &SM, unsigned DispatchWidth,
                                unsigned NumMicroOps,
                                ArrayRef<unsigned> ProcResourceUsage);
diff --git a/llvm/lib/MCA/HardwareUnits/ResourceManager.cpp b/llvm/lib/MCA/HardwareUnits/ResourceManager.cpp
index 8d99695f4c29e..d58e3e68cdd32 100644
--- a/llvm/lib/MCA/HardwareUnits/ResourceManager.cpp
+++ b/llvm/lib/MCA/HardwareUnits/ResourceManager.cpp
@@ -346,7 +346,7 @@ uint64_t ResourceManager::checkAvailability(const InstrDesc &Desc) const {
 
 void ResourceManager::issueInstruction(
     const InstrDesc &Desc,
-    SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Pipes) {
+    SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Pipes) {
   for (const std::pair<uint64_t, ResourceUsage> &R : Desc.Resources) {
     const CycleSegment &CS = R.second.CS;
     if (!CS.size()) {
@@ -354,13 +354,13 @@ void ResourceManager::issueInstruction(
       continue;
     }
 
-    assert(CS.begin() == 0 && "Invalid {Start, End} cycles!");
+    assert(CS.isValid() && "Invalid {Start, End} cycles!");
     if (!R.second.isReserved()) {
       ResourceRef Pipe = selectPipe(R.first);
       use(Pipe);
       BusyResources[Pipe] += CS.size();
-      Pipes.emplace_back(std::pair<ResourceRef, ReleaseAtCycles>(
-          Pipe, ReleaseAtCycles(CS.size())));
+      Pipes.emplace_back(std::pair<ResourceRef, NumCyclesUsed>(
+          Pipe, NumCyclesUsed(CS.size())));
     } else {
       assert((llvm::popcount(R.first) > 1) && "Expected a group!");
       // Mark this group as reserved.
diff --git a/llvm/lib/MCA/HardwareUnits/Scheduler.cpp b/llvm/lib/MCA/HardwareUnits/Scheduler.cpp
index a9bbf69799198..06013d4d0cfd5 100644
--- a/llvm/lib/MCA/HardwareUnits/Scheduler.cpp
+++ b/llvm/lib/MCA/HardwareUnits/Scheduler.cpp
@@ -69,7 +69,7 @@ Scheduler::Status Scheduler::isAvailable(const InstRef &IR) {
 
 void Scheduler::issueInstructionImpl(
     InstRef &IR,
-    SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &UsedResources) {
+    SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &UsedResources) {
   Instruction *IS = IR.getInstruction();
   const InstrDesc &D = IS->getDesc();
 
@@ -98,7 +98,7 @@ void Scheduler::issueInstructionImpl(
 // Release the buffered resources and issue the instruction.
 void Scheduler::issueInstruction(
     InstRef &IR,
-    SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &UsedResources,
+    SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &UsedResources,
     SmallVectorImpl<InstRef> &PendingInstructions,
     SmallVectorImpl<InstRef> &ReadyInstructions) {
   const Instruction &Inst = *IR.getInstruction();
diff --git a/llvm/lib/MCA/InstrBuilder.cpp b/llvm/lib/MCA/InstrBuilder.cpp
index 1a82e45763a26..f346368da5de8 100644
--- a/llvm/lib/MCA/InstrBuilder.cpp
+++ b/llvm/lib/MCA/InstrBuilder.cpp
@@ -89,11 +89,11 @@ static void initializeUsedResources(InstrDesc &ID,
       AllInOrderResources &= (PR.BufferSize <= 1);
     }
 
-    CycleSegment RCy(0, PRE->ReleaseAtCycle, false);
+    CycleSegment RCy(PRE->AcquireAtCycle, PRE->ReleaseAtCycle, false);
     Worklist.emplace_back(ResourcePlusCycles(Mask, ResourceUsage(RCy)));
     if (PR.SuperIdx) {
       uint64_t Super = ProcResourceMasks[PR.SuperIdx];
-      SuperResources[Super] += PRE->ReleaseAtCycle;
+      SuperResources[Super] += PRE->ReleaseAtCycle - PRE->AcquireAtCycle;
     }
   }
 
diff --git a/llvm/lib/MCA/Stages/InstructionTables.cpp b/llvm/lib/MCA/Stages/InstructionTables.cpp
index 937cc7da8de72..c4e83d00f4769 100644
--- a/llvm/lib/MCA/Stages/InstructionTables.cpp
+++ b/llvm/lib/MCA/Stages/InstructionTables.cpp
@@ -38,7 +38,7 @@ Error InstructionTables::execute(InstRef &IR) {
       for (unsigned I = 0, E = NumUnits; I < E; ++I) {
         ResourceRef ResourceUnit = std::make_pair(Index, 1U << I);
         UsedResources.emplace_back(
-            std::make_pair(ResourceUnit, ReleaseAtCycles(Cycles, NumUnits)));
+            std::make_pair(ResourceUnit, NumCyclesUsed(Cycles, NumUnits)));
       }
       continue;
     }
@@ -54,7 +54,7 @@ Error InstructionTables::execute(InstRef &IR) {
         ResourceRef ResourceUnit = std::make_pair(SubUnitIdx, 1U << I2);
         UsedResources.emplace_back(std::make_pair(
             ResourceUnit,
-            ReleaseAtCycles(Cycles, NumUnits * SubUnit.NumUnits)));
+            NumCyclesUsed(Cycles, NumUnits * SubUnit.NumUnits)));
       }
     }
   }
diff --git a/llvm/lib/MCA/Support.cpp b/llvm/lib/MCA/Support.cpp
index f8b8a2d129c1c..849a0c7aa01f6 100644
--- a/llvm/lib/MCA/Support.cpp
+++ b/llvm/lib/MCA/Support.cpp
@@ -21,7 +21,7 @@ namespace mca {
 
 #define DEBUG_TYPE "llvm-mca"
 
-ReleaseAtCycles &ReleaseAtCycles::operator+=(const ReleaseAtCycles &RHS) {
+NumCyclesUsed &NumCyclesUsed::operator+=(const NumCyclesUsed &RHS) {
   if (Denominator == RHS.Denominator)
     Numerator += RHS.Numerator;
   else {
@@ -92,18 +92,18 @@ double computeBlockRThroughput(const MCSchedModel &SM, unsigned DispatchWidth,
   // The number of available resource units affects the resource pressure
   // distribution, as well as how many blocks can be executed every cycle.
   for (unsigned I = 0, E = SM.getNumProcResourceKinds(); I < E; ++I) {
-    unsigned ReleaseAtCycles = ProcResourceUsage[I];
-    if (!ReleaseAtCycles)
+    unsigned NumCyclesUsed = ProcResourceUsage[I];
+    if (!NumCyclesUsed)
       continue;
 
     const MCProcResourceDesc &MCDesc = *SM.getProcResource(I);
-    double Throughput = static_cast<double>(ReleaseAtCycles) / MCDesc.NumUnits;
+    double Throughput = static_cast<double>(NumCyclesUsed) / MCDesc.NumUnits;
     Max = std::max(Max, Throughput);
   }
 
   // The block reciprocal throughput is computed as the MAX of:
   //  - (NumMicroOps / DispatchWidth)
-  //  - (NumUnits / ReleaseAtCycles)   for every consumed processor resource.
+  //  - (NumUnits / NumCyclesUsed)   for every consumed processor resource.
   return Max;
 }
 
diff --git a/llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s b/llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s
index a6b756ba8151b..6ffb2bd1a23d1 100644
--- a/llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s
+++ b/llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s
@@ -223,13 +223,13 @@ vfredmin.vs  v4, v8, v12
 
 # CHECK:      Iterations:        1
 # CHECK-NEXT: Instructions:      206
-# CHECK-NEXT: Total Cycles:      8746
+# CHECK-NEXT: Total Cycles:      8644
 # CHECK-NEXT: Total uOps:        206
 
 # CHECK:      Dispatch Width:    2
 # CHECK-NEXT: uOps Per Cycle:    0.02
 # CHECK-NEXT: IPC:               0.02
-# CHECK-NEXT: Block RThroughput: 8743.0
+# CHECK-NEXT: Block RThroughput: 8640.0
 
 # CHECK:      Instruction Info:
 # CHECK-NEXT: [1]: #uOps
@@ -459,213 +459,213 @@ vfredmin.vs  v4, v8, v12
 
 # CHECK:      Resource pressure per iteration:
 # CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]
-# CHECK-NEXT:  -      -     103.00  -     8743.00 103.00  -     -
+# CHECK-NEXT:  -      -     103.00  -     8640.00 103.00  -     -
 
 # CHECK:      Resource pressure by instruction:
 # CHECK-NEXT: [0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    Instructions:
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf8, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf4, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     48.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     50.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     49.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     54.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     53.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     62.00  1.00    -      -     vredsum.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     61.00  1.00    -      -     vredsum.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, mf4, tu, mu
-# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     41.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     41.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     43.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     45.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     44.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     49.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     48.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     57.00  1.00    -      -     vredand.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     56.00  1.00    -      -     vredand.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     37.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     36.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     38.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     37.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     40.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     39.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     44.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     43.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e32, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     52.00  1.00    -      -     vredor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     51.00  1.00    -      -     vredor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e64, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     33.00  1.00    -      -     vredxor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     32.00  1.00    -      -     vredxor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e64, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     35.00  1.00    -      -     vredxor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     34.00  1.00    -      -     vredxor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e64, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     39.00  1.00    -      -     vredxor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     38.00  1.00    -      -     vredxor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e64, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredxor.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredxor.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf8, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf4, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     46.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m1, tu, mu
-# CHECK-NEXT:  -      -      -      -     48.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     47.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m2, tu, mu
-# CHECK-NEXT:  -      -      -      -     50.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     49.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m4, tu, mu
-# CHECK-NEXT:  -      -      -      -     54.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     53.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e8, m8, tu, mu
-# CHECK-NEXT:  -      -      -      -     62.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     61.00  1.00    -      -     vredmaxu.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, mf4, tu, mu
-# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredmax.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     41.00  1.00    -      -     vredmax.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, mf2, tu, mu
-# CHECK-NEXT:  -      -      -      -     42.00  1.00    -      -     vredmax.vs	v4, v8, v12
+# CHECK-NEXT:  -      -      -      -     41.00  1.00    -      -     vredmax.vs	v4, v8, v12
 # CHECK-NEXT:  -      -     1.00    -      -      -      -      -     vsetvli	zero, zero, e16, m1, tu, mu
-# CHECK-NEXT:  -      -     ...
[truncated]

github-actions · 2024-02-05T21:10:54Z

✅ With the latest revision this PR passed the C/C++ code formatter.

wangpc-pp

LGTM.
I don't see any problem here, but please wait for @fpetrogalli to see if there are any comments.

fpetrogalli · 2024-02-06T08:43:00Z

In the past, there was a variable named ResourceCycles which modeled how long a resource was consumed. Then a variable named AcquireAtCycles was added and allowed the scheduler to specify how many cycles after "cycle 0" the resource was acquired. It was consumed from that AcquireAtCycle until the cycle ResourceCycles - AcquireAtCycle.

Nit: I think you should rephrase as follows:

[...] the scheduler to specify how many cycles after issuing the instruction the resource was acquired. Relatively to the issue cycle, the resource was consumed from that AcquireAtCycle until the cycle ResourceCycles.

The reason is that with the introduction of AcquireAtCycle, if you mark as c the issue cycle, you end up having the following resource usage (marked with x):

| c | c+ 1| ... | c+AcquireAtCycle| ... | c+ReleaseAtCycle |
+---+-----+-----+-----------------+-----+------------------+
|   |     |     |xxxxxxxxxxxxxxxxx|xxxxx|xxxxxxxxxxxxxxxxxx|

Therefore, when you say how many cycle after "cycle 0",you refer "cycle 0" to be the issue cycle, which means that the resource is taken until ResourceCycles, not until ResourceCycles - AcquireAtCycles. (notice that I am assuming ReleaseAtCycle == ResourceCycle)

fpetrogalli · 2024-02-06T08:44:27Z

llvm/include/llvm/MCA/HardwareUnits/ResourceManager.h

@@ -430,7 +430,7 @@ class ResourceManager {

  void issueInstruction(
      const InstrDesc &Desc,
-      SmallVectorImpl<std::pair<ResourceRef, ReleaseAtCycles>> &Pipes);
+      SmallVectorImpl<std::pair<ResourceRef, NumCyclesUsed>> &Pipes);


How comes that LLVM-MCA does not need to know about the AcquireAtCycle?

NumCycleUsed is based on the ReleaseAtCycle - AcquireAtCycle. In that sense, it does know about AcquireAtCycle.

I think what you are pointing out here is that the current implementation will use the resource for the correct number of cycles now, but it ignores when it can start acquiring the resource relative to when the instruction is issued. To demonstrate, say an instruction X uses resource A with (Acquire, Release) = (0, 2) and resource B with (Acquire, Release) = (1, 2). This patch will model that resource A is used for 2 cycles and resource B for 1 cycle, but it does not model that we have to wait one cycle before consuming resource B.

I should look into adding this support. Thanks for catching this!

As Michael wrote, this patch doesn't add support for AcquireAtCycle.

The difficult part will be refactoring the ResourceManager logic so to keep track of multiple AcquireAtCycle requests, and quickly resolve checks on whether resources are busy or not.
Changing that logic might not be too complicated. However, doing it without slowing down the process too much might be challenging.

At the moment, mca doesn't know about the AcquireAtCycles, so this will be a problem for future patches.

fpetrogalli · 2024-02-06T08:46:32Z

llvm/test/tools/llvm-mca/RISCV/SiFive7/strided-load-x0.s

@@ -37,13 +37,13 @@ vle64.v v1, (a1)

 # CHECK:      Iterations:        1
 # CHECK-NEXT: Instructions:      26
-# CHECK-NEXT: Total Cycles:      234
+# CHECK-NEXT: Total Cycles:      211


I would expect the fix for llvm-mca to be a NFC for its output, at least for all those scheduling model that are already existing and that by default have AcquireAtCycle = 0.

How do you explain this new values?

I think llvm-mca has over estimated the cycles before. With this change, the cycles of RVV instrcctions should reduce by 1 cycle (for SiFive7VCQ), so here the total cycles reduce 23 cycles (there are 24 RVV instrcutions and extra 1 cycle for first RVV instruction?).

I think the changing of the cycles makes sense since AcquireAtCycle != 0 for the SiFive7 scheduler model. You can see this here for some of the vector alu operations. llvm-mca thought that the arithmetic/load/store sequencer (VA,VS, or VL) was used for one extra cycle than it should have been since AcquireAtCycle = 1 for those resources.

fpetrogalli

Thank you for the patch @michaelmaitland! When talking about the AcquireAtCYcle/ReleaseAtCycle change with @adibiagio , he mentioned that llvm-mca needed updating to account for this.

Unfortunately I do now know enough about LLVM-MCA to be able to tell whether what you are doing is correct or not.

What puzzles me are the changes in the tests, because I was expecting a NFC for the rework.

I kindly ask you to seek @adibiagio feedback for moving forward.

adibiagio · 2024-02-07T10:06:26Z

Thank you for the patch @michaelmaitland! When talking about the AcquireAtCYcle/ReleaseAtCycle change with @adibiagio , he mentioned that llvm-mca needed updating to account for this.

Unfortunately I do now know enough about LLVM-MCA to be able to tell whether what you are doing is correct or not.

What puzzles me are the changes in the tests, because I was expecting a NFC for the rework.

I kindly ask you to seek @adibiagio feedback for moving forward.

I'll review the patch today :-).

Thanks @michaelmaitland for working on this!

adibiagio

LGTM

michaelmaitland · 2024-02-27T16:53:30Z

llvm/lib/MCA/InstrBuilder.cpp

@@ -89,11 +89,11 @@ static void initializeUsedResources(InstrDesc &ID,
      AllInOrderResources &= (PR.BufferSize <= 1);
    }

-    CycleSegment RCy(0, PRE->ReleaseAtCycle, false);
+    CycleSegment RCy(PRE->AcquireAtCycle, PRE->ReleaseAtCycle, false);


This code here made me wonder if we are accounting for AcquireAtCycle correctly.

The NumCyclesUsed class may only represent the total number of cycles used, without saying anything about when the resource is acquired or released. CycleSegment class looks like it is the one who is responsible for taking this information into account. After looking into it, I found that although this information is being tracked, it is not being used as needed:

In ResourceManager::issueInstruction, we say BusyResources[Pipe] += CS.size(), which is a map from pipe to how many cycles are left before the pipe is usable again. The problem is that the this map data structure does not model the fact that a resource can be free from issue cycle until acquire at cycle. My thought is to add another map that tracks this data, and in conjunction with BusyResources we can tell the full story.

The other part that needs some fixing is that reserveResource is called in issueInstruction. We need to model that a resource may not need to be reserved on instruction issue, but on acquire at cycles after instruction issue. It may be a good idea to move the call to reserveResource from ResourceManager::issueInstruction to ResourceManager::cycleEvent.

I will take a look into this once I get some bandwidth. I am leaving this comment here and marking this PR as needs changes in the meantime.

EDIT: It turns out the creator of a PR cannot request changes.

This code here made me wonder if we are accounting for AcquireAtCycle correctly.

The NumCyclesUsed class may only represent the total number of cycles used, without saying anything about when the resource is acquired or released. CycleSegment class looks like it is the one who is responsible for taking this information into account. After looking into it, I found that although this information is being tracked, it is not being used as needed

Correct.

If you see my comment from three weeks ago, this is literally what I was trying to say when I wrote "this patch doesn't add support for AcquireAtCycle".
The ResourceManager still assumes that every resource is consumed at relative cycle zero (i.e. relative to the issue event).

I missed your comment above. I think your point is that we need to be able to do this without slowing things down too much. I have a rough idea on how me might do this in my head, but I am not sure when I will have time to pick this task back up again, since it is much less straightforward than I had for the original patch.

I missed your comment above. I think your point is that we need to be able to do this without slowing things down too much. I have a rough idea on how me might do this in my head, but I am not sure when I will have time to pick this task back up again, since it is much less straightforward than I had for the original patch.

No problem. It is something that I always wanted to do in my spare time but never ended up doing.
Redesigning the ResourceManager is definitely not straightforward; the simulation time is heavily dependent on how quickly it replies to individual dispatch/issue requests.
It won't be easy to find a design which doesn't hurt performance too much. In the end we will inevitably have to compromise a bit between simulation speed and simulation accuracy. That is expected.

The complexity of the new design also depends on how many pending CycleSegments can be "issued" for every resource. I am mainly thinking about out-of-order processors where multiple delayed resource consumptions can be pending at the same time. Not sure if that makes sense...

wangpc-pp

It turns out the creator of a PR cannot request changes.

:-)

michaelmaitland requested review from mshockwave, topperc, fpetrogalli, adibiagio and wangpc-pp February 5, 2024 21:08

llvmbot added the tools:llvm-mca label Feb 5, 2024

fixup! clang-format

49d97df

wangpc-pp approved these changes Feb 6, 2024

View reviewed changes

fpetrogalli reviewed Feb 6, 2024

View reviewed changes

adibiagio approved these changes Feb 7, 2024

View reviewed changes

michaelmaitland commented Feb 27, 2024

View reviewed changes

wangpc-pp requested changes Feb 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llvm-mca] Account for AcquireAtCycles in llvm-mca #80742

[llvm-mca] Account for AcquireAtCycles in llvm-mca #80742

michaelmaitland commented Feb 5, 2024 •

edited

llvmbot commented Feb 5, 2024

github-actions bot commented Feb 5, 2024 •

edited

wangpc-pp left a comment

fpetrogalli commented Feb 6, 2024

fpetrogalli Feb 6, 2024

michaelmaitland Feb 6, 2024

adibiagio Feb 7, 2024

fpetrogalli Feb 6, 2024

wangpc-pp Feb 6, 2024

michaelmaitland Feb 6, 2024

fpetrogalli left a comment

adibiagio commented Feb 7, 2024

adibiagio left a comment

michaelmaitland Feb 27, 2024 •

edited

adibiagio Feb 29, 2024

michaelmaitland Feb 29, 2024

adibiagio Feb 29, 2024

wangpc-pp left a comment

[llvm-mca] Account for AcquireAtCycles in llvm-mca #80742

Are you sure you want to change the base?

[llvm-mca] Account for AcquireAtCycles in llvm-mca #80742

Conversation

michaelmaitland commented Feb 5, 2024 • edited

llvmbot commented Feb 5, 2024

github-actions bot commented Feb 5, 2024 • edited

wangpc-pp left a comment

Choose a reason for hiding this comment

fpetrogalli commented Feb 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetrogalli left a comment

Choose a reason for hiding this comment

adibiagio commented Feb 7, 2024

adibiagio left a comment

Choose a reason for hiding this comment

michaelmaitland Feb 27, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangpc-pp left a comment

Choose a reason for hiding this comment

michaelmaitland commented Feb 5, 2024 •

edited

github-actions bot commented Feb 5, 2024 •

edited

michaelmaitland Feb 27, 2024 •

edited