[RISCV] Remove SiFive7PipeV and replace it with SiFive7VCQ #73969

michaelmaitland · 2023-11-30T18:56:23Z

The Arithmetic, Load, and Store sequencers can accept instructions in parallel. The PipeV blocked that from happening since it became busy if any of the sequencers were busy. This change allows the sequencers to accept instructions in parallel.

The VCQ accepts instructions from the the A Pipe and holds them until the vector unit is ready to dequeue them. The unit dequeues up to one instruction per cycle, in order, as soon as the sequencer for that type of instruction is avaliable. This resource is meant to be used for 1 cycle by all vector instructions, to model that only one vector instruction may be dequed at a time. The actual dequeueing into the sequencer is modeled by the VA, VL, and VS sequencer resources below. Each of them will only accept a single instruction at a time and remain busy for the number of cycles associated with that instruction.

llvmbot · 2023-11-30T18:56:47Z

@llvm/pr-subscribers-backend-risc-v

Author: Michael Maitland (michaelmaitland)

Changes

The Arithmetic, Load, and Store sequencers can accept instructions in parallel. The PipeV blocked that from happening since it became busy if any of the sequencers were busy. This change allows the sequencers to accept instructions in parallel.

The VCQ accepts instructions from the the A Pipe and holds them until the vector unit is ready to dequeue them. The unit dequeues up to one instruction per cycle, in order, as soon as the sequencer for that type of instruction is avaliable. This resource is meant to be used for 1 cycle by all vector instructions, to model that only one vector instruction may be dequed at a time. The actual dequeueing into the sequencer is modeled by the VA, VL, and VS sequencer resources below. Each of them will only accept a single instruction at a time and remain busy for the number of cycles associated with that instruction.

Patch is 465.37 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73969.diff

28 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVSchedSiFive7.td (+242-233)
(modified) llvm/lib/Target/RISCV/RISCVScheduleV.td (+10-6)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/gpr-bypass-c.s (+2-2)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/gpr-bypass.s (+2-2)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/reductions.s (+211-211)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/strided-load-x0.s (+53-53)
(modified) llvm/test/tools/llvm-mca/RISCV/SiFive7/vector-integer-arithmetic.s (+749-749)
(modified) llvm/test/tools/llvm-mca/RISCV/different-lmul-instruments.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/different-sew-instruments.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/disable-im.s (+20-20)
(modified) llvm/test/tools/llvm-mca/RISCV/fractional-lmul-data.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-at-start.s (+6-6)
(modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-in-middle.s (+13-13)
(modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-in-region.s (+6-6)
(modified) llvm/test/tools/llvm-mca/RISCV/lmul-instrument-straddles-region.s (+6-6)
(modified) llvm/test/tools/llvm-mca/RISCV/multiple-same-lmul-instruments.s (+26-26)
(modified) llvm/test/tools/llvm-mca/RISCV/multiple-same-sew-instruments.s (+15-15)
(modified) llvm/test/tools/llvm-mca/RISCV/needs-sew-but-only-lmul.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/no-vsetvli-to-start.s (+13-13)
(modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-at-start.s (+6-6)
(modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-in-middle.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-in-region.s (+6-6)
(modified) llvm/test/tools/llvm-mca/RISCV/sew-instrument-straddles-region.s (+6-6)
(modified) llvm/test/tools/llvm-mca/RISCV/vle-vse.s (+407-407)
(modified) llvm/test/tools/llvm-mca/RISCV/vsetivli-lmul-instrument.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/vsetivli-lmul-sew-instrument.s (+9-9)
(modified) llvm/test/tools/llvm-mca/RISCV/vsetvli-lmul-instrument.s (+8-8)
(modified) llvm/test/tools/llvm-mca/RISCV/vsetvli-lmul-sew-instrument.s (+9-9)

diff --git a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
index 53ef9d1baf7b59a..6962546de7e0c48 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
@@ -208,20 +208,29 @@ def SiFive7Model : SchedMachineModel {
 // Pipe A can handle memory, integer alu and vector operations.
 // Pipe B can handle integer alu, control flow, integer multiply and divide,
 // and floating point computation.
-// Pipe V can handle the V extension.
+// The V pipeline is modeled by the VCQ, VA, VL, and VS resources.
 let SchedModel = SiFive7Model in {
 let BufferSize = 0 in {
 def SiFive7PipeA       : ProcResource<1>;
 def SiFive7PipeB       : ProcResource<1>;
-def SiFive7PipeV       : ProcResource<1>;
+def SiFive7VA          : ProcResource<1>; // Arithmetic sequencer
+def SiFive7VL          : ProcResource<1>; // Load sequencer
+def SiFive7VS          : ProcResource<1>; // Store sequencer
+// The VCQ accepts instructions from the the A Pipe and holds them until the
+// vector unit is ready to dequeue them. The unit dequeues up to one instruction
+// per cycle, in order, as soon as the sequencer for that type of instruction is
+// avaliable. This resource is meant to be used for 1 cycle by all vector
+// instructions, to model that only one vector instruction may be dequed at a
+// time. The actual dequeueing into the sequencer is modeled by the VA, VL, and
+// VS sequencer resources below. Each of them will only accept a single
+// instruction at a time and remain busy for the number of cycles associated
+// with that instruction.
+def SiFive7VCQ         : ProcResource<1>; // Vector Command Queue
 }
 
 let BufferSize = 1 in {
 def SiFive7IDiv        : ProcResource<1> { let Super = SiFive7PipeB; } // Int Division
 def SiFive7FDiv        : ProcResource<1> { let Super = SiFive7PipeB; } // FP Division/Sqrt
-def SiFive7VA          : ProcResource<1> { let Super = SiFive7PipeV; } // Arithmetic sequencer
-def SiFive7VL          : ProcResource<1> { let Super = SiFive7PipeV; } // Load sequencer
-def SiFive7VS          : ProcResource<1> { let Super = SiFive7PipeV; } // Store sequencer
 }
 
 def SiFive7PipeAB : ProcResGroup<[SiFive7PipeA, SiFive7PipeB]>;
@@ -433,21 +442,21 @@ def : WriteRes<WriteVSETVL, [SiFive7PipeA]>;
 foreach mx = SchedMxList in {
   defvar Cycles = SiFive7GetCyclesDefault<mx>.c;
   defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  let Latency = 4, ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVLDE",    [SiFive7VL], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVLDFF",   [SiFive7VL], mx, IsWorstCase>;
+  let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVLDE",    [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDFF",   [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
   }
-  let Latency = 1, ReleaseAtCycles = [Cycles] in
-  defm "" : LMULWriteResMX<"WriteVSTE",    [SiFive7VS], mx, IsWorstCase>;
+  let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
+  defm "" : LMULWriteResMX<"WriteVSTE",    [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
 }
 
 foreach mx = SchedMxList in {
   defvar Cycles = SiFive7GetMaskLoadStoreCycles<mx>.c;
   defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  let Latency = 4, ReleaseAtCycles = [Cycles] in
-  defm "" : LMULWriteResMX<"WriteVLDM",    [SiFive7VL], mx, IsWorstCase>;
-  let Latency = 1, ReleaseAtCycles = [Cycles] in
-  defm "" : LMULWriteResMX<"WriteVSTM",    [SiFive7VS], mx, IsWorstCase>;
+  let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
+  defm "" : LMULWriteResMX<"WriteVLDM",    [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+  let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
+  defm "" : LMULWriteResMX<"WriteVSTM",    [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
 }
 
 // Strided loads and stores operate at one element per cycle and should be
@@ -466,17 +475,17 @@ foreach mx = SchedMxList in {
   defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
   defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 8>.c;
   defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS8",  VLDSX0Pred, [SiFive7VL],
-                                       4, [VLDSX0Cycles], !add(3, Cycles),
-                                       [Cycles], mx, IsWorstCase>;
-  let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVLDUX8", [SiFive7VL], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVLDOX8", [SiFive7VL], mx, IsWorstCase>;
+  defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS8",  VLDSX0Pred, [SiFive7VCQ, SiFive7VL],
+                                       4, [0, 1], [1, !add(1, VLDSX0Cycles)], !add(3, Cycles),
+                                       [0, 1], [1, !add(1, Cycles)], mx, IsWorstCase>;
+  let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVLDUX8", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDOX8", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
   }
-  let Latency = 1, ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVSTS8",  [SiFive7VS], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVSTUX8", [SiFive7VS], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVSTOX8", [SiFive7VS], mx, IsWorstCase>;
+  let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVSTS8",  [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTUX8", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTOX8", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
   }
 }
 // TODO: The MxLists need to be filtered by EEW. We only need to support
@@ -486,72 +495,72 @@ foreach mx = ["MF4", "MF2", "M1", "M2", "M4", "M8"] in {
   defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
   defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 16>.c;
   defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS16",  VLDSX0Pred, [SiFive7VL],
-                                       4, [VLDSX0Cycles], !add(3, Cycles),
-                                       [Cycles], mx, IsWorstCase>;
-  let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVLDUX16", [SiFive7VL], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVLDOX16", [SiFive7VL], mx, IsWorstCase>;
+  defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS16",  VLDSX0Pred, [SiFive7VCQ, SiFive7VL],
+                                       4, [0, 1], [1, !add(1, VLDSX0Cycles)], !add(3, Cycles),
+                                       [0, 1], [1, !add(1, Cycles)], mx, IsWorstCase>;
+  let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVLDUX16", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDOX16", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
   }
-  let Latency = 1, ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVSTS16",  [SiFive7VS], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVSTUX16", [SiFive7VS], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVSTOX16", [SiFive7VS], mx, IsWorstCase>;
+  let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVSTS16",  [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTUX16", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTOX16", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
   }
 }
 foreach mx = ["MF2", "M1", "M2", "M4", "M8"] in {
   defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
   defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 32>.c;
   defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS32",  VLDSX0Pred, [SiFive7VL],
-                                       4, [VLDSX0Cycles], !add(3, Cycles),
-                                       [Cycles], mx, IsWorstCase>;
-  let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVLDUX32", [SiFive7VL], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVLDOX32", [SiFive7VL], mx, IsWorstCase>;
+  defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS32",  VLDSX0Pred, [SiFive7VCQ, SiFive7VL],
+                                       4, [0, 1], [1, !add(1, VLDSX0Cycles)], !add(3, Cycles),
+                                       [0, 1], [1, !add(1, Cycles)], mx, IsWorstCase>;
+  let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVLDUX32", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDOX32", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
   }
-  let Latency = 1, ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVSTS32",  [SiFive7VS], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVSTUX32", [SiFive7VS], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVSTOX32", [SiFive7VS], mx, IsWorstCase>;
+  let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVSTS32",  [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTUX32", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTOX32", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
   }
 }
 foreach mx = ["M1", "M2", "M4", "M8"] in {
   defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
   defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 64>.c;
   defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS64",  VLDSX0Pred, [SiFive7VL],
-                                       4, [VLDSX0Cycles], !add(3, Cycles),
-                                       [Cycles], mx, IsWorstCase>;
-  let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVLDUX64", [SiFive7VL], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVLDOX64", [SiFive7VL], mx, IsWorstCase>;
+  defm SiFive7 : LMULWriteResMXVariant<"WriteVLDS64",  VLDSX0Pred, [SiFive7VCQ, SiFive7VL],
+                                       4, [0, 1], [1, !add(1, VLDSX0Cycles)], !add(3, Cycles),
+                                       [0, 1], [1, !add(1, Cycles)], mx, IsWorstCase>;
+  let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVLDUX64", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVLDOX64", [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
   }
-  let Latency = 1, ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVSTS64",  [SiFive7VS], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVSTUX64", [SiFive7VS], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVSTOX64", [SiFive7VS], mx, IsWorstCase>;
+  let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVSTS64",  [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTUX64", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVSTOX64", [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
   }
 }
 
 // VLD*R is LMUL aware
-let Latency = 4, ReleaseAtCycles = [2] in
-  def : WriteRes<WriteVLD1R,  [SiFive7VL]>;
-let Latency = 4, ReleaseAtCycles = [4] in
-  def : WriteRes<WriteVLD2R,  [SiFive7VL]>;
-let Latency = 4, ReleaseAtCycles = [8] in
-  def : WriteRes<WriteVLD4R,  [SiFive7VL]>;
-let Latency = 4, ReleaseAtCycles = [16] in
-  def : WriteRes<WriteVLD8R,  [SiFive7VL]>;
+let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, 2)] in
+  def : WriteRes<WriteVLD1R,  [SiFive7VCQ, SiFive7VL]>;
+let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, 4)] in
+  def : WriteRes<WriteVLD2R,  [SiFive7VCQ, SiFive7VL]>;
+let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, 8)] in
+  def : WriteRes<WriteVLD4R,  [SiFive7VCQ, SiFive7VL]>;
+let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, 16)] in
+  def : WriteRes<WriteVLD8R,  [SiFive7VCQ, SiFive7VL]>;
 // VST*R is LMUL aware
-let Latency = 1, ReleaseAtCycles = [2] in
-  def : WriteRes<WriteVST1R,   [SiFive7VS]>;
-let Latency = 1, ReleaseAtCycles = [4] in
-  def : WriteRes<WriteVST2R,   [SiFive7VS]>;
-let Latency = 1, ReleaseAtCycles = [8] in
-  def : WriteRes<WriteVST4R,   [SiFive7VS]>;
-let Latency = 1, ReleaseAtCycles = [16] in
-  def : WriteRes<WriteVST8R,   [SiFive7VS]>;
+let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, 2)] in
+  def : WriteRes<WriteVST1R,   [SiFive7VCQ, SiFive7VS]>;
+let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, 4)] in
+  def : WriteRes<WriteVST2R,   [SiFive7VCQ, SiFive7VS]>;
+let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, 8)] in
+  def : WriteRes<WriteVST4R,   [SiFive7VCQ, SiFive7VS]>;
+let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, 16)] in
+  def : WriteRes<WriteVST8R,   [SiFive7VCQ, SiFive7VS]>;
 
 // Segmented Loads and Stores
 // Unit-stride segmented loads and stores are effectively converted into strided
@@ -564,22 +573,22 @@ foreach mx = SchedMxList in {
     defvar Cycles = SiFive7GetCyclesSegmentedSeg2<mx>.c;
     defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
     // Does not chain so set latency high
-    let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
-      defm "" : LMULWriteResMX<"WriteVLSEG2e" # eew,   [SiFive7VL], mx, IsWorstCase>;
-      defm "" : LMULWriteResMX<"WriteVLSEGFF2e" # eew, [SiFive7VL], mx, IsWorstCase>;
+    let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+      defm "" : LMULWriteResMX<"WriteVLSEG2e" # eew,   [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+      defm "" : LMULWriteResMX<"WriteVLSEGFF2e" # eew, [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
     }
-    let Latency = 1, ReleaseAtCycles = [Cycles] in
-    defm "" : LMULWriteResMX<"WriteVSSEG2e" # eew,   [SiFive7VS], mx, IsWorstCase>;
+    let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
+    defm "" : LMULWriteResMX<"WriteVSSEG2e" # eew,   [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
     foreach nf=3-8 in {
       defvar Cycles = SiFive7GetCyclesSegmented<mx, eew, nf>.c;
       defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
       // Does not chain so set latency high
-      let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
-        defm "" : LMULWriteResMX<"WriteVLSEG" # nf # "e" # eew,   [SiFive7VL], mx, IsWorstCase>;
-        defm "" : LMULWriteResMX<"WriteVLSEGFF" # nf # "e" # eew, [SiFive7VL], mx, IsWorstCase>;
+      let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+        defm "" : LMULWriteResMX<"WriteVLSEG" # nf # "e" # eew,   [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVLSEGFF" # nf # "e" # eew, [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
       }
-      let Latency = 1, ReleaseAtCycles = [Cycles] in
-      defm "" : LMULWriteResMX<"WriteVSSEG" # nf # "e" # eew,   [SiFive7VS], mx, IsWorstCase>;
+      let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
+      defm "" : LMULWriteResMX<"WriteVSSEG" # nf # "e" # eew,   [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
     }
   }
 }
@@ -589,15 +598,15 @@ foreach mx = SchedMxList in {
       defvar Cycles = SiFive7GetCyclesSegmented<mx, eew, nf>.c;
       defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
       // Does not chain so set latency high
-      let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
-        defm "" : LMULWriteResMX<"WriteVLSSEG" # nf # "e" # eew,  [SiFive7VL], mx, IsWorstCase>;
-        defm "" : LMULWriteResMX<"WriteVLUXSEG" # nf # "e" # eew, [SiFive7VL], mx, IsWorstCase>;
-        defm "" : LMULWriteResMX<"WriteVLOXSEG" # nf # "e" # eew, [SiFive7VL], mx, IsWorstCase>;
+      let Latency = !add(3, Cycles), AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+        defm "" : LMULWriteResMX<"WriteVLSSEG" # nf # "e" # eew,  [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVLUXSEG" # nf # "e" # eew, [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVLOXSEG" # nf # "e" # eew, [SiFive7VCQ, SiFive7VL], mx, IsWorstCase>;
       }
-      let Latency = 1, ReleaseAtCycles = [Cycles] in {
-        defm "" : LMULWriteResMX<"WriteVSSSEG" # nf # "e" # eew,  [SiFive7VS], mx, IsWorstCase>;
-        defm "" : LMULWriteResMX<"WriteVSUXSEG" # nf # "e" # eew, [SiFive7VS], mx, IsWorstCase>;
-        defm "" : LMULWriteResMX<"WriteVSOXSEG" # nf # "e" # eew, [SiFive7VS], mx, IsWorstCase>;
+      let Latency = 1, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+        defm "" : LMULWriteResMX<"WriteVSSSEG" # nf # "e" # eew,  [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVSUXSEG" # nf # "e" # eew, [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
+        defm "" : LMULWriteResMX<"WriteVSOXSEG" # nf # "e" # eew, [SiFive7VCQ, SiFive7VS], mx, IsWorstCase>;
       }
     }
   }
@@ -607,41 +616,41 @@ foreach mx = SchedMxList in {
 foreach mx = SchedMxList in {
   defvar Cycles = SiFive7GetCyclesDefault<mx>.c;
   defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
-  let Latency = 4, ReleaseAtCycles = [Cycles] in {
-    defm "" : LMULWriteResMX<"WriteVIALUV",     [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIALUX",     [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIALUI",     [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVICALUV",    [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVICALUX",    [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVICALUI",    [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVShiftV",    [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVShiftX",    [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVShiftI",    [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMinMaxV",  [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMinMaxX",  [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMulV",     [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMulX",     [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMulAddV",  [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMulAddX",  [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMergeV",   [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMergeX",   [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMergeI",   [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMovV",     [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMovX",     [SiFive7VA], mx, IsWorstCase>;
-    defm "" : LMULWriteResMX<"WriteVIMovI",     [SiFive7VA], mx, IsWorstCase>;
+  let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+    defm "" : LMULWriteResMX<"WriteVIALUV",     [SiFive7VCQ, SiFive7VA], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIALUX",     [SiFive7VCQ, SiFive7VA], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVIALUI",     [SiFive7VCQ, SiFive7VA], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVICALUV",    [SiFive7VCQ, SiFive7VA], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVICALUX",    [SiFive7VCQ, SiFive7VA], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVICALUI",    [SiFive7VCQ, SiFive7VA], mx, IsWorstCase>;
+    defm "" : LMULWriteResMX<"WriteVShiftV",    [SiFive7VCQ, SiFive7VA], mx, IsWorstCase>;
+    ...
[truncated]

wangpc-pp · 2023-12-01T04:03:15Z

I don't know the details about the implementation but the changes LGTM.
Have you tried to evaluate it on FPGA/chip and to see the improvements? I think these model changes will make the vector pipeline more efficient.

michaelmaitland · 2023-12-01T16:50:15Z

Have you tried to evaluate it on FPGA/chip and to see the improvements?

We have evaluated on spec2006 and saw improvements. This change is important to take advantage of the parallelism in the vector pipeline.

llvm/lib/Target/RISCV/RISCVSchedSiFive7.td

llvm/test/tools/llvm-mca/RISCV/SiFive7/strided-load-x0.s

wangpc-pp

LGTM.

The Arithmetic, Load, and Store sequencers can accept instructions in parallel. The PipeV blocked that from happening since it became busy if any of the sequencers were busy. This change allows the sequencers to accept instructions in parallel. The VCQ accepts instructions from the the A Pipe and holds them until the vector unit is ready to dequeue them. The unit dequeues up to one instruction per cycle, in order, as soon as the sequencer for that type of instruction is avaliable. This resource is meant to be used for 1 cycle by all vector instructions, to model that only one vector instruction may be dequed at a time. The actual dequeueing into the sequencer is modeled by the VA, VL, and VS sequencer resources below. Each of them will only accept a single instruction at a time and remain busy for the number of cycles associated with that instruction.

michaelmaitland requested review from topperc and wangpc-pp November 30, 2023 18:56

michaelmaitland added the backend:RISC-V label Nov 30, 2023

wangpc-pp reviewed Dec 4, 2023

View reviewed changes

llvm/lib/Target/RISCV/RISCVSchedSiFive7.td Show resolved Hide resolved

llvm/test/tools/llvm-mca/RISCV/SiFive7/strided-load-x0.s Show resolved Hide resolved

wangpc-pp approved these changes Dec 4, 2023

View reviewed changes

michaelmaitland force-pushed the sifive7-vcq branch from 9ddc5b4 to f155f7a Compare December 4, 2023 18:58

michaelmaitland force-pushed the sifive7-vcq branch from f155f7a to e85173a Compare December 4, 2023 19:01

!fixup fix test after rebase

203a5c9

michaelmaitland merged commit d9570ba into llvm:main Dec 4, 2023
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Remove SiFive7PipeV and replace it with SiFive7VCQ #73969

[RISCV] Remove SiFive7PipeV and replace it with SiFive7VCQ #73969

michaelmaitland commented Nov 30, 2023

llvmbot commented Nov 30, 2023

wangpc-pp commented Dec 1, 2023 •

edited

Loading

michaelmaitland commented Dec 1, 2023

wangpc-pp left a comment

[RISCV] Remove SiFive7PipeV and replace it with SiFive7VCQ #73969

[RISCV] Remove SiFive7PipeV and replace it with SiFive7VCQ #73969

Conversation

michaelmaitland commented Nov 30, 2023

llvmbot commented Nov 30, 2023

wangpc-pp commented Dec 1, 2023 • edited Loading

michaelmaitland commented Dec 1, 2023

wangpc-pp left a comment

Choose a reason for hiding this comment

wangpc-pp commented Dec 1, 2023 •

edited

Loading