-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RISCV] Toggle throttled FP64 feature in SiFive7 scheduling model with subtarget feature #162400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Toggle throttled FP64 feature in SiFive7 scheduling model with subtarget feature #162400
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Min-Yih Hsu (mshockwave) ChangesStacks on top of #162399 (which also has more context on the rationale of this patch) This patch teaches the SiFive7 scheduling model to configure / toggle the throttled FP64 vector feature with subtarget feature rather than hard-coded TableGen parameter, which inevitably forces us to instantiate a new scheduling model for every performance features like this. Full diff: https://github.com/llvm/llvm-project/pull/162400.diff 6 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVFeatures.td b/llvm/lib/Target/RISCV/RISCVFeatures.td
index 27cf057112869..0d3df0e188505 100644
--- a/llvm/lib/Target/RISCV/RISCVFeatures.td
+++ b/llvm/lib/Target/RISCV/RISCVFeatures.td
@@ -1823,6 +1823,10 @@ def TuneConditionalCompressedMoveFusion
def HasConditionalMoveFusion : Predicate<"Subtarget->hasConditionalMoveFusion()">;
def NoConditionalMoveFusion : Predicate<"!Subtarget->hasConditionalMoveFusion()">;
+def TuneHasThrottledVecFP64
+ : SubtargetFeature<"throttled-vec-fp64", "HasThrottledVectorFP64", "true",
+ "Certain vector FP64 operations have limited performance">;
+
def TuneMIPSP8700
: SubtargetFeature<"mips-p8700", "RISCVProcFamily", "MIPSP8700",
"MIPS p8700 processor">;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrPredicates.td b/llvm/lib/Target/RISCV/RISCVInstrPredicates.td
index 6d86aff581604..8a449c7e3dd08 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrPredicates.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrPredicates.td
@@ -14,6 +14,8 @@
// otherwise.
def VLDSX0Pred : MCSchedPredicate<CheckRegOperand<3, X0>>;
+def ThrottledVecFP64SchedPred : FeatureSchedPredicate<TuneHasThrottledVecFP64>;
+
// Returns true if this is the sext.w pattern, addiw rd, rs1, 0.
def isSEXT_W
: TIIPredicate<"isSEXT_W",
diff --git a/llvm/lib/Target/RISCV/RISCVProcessors.td b/llvm/lib/Target/RISCV/RISCVProcessors.td
index 17a794867be9e..1927bfdb689c1 100644
--- a/llvm/lib/Target/RISCV/RISCVProcessors.td
+++ b/llvm/lib/Target/RISCV/RISCVProcessors.td
@@ -338,7 +338,8 @@ def SIFIVE_X390 : RISCVProcessorModel<"sifive-x390",
FeatureStdExtZvl1024b,
FeatureVendorXSiFivecdiscarddlone,
FeatureVendorXSiFivecflushdlone],
- SiFiveIntelligenceTuneFeatures>;
+ !listconcat(SiFiveIntelligenceTuneFeatures,
+ [TuneHasThrottledVecFP64])>;
defvar SiFiveP400TuneFeatures = [TuneNoDefaultUnroll,
TuneConditionalCompressedMoveFusion,
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
index 3e07eff72bf70..22bf835a20267 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
@@ -317,7 +317,6 @@ multiclass SiFive7WriteResBase<int VLEN,
ProcResourceKind VL, ProcResourceKind VS,
ProcResourceKind VCQ,
SiFive7FPLatencies fpLatencies,
- bit isFP64Throttled = false,
bit hasFastGather = false> {
// Branching
@@ -832,29 +831,56 @@ multiclass SiFive7WriteResBase<int VLEN,
// 13. Vector Floating-Point Instructions
foreach mx = SchedMxListF in {
foreach sew = SchedSEWSet<mx, isF=1>.val in {
- defvar Cycles = !if(!and(isFP64Throttled, !eq(sew, 64)),
- SiFive7GetCyclesOnePerElement<mx, sew, VLEN>.c,
- SiFive7GetCyclesDefault<mx>.c);
- defvar Lat8 = !if(!and(isFP64Throttled, !eq(sew, 64)), Cycles, 8);
- defvar VA = !if(!and(isFP64Throttled, !eq(sew, 64)), VA1, VA1OrVA2);
defvar IsWorstCase = SiFive7IsWorstCaseMXSEW<mx, sew, SchedMxListF, isF=1>.c;
- let Latency = Lat8, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
- defm : LMULSEWWriteResMXSEW<"WriteVFALUV", [VCQ, VA], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFALUF", [VCQ, VA], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFMulV", [VCQ, VA], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFMulF", [VCQ, VA], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFMulAddV", [VCQ, VA], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFMulAddF", [VCQ, VA], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFRecpV", [VCQ, VA1], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFCvtIToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
- }
- defvar Lat4 = !if(!and(isFP64Throttled, !eq(sew, 64)), Cycles, 4);
- let Latency = Lat4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
- defm : LMULSEWWriteResMXSEW<"WriteVFSgnjV", [VCQ, VA], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFSgnjF", [VCQ, VA], mx, sew, IsWorstCase>;
- // min max require merge
- defm : LMULSEWWriteResMXSEW<"WriteVFMinMaxV", [VCQ, VA1], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFMinMaxF", [VCQ, VA1], mx, sew, IsWorstCase>;
+ if !eq(sew, 64) then {
+ defvar ThrottledCycles = SiFive7GetCyclesOnePerElement<mx, sew, VLEN>.c;
+ foreach SchedWriteName = ["WriteVFALUV", "WriteVFALUF", "WriteVFMulV", "WriteVFMulF",
+ "WriteVFMulAddV", "WriteVFMulAddF"] in
+ defm : LMULSEWWriteResMXSEWVariant<SchedWriteName, ThrottledVecFP64SchedPred,
+ // Predicated
+ [VCQ, VA1], ThrottledCycles, [0, 1], [1, !add(1, ThrottledCycles)],
+ // Not Predicated
+ [VCQ, VA1OrVA2], 8, [0, 1], [1, !add(1, SiFive7GetCyclesDefault<mx>.c)],
+ mx, sew, IsWorstCase>;
+ foreach SchedWriteName = ["WriteVFRecpV", "WriteVFCvtIToFV"] in
+ defm : LMULSEWWriteResMXSEWVariant<SchedWriteName, ThrottledVecFP64SchedPred,
+ // Predicated
+ [VCQ, VA1], ThrottledCycles, [0, 1], [1, !add(1, ThrottledCycles)],
+ // Not Predicated
+ [VCQ, VA1], 8, [0, 1], [1, !add(1, SiFive7GetCyclesDefault<mx>.c)],
+ mx, sew, IsWorstCase>;
+ foreach SchedWriteName = ["WriteVFSgnjV", "WriteVFSgnjF"] in
+ defm : LMULSEWWriteResMXSEWVariant<SchedWriteName, ThrottledVecFP64SchedPred,
+ // Predicated
+ [VCQ, VA1], ThrottledCycles, [0, 1], [1, !add(1, ThrottledCycles)],
+ // Not Predicated
+ [VCQ, VA1OrVA2], 4, [0, 1], [1, !add(1, SiFive7GetCyclesDefault<mx>.c)],
+ mx, sew, IsWorstCase>;
+ foreach SchedWriteName = ["WriteVFMinMaxV", "WriteVFMinMaxF"] in
+ defm : LMULSEWWriteResMXSEWVariant<SchedWriteName, ThrottledVecFP64SchedPred,
+ // Predicated
+ [VCQ, VA1], ThrottledCycles, [0, 1], [1, !add(1, ThrottledCycles)],
+ // Not Predicated
+ [VCQ, VA1], 4, [0, 1], [1, !add(1, SiFive7GetCyclesDefault<mx>.c)],
+ mx, sew, IsWorstCase>;
+ } else {
+ let Latency = 8, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, SiFive7GetCyclesDefault<mx>.c)] in {
+ defm : LMULSEWWriteResMXSEW<"WriteVFALUV", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFALUF", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFMulV", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFMulF", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFMulAddV", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFMulAddF", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFRecpV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFCvtIToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ }
+ let Latency = 4, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, SiFive7GetCyclesDefault<mx>.c)] in {
+ defm : LMULSEWWriteResMXSEW<"WriteVFSgnjV", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFSgnjF", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
+ // min max require merge
+ defm : LMULSEWWriteResMXSEW<"WriteVFMinMaxV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFMinMaxF", [VCQ, VA1], mx, sew, IsWorstCase>;
+ }
}
}
}
@@ -892,19 +918,28 @@ multiclass SiFive7WriteResBase<int VLEN,
// Widening
foreach mx = SchedMxListW in {
foreach sew = SchedSEWSet<mx, isF=0, isWidening=1>.val in {
- defvar Cycles = !if(!and(isFP64Throttled, !eq(sew, 32)),
- SiFive7GetCyclesOnePerElement<mx, sew, VLEN>.c,
- SiFive7GetCyclesDefault<mx>.c);
defvar IsWorstCase = SiFive7IsWorstCaseMXSEW<mx, sew, SchedMxListW>.c;
- let Latency = 8, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in
- defm : LMULSEWWriteResMXSEW<"WriteVFWCvtIToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ defvar DefaultCycles = SiFive7GetCyclesDefault<mx>.c;
+ if !eq(sew, 32) then {
+ defvar ThrottledCycles = SiFive7GetCyclesOnePerElement<mx, sew, VLEN>.c;
+ defm : LMULSEWWriteResMXSEWVariant<"WriteVFWCvtIToFV", ThrottledVecFP64SchedPred,
+ // Predicated
+ [VCQ, VA1], 8, [0, 1], [1, !add(1, ThrottledCycles)],
+ // Not Predicated
+ [VCQ, VA1], 8, [0, 1], [1, !add(1, DefaultCycles)],
+ mx, sew, IsWorstCase>;
+ } else {
+ let Latency = 8,
+ AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, DefaultCycles)] in
+ defm : LMULSEWWriteResMXSEW<"WriteVFWCvtIToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ }
}
}
foreach mx = SchedMxListFW in {
foreach sew = SchedSEWSet<mx, isF=1, isWidening=1>.val in {
- defvar Cycles = SiFive7GetCyclesDefault<mx>.c;
+ defvar DefaultCycles = SiFive7GetCyclesDefault<mx>.c;
defvar IsWorstCase = SiFive7IsWorstCaseMXSEW<mx, sew, SchedMxListFW, isF=1>.c;
- let Latency = 8, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
+ let Latency = 8, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, DefaultCycles)] in {
defm : LMULSEWWriteResMXSEW<"WriteVFWALUV", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
defm : LMULSEWWriteResMXSEW<"WriteVFWALUF", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
defm : LMULSEWWriteResMXSEW<"WriteVFWMulV", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
@@ -912,11 +947,19 @@ multiclass SiFive7WriteResBase<int VLEN,
defm : LMULSEWWriteResMXSEW<"WriteVFWMulAddV", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
defm : LMULSEWWriteResMXSEW<"WriteVFWMulAddF", [VCQ, VA1OrVA2], mx, sew, IsWorstCase>;
}
- defvar CvtCycles = !if(!and(isFP64Throttled, !eq(sew, 32)),
- SiFive7GetCyclesOnePerElement<mx, sew, VLEN>.c,
- SiFive7GetCyclesDefault<mx>.c);
- let Latency = 8, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, CvtCycles)] in
- defm "" : LMULSEWWriteResMXSEW<"WriteVFWCvtFToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ if !eq(sew, 32) then {
+ defvar ThrottledCycles = SiFive7GetCyclesOnePerElement<mx, sew, VLEN>.c;
+ defm : LMULSEWWriteResMXSEWVariant<"WriteVFWCvtFToFV", ThrottledVecFP64SchedPred,
+ // Predicated
+ [VCQ, VA1], 8, [0, 1], [1, !add(1, ThrottledCycles)],
+ // Not Predicated
+ [VCQ, VA1], 8, [0, 1], [1, !add(1, DefaultCycles)],
+ mx, sew, IsWorstCase>;
+ } else {
+ let Latency = 8,
+ AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, DefaultCycles)] in
+ defm : LMULSEWWriteResMXSEW<"WriteVFWCvtFToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ }
}
defvar Cycles = SiFive7GetCyclesDefault<mx>.c;
defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxListFW>.c;
@@ -933,13 +976,23 @@ multiclass SiFive7WriteResBase<int VLEN,
}
foreach mx = SchedMxListFW in {
foreach sew = SchedSEWSet<mx, isF=1, isWidening=1>.val in {
- defvar Cycles = !if(!and(isFP64Throttled, !eq(sew, 32)),
- SiFive7GetCyclesOnePerElement<mx, sew, VLEN>.c,
- SiFive7GetCyclesNarrowing<mx>.c);
defvar IsWorstCase = SiFive7IsWorstCaseMXSEW<mx, sew, SchedMxListFW, isF=1>.c;
- let Latency = 8, AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, Cycles)] in {
- defm : LMULSEWWriteResMXSEW<"WriteVFNCvtIToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
- defm : LMULSEWWriteResMXSEW<"WriteVFNCvtFToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ defvar DefaultCycles = SiFive7GetCyclesNarrowing<mx>.c;
+ if !eq(sew, 32) then {
+ defvar ThrottledCycles = SiFive7GetCyclesOnePerElement<mx, sew, VLEN>.c;
+ foreach SchedWriteName = ["WriteVFNCvtIToFV", "WriteVFNCvtFToFV"] in
+ defm : LMULSEWWriteResMXSEWVariant<SchedWriteName, ThrottledVecFP64SchedPred,
+ // Predicated
+ [VCQ, VA1], 8, [0, 1], [1, !add(1, ThrottledCycles)],
+ // Not Predicated
+ [VCQ, VA1], 8, [0, 1], [1, !add(1, DefaultCycles)],
+ mx, sew, IsWorstCase>;
+ } else {
+ let Latency = 8,
+ AcquireAtCycles = [0, 1], ReleaseAtCycles = [1, !add(1, DefaultCycles)] in {
+ defm : LMULSEWWriteResMXSEW<"WriteVFNCvtIToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ defm : LMULSEWWriteResMXSEW<"WriteVFNCvtFToFV", [VCQ, VA1], mx, sew, IsWorstCase>;
+ }
}
}
}
@@ -1499,7 +1552,6 @@ multiclass SiFive7ReadAdvance {
/// eventually be supplied by different SchedMachineModels.
multiclass SiFive7SchedResources<int vlen, bit extraVALU,
SiFive7FPLatencies fpLatencies,
- bit isFP64Throttled,
bit hasFastGather> {
defm SiFive7 : SiFive7ProcResources<extraVALU>;
@@ -1527,8 +1579,7 @@ multiclass SiFive7SchedResources<int vlen, bit extraVALU,
: SiFive7WriteResBase<vlen, SiFive7PipeA, SiFive7PipeB, SiFive7PipeAB,
SiFive7IDiv, SiFive7FDiv, SiFive7VA1,
SiFive7VA1OrVA2, SiFive7VL, SiFive7VS,
- SiFive7VCQ, fpLatencies, isFP64Throttled,
- hasFastGather>;
+ SiFive7VCQ, fpLatencies, hasFastGather>;
//===----------------------------------------------------------------------===//
// Bypass and advance
@@ -1560,7 +1611,6 @@ class SiFive7SchedMachineModel<int vlen> : SchedMachineModel {
bit HasExtraVALU = false;
SiFive7FPLatencies FPLatencies;
- bit IsFP64Throttled = false;
bit HasFastGather = false;
string Name = !subst("Model", "", !subst("SiFive7", "", NAME));
@@ -1587,7 +1637,6 @@ def SiFive7VLEN512Model : SiFive7SchedMachineModel<512> {
def SiFive7VLEN1024X300Model : SiFive7SchedMachineModel<1024> {
let HasExtraVALU = true;
let FPLatencies = SiFive7LowFPLatencies;
- let IsFP64Throttled = true;
let HasFastGather = true;
}
@@ -1596,7 +1645,6 @@ foreach model = [SiFive7VLEN512Model, SiFive7VLEN1024X300Model] in {
let SchedModel = model in
defm model.Name : SiFive7SchedResources<model.VLEN, model.HasExtraVALU,
model.FPLatencies,
- model.IsFP64Throttled,
model.HasFastGather>;
}
diff --git a/llvm/lib/Target/RISCV/RISCVScheduleV.td b/llvm/lib/Target/RISCV/RISCVScheduleV.td
index 01a4308a1366d..d11b446920c4e 100644
--- a/llvm/lib/Target/RISCV/RISCVScheduleV.td
+++ b/llvm/lib/Target/RISCV/RISCVScheduleV.td
@@ -128,6 +128,22 @@ multiclass LMULWriteResMXVariant<string name, SchedPredicateBase Pred,
IsWorstCase>;
}
+multiclass LMULSEWWriteResMXSEWVariant<string name, SchedPredicateBase Pred,
+ list<ProcResourceKind> predResources,
+ int predLat, list<int> predAcquireCycles,
+ list<int> predReleaseCycles,
+ list<ProcResourceKind> noPredResources,
+ int noPredLat, list<int> noPredAcquireCycles,
+ list<int> noPredReleaseCycles,
+ string mx, int sew, bit IsWorstCase> {
+ defm "" : LMULWriteResVariantImpl<name, name # "_" # mx # "_E" # sew, Pred, predResources,
+ predLat, predAcquireCycles,
+ predReleaseCycles, noPredResources,
+ noPredLat, noPredAcquireCycles,
+ noPredReleaseCycles,
+ IsWorstCase>;
+}
+
// Define multiclasses to define SchedWrite, SchedRead, WriteRes, and
// ReadAdvance for each (name, LMUL) pair and for each LMUL in each of the
// SchedMxList variants above. Each multiclass is responsible for defining
diff --git a/llvm/test/CodeGen/RISCV/features-info.ll b/llvm/test/CodeGen/RISCV/features-info.ll
index 1a7a72d3e072b..40a976e871988 100644
--- a/llvm/test/CodeGen/RISCV/features-info.ll
+++ b/llvm/test/CodeGen/RISCV/features-info.ll
@@ -179,6 +179,7 @@
; CHECK-NEXT: svpbmt - 'Svpbmt' (Page-Based Memory Types).
; CHECK-NEXT: svvptc - 'Svvptc' (Obviating Memory-Management Instructions after Marking PTEs Valid).
; CHECK-NEXT: tagged-globals - Use an instruction sequence for taking the address of a global that allows a memory tag in the upper address bits.
+; CHECK-NEXT: throttled-vec-fp64 - Certain vector FP64 operations have limited performance.
; CHECK-NEXT: unaligned-scalar-mem - Has reasonably performant unaligned scalar loads and stores.
; CHECK-NEXT: unaligned-vector-mem - Has reasonably performant unaligned vector loads and stores.
; CHECK-NEXT: use-postra-scheduler - Schedule again after register allocation.
|
def HasConditionalMoveFusion : Predicate<"Subtarget->hasConditionalMoveFusion()">; | ||
def NoConditionalMoveFusion : Predicate<"!Subtarget->hasConditionalMoveFusion()">; | ||
|
||
def TuneHasThrottledVecFP64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about SingleElementVecFP64 instead of some arbitrary notion of "throttling"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that sounds better, it's fixed now (here and the base PR where this feature was introduced)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
"WriteVFMulAddV", "WriteVFMulAddF"] in | ||
defm : LMULSEWWriteResMXSEWVariant<SchedWriteName, SingleElementVecFP64SchedPred, | ||
// Predicated | ||
[VCQ, VA1], SingleElementCycles, [0, 1], [1, !add(1, SingleElementCycles)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latency for single element should be SingleElementCycles+7. The last element still takes 8 cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense. It's fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…h subtarget feature
…del with subtarget feature
50e8625
to
d474ae8
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/166/builds/2850 Here is the relevant piece of the build log for the reference
|
Stacks on top of #162399 (which also has more context on the rationale of this patch)
This patch teaches the SiFive7 scheduling model to configure / toggle the throttled FP64 vector feature with subtarget feature rather than hard-coded TableGen parameter, which inevitably forces us to instantiate a new scheduling model for every performance features like this.