-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] SiFive7 VLDS Sched should not depend on VL when stride is x0. #70266
Conversation
When stride is x0, a strided load should behave like a unit stride load, which uses the VLDE sched class.
@llvm/pr-subscribers-backend-risc-v Author: Michael Maitland (michaelmaitland) ChangesWhen stride is x0, a strided load should behave like a unit stride load, which uses the VLDE sched class. Full diff: https://github.com/llvm/llvm-project/pull/70266.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
index 96ebe8e3e67686a..7433d88bfaba096 100644
--- a/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
+++ b/llvm/lib/Target/RISCV/RISCVSchedSiFive7.td
@@ -455,11 +455,20 @@ foreach mx = SchedMxList in {
// specific suffixes, but since SEW is already encoded in the name of the
// resource, we do not need to use LMULSEWXXX constructors. However, we do
// use the SEW from the name to determine the number of Cycles.
+
+// This predicate is true when the rs2 operand of vlse or vsse is x0, false
+// otherwise.
+def VLDSX0Pred: SchedPredicate<[{ MI->getOperand(3).getReg() == RISCV::X0 }]>;
+
foreach mx = SchedMxList in {
+ defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 8>.c;
defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
+ defm "" : LMULWriteResMXVariant<"WriteVLDS8", VLDSX0Pred, [SiFive7VL],
+ 4, [VLDSX0Cycles],
+ !add(3, Cycles), [Cycles], mx,
+ IsWorstCase, "SiFive7">;
let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
- defm "" : LMULWriteResMX<"WriteVLDS8", [SiFive7VL], mx, IsWorstCase>;
defm "" : LMULWriteResMX<"WriteVLDUX8", [SiFive7VL], mx, IsWorstCase>;
defm "" : LMULWriteResMX<"WriteVLDOX8", [SiFive7VL], mx, IsWorstCase>;
}
@@ -469,11 +478,18 @@ foreach mx = SchedMxList in {
defm "" : LMULWriteResMX<"WriteVSTOX8", [SiFive7VS], mx, IsWorstCase>;
}
}
-foreach mx = SchedMxList in {
+// TODO: The MxLists need to be filtered by EEW. We only need to support
+// LMUL >= SEW_min/ELEN. Here, the smallest EEW prevents us from having MF8
+// since LMUL >= 16/64.
+foreach mx = ["MF4", "MF2", "M1", "M2", "M4", "M8"] in {
+ defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 16>.c;
defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
+ defm "" : LMULWriteResMXVariant<"WriteVLDS16", VLDSX0Pred, [SiFive7VL],
+ 4, [VLDSX0Cycles],
+ !add(3, Cycles), [Cycles], mx,
+ IsWorstCase, "SiFive7">;
let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
- defm "" : LMULWriteResMX<"WriteVLDS16", [SiFive7VL], mx, IsWorstCase>;
defm "" : LMULWriteResMX<"WriteVLDUX16", [SiFive7VL], mx, IsWorstCase>;
defm "" : LMULWriteResMX<"WriteVLDOX16", [SiFive7VL], mx, IsWorstCase>;
}
@@ -483,11 +499,15 @@ foreach mx = SchedMxList in {
defm "" : LMULWriteResMX<"WriteVSTOX16", [SiFive7VS], mx, IsWorstCase>;
}
}
-foreach mx = SchedMxList in {
+foreach mx = ["MF2", "M1", "M2", "M4", "M8"] in {
+ defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 32>.c;
defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
+ defm "" : LMULWriteResMXVariant<"WriteVLDS32", VLDSX0Pred, [SiFive7VL],
+ 4, [VLDSX0Cycles],
+ !add(3, Cycles), [Cycles], mx,
+ IsWorstCase, "SiFive7">;
let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
- defm "" : LMULWriteResMX<"WriteVLDS32", [SiFive7VL], mx, IsWorstCase>;
defm "" : LMULWriteResMX<"WriteVLDUX32", [SiFive7VL], mx, IsWorstCase>;
defm "" : LMULWriteResMX<"WriteVLDOX32", [SiFive7VL], mx, IsWorstCase>;
}
@@ -497,11 +517,15 @@ foreach mx = SchedMxList in {
defm "" : LMULWriteResMX<"WriteVSTOX32", [SiFive7VS], mx, IsWorstCase>;
}
}
-foreach mx = SchedMxList in {
+foreach mx = ["M1", "M2", "M4", "M8"] in {
+ defvar VLDSX0Cycles = SiFive7GetCyclesDefault<mx>.c;
defvar Cycles = SiFive7GetCyclesOnePerElement<mx, 64>.c;
defvar IsWorstCase = SiFive7IsWorstCaseMX<mx, SchedMxList>.c;
+ defm "" : LMULWriteResMXVariant<"WriteVLDS64", VLDSX0Pred, [SiFive7VL],
+ 4, [VLDSX0Cycles],
+ !add(3, Cycles), [Cycles], mx,
+ IsWorstCase, "SiFive7">;
let Latency = !add(3, Cycles), ReleaseAtCycles = [Cycles] in {
- defm "" : LMULWriteResMX<"WriteVLDS64", [SiFive7VL], mx, IsWorstCase>;
defm "" : LMULWriteResMX<"WriteVLDUX64", [SiFive7VL], mx, IsWorstCase>;
defm "" : LMULWriteResMX<"WriteVLDOX64", [SiFive7VL], mx, IsWorstCase>;
}
diff --git a/llvm/lib/Target/RISCV/RISCVScheduleV.td b/llvm/lib/Target/RISCV/RISCVScheduleV.td
index 7af7716c96b8564..2bad3b88867a975 100644
--- a/llvm/lib/Target/RISCV/RISCVScheduleV.td
+++ b/llvm/lib/Target/RISCV/RISCVScheduleV.td
@@ -62,6 +62,45 @@ multiclass LMULSEWWriteResMXSEW<string name, list<ProcResourceKind> resources,
def : WriteRes<!cast<SchedWrite>(name # "_WorstCase"), resources>;
}
+// Define a SchedAlias for the SchedWrite associated with (name, mx) whose
+// behavior is aliased to a Variant. The Variant has Latency predLad and
+// ReleaseAtCycles predCycles if the SchedPredicate Pred is true, otherwise has
+// Latency noPredLat and ReleaseAtCycles noPredCycles. The WorstCase SchedWrite
+// is created similiarly if IsWorstCase is true. The prefix is used to uniquely
+// define SchedWriteRes and Variant resources.
+multiclass LMULWriteResMXVariant<string name, SchedPredicate Pred,
+ list<ProcResourceKind> resources,
+ int predLat, list<int> predCycles,
+ int noPredLat, list<int> noPredCycles,
+ string mx, bit IsWorstCase, string prefix> {
+ defvar nameMX = prefix # name # "_" # mx;
+
+ // Define the different behaviors
+ def nameMX # "_Pred" : SchedWriteRes<resources>
+ { let Latency = predLat; let ReleaseAtCycles = predCycles; }
+ def nameMX # "_NoPred" : SchedWriteRes<resources>
+ { let Latency = noPredLat; let ReleaseAtCycles = noPredCycles; }
+
+ // Tie behavior to predicate
+ def nameMX # "_Variant" : SchedWriteVariant<[
+ SchedVar<Pred, [!cast<SchedWriteRes>(nameMX # "_Pred")]>,
+ SchedVar<NoSchedPred, [!cast<SchedWriteRes>(nameMX # "_NoPred")]>
+ ]>;
+ def : SchedAlias<
+ !cast<SchedReadWrite>(name # "_" # mx),
+ !cast<SchedReadWrite>(nameMX # "_Variant")>;
+
+ if IsWorstCase then {
+ def prefix # name # "_WorstCase_Variant" : SchedWriteVariant<[
+ SchedVar<Pred, [!cast<SchedWriteRes>(nameMX # "_Pred")]>,
+ SchedVar<NoSchedPred, [!cast<SchedWriteRes>(nameMX # "_NoPred")]>
+ ]>;
+ def : SchedAlias<
+ !cast<SchedReadWrite>(name # "_WorstCase"),
+ !cast<SchedReadWrite>(prefix # name # "_WorstCase_Variant")>;
+ }
+}
+
// Define multiclasses to define SchedWrite, SchedRead, WriteRes, and
// ReadAdvance for each (name, LMUL) pair and for each LMUL in each of the
// SchedMxList variants above. Each multiclass is responsible for defining
|
defvar nameMX = prefix # name # "_" # mx; | ||
|
||
// Define the different behaviors | ||
def nameMX # "_Pred" : SchedWriteRes<resources> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am all ears if anyone has an idea to remove the use of prefix
. The problem is that if we removed it and there is another processor who wants to create a LMULWriteResMXVariant for the same name
and mx
, then we will be creating two records with the same name, which is not allowed.
But using it feels awkward because we would like it to be in the namespace of the caller of LMULWriteResMXVariant. We cant give a name like defm "SiFive7" : LMULWriteResMXVariant
because then it will suffix the SchedAlias with that, and we don't want that.
I tried to use defvar Pred = SchedWriteRes<resources>;
but that leads to error: Record
anonymous_53386', field SchedModel' does not have a def initializer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with the Name being applied to the SchedAlias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. There is no name on SchedAlias. It is anonymous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because then it will suffix the SchedAlias with that, and we don't want that.
Why don't we want that? Or why is it a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of SchedAlias
doesn't matter I think, we can just use the form defm "SiFive7" : LMULWriteResMXVariant
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with the Name being applied to the SchedAlias?
Say we had something like this:
defm "SiFive7" : LMULWriteResMXVariant<"WriteVLDS8", VLDSX0Pred, [SiFive7VL],
Then when we define a SchedWriteRes using LMULWriteResMXVariant, then it also gets "SiFIve7" prepended to it, creating something like "SiFive7VLDS_M1_Pred". Then to create the variant, we must have that prefix:
// Tie behavior to predicate
def nameMX # "_Variant" : SchedWriteVariant<[
- SchedVar<Pred, [!cast<SchedWriteRes>(nameMX # "_Pred")]>,
- SchedVar<NoSchedPred, [!cast<SchedWriteRes>(nameMX # "_NoPred")]>
+ SchedVar<Pred, [!cast<SchedWriteRes>("SiFive7" # nameMX # "_Pred")]>,
+ SchedVar<NoSchedPred, [!cast<SchedWriteRes>("SiFive7" # nameMX # "_NoPred")]>
]>;
def : SchedAlias<
- !cast<SchedReadWrite>(name # "_" # mx),
- !cast<SchedReadWrite>(nameMX # "_Variant")>;
+ !cast<SchedReadWrite>(nameMX),
+ !cast<SchedReadWrite>("SiFive7" # nameMX # "_Variant")>;
The problem here is that we need to know the prefix anyway, to create the Variant and SchedAlias, so specifying it in the defm
achieves absolutely nothing.
The only alternative I can think of is for us to use something like LMULSchedWritesVariant
similiar to LMULSchedWrites
to be used in conjunction with LMULWriteResVariant
. However, I don't think this is the right approach either because this would force all subtargets to use a variant in those places, even if they didn't need to use a variant. I think the current approach may be the best approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use NAME
to get the name of the defm that was used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some llvm-mca tests?
defvar nameMX = prefix # name # "_" # mx; | ||
|
||
// Define the different behaviors | ||
def nameMX # "_Pred" : SchedWriteRes<resources> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of SchedAlias
doesn't matter I think, we can just use the form defm "SiFive7" : LMULWriteResMXVariant
.
I have added them, but llvm-mca does not know how to give a correct report yet. I have confirmed that this change does work as expected on some internal benchmarks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with suggestions.
Co-authored-by: Wang Pengcheng <wangpengcheng.pp@bytedance.com>
When stride is x0, a strided load should behave like a unit stride load, which uses the VLDE sched class.