-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Minimally modify incoming state in transferBefore #72352
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangestransferBefore currently takes an incoming state and an instruction, computes This patch reverses the approach by instead taking the incoming state and I find this makes things slightly easier to reason about. It also us to use the I haven't had a chance to check, but hopefully this might also address some of Patch is 100.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72352.diff 13 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
index 999ff9920716e87..87e595ab20cf83e 100644
--- a/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
@@ -479,6 +479,8 @@ class VSETVLIInfo {
unsigned getSEW() const { return SEW; }
RISCVII::VLMUL getVLMUL() const { return VLMul; }
+ bool getTailAgnostic() const { return TailAgnostic; }
+ bool getMaskAgnostic() const { return MaskAgnostic; }
bool hasNonZeroAVL(const MachineRegisterInfo &MRI) const {
if (hasAVLImm())
@@ -1013,73 +1015,76 @@ bool RISCVInsertVSETVLI::needVSETVLI(const MachineInstr &MI,
return true;
}
-// Given an incoming state reaching MI, modifies that state so that it is minimally
-// compatible with MI. The resulting state is guaranteed to be semantically legal
-// for MI, but may not be the state requested by MI.
+// Given an incoming state reaching MI, minimally modifies that state so that it
+// is compatible with MI. The resulting state is guaranteed to be semantically
+// legal for MI, but may not be the state requested by MI.
void RISCVInsertVSETVLI::transferBefore(VSETVLIInfo &Info,
const MachineInstr &MI) const {
uint64_t TSFlags = MI.getDesc().TSFlags;
if (!RISCVII::hasSEWOp(TSFlags))
return;
- const VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);
+ VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI);
if (Info.isValid() && !needVSETVLI(MI, NewInfo, Info))
return;
- const VSETVLIInfo PrevInfo = Info;
- Info = NewInfo;
+ if (Info.hasSEWLMULRatioOnly() || !Info.isValid() || Info.isUnknown())
+ Info = NewInfo;
- if (!RISCVII::hasVLOp(TSFlags))
- return;
+ DemandedFields Demanded = getDemanded(MI, MRI, ST);
// If we don't use LMUL or the SEW/LMUL ratio, then adjust LMUL so that we
// maintain the SEW/LMUL ratio. This allows us to eliminate VL toggles in more
// places.
- DemandedFields Demanded = getDemanded(MI, MRI, ST);
if (!Demanded.LMUL && !Demanded.SEWLMULRatio && Info.isValid() &&
- PrevInfo.isValid() && !Info.isUnknown() && !PrevInfo.isUnknown()) {
- if (auto NewVLMul = RISCVVType::getSameRatioLMUL(
- PrevInfo.getSEW(), PrevInfo.getVLMUL(), Info.getSEW()))
- Info.setVLMul(*NewVLMul);
- }
-
- // For vmv.s.x and vfmv.s.f, there are only two behaviors, VL = 0 and
- // VL > 0. We can discard the user requested AVL and just use the last
- // one if we can prove it equally zero. This removes a vsetvli entirely
- // if the types match or allows use of cheaper avl preserving variant
- // if VLMAX doesn't change. If VLMAX might change, we couldn't use
- // the 'vsetvli x0, x0, vtype" variant, so we avoid the transform to
- // prevent extending live range of an avl register operand.
- // TODO: We can probably relax this for immediates.
- if (isScalarInsertInstr(MI) && PrevInfo.isValid() &&
- PrevInfo.hasEquallyZeroAVL(Info, *MRI) &&
- Info.hasSameVLMAX(PrevInfo)) {
- if (PrevInfo.hasAVLImm())
- Info.setAVLImm(PrevInfo.getAVLImm());
- else
- Info.setAVLReg(PrevInfo.getAVLReg());
- return;
+ !Info.isUnknown() && !Info.hasSEWLMULRatioOnly()) {
+ if (auto SameRatioLMUL = RISCVVType::getSameRatioLMUL(
+ Info.getSEW(), Info.getVLMUL(), NewInfo.getSEW())) {
+ NewInfo.setVLMul(*SameRatioLMUL);
+ Demanded.LMUL = true;
+ }
}
- // If AVL is defined by a vsetvli with the same VLMAX, we can
- // replace the AVL operand with the AVL of the defining vsetvli.
- // We avoid general register AVLs to avoid extending live ranges
- // without being sure we can kill the original source reg entirely.
- if (!Info.hasAVLReg() || !Info.getAVLReg().isVirtual())
- return;
- MachineInstr *DefMI = MRI->getVRegDef(Info.getAVLReg());
- if (!DefMI || !isVectorConfigInstr(*DefMI))
- return;
+ // If AVL is defined by a vsetvli with the same VLMAX, we can replace the AVL
+ // operand with the AVL of the defining vsetvli. We avoid general register
+ // AVLs to avoid extending live ranges without being sure we can kill the
+ // original source reg entirely.
+ if (RISCVII::hasVLOp(TSFlags) && NewInfo.hasAVLReg() &&
+ NewInfo.getAVLReg().isVirtual()) {
+ MachineInstr *DefMI = MRI->getVRegDef(NewInfo.getAVLReg());
+ if (DefMI && isVectorConfigInstr(*DefMI)) {
+ VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI);
+ if (DefInfo.hasSameVLMAX(NewInfo) &&
+ (DefInfo.hasAVLImm() || DefInfo.getAVLReg() == RISCV::X0)) {
+ if (DefInfo.hasAVLImm())
+ NewInfo.setAVLImm(DefInfo.getAVLImm());
+ else
+ NewInfo.setAVLReg(DefInfo.getAVLReg());
+ }
+ }
+ }
- VSETVLIInfo DefInfo = getInfoForVSETVLI(*DefMI);
- if (DefInfo.hasSameVLMAX(Info) &&
- (DefInfo.hasAVLImm() || DefInfo.getAVLReg() == RISCV::X0)) {
- if (DefInfo.hasAVLImm())
- Info.setAVLImm(DefInfo.getAVLImm());
+ // If MI only demands that VL has the same zeroness, we only need to set the
+ // AVL if the zeroness differs, or if VLMAX changes (since that prevents us
+ // from using vsetvli x0, x0).
+ bool CanUseX0X0Form =
+ Info.hasEquallyZeroAVL(NewInfo, *MRI) && Info.hasSameVLMAX(NewInfo);
+ if (Demanded.VLAny || (Demanded.VLZeroness && !CanUseX0X0Form)) {
+ if (NewInfo.hasAVLImm())
+ Info.setAVLImm(NewInfo.getAVLImm());
else
- Info.setAVLReg(DefInfo.getAVLReg());
- return;
- }
+ Info.setAVLReg(NewInfo.getAVLReg());
+ }
+
+ Info.setVTYPE(
+ ((Demanded.LMUL || Demanded.SEWLMULRatio) ? NewInfo : Info).getVLMUL(),
+ ((Demanded.SEW || Demanded.SEWLMULRatio) ? NewInfo : Info).getSEW(),
+ // Prefer tail/mask agnostic since it can be relaxed to undisturbed later
+ // if needed.
+ (Demanded.TailPolicy ? NewInfo : Info).getTailAgnostic() ||
+ NewInfo.getTailAgnostic(),
+ (Demanded.MaskPolicy ? NewInfo : Info).getMaskAgnostic() ||
+ NewInfo.getMaskAgnostic());
}
// Given a state with which we evaluated MI (see transferBefore above for why
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll
index 9d689c732d7999f..7ad01ceea33b154 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll
@@ -106,7 +106,7 @@ define i1 @extractelt_v16i1(ptr %x, i64 %idx) nounwind {
; RV32-NEXT: vsetivli zero, 16, e8, m1, ta, ma
; RV32-NEXT: vle8.v v8, (a0)
; RV32-NEXT: vmseq.vi v8, v8, 0
-; RV32-NEXT: vsetivli zero, 1, e16, mf4, ta, ma
+; RV32-NEXT: vsetvli zero, zero, e16, m2, ta, ma
; RV32-NEXT: vmv.x.s a0, v8
; RV32-NEXT: srl a0, a0, a1
; RV32-NEXT: andi a0, a0, 1
@@ -117,7 +117,7 @@ define i1 @extractelt_v16i1(ptr %x, i64 %idx) nounwind {
; RV64-NEXT: vsetivli zero, 16, e8, m1, ta, ma
; RV64-NEXT: vle8.v v8, (a0)
; RV64-NEXT: vmseq.vi v8, v8, 0
-; RV64-NEXT: vsetivli zero, 1, e16, mf4, ta, ma
+; RV64-NEXT: vsetvli zero, zero, e16, m2, ta, ma
; RV64-NEXT: vmv.x.s a0, v8
; RV64-NEXT: srl a0, a0, a1
; RV64-NEXT: andi a0, a0, 1
@@ -128,7 +128,7 @@ define i1 @extractelt_v16i1(ptr %x, i64 %idx) nounwind {
; RV32ZBS-NEXT: vsetivli zero, 16, e8, m1, ta, ma
; RV32ZBS-NEXT: vle8.v v8, (a0)
; RV32ZBS-NEXT: vmseq.vi v8, v8, 0
-; RV32ZBS-NEXT: vsetivli zero, 1, e16, mf4, ta, ma
+; RV32ZBS-NEXT: vsetvli zero, zero, e16, m2, ta, ma
; RV32ZBS-NEXT: vmv.x.s a0, v8
; RV32ZBS-NEXT: bext a0, a0, a1
; RV32ZBS-NEXT: ret
@@ -138,7 +138,7 @@ define i1 @extractelt_v16i1(ptr %x, i64 %idx) nounwind {
; RV64ZBS-NEXT: vsetivli zero, 16, e8, m1, ta, ma
; RV64ZBS-NEXT: vle8.v v8, (a0)
; RV64ZBS-NEXT: vmseq.vi v8, v8, 0
-; RV64ZBS-NEXT: vsetivli zero, 1, e16, mf4, ta, ma
+; RV64ZBS-NEXT: vsetvli zero, zero, e16, m2, ta, ma
; RV64ZBS-NEXT: vmv.x.s a0, v8
; RV64ZBS-NEXT: bext a0, a0, a1
; RV64ZBS-NEXT: ret
@@ -155,7 +155,7 @@ define i1 @extractelt_v32i1(ptr %x, i64 %idx) nounwind {
; RV32-NEXT: vsetvli zero, a2, e8, m2, ta, ma
; RV32-NEXT: vle8.v v8, (a0)
; RV32-NEXT: vmseq.vi v10, v8, 0
-; RV32-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
+; RV32-NEXT: vsetvli zero, zero, e32, m8, ta, ma
; RV32-NEXT: vmv.x.s a0, v10
; RV32-NEXT: srl a0, a0, a1
; RV32-NEXT: andi a0, a0, 1
@@ -167,7 +167,7 @@ define i1 @extractelt_v32i1(ptr %x, i64 %idx) nounwind {
; RV64-NEXT: vsetvli zero, a2, e8, m2, ta, ma
; RV64-NEXT: vle8.v v8, (a0)
; RV64-NEXT: vmseq.vi v10, v8, 0
-; RV64-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
+; RV64-NEXT: vsetvli zero, zero, e32, m8, ta, ma
; RV64-NEXT: vmv.x.s a0, v10
; RV64-NEXT: srl a0, a0, a1
; RV64-NEXT: andi a0, a0, 1
@@ -179,7 +179,7 @@ define i1 @extractelt_v32i1(ptr %x, i64 %idx) nounwind {
; RV32ZBS-NEXT: vsetvli zero, a2, e8, m2, ta, ma
; RV32ZBS-NEXT: vle8.v v8, (a0)
; RV32ZBS-NEXT: vmseq.vi v10, v8, 0
-; RV32ZBS-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
+; RV32ZBS-NEXT: vsetvli zero, zero, e32, m8, ta, ma
; RV32ZBS-NEXT: vmv.x.s a0, v10
; RV32ZBS-NEXT: bext a0, a0, a1
; RV32ZBS-NEXT: ret
@@ -190,7 +190,7 @@ define i1 @extractelt_v32i1(ptr %x, i64 %idx) nounwind {
; RV64ZBS-NEXT: vsetvli zero, a2, e8, m2, ta, ma
; RV64ZBS-NEXT: vle8.v v8, (a0)
; RV64ZBS-NEXT: vmseq.vi v10, v8, 0
-; RV64ZBS-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
+; RV64ZBS-NEXT: vsetvli zero, zero, e32, m8, ta, ma
; RV64ZBS-NEXT: vmv.x.s a0, v10
; RV64ZBS-NEXT: bext a0, a0, a1
; RV64ZBS-NEXT: ret
@@ -221,7 +221,7 @@ define i1 @extractelt_v64i1(ptr %x, i64 %idx) nounwind {
; RV64-NEXT: vsetvli zero, a2, e8, m4, ta, ma
; RV64-NEXT: vle8.v v8, (a0)
; RV64-NEXT: vmseq.vi v12, v8, 0
-; RV64-NEXT: vsetivli zero, 1, e64, m1, ta, ma
+; RV64-NEXT: vsetvli zero, a2, e64, m4, ta, ma
; RV64-NEXT: vmv.x.s a0, v12
; RV64-NEXT: srl a0, a0, a1
; RV64-NEXT: andi a0, a0, 1
@@ -246,7 +246,7 @@ define i1 @extractelt_v64i1(ptr %x, i64 %idx) nounwind {
; RV64ZBS-NEXT: vsetvli zero, a2, e8, m4, ta, ma
; RV64ZBS-NEXT: vle8.v v8, (a0)
; RV64ZBS-NEXT: vmseq.vi v12, v8, 0
-; RV64ZBS-NEXT: vsetivli zero, 1, e64, m1, ta, ma
+; RV64ZBS-NEXT: vsetvli zero, a2, e64, m4, ta, ma
; RV64ZBS-NEXT: vmv.x.s a0, v12
; RV64ZBS-NEXT: bext a0, a0, a1
; RV64ZBS-NEXT: ret
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
index 728cf18e1a77d8a..aec9babec33a095 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
@@ -543,9 +543,8 @@ define <4 x i8> @mgather_truemask_v4i8(<4 x ptr> %ptrs, <4 x i8> %passthru) {
; RV64ZVE32F-NEXT: .LBB9_6: # %cond.load1
; RV64ZVE32F-NEXT: ld a2, 8(a0)
; RV64ZVE32F-NEXT: lbu a2, 0(a2)
-; RV64ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
-; RV64ZVE32F-NEXT: vmv.s.x v9, a2
; RV64ZVE32F-NEXT: vsetivli zero, 2, e8, mf4, tu, ma
+; RV64ZVE32F-NEXT: vmv.s.x v9, a2
; RV64ZVE32F-NEXT: vslideup.vi v8, v9, 1
; RV64ZVE32F-NEXT: andi a2, a1, 4
; RV64ZVE32F-NEXT: beqz a2, .LBB9_3
@@ -1267,9 +1266,8 @@ define <4 x i16> @mgather_truemask_v4i16(<4 x ptr> %ptrs, <4 x i16> %passthru) {
; RV64ZVE32F-NEXT: .LBB20_6: # %cond.load1
; RV64ZVE32F-NEXT: ld a2, 8(a0)
; RV64ZVE32F-NEXT: lh a2, 0(a2)
-; RV64ZVE32F-NEXT: vsetvli zero, zero, e16, m1, ta, ma
-; RV64ZVE32F-NEXT: vmv.s.x v9, a2
; RV64ZVE32F-NEXT: vsetivli zero, 2, e16, mf2, tu, ma
+; RV64ZVE32F-NEXT: vmv.s.x v9, a2
; RV64ZVE32F-NEXT: vslideup.vi v8, v9, 1
; RV64ZVE32F-NEXT: andi a2, a1, 4
; RV64ZVE32F-NEXT: beqz a2, .LBB20_3
@@ -2351,9 +2349,8 @@ define <4 x i32> @mgather_truemask_v4i32(<4 x ptr> %ptrs, <4 x i32> %passthru) {
; RV64ZVE32F-NEXT: .LBB32_6: # %cond.load1
; RV64ZVE32F-NEXT: ld a2, 8(a0)
; RV64ZVE32F-NEXT: lw a2, 0(a2)
-; RV64ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
-; RV64ZVE32F-NEXT: vmv.s.x v9, a2
; RV64ZVE32F-NEXT: vsetivli zero, 2, e32, m1, tu, ma
+; RV64ZVE32F-NEXT: vmv.s.x v9, a2
; RV64ZVE32F-NEXT: vslideup.vi v8, v9, 1
; RV64ZVE32F-NEXT: andi a2, a1, 4
; RV64ZVE32F-NEXT: beqz a2, .LBB32_3
@@ -3844,7 +3841,7 @@ define <4 x i64> @mgather_truemask_v4i64(<4 x ptr> %ptrs, <4 x i64> %passthru) {
; RV32ZVE32F-NEXT: vmv.x.s a6, v9
; RV32ZVE32F-NEXT: bnez zero, .LBB45_5
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a2, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -4265,12 +4262,12 @@ define <8 x i64> @mgather_baseidx_v8i8_v8i64(ptr %base, <8 x i8> %idxs, <8 x i1>
; RV32ZVE32F-NEXT: vsext.vf4 v10, v8
; RV32ZVE32F-NEXT: vsll.vi v8, v10, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB48_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a1, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -4539,12 +4536,12 @@ define <8 x i64> @mgather_baseidx_sext_v8i8_v8i64(ptr %base, <8 x i8> %idxs, <8
; RV32ZVE32F-NEXT: vsext.vf4 v10, v8
; RV32ZVE32F-NEXT: vsll.vi v8, v10, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB49_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a1, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -4815,12 +4812,12 @@ define <8 x i64> @mgather_baseidx_zext_v8i8_v8i64(ptr %base, <8 x i8> %idxs, <8
; RV32ZVE32F-NEXT: vzext.vf4 v10, v8
; RV32ZVE32F-NEXT: vsll.vi v8, v10, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB50_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a1, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -5098,12 +5095,12 @@ define <8 x i64> @mgather_baseidx_v8i16_v8i64(ptr %base, <8 x i16> %idxs, <8 x i
; RV32ZVE32F-NEXT: vsext.vf2 v10, v8
; RV32ZVE32F-NEXT: vsll.vi v8, v10, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB51_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a1, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -5373,12 +5370,12 @@ define <8 x i64> @mgather_baseidx_sext_v8i16_v8i64(ptr %base, <8 x i16> %idxs, <
; RV32ZVE32F-NEXT: vsext.vf2 v10, v8
; RV32ZVE32F-NEXT: vsll.vi v8, v10, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB52_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a1, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -5650,12 +5647,12 @@ define <8 x i64> @mgather_baseidx_zext_v8i16_v8i64(ptr %base, <8 x i16> %idxs, <
; RV32ZVE32F-NEXT: vzext.vf2 v10, v8
; RV32ZVE32F-NEXT: vsll.vi v8, v10, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB53_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a1, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -5934,12 +5931,12 @@ define <8 x i64> @mgather_baseidx_v8i32_v8i64(ptr %base, <8 x i32> %idxs, <8 x i
; RV32ZVE32F-NEXT: vsetivli zero, 8, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vsll.vi v8, v8, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB54_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a1, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -6207,12 +6204,12 @@ define <8 x i64> @mgather_baseidx_sext_v8i32_v8i64(ptr %base, <8 x i32> %idxs, <
; RV32ZVE32F-NEXT: vsetivli zero, 8, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vsll.vi v8, v8, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB55_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a1, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -6481,12 +6478,12 @@ define <8 x i64> @mgather_baseidx_zext_v8i32_v8i64(ptr %base, <8 x i32> %idxs, <
; RV32ZVE32F-NEXT: vsetivli zero, 8, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vsll.vi v8, v8, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB56_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a3, v8
; RV32ZVE32F-NEXT: lw a1, 4(a3)
; RV32ZVE32F-NEXT: lw a3, 0(a3)
@@ -6779,12 +6776,12 @@ define <8 x i64> @mgather_baseidx_v8i64(ptr %base, <8 x i64> %idxs, <8 x i1> %m,
; RV32ZVE32F-NEXT: vslide1down.vx v8, v8, a4
; RV32ZVE32F-NEXT: vsll.vi v8, v8, 3
; RV32ZVE32F-NEXT: vadd.vx v8, v8, a1
-; RV32ZVE32F-NEXT: vsetivli zero, 1, e8, mf4, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e8, mf2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s t0, v0
; RV32ZVE32F-NEXT: andi a1, t0, 1
; RV32ZVE32F-NEXT: beqz a1, .LBB57_7
; RV32ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; RV32ZVE32F-NEXT: vsetvli zero, zero, e32, m2, ta, ma
; RV32ZVE32F-NEXT: vmv.x.s a2, v8
; RV32ZVE32F-NEXT: lw a1, 4(a2)
; RV32ZVE32F-NEXT: lw a2, 0(a2)
@@ -7216,9 +7213,8 @@ define <4 x half> @mgather_truemask_v4f16(<4 x ptr> %ptrs, <4 x half> %passthru)
; RV64ZVE32F-NEXT: .LBB61_6: # %cond.load1
; RV64ZVE32F-NEXT: ld a2, 8(a0)
; RV64ZVE32F-NEXT: flh fa5, 0(a2)
-; RV64ZVE32...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a generally good idea, some style clean up needed before landing.
void RISCVInsertVSETVLI::transferBefore(VSETVLIInfo &Info, | ||
const MachineInstr &MI) const { | ||
uint64_t TSFlags = MI.getDesc().TSFlags; | ||
if (!RISCVII::hasSEWOp(TSFlags)) | ||
return; | ||
|
||
const VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI); | ||
VSETVLIInfo NewInfo = computeInfoForInstr(MI, TSFlags, MRI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave NewInfo and PrevInfo unchanged. Use Info as your mutable state variable if desired. Mixing the meaning leads to subtle confusions below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this an attempt, but couldn't figure out a good way of doing so. Mainly because the SEW/LMUL ratio "touch up" has to happen on NewInfo before we actually mutate Info, since CanUseX0X0Form is dependent on said ratio.
We were still essentially mutating NewInfo before this patch, it was just a bit obfuscated because we swapped it into Info first. Any ideas here?
The property we're explicitly looking for is whether or not MI only cares about VL zeroness and not VL itself, so we can just use DemandedFields for this. This should simplify an upcoming change in #72352
It's always guaranteed to be valid since we compute it ourselves from MI. This should simplify an upcoming change in #72352
534f7c3
to
9d9fcf6
Compare
The property we're explicitly looking for is whether or not MI only cares about VL zeroness and not VL itself, so we can just use DemandedFields for this. This should simplify an upcoming change in llvm#72352
It's always guaranteed to be valid since we compute it ourselves from MI. This should simplify an upcoming change in llvm#72352
The property we're explicitly looking for is whether or not MI only cares about VL zeroness and not VL itself, so we can just use DemandedFields for this. This should simplify an upcoming change in llvm#72352
It's always guaranteed to be valid since we compute it ourselves from MI. This should simplify an upcoming change in llvm#72352
There is an optimisation in transferBefore where if a VSETVLIInfo uses the AVL of a defining vsetvli, it uses that vsetvli's AVL provided VLMAX is the same. This patch moves it out of transferBefore and up into computeInfoForInstr to show how it isn't affected by the other optimisations in transferBefore, and to simplify the control flow by removing an early return. This should make llvm#72352 easier to reason about.
9d9fcf6
to
53a16ba
Compare
I have rebased this upon #73909 and added back an early exit that was responsible for a lot of test diffs. Ideally we should be able to view the diff of the four commits past #73909, but GitHub seems to error when I try this: Here's a link to the diff instead: lukel97/llvm-project@move-avl-reg-coalescing...lukel97:llvm-project:reverse-transferBefore |
53a16ba
to
731800f
Compare
// Given an incoming state reaching MI, modifies that state so that it is minimally | ||
// compatible with MI. The resulting state is guaranteed to be semantically legal | ||
// for MI, but may not be the state requested by MI. | ||
static VSETVLIInfo adjustIncoming(VSETVLIInfo PrevInfo, VSETVLIInfo NewInfo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This forward definition is solely to improve the diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…FC (#73909) There is an optimisation in transferBefore where if a VSETVLIInfo uses the AVL of a defining vsetvli, it uses that vsetvli's AVL provided VLMAX is the same. This patch moves it out of transferBefore and up into computeInfoForInstr to show how it isn't affected by the other optimisations in transferBefore, and to simplify the control flow by removing an early return. This should make #72352 easier to reason about.
transferBefore currently takes an incoming state and an instruction, computes the new state needed for the instruction, and then modifies that new state to be more similar to the incoming state. This patch reverses the approach by instead taking the incoming state and modifying only the bits that are demanded by the instruction. I find this makes things slightly easier to reason about. It also us to use the x0, x0 form in more places, in particular with vmv.x.s since we're no longer relying on isScalarInsertInstr, but instead reusing the logic in getDemanded. I haven't had a chance to check, but hopefully this might also address some of the regressions seen in llvm#71501.
731800f
to
365ebae
Compare
transferBefore currently takes an incoming state and an instruction, computes
the new state needed for the instruction, and then modifies that new state to
be more similar to the incoming state.
This patch reverses the approach by instead taking the incoming state and
modifying only the bits that are demanded by the instruction.
It is currently NFC, but later patches would rework adjustIncoming to bring back the improvements from earlier revisions of this PR.