[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR #80452

lukel97 · 2024-02-02T15:45:10Z

This handles two cases where we can work out some known-zero bits for
ISD::STEP_VECTOR.

The first case handles when we know the low bits are zero because the step
amount is a power of two. This is taken from https://reviews.llvm.org/D128159,
and even though the original patch didn't end up landing this case due to it
not having any test difference, I've included it here for completeness's sake.

The second case handles the case when we have an upper bound on vscale_range.
We can use this to work out the upper bound on the number of elements, and thus
what the maximum step will be. From the maximum step we then know which hi bits
are zero.

On its own, computing the known hi bits results in some small improvements for
RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm hoping
to be able to use this later to reduce the LMUL in index calculations for
vrgather/indexed accesses.

Note

The tests have been included in a separate commit in this PR so reviewers can view the diff.

This handles two cases where we can work out some known-zero bits for ISD::STEP_VECTOR. The first case handles when we know the low bits are zero because the step amount is a power of two. This is taken from https://reviews.llvm.org/D128159, and even though the original patch didn't end up landing this case due to it not having any test difference, I've included it here for completeness's sake. The second case handles the case when we have an upper bound on vscale_range. We can use this to work out the upper bound on the number of elements, and thus what the maximum step will be. From the maximum step we then know which hi bits are zero. On its own, computing the known hi bits results in some small improvements for RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm hoping to be able to use this later to reduce the LMUL in index calculations for vrgather/indexed accesses. Co-authored-by: Philip Reames <preames@rivosinc.com>

llvmbot · 2024-02-02T15:45:40Z

@llvm/pr-subscribers-llvm-selectiondag

Author: Luke Lau (lukel97)

Changes

This handles two cases where we can work out some known-zero bits for
ISD::STEP_VECTOR.

The first case handles when we know the low bits are zero because the step
amount is a power of two. This is taken from https://reviews.llvm.org/D128159,
and even though the original patch didn't end up landing this case due to it
not having any test difference, I've included it here for completeness's sake.

The second case handles the case when we have an upper bound on vscale_range.
We can use this to work out the upper bound on the number of elements, and thus
what the maximum step will be. From the maximum step we then know which hi bits
are zero.

On its own, computing the known hi bits results in some small improvements for
RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm hoping
to be able to use this later to reduce the LMUL in index calculations for
vrgather/indexed accesses.

> [!NOTE]
> The tests have been included in a separate commit in this PR so reviewers can view the diff.

Full diff: https://github.com/llvm/llvm-project/pull/80452.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+13)
(modified) llvm/test/CodeGen/RISCV/rvv/stepvector.ll (+26)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 3c1343836187a..b20230ff96f4e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -3110,6 +3110,19 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, const APInt &DemandedElts,
     }
     break;
   }
+  case ISD::STEP_VECTOR: {
+    const APInt &Step = Op.getConstantOperandAPInt(0);
+
+    if (Step.isPowerOf2())
+      Known.Zero.setLowBits(Step.logBase2());
+
+    const Function &F = getMachineFunction().getFunction();
+    const APInt MaxNumElts = getVScaleRange(&F, BitWidth).getUnsignedMax() *
+                             Op.getValueType().getVectorMinNumElements();
+    const APInt MaxValue = (MaxNumElts - 1) * Step;
+    Known.Zero.setHighBits(MaxValue.countl_zero());
+    break;
+  }
   case ISD::BUILD_VECTOR:
     assert(!Op.getValueType().isScalableVector());
     // Collect the known bits that are shared by every demanded vector element.
diff --git a/llvm/test/CodeGen/RISCV/rvv/stepvector.ll b/llvm/test/CodeGen/RISCV/rvv/stepvector.ll
index 6ce307146be19..d18b45105cff7 100644
--- a/llvm/test/CodeGen/RISCV/rvv/stepvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/stepvector.ll
@@ -733,3 +733,29 @@ entry:
   %3 = shl <vscale x 16 x i64> %2, %1
   ret <vscale x 16 x i64> %3
 }
+
+; maximum step is 4 * 2 = 8, so maximum step value is 7, so hi 61 bits are known
+; zero
+define <vscale x 2 x i64> @hi_bits_known_zero() vscale_range(2, 4) {
+; CHECK-LABEL: hi_bits_known_zero:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; CHECK-NEXT:    vmv.v.i v8, 0
+; CHECK-NEXT:    ret
+  %step = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
+  %and = and <vscale x 2 x i64> %step, shufflevector(<vscale x 2 x i64> insertelement(<vscale x 2 x i64> poison, i64 u0xfffffffffffffff8, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
+  ret <vscale x 2 x i64> %and
+}
+
+; step values are multiple of 8, so lo 3 bits are known zero
+define <vscale x 2 x i64> @lo_bits_known_zero() {
+; CHECK-LABEL: lo_bits_known_zero:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; CHECK-NEXT:    vmv.v.i v8, 0
+; CHECK-NEXT:    ret
+  %step = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
+  %step.mul = mul <vscale x 2 x i64> %step, shufflevector(<vscale x 2 x i64> insertelement(<vscale x 2 x i64> poison, i64 8, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
+  %and = and <vscale x 2 x i64> %step.mul, shufflevector(<vscale x 2 x i64> insertelement(<vscale x 2 x i64> poison, i64 7, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
+  ret <vscale x 2 x i64> %and
+}

topperc · 2024-02-02T17:26:02Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+      Known.Zero.setLowBits(Step.logBase2());
+
+    const Function &F = getMachineFunction().getFunction();
+    const APInt MaxNumElts = getVScaleRange(&F, BitWidth).getUnsignedMax() *


Do we need to check for overflow on these multiplies?

Yeah, I did a quick check that * does indeed perform wrapping. But forgot to actually bail if it overflows.

I think in practice, the overflow will never happen for current targets. The maximum value of vscale for AArch64 and RISCV is quite small and the maximum size of the vector isn't likely to be much larger than a m8 in practice. Should definitely guard the case properly for future proofing though.

I think it's the multiplicand that's most likely to cause overflow though. On RVV strided ops can have a 2^XLEN - 1 stride max. But as you say, I hope we won't see this in practice.

An SEW=8 LMUL=8 vid.v overflows on a VLEN=512 core like sifive-x280. That's 512 elements. so the step vector produces [0, ..., 255, 0, ..., 255].

topperc · 2024-02-02T18:17:49Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+      Known.Zero.setLowBits(Step.logBase2());
+
+    const Function &F = getMachineFunction().getFunction();
+    const APInt MaxNumElts = getVScaleRange(&F, BitWidth).getUnsignedMax() *


The element count multiply can also overflow.

I think the min number of elements itself can also overflow, so I've added a check that it fits into the bitwidth

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

…h APInt op

preames

LGTM

topperc · 2024-02-07T20:42:26Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+
+    const Function &F = getMachineFunction().getFunction();
+
+    if (!isUIntN(BitWidth, Op.getValueType().getVectorMinNumElements()))


I think this might be off by 1. Only Op.getValueType().getVectorMinNumElements() - 1 needs to fit in the bitwidth.

Err. nevermind we need it to fit to load it into the APInt.

topperc · 2024-02-07T20:46:40Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

+        APInt(BitWidth, Op.getValueType().getVectorMinNumElements());
+
+    bool Overflow;
+    const APInt MaxNumElts = getVScaleRange(&F, BitWidth)


Are fixed vector ISD::STEP_VECTOR allowed?

I don't believe we construct them. SDAGBuilder doesn't, so unless we have something somewhere which combines to them, no.

However, good point. Unless we have an assert to that effect somewhere, we should handle them (via a bail out) here.

Never mind, we have the assert in SelectionDAG::getNode

assert(VT.isScalableVector() && "STEP_VECTOR can only be used with scalable types"); assert(OpOpcode == ISD::TargetConstant && VT.getVectorElementType() == N1.getValueType() && "Unexpected step operand");

topperc

LGTM

lukel97 and others added 2 commits February 2, 2024 22:26

Add tests where we should be able to know the lo or hi bits are zero

ff14dd1

llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Feb 2, 2024

lukel97 requested review from preames, RKSimon and topperc February 2, 2024 15:45

topperc reviewed Feb 2, 2024

View reviewed changes

Check for overflow

40012cf

topperc reviewed Feb 2, 2024

View reviewed changes

Check for overflow in the max num of elements too

d472126

topperc reviewed Feb 2, 2024

View reviewed changes

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp Show resolved Hide resolved

lukel97 added 2 commits February 3, 2024 09:40

Check the minimum num of elts doesn't overflow either.

d35a5a2

Overflow flag actually gets cleared every call, so check it after eac…

32daac9

…h APInt op

preames approved these changes Feb 7, 2024

View reviewed changes

topperc reviewed Feb 7, 2024

View reviewed changes

topperc approved these changes Feb 7, 2024

View reviewed changes

lukel97 merged commit ece66db into llvm:main Feb 8, 2024
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR #80452

[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR #80452

lukel97 commented Feb 2, 2024

llvmbot commented Feb 2, 2024

topperc Feb 2, 2024

lukel97 Feb 2, 2024

preames Feb 2, 2024

lukel97 Feb 2, 2024

topperc Feb 2, 2024 •

edited

topperc Feb 2, 2024

lukel97 Feb 3, 2024

preames left a comment

topperc Feb 7, 2024

topperc Feb 7, 2024

topperc Feb 7, 2024

preames Feb 7, 2024

preames Feb 7, 2024

topperc left a comment


		const Function &F = getMachineFunction().getFunction();

		if (!isUIntN(BitWidth, Op.getValueType().getVectorMinNumElements()))

[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR #80452

[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR #80452

Conversation

lukel97 commented Feb 2, 2024

llvmbot commented Feb 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topperc Feb 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

preames left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topperc left a comment

Choose a reason for hiding this comment

topperc Feb 2, 2024 •

edited