Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SelectionDAG] Add computeKnownBits support for ISD::STEP_VECTOR #80452

Merged
merged 6 commits into from
Feb 8, 2024

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Feb 2, 2024

This handles two cases where we can work out some known-zero bits for
ISD::STEP_VECTOR.

The first case handles when we know the low bits are zero because the step
amount is a power of two. This is taken from https://reviews.llvm.org/D128159,
and even though the original patch didn't end up landing this case due to it
not having any test difference, I've included it here for completeness's sake.

The second case handles the case when we have an upper bound on vscale_range.
We can use this to work out the upper bound on the number of elements, and thus
what the maximum step will be. From the maximum step we then know which hi bits
are zero.

On its own, computing the known hi bits results in some small improvements for
RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm hoping
to be able to use this later to reduce the LMUL in index calculations for
vrgather/indexed accesses.

Note

The tests have been included in a separate commit in this PR so reviewers can view the diff.

lukel97 and others added 2 commits February 2, 2024 22:26
This handles two cases where we can work out some known-zero bits for
ISD::STEP_VECTOR.

The first case handles when we know the low bits are zero because the step
amount is a power of two. This is taken from https://reviews.llvm.org/D128159,
and even though the original patch didn't end up landing this case due to it
not having any test difference, I've included it here for completeness's sake.

The second case handles the case when we have an upper bound on vscale_range.
We can use this to work out the upper bound on the number of elements, and thus
what the maximum step will be. From the maximum step we then know which hi bits
are zero.

On its own, computing the known hi bits results in some small improvements for
RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm hoping
to be able to use this later to reduce the LMUL in index calculations for
vrgather/indexed accesses.

Co-authored-by: Philip Reames <preames@rivosinc.com>
@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Feb 2, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 2, 2024

@llvm/pr-subscribers-llvm-selectiondag

Author: Luke Lau (lukel97)

Changes

This handles two cases where we can work out some known-zero bits for
ISD::STEP_VECTOR.

The first case handles when we know the low bits are zero because the step
amount is a power of two. This is taken from https://reviews.llvm.org/D128159,
and even though the original patch didn't end up landing this case due to it
not having any test difference, I've included it here for completeness's sake.

The second case handles the case when we have an upper bound on vscale_range.
We can use this to work out the upper bound on the number of elements, and thus
what the maximum step will be. From the maximum step we then know which hi bits
are zero.

On its own, computing the known hi bits results in some small improvements for
RVV with -mrvv-vector-bits=zvl across the llvm-test-suite. However I'm hoping
to be able to use this later to reduce the LMUL in index calculations for
vrgather/indexed accesses.

> [!NOTE]
> The tests have been included in a separate commit in this PR so reviewers can view the diff.


Full diff: https://github.com/llvm/llvm-project/pull/80452.diff

2 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp (+13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/stepvector.ll (+26)
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 3c1343836187a..b20230ff96f4e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -3110,6 +3110,19 @@ KnownBits SelectionDAG::computeKnownBits(SDValue Op, const APInt &DemandedElts,
     }
     break;
   }
+  case ISD::STEP_VECTOR: {
+    const APInt &Step = Op.getConstantOperandAPInt(0);
+
+    if (Step.isPowerOf2())
+      Known.Zero.setLowBits(Step.logBase2());
+
+    const Function &F = getMachineFunction().getFunction();
+    const APInt MaxNumElts = getVScaleRange(&F, BitWidth).getUnsignedMax() *
+                             Op.getValueType().getVectorMinNumElements();
+    const APInt MaxValue = (MaxNumElts - 1) * Step;
+    Known.Zero.setHighBits(MaxValue.countl_zero());
+    break;
+  }
   case ISD::BUILD_VECTOR:
     assert(!Op.getValueType().isScalableVector());
     // Collect the known bits that are shared by every demanded vector element.
diff --git a/llvm/test/CodeGen/RISCV/rvv/stepvector.ll b/llvm/test/CodeGen/RISCV/rvv/stepvector.ll
index 6ce307146be19..d18b45105cff7 100644
--- a/llvm/test/CodeGen/RISCV/rvv/stepvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/stepvector.ll
@@ -733,3 +733,29 @@ entry:
   %3 = shl <vscale x 16 x i64> %2, %1
   ret <vscale x 16 x i64> %3
 }
+
+; maximum step is 4 * 2 = 8, so maximum step value is 7, so hi 61 bits are known
+; zero
+define <vscale x 2 x i64> @hi_bits_known_zero() vscale_range(2, 4) {
+; CHECK-LABEL: hi_bits_known_zero:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; CHECK-NEXT:    vmv.v.i v8, 0
+; CHECK-NEXT:    ret
+  %step = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
+  %and = and <vscale x 2 x i64> %step, shufflevector(<vscale x 2 x i64> insertelement(<vscale x 2 x i64> poison, i64 u0xfffffffffffffff8, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
+  ret <vscale x 2 x i64> %and
+}
+
+; step values are multiple of 8, so lo 3 bits are known zero
+define <vscale x 2 x i64> @lo_bits_known_zero() {
+; CHECK-LABEL: lo_bits_known_zero:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli a0, zero, e64, m2, ta, ma
+; CHECK-NEXT:    vmv.v.i v8, 0
+; CHECK-NEXT:    ret
+  %step = call <vscale x 2 x i64> @llvm.experimental.stepvector.nxv2i64()
+  %step.mul = mul <vscale x 2 x i64> %step, shufflevector(<vscale x 2 x i64> insertelement(<vscale x 2 x i64> poison, i64 8, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
+  %and = and <vscale x 2 x i64> %step.mul, shufflevector(<vscale x 2 x i64> insertelement(<vscale x 2 x i64> poison, i64 7, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer)
+  ret <vscale x 2 x i64> %and
+}

Known.Zero.setLowBits(Step.logBase2());

const Function &F = getMachineFunction().getFunction();
const APInt MaxNumElts = getVScaleRange(&F, BitWidth).getUnsignedMax() *
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check for overflow on these multiplies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did a quick check that * does indeed perform wrapping. But forgot to actually bail if it overflows.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in practice, the overflow will never happen for current targets. The maximum value of vscale for AArch64 and RISCV is quite small and the maximum size of the vector isn't likely to be much larger than a m8 in practice. Should definitely guard the case properly for future proofing though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the multiplicand that's most likely to cause overflow though. On RVV strided ops can have a 2^XLEN - 1 stride max. But as you say, I hope we won't see this in practice.

Copy link
Collaborator

@topperc topperc Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An SEW=8 LMUL=8 vid.v overflows on a VLEN=512 core like sifive-x280. That's 512 elements. so the step vector produces [0, ..., 255, 0, ..., 255].

Known.Zero.setLowBits(Step.logBase2());

const Function &F = getMachineFunction().getFunction();
const APInt MaxNumElts = getVScaleRange(&F, BitWidth).getUnsignedMax() *
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The element count multiply can also overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the min number of elements itself can also overflow, so I've added a check that it fits into the bitwidth

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


const Function &F = getMachineFunction().getFunction();

if (!isUIntN(BitWidth, Op.getValueType().getVectorMinNumElements()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be off by 1. Only Op.getValueType().getVectorMinNumElements() - 1 needs to fit in the bitwidth.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err. nevermind we need it to fit to load it into the APInt.

APInt(BitWidth, Op.getValueType().getVectorMinNumElements());

bool Overflow;
const APInt MaxNumElts = getVScaleRange(&F, BitWidth)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are fixed vector ISD::STEP_VECTOR allowed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we construct them. SDAGBuilder doesn't, so unless we have something somewhere which combines to them, no.

However, good point. Unless we have an assert to that effect somewhere, we should handle them (via a bail out) here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, we have the assert in SelectionDAG::getNode

   assert(VT.isScalableVector() &&
           "STEP_VECTOR can only be used with scalable types");
    assert(OpOpcode == ISD::TargetConstant &&
           VT.getVectorElementType() == N1.getValueType() &&
           "Unexpected step operand");

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lukel97 lukel97 merged commit ece66db into llvm:main Feb 8, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants