Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats #65706

Merged
merged 14 commits into from
Sep 20, 2023

Conversation

michaelmaitland
Copy link
Contributor

of the scalar operation

VP Intrinsics whose vector operands are both splat values may be simplified into the scalar version of the operation and the result is splatted. If this simplification occurs, then it can lead to scalarization during CodeGen.

This issue is the intrinsic dual of #65072. This issue scalarizes non-legal types when the operations are VP Intrinsics.

@michaelmaitland michaelmaitland requested review from a team as code owners September 8, 2023 03:16
@michaelmaitland michaelmaitland changed the title [InstCombine][RISCV] Convert VPIntrinsics with splat operands to splats [InstCombine][RISCV] Convert VPIntrinsics with splat operands to splats of scalar operand Sep 8, 2023
@michaelmaitland michaelmaitland changed the title [InstCombine][RISCV] Convert VPIntrinsics with splat operands to splats of scalar operand [InstCombine][RISCV] Convert VPIntrinsics with splat operands to splats Sep 8, 2023
Copy link
Contributor Author

@michaelmaitland michaelmaitland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pointed out a few cases that we may not want to do this optimization. Wondering if anyone has any feedback.

llvm/test/CodeGen/RISCV/rvv/vpbinops-scalarization.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/RISCV/rvv/vpbinops-scalarization.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/RISCV/rvv/vpbinops-scalarization.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/RISCV/rvv/vpbinops-scalarization.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/RISCV/rvv/vpbinops-scalarization.ll Outdated Show resolved Hide resolved
@lukel97
Copy link
Contributor

lukel97 commented Sep 8, 2023

@ChunyuLiao pointed out that the VectorCombine pass already scalarizes binary ops with splatted operands at the IR level: https://godbolt.org/z/MPvvTG5dT

Would it make sense to do this in VectorCombine::scalarizeBinopOrCmp? It seems to have some cost modelling in there which I'd imagine would be good to take advantage of

@michaelmaitland
Copy link
Contributor Author

Would it make sense to do this in VectorCombine::scalarizeBinopOrCmp? It seems to have some cost modelling in there which I'd imagine would be good to take advantage of

It isn't quite similar enough to fit right into VectorCombine::scalarizeBinopOrCmp, but I've put it in right around that same area. Having the cost model from VectorCombine there is a great idea.

// is a poison value. For now, only do this simplification if all lanes
// are active.
// TODO: Relax the condition that all lanes are active by using insertelement
// on inactive lanes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this patch, but for reinserting the inactive lanes later maybe we could do something like

%x = scalar
%v = splat
%res = @llvm.vp.merge.v16i32(%mask, %v, poison, %evl)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. I will work on this after this patch lands.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/VectorCombine.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. Just a note, not related to your patch, but about a missed scalarization in the existing non-vp scalarization: It only catches binops where both operands aren't constant, e.g. like this:

define <vscale x 1 x i64> @f(i64 %x, i64 %y) {
  %head.x = insertelement <vscale x 1 x i64> poison, i64 %x, i32 0
  %splat.x = shufflevector <vscale x 1 x i64> %head.x, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
  %head.y = insertelement <vscale x 1 x i64> poison, i64 %y, i32 0
  %splat.y = shufflevector <vscale x 1 x i64> %head.y, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
  %v = add <vscale x 1 x i64> %splat.x, %splat.y
  ret <vscale x 1 x i64> %v
}

Because this happens to get transformed by instcombine into:

define <vscale x 1 x i64> @f(i64 %x, i64 %y) #0 {
  %head.x = insertelement <vscale x 1 x i64> poison, i64 %x, i64 0
  %head.y = insertelement <vscale x 1 x i64> poison, i64 %y, i64 0
  %1 = add <vscale x 1 x i64> %head.x, %head.y
  %v = shufflevector <vscale x 1 x i64> %1, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
  ret <vscale x 1 x i64> %v
}

And scalarizeBinopOrCmp only looks for insertelements.

But if one of the operands of the binop is a constant:

define <vscale x 1 x i64> @g(i64 %x) {
  %head.x = insertelement <vscale x 1 x i64> poison, i64 %x, i64 0
  %splat.x = shufflevector <vscale x 1 x i64> %head.x, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
  %splat.y = shufflevector <vscale x 1 x i64> insertelement(<vscale x 1 x i64> poison, i64 42, i32 0), <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
  %v = add <vscale x 1 x i64> %splat.x, %splat.y
  ret <vscale x 1 x i64> %v
}

Then the above transformation doesn't happen, and it stays in the shufflevector %x, poison, zeroinitializer form. Which scalarizeBinopOrCmp doesn't handle.

@lukel97
Copy link
Contributor

lukel97 commented Sep 13, 2023

Thank you very much for your review on this. You were very helpful in improving this patch and I learned a lot about a space I was not too familiar with previously.

No problem, I learnt lots about UB too :)

@nikic nikic changed the title [InstCombine][RISCV] Convert VPIntrinsics with splat operands to splats [VectorCombine][RISCV] Convert VPIntrinsics with splat operands to splats Sep 14, 2023
ScalarIntrID = VPI.getFunctionalIntrinsicID();
if (!ScalarIntrID)
return false;
ScalarIsIntr = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this flag? Can we check if ScalarIntrID has a value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

bool MustHaveNonZeroVL =
IntrID == Intrinsic::vp_sdiv || IntrID == Intrinsic::vp_udiv ||
IntrID == Intrinsic::vp_srem || IntrID == Intrinsic::vp_urem ||
IntrID == Intrinsic::vp_fdiv || IntrID == Intrinsic::vp_frem;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fp isn't an issue. only integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Do you mind explaining why?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP division by 0 produces infinity or negative infinity unless than numerator is 0, then it's NaN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For integer division: The quotient of division by zero has all bits set, i.e. 2XLEN − 1 for unsigned division or −1 for signed division [Source, page 48] (https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf).

Why do we want to avoid doing integer division but okay with fp division?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not RISC-V specific code so the RISC-V spec does not apply. https://llvm.org/docs/LangRef.html#udiv-instruction "Division by zero is undefined behavior."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see, my bad. I've updated to remove check for fp here.

IntrID == Intrinsic::vp_srem || IntrID == Intrinsic::vp_urem ||
IntrID == Intrinsic::vp_fdiv || IntrID == Intrinsic::vp_frem;

if ((MustHaveNonZeroVL && IsKnownNonZeroVL) || !MustHaveNonZeroVL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just !MustHaveNonZeroVL || IsKnownNonZeroVL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

ElementCount EC = cast<VectorType>(Op0->getType())->getElementCount();
Value *EVL = VPI.getArgOperand(3);
const DataLayout &DL = VPI.getModule()->getDataLayout();
bool IsKnownNonZeroVL = isKnownNonZero(EVL, DL, 0, &AC, &VPI, &DT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only call isKnownNonZero if we need it. It's expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

of the scalar operation

VP Intrinsics whose vector operands are both splat values may be simplified
into the scalar version of the operation and the result is splatted. If
this simplification occurs, then it can lead to scalarization during CodeGen.

This issue is the intrinsic dual of llvm#65072. This issue scalarizes
non-legal types when the operations are VP Intrinsics.
…to splats of the scalar operation

Use getFunctionalIntrinsicID
…to splats of the scalar operation

Add zvfh and VEC-COMBINE-64/32
…to splats of the scalar operation

Respond to craigs comments
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
…lvm#66190)

This adds a helper method to get the ID of the functionally equivalent
intrinsic, similar to the existing getFunctionalOpcodeForVP and
getConstrainedIntrinsicIDForVP methods.

Not sure if it's notable or not, but I can't find any existing uses of
VP_PROPERTY_FUNCTIONAL_INTRINSIC?

It could potentially be used in llvm#65706 to scalarize VP intrinsics.
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
VPIntrinsics with VP_PROPERTY_BINARYOP property should have the ability
to be queried with with VPBinOpIntrinsic::isVPBinOp, similiar to how
intrinsics with the VP_PROPERTY_REDUCTION property can be queried with
VPReductionIntrinsic::isVPReduction.

This will be used in llvm#65706. In that PR the usage of this class is
tested.
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaelmaitland michaelmaitland merged commit e0aaa19 into llvm:main Sep 20, 2023
2 checks passed
@michaelmaitland michaelmaitland deleted the vpbinops-scalarization branch September 20, 2023 22:27
michaelmaitland added a commit that referenced this pull request Sep 20, 2023
This directory was missing a lit.local.cfg which was causing some build
bots to fail when #65706 was comitted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants